10,000 Matching Annotations
  1. Oct 2025
    1. Author response:

      The issue of a control without blue light illumination was raised. Clearly without the light we will not obtain any signal in the fluorescence microscopy experiments, which would not be very informative. Instead, we changed the level of blue light illumination in the fluorescence microscopy experiments (figure 4A) and the response of the bacteria scales with dosage. It is very hard to find an alternative explanation, beyond that the blue light is stressing the bacteria and modulating their membrane potentials.

      One of the referees refuses to see wavefronts in our microscopy data. We struggle to understand whether it is an issue with definitions (Waigh has published a tutorial on the subject in Chapter 5 of his book ‘The physics of bacteria: from cells to biofilms’, T.A.Waigh, CUP, 2024 – figure 5.1 shows a sketch) or something subtler on diffusion in excitable systems. We stand by our claim that we observe wavefronts, similar to those observed by Prindle et al<sup>1</sup> and Blee et al<sup>2</sup> for B. subtilis biofilms.

      The referee is questioning our use of ThT to probe the membrane potential. We believe the Pilizota and Strahl groups are treating the E. coli as unexcitable cells, leading to their problems. Instead, we believe E. coli cells are excitable (containing the voltage-gated ion channel Kch) and we now clearly state this in the manuscript. Furthermore, we include a section here discussing some of the issues with ThT.


      Use of ThT as a voltage sensor in cells

      ThT is now used reasonably widely in the microbiology community as a voltage sensor in both bacterial [Prindle et al]1 and fungal cells [Pena et al]12. ThT is a small cationic fluorophore that loads into the cells in proportion to their membrane potential, thus allowing the membrane potential to be measured from fluorescence microscopy measurements.

      Previously ThT was widely used to quantify the growth of amyloids in molecular biology experiments (standardized protocols exist and dedicated software has been created)13 and there is a long history of its use14. ThT fluorescence is bright, stable and slow to photobleach.

      Author response image 1 shows a schematic diagram of the ThT loading in E. coli in our experiments in response to illumination with blue light. Similar results were previously presented by Mancini et al15, but regimes 2 and 3 were mistakenly labelled as artefacts.

      Author response image 1.

      Schematic diagram of ThT loading during an experiment with E. coli cells under blue light illumination i.e. ThT fluorescence as a function of time. Three empirical regimes for the fluorescence are shown (1, 2 and 3).

      The classic study of Prindle et al on bacterial biofilm electrophysiology established the use of ThT in B. subtilis biofilms by showing similar results occurred with DiSc3 which is widely used as a Nernstian voltage sensor in cellular biology1 e.g. with mitochondrial membrane potentials in eukaryotic organisms where there is a large literature. We repeated such a comparative calibration of ThT with DiSc3 in a previous publication with both B. subtilis and P. aeruginosa cells2. ThT thus functioned well in our previous publications with Gram positive and Gram negative cells.

      However, to our knowledge, there are now two groups questioning the use of ThT and DiSc3 as voltage sensors with E. coli cells15-16. The first by the Pilizota group claims ThT only works as a voltage sensor in regime 1 of Author response image 1 using a method based on the rate of rotation of flagellar motors. Another slightly contradictory study by the Strahl group claims DiSc316 only acts as a voltage sensor with the addition of an ionophore for potassium which allows free movement of potassium through the E. coli membranes.

      Our resolution to this contradiction is that ThT does indeed work reasonably well with E. coli. The Pilizota group’s model for rotating flagellar motors assumes the membrane voltage is not varying due to excitability of the membrane voltage (otherwise a non-linear Hodgkin Huxley type model would be needed to quantify their results) i.e. E. coli cells are unexcitable. We show clearly in our study that ThT loading in E. coli is a function of irradiation with blue light and is a stress response of the excitable cells. This is in contradiction to the Pilizota group’s model. The Pilizota group’s model also requires the awkward fiction of why cells decide to unload and then reload ThT in regimes 2 and 3 of Author response image 1 due to variable membrane partitioning of the ThT. Our simple explanation is that it is just due to the membrane voltage changing and no membrane permeability switch needs to be invoked. The Strahl group’s16 results with DiSc3 are also explained by a neglect of the excitable nature of E. coli cells that are reacting to blue light irradiation. Adding ionophores to the E. coli membranes makes the cells unexcitable, reduces their response to blue light and thus leads to simple loading of DiSc3 (the physiological control of K+ in the cells by voltage-gated ion channels has been short circuited by the addition of the ionophore).

      Further evidence of our model that ThT functions as a voltage sensor with E. coli include:

      1) The 3 regimes in Author response image 1 from ThT correlate well with measurements of extracellular potassium ion concentration using TMRM i.e. all 3 regimes in Author response image 1 are visible with this separate dye (figure 1d).

      2) We are able to switch regime 3 in Author response image 1, off and then on again by using knock downs of the potassium ion channel Kch in the membranes of the E. coli and then reinserting the gene back into the knock downs. This cannot be explained by the Pilizota model.

      We conclude that ThT works reasonably well as a sensor of membrane voltage in E. coli and the previous contradictory studies15-16 are because they neglect the excitable nature of the membrane voltage of E. coli cells in response to the light used to make the ThT fluoresce.

      Three further criticisms of the Mancini et al method15 for calibrating membrane voltages include:

      1) E. coli cells have clutches that are not included in their models. Otherwise the rotation of the flagella would be entirely enslaved to the membrane voltage allowing the bacteria no freedom to modulate their speed of motility.

      2) Ripping off the flagella may perturb the integrity of the cell membrane and lead to different loading of the ThT in the E. coli cells.

      3) Most seriously, the method ignores the activity of many other ion channels (beyond H+) on the membrane voltage that are known to exist with E. coli cells e.g. Kch for K+ ions. The Pilizota groups uses a simple Nernstian battery model developed for mitochondria in the 1960s. It is not adequate to explain our results.

      An additional criticism of the Winkel et al study17 from the Strahl group is that it indiscriminately switches between discussion of mitochondria and bacteria e.g. on page 8 ‘As a consequence the membrane potential is dominated by H+’. Mitochondria are slightly alkaline intracellular organelles with external ion concentrations in the cytoplasm that are carefully controlled by the eukaryotic cells. E. coli are not i.e. they have neutral internal pHs, with widely varying extracellular ionic concentrations and have reinforced outer membranes to resist osmotic shocks (in contrast mitochondria can easily swell in response to moderate changes in osmotic pressure).

      A quick calculation of the equilibrium membrane voltage of E. coli can be easily done using the Nernst equation dependent on the extracellular ion concentrations defined by the growth media (the intracellular ion concentrations in E. coli are 0.2 M K+ and 10-7 M H+ i.e. there is a factor of a million fewer H+ ions). Thus in contradiction to the claims of the groups of Pilizota15 and Strahl17, H+ is a minority determinant to the membrane voltage of E. coli. The main determinant is K+. For a textbook version of this point the authors can refer to Chapter 4 of D. White, et al’s ‘The physiology and biochemistry of prokaryotes’, OUP, 2012, 4th edition.

      Even in mitochondria the assumption that H+ dominates the membrane potential and the cells are unexcitable can be questioned e.g. people have observed pulsatile depolarization phenomena with mitochondria18-19. A large number of K+ channels are now known to occur in mitochondrial membranes (not to mention Ca2+ channels; mitochondria have extensive stores of Ca2+) and they are implicated in mitochondrial membrane potentials. In this respect the seminal Nobel prize winning research of Peter Mitchell (1961) on mitochondria needs to be amended20. Furthermore, the mitochondrial work is clearly inapplicable to bacteria (the proton motive force, PMF, will instead subtly depend on non-linear Hodgkin-Huxley equations for the excitable membrane potential, similar to those presented in the current article). A much more sophisticated framework has been developed to describe electrophysiology by the mathematical biology community to describe the activity of electrically excitable cells (e.g. with neurons, sensory cells and cardiac cells), beyond Mitchell’s use of the simple stationary equilibrium thermodynamics to define the Proton Motive Force via the electrochemical potential of a proton (the use of the word ‘force’ is unfortunate, since it is a potential). The tools developed in the field of mathematical electrophysiology8 should be more extensively applied to bacteria, fungi, mitochondria and chloroplasts if real progress is to be made.


      Related to the previous point, we now cite articles from the Pilizota and Strahl groups in the main text (one from each group). Unfortunately, the space constraints of eLife mean we cannot make a more detailed discussion in the main article.

      In terms of modelling the ion channels, the Hodgkin-Huxley type model proposes that the Kch ion channel can be modelled as a typical voltage-gated potassium ion channel i.e. with a 𝑛<sup>4</sup> term in its conductivity. The literature agrees that Kch is a voltage-gated potassium ion channel based on its primary sequence<sup>3</sup>. The protein has the typical 6 transmembrane helix motif for a voltage-gated ion channel. The agent-based model assumes little about the structure of ion channels in E. coli, other than they release potassium in response to a threshold potassium concentration in their environment. The agent based model is thus robust to the exact molecular details chosen and predicts the anomalous transport of the potassium wavefronts reasonably well (the modelling was extended in a recent Physical Review E article(<sup>4</sup>). Such a description of reaction-anomalous diffusion phenomena has not to our knowledge been previously achieved in the literature<sup>5</sup> and in general could be used to describe other signaling molecules.

      1. Prindle, A.; Liu, J.; Asally, M.; Ly, S.; Garcia-Ojalvo, J.; Sudel, G. M., Ion channels enable electrical communication in bacterial communities. Nature 2015, 527, 59.

      2. Blee, J. A.; Roberts, I. S.; Waigh, T. A., Membrane potentials, oxidative stress and the dispersal response of bacterial biofilms to 405 nm light. Physical Biology 2020, 17, 036001.

      3. Milkman, R., An E. col_i homologue of eukaryotic potassium channel proteins. _PNAS 1994, 91, 3510-3514.

      4. Martorelli, V.; Akabuogu, E. U.; Krasovec, R.; Roberts, I. S.; Waigh, T. A., Electrical signaling in three-dimensional bacterial biofilms using an agent-based fire-diffuse-fire model. Physical Review E 2024, 109, 054402.

      5. Waigh, T. A.; Korabel, N., Heterogeneous anomalous transport in cellular and molecular biology. Reports on Progress in Physics 2023, 86, 126601.

      6. Hodgkin, A. L.; Huxley, A. F., A quantitative description of membrane current and its application to conduction and excitation in nerve. Journal of Physiology 1952, 117, 500.

      7. Dawson, S. P.; Keizer, J.; Pearson, J. E., Fire-diffuse-fire model of dynamics of intracellular calcium waves. PNAS 1999, 96, 606.

      8. Keener, J.; Sneyd, J., Mathematical Physiology. Springer: 2009.

      9. Coombes, S., The effect of ion pumps on the speed of travelling waves in the fire-diffuse-fire model of Ca2+ release. Bulletin of Mathematical Biology 2001, 63, 1.

      10. Blee, J. A.; Roberts, I. S.; Waigh, T. A., Spatial propagation of electrical signals in circular biofilms. Physical Review E 2019, 100, 052401.

      11. Gorochowski, T. E.; Matyjaszkiewicz, A.; Todd, T.; Oak, N.; Kowalska, K., BSim: an agent-based tool for modelling bacterial populations in systems and synthetic biology. PloS One 2012, 7, 1.

      12. Pena, A.; Sanchez, N. S.; Padilla-Garfias, F.; Ramiro-Cortes, Y.; Araiza-Villaneuva, M.; Calahorra, M., The use of thioflavin T for the estimation and measurement of the plasma membrane electric potential difference in different yeast strains. Journal of Fungi 2023, 9 (9), 948.

      13. Xue, C.; Lin, T. Y.; Chang, D.; Guo, Z., Thioflavin T as an amyloid dye: fibril quantification, optimal concentration and effect on aggregation. Royal Society Open Science 2017, 4, 160696.

      14. Meisl, G.; Kirkegaard, J. B.; Arosio, P.; Michaels, T. C. T.; Vendruscolo, M.; Dobson, C. M.; Linse, S.; Knowles, T. P. J., Molecular mechanisms of protein aggregation from global fitting of kinetic models. Nature Protocols 2016, 11 (2), 252-272.

      15. Mancini, L.; Tian, T.; Guillaume, T.; Pu, Y.; Li, Y.; Lo, C. J.; Bai, F.; Pilizota, T., A general workflow for characterization of Nernstian dyes and their effects on bacterial physiology. Biophysical Journal 2020, 118 (1), 4-14.

      16. Buttress, J. A.; Halte, M.; Winkel, J. D. t.; Erhardt, M.; Popp, P. F.; Strahl, H., A guide for membrane potential measurements in Gram-negative bacteria using voltage-sensitive dyes. Microbiology 2022, 168, 001227.

      17. Derk te Winkel, J.; Gray, D. A.; Seistrup, K. H.; Hamoen, L. W.; Strahl, H., Analysis of antimicrobial-triggered membrane depolarization using voltage sensitive dyes. Frontiers in Cell and Developmental Biology 2016, 4, 29.

      18. Schawarzlander, M.; Logan, D. C.; Johnston, I. G.; Jones, N. S.; Meyer, A. J.; Fricker, M. D.; Sweetlove, L. J., Pulsing of membrane potential in individual mitochondria. The Plant Cell 2012, 24, 1188-1201.

      19. Huser, J.; Blatter, L. A., Fluctuations in mitochondrial membrane potential caused by repetitive gating of the permeability transition pore. Biochemistry Journal 1999, 343, 311-317.

      20. Mitchell, P., Coupling of phosphorylation to electron and hydrogen transfer by a chemi-osmotic type of mechanism. Nature 1961, 191 (4784), 144-148.

      21. Baba, T.; Ara, M.; Hasegawa, Y.; Takai, Y.; Okumura, Y.; Baba, M.; Datsenko, K. A.; Tomita, M.; Wanner, B. L.; Mori, H., Construction of Escherichia Coli K-12 in-frame, single-gene knockout mutants: the Keio collection. Molecular Systems Biology 2006, 2, 1.

      22. Schinedlin, J.; al, e., Fiji: an open-source platform for biological-image analysis. Nature Methods 2012, 9, 676.

      23. Hartmann, R.; al, e., Quantitative image analysis of microbial communities with BiofilmQ. Nature Microbiology 2021, 6 (2), 151.


      The following is the authors’ response to the original reviews.

      Critical synopsis of the articles cited by referee 2:

      (1) ‘Generalized workflow for characterization of Nernstian dyes and their effects on bacterial physiology’, L.Mancini et al, Biophysical Journal, 2020, 118, 1, 4-14.

      This is the central article used by referee 2 to argue that there are issues with the calibration of ThT for the measurement of membrane potentials. The authors use a simple Nernstian battery (SNB) model and unfortunately it is wrong when voltage-gated ion channels occur. Huge oscillations occur in the membrane potentials of E. coli that cannot be described by the SNB model. Instead a Hodgkin Huxley model is needed, as shown in our eLife manuscript and multiple other studies (see above). Arrhenius kinetics are assumed in the SNB model for pumping with no real evidence and the generalized workflow involves ripping the flagella off the bacteria! The authors construct an elaborate ‘work flow’ to insure their ThT results can be interpreted using their erroneous SNB model over a limited range of parameters.

      (2) ‘Non-equivalence of membrane voltage and ion-gradient as driving forces for the bacterial flagellar motor at low load’, C.J.Lo, et al, Biophysical Journal, 2007, 93, 1, 294.

      An odd de novo chimeric species is developed using an E. coli  chassis which uses Na+ instead of H+ for the motility of its flagellar motor. It is not clear the relevance to wild type E. coli, due to the massive physiological perturbations involved. A SNB model is using to fit the data over a very limited parameter range with all the concomitant errors.

      (3) Single-cell bacterial electrophysiology reveals mechanisms of stress-induced damage’, E.Krasnopeeva, et al, Biophysical Journal, 2019, 116, 2390.

      The abstract says ‘PMF defines the physiological state of the cell’. This statement is hyperbolic. An extremely wide range of molecules contribute to the physiological state of a cell. PMF does not even define the electrophysiology of the cell e.g. via the membrane potential. There are 0.2 M of K+ compared with 0.0000001 M of H+ in E. coli, so K+ is arguably a million times more important for the membrane potential than H+ and thus the electrophysiology!

      Equation (1) in the manuscript assumes no other ions are exchanged during the experiments other than H+. This is a very bad approximation when voltage-gated potassium ion channels move the majority ion (K+) around!

      In our model Figure 4A is better explained by depolarisation due to K+ channels closing than direct irreversible photodamage. Why does the THT fluorescence increase again for the second hyperpolarization event if the THT is supposed to be damaged? It does not make sense.

      (4) ‘The proton motive force determines E. coli robustness to extracellular pH’, G.Terradot et al, 2024, preprint.

      This article expounds the SNB model once more. It still ignores the voltage-gated ion channels. Furthermore, it ignores the effect of the dominant ion in E. coli, K+. The manuscript is incorrect as a result and I would not recommend publication.

      In general, an important problem is being researched i.e. how the membrane potential of E. coli is related to motility, but there are serious flaws in the SNB approach and the experimental methodology appears tenuous.

      Answers to specific questions raised by the referees

      Reviewer #1 (Public Review):

      Summary:

      Cell-to-cell communication is essential for higher functions in bacterial biofilms. Electrical signals have proven effective in transmitting signals across biofilms. These signals are then used to coordinate cellular metabolisms or to increase antibiotic tolerance. Here, the authors have reported for the first time coordinated oscillation of membrane potential in E. coli biofilms that may have a functional role in photoprotection.

      Strengths:

      - The authors report original data.

      - For the first time, they showed that coordinated oscillations in membrane potential occur in E. Coli biofilms.

      - The authors revealed a complex two-phase dynamic involving distinct molecular response mechanisms.

      - The authors developed two rigorous models inspired by 1) Hodgkin-Huxley model for the temporal dynamics of membrane potential and 2) Fire-Diffuse-Fire model for the propagation of the electric signal.

      - Since its discovery by comparative genomics, the Kch ion channel has not been associated with any specific phenotype in E. coli. Here, the authors proposed a functional role for the putative K+ Kch channel : enhancing survival under photo-toxic conditions.

      We thank the referee for their positive evaluations and agree with these statements.

      Weaknesses:

      - Since the flow of fresh medium is stopped at the beginning of the acquisition, environmental parameters such as pH and RedOx potential are likely to vary significantly during the experiment. It is therefore important to exclude the contributions of these variations to ensure that the electrical response is only induced by light stimulation. Unfortunately, no control experiments were carried out to address this issue.

      The electrical responses occur almost instantaneously when the stimulation with blue light begins i.e. it is too fast to be a build of pH. We are not sure what the referee means by Redox potential since it is an attribute of all chemicals that are able to donate/receive electrons. The electrical response to stress appears to be caused by ROS, since when ROS scavengers are added the electrical response is removed i.e. pH plays a very small minority role if any.

      - Furthermore, the control parameter of the experiment (light stimulation) is the same as that used to measure the electrical response, i.e. through fluorescence excitation. The use of the PROPS system could solve this problem.

      >>We were enthusiastic at the start of the project to use the PROPs system in E. coli as presented by J.M.Krajl et al, ‘Electrical spiking in E. coli probed with a fluorescent voltage-indicating protein’, Science, 2011, 333, 6040, 345. However, the people we contacted in the microbiology community said that it had some technical issues and there have been no subsequent studies using PROPs in bacteria after the initial promising study. The fluorescent protein system recently presented in PNAS seems more promising, ‘Sensitive bacterial Vm sensors revealed the excitability of bacterial Vm and its role in antibiotic tolerance’, X.Jin et al, PNAS, 120, 3, e2208348120.

      - Electrical signal propagation is an important aspect of the manuscript. However, a detailed quantitative analysis of the spatial dynamics within the biofilm is lacking. In addition, it is unclear if the electrical signal propagates within the biofilm during the second peak regime, which is mediated by the Kch channel. This is an important question, given that the fire-diffuse-fire model is presented with emphasis on the role of K+ ions.

      We have presented a more detailed account of the electrical wavefront modelling work and it is currently under review in a physical journal, ‘Electrical signalling in three dimensional bacterial biofilms using an agent based fire-diffuse-fire model’, V.Martorelli, et al, 2024 https://www.biorxiv.org/content/10.1101/2023.11.17.567515v1

      - Since deletion of the kch gene inhibits the long-term electrical response to light stimulation (regime II), the authors concluded that K+ ions play a role in the habituation response. However, Kch is a putative K+ ion channel. The use of specific drugs could help to clarify the role of K+ ions.

      Our recent electrical impedance spectroscopy publication provides further evidence that Kch is associated with large changes in conductivity as expected for a voltage-gated ion channel (https://pubs.acs.org/doi/10.1021/acs.nanolett.3c04446, 'Electrical impedance spectroscopy with bacterial biofilms: neuronal-like behavior', E.Akabuogu et al, ACS Nanoletters, 2024, in print.

      - The manuscript as such does not allow us to properly conclude on the photo-protective role of the Kch ion channel.

      That Kch has a photoprotective role is our current working hypothesis. The hypothesis fits with the data, but we are not saying we have proven it beyond all possible doubt.

      - The link between membrane potential dynamics and mechanosensitivity is not captured in the equation for the Q-channel opening dynamics in the Hodgkin-Huxley model (Supp Eq 2).

      Our model is agnostic with respect to the mechanosensitivity of the ion channels, although we deduce that mechanosensitive ion channels contribute to ion channel Q.

      - Given the large number of parameters used in the models, it is hard to distinguish between prediction and fitting.

      This is always an issue with electrophysiological modelling (compared with most heart and brain modelling studies we are very conservative in the choice of parameters for the bacteria). In terms of predicting the different phenomena observed, we believe the model is very successful.

      Reviewer #2 (Public Review):

      Summary of what the authors were trying to achieve:

      The authors thought they studied membrane potential dynamics in E.coli biofilms. They thought so because they were unaware that the dye they used to report that membrane potential in E.coli, has been previously shown not to report it. Because of this, the interpretation of the authors' results is not accurate.

      We believe the Pilizota work is scientifically flawed.

      Major strengths and weaknesses of the methods and results:

      The strength of this work is that all the data is presented clearly, and accurately, as far as I can tell.

      The major critical weakness of this paper is the use of ThT dye as a membrane potential dye in E.coli. The work is unaware of a publication from 2020 https://www.sciencedirect.com/science/article/pii/S0006349519308793 [sciencedirect.com] that demonstrates that ThT is not a membrane potential dye in E. coli. Therefore I think the results of this paper are misinterpreted. The same publication I reference above presents a protocol on how to carefully calibrate any candidate membrane potential dye in any given condition.

      We are aware of this study, but believe it to be scientifically flawed. We do not cite the article because we do not think it is a particularly useful contribution to the literature.

      I now go over each results section in the manuscript.

      Result section 1: Blue light triggers electrical spiking in single E. coli cells

      I do not think the title of the result section is correct for the following reasons. The above-referenced work demonstrates the loading profile one should expect from a Nernstian dye (Figure 1). It also demonstrates that ThT does not show that profile and explains why is this so. ThT only permeates the membrane under light exposure (Figure 5). This finding is consistent with blue light peroxidising the membrane (see also following work Figure 4 https://www.sciencedirect.com/science/article/pii/S0006349519303923 [sciencedirect.com] on light-induced damage to the electrochemical gradient of protons-I am sure there are more references for this).

      The Pilizota group invokes some elaborate artefacts to explain the lack of agreement with a simple Nernstian battery model. The model is incorrect not the fluorophore.

      Please note that the loading profile (only observed under light) in the current manuscript in Figure 1B as well as in the video S1 is identical to that in Figure 3 from the above-referenced paper (i.e. https://www.sciencedirect.com/science/article/pii/S0006349519308793 [sciencedirect.com]), and corresponding videos S3 and S4. This kind of profile is exactly what one would expect theoretically if the light is simultaneously lowering the membrane potential as the ThT is equilibrating, see Figure S12 of that previous work. There, it is also demonstrated by the means of monitoring the speed of bacterial flagellar motor that the electrochemical gradient of protons is being lowered by the light. The authors state that applying the blue light for different time periods and over different time scales did not change the peak profile. This is expected if the light is lowering the electrochemical gradient of protons. But, in Figure S1, it is clear that it affected the timing of the peak, which is again expected, because the light affects the timing of the decay, and thus of the decay profile of the electrochemical gradient of protons (Figure 4 https://www.sciencedirect.com/science/article/pii/S0006349519303923 [sciencedirect.com]).

      We think the proton effect is a million times weaker than that due to potasium i.e. 0.2 M K+ versus 10-7 M H+. We can comfortably neglect the influx of H+ in our experiments.

      If find Figure S1D interesting. There authors load TMRM, which is a membrane voltage dye that has been used extensively (as far as I am aware this is the first reference for that and it has not been cited https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1914430 [ncbi.nlm.nih.gov]/). As visible from the last TMRM reference I give, TMRM will only load the cells in Potassium Phosphate buffer with NaCl (and often we used EDTA to permeabilise the membrane). It is not fully clear (to me) whether here TMRM was prepared in rich media (it explicitly says so for ThT in Methods but not for TMRM), but it seems so. If this is the case, it likely also loads because of the damage to the membrane done with light, and therefore I am not surprised that the profiles are similar.

      The vast majority of cells continue to be viable. We do not think membrane damage is dominating.

      The authors then use CCCP. First, a small correction, as the authors state that it quenches membrane potential. CCCP is a protonophore (https://pubmed.ncbi.nlm.nih.gov/4962086 [pubmed.ncbi.nlm.nih.gov]/), so it collapses electrochemical gradient of protons. This means that it is possible, and this will depend on the type of pumps present in the cell, that CCCP collapses electrochemical gradient of protons, but the membrane potential is equal and opposite in sign to the DeltapH. So using CCCP does not automatically mean membrane potential will collapse (e.g. in some mammalian cells it does not need to be the case, but in E.coli it is https://www.biorxiv.org/content/10.1101/2021.11.19.469321v2 [biorxiv.org]). CCCP has also been recently found to be a substrate for TolC (https://journals.asm.org/doi/10.1128/mbio.00676-21 [journals.asm.org]), but at the concentrations the authors are using CCCP (100uM) that should not affect the results. However, the authors then state because they observed, in Figure S1E, a fast efflux of ions in all cells and no spiking dynamics this confirms that observed dynamics are membrane potential related. I do not agree that it does. First, Figure S1E, does not appear to show transients, instead, it is visible that after 50min treatment with 100uM CCCP, ThT dye shows no dynamics. The action of a Nernstian dye is defined. It is not sufficient that a charged molecule is affected in some way by electrical potential, this needs to be in a very specific way to be a Nernstian dye. Part of the profile of ThT loading observed in https://www.sciencedirect.com/science/article/pii/S0006349519308793 [sciencedirect.com] is membrane potential related, but not in a way that is characteristic of Nernstian dye.

      Our understanding of the literature is CCCP poisons the whole metabolism of the bacterial cells. The ATP driven K+ channels will stop functioning and this is the dominant contributor to membrane potential.

      Result section 2: Membrane potential dynamics depend on the intercellular distance

      In this chapter, the authors report that the time to reach the first intensity peak during ThT loading is different when cells are in microclusters. They interpret this as electrical signalling in clusters because the peak is reached faster in microclusters (as opposed to slower because intuitively in these clusters cells could be shielded from light). However, shielding is one possibility. The other is that the membrane has changed in composition and/or the effective light power the cells can tolerate (with mechanisms to handle light-induced damage, some of which authors mention later in the paper) is lower. Given that these cells were left in a microfluidic chamber for 2h hours to attach in growth media according to Methods, there is sufficient time for that to happen. In Figure S12 C and D of that same paper from my group (https://ars.els-cdn.com/content/image/1-s2.0-S0006349519308793-mmc6.pdf [ars.els-cdn.com]) one can see the effects of peak intensity and timing of the peak on the permeability of the membrane. Therefore I do not think the distance is the explanation for what authors observe.

      Shielding would provide the reverse effect, since hyperpolarization begins in the dense centres of the biofilms. For the initial 2 hours the cells receive negligible blue light. Neither of the referee’s comments thus seem tenable.

      Result section 3: Emergence of synchronized global wavefronts in E. coli biofilms

      In this section, the authors exposed a mature biofilm to blue light. They observe that the intensity peak is reached faster in the cells in the middle. They interpret this as the ion-channel-mediated wavefronts moved from the center of the biofilm. As above, cells in the middle can have different membrane permeability to those at the periphery, and probably even more importantly, there is no light profile shown anywhere in SI/Methods. I could be wrong, but the SI3 A profile is consistent with a potential Gaussian beam profile visible in the field of view. In Methods, I find the light source for the blue light and the type of microscope but no comments on how 'flat' the illumination is across their field of view. This is critical to assess what they are observing in this result section. I do find it interesting that the ThT intensity collapsed from the edges of the biofilms. In the publication I mentioned https://www.sciencedirect.com/science/article/pii/S0006349519308793#app2 [sciencedirect.com], the collapse of fluorescence was not understood (other than it is not membrane potential related). It was observed in Figure 5A, C, and F, that at the point of peak, electrochemical gradient of protons is already collapsed, and that at the point of peak cell expands and cytoplasmic content leaks out. This means that this part of the ThT curve is not membrane potential related. The authors see that after the first peak collapsed there is a period of time where ThT does not stain the cells and then it starts again. If after the first peak the cellular content leaks, as we have observed, then staining that occurs much later could be simply staining of cytoplasmic positively charged content, and the timing of that depends on the dynamics of cytoplasmic content leakage (we observed this to be happening over 2h in individual cells). ThT is also a non-specific amyloid dye, and in starving E. coli cells formation of protein clusters has been observed (https://pubmed.ncbi.nlm.nih.gov/30472191 [pubmed.ncbi.nlm.nih.gov]/), so such cytoplasmic staining seems possible.

      >>It is very easy to see if the illumination is flat (Köhler illumination) by comparing the intensity of background pixels on the detector. It was flat in our case. Protons have little to do with our work for reasons highlighted before. Differential membrane permittivity is a speculative phenomenon not well supported by any evidence and with no clear molecular mechanism.

      Finally, I note that authors observe biofilms of different shapes and sizes and state that they observe similar intensity profiles, which could mean that my comment on 'flatness' of the field of view above is not a concern. However, the scale bar in Figure 2A is not legible, so I can't compare it to the variation of sizes of the biofilms in Figure 2C (67 to 280um). Based on this, I think that the illumination profile is still a concern.

      The referee now contradicts themselves and wants a scale bar to be more visible. We have changed the scale bar.

      Result section 4: Voltage-gated Kch potassium channels mediate ion-channel electrical oscillations in E. coli

      First I note at this point, given that I disagree that the data presented thus 'suggest that E. coli biofilms use electrical signaling to coordinate long-range responses to light stress' as the authors state, it gets harder to comment on the rest of the results.

      In this result section the authors look at the effect of Kch, a putative voltage-gated potassium channel, on ThT profile in E. coli cells. And they see a difference. It is worth noting that in the publication https://www.sciencedirect.com/science/article/pii/S0006349519308793 [sciencedirect.com] it is found that ThT is also likely a substrate for TolC (Figure 4), but that scenario could not be distinguished from the one where TolC mutant has a different membrane permeability (and there is a publication that suggests the latter is happening https://onlinelibrary.wiley.com/doi/10.1111/j.1365-2958.2010.07245.x [onlinelibrary.wiley.com]). Given this, it is also possible that Kch deletion affects the membrane permeability. I do note that in video S4 I seem to see more of, what appear to be, plasmolysed cells. The authors do not see the ThT intensity with this mutant that appears long after the initial peak has disappeared, as they see in WT. It is not clear how long they waited for this, as from Figure S3C it could simply be that the dynamics of this is a lot slower, e.g. Kch deletion changes membrane permeability.

      The work that TolC provides a possible passive pathway for ThT to leave cells seems slightly niche. It just demonstrates another mechanism for the cells to equilibriate the concentrations of ThT in a Nernstian manner i.e. driven by the membrane voltage.

      The authors themselves state that the evidence for Kch being a voltage-gated channel is indirect (line 54). I do not think there is a need to claim function from a ThT profile of E. coli mutants (nor do I believe it's good practice), given how accurate single-channel recordings are currently. To know the exact dependency on the membrane potential, ion channel recordings on this protein are needed first.

      We have good evidence form electrical impedance spectroscopy experiments that Kch increases the conductivity of biofilms  (https://pubs.acs.org/doi/10.1021/acs.nanolett.3c04446, 'Electrical impedance spectroscopy with bacterial biofilms: neuronal-like behavior', E.Akabuogu et al, ACS Nanoletters, 2024, in print.

      Result section 5: Blue light influences ion-channel mediated membrane potential events in E. coli

      In this chapter the authors vary the light intensity and stain the cells with PI (this dye gets into the cells when the membrane becomes very permeable), and the extracellular environment with K+ dye (I have not yet worked carefully with this dye). They find that different amounts of light influence ThT dynamics. This is in line with previous literature (both papers I have been mentioning: Figure 4 https://www.sciencedirect.com/science/article/pii/S0006349519303923 [sciencedirect.com] and https://ars.els-cdn.com/content/image/1-s2.0-S0006349519308793-mmc6.pdf [ars.els-cdn.com] especially SI12), but does not add anything new. I think the results presented here can be explained with previously published theory and do not indicate that the ion-channel mediated membrane potential dynamics is a light stress relief process.

      The simple Nernstian battery model proposed by Pilizota et al is erroneous in our opinion for reasons outlined above. We believe it will prove to be a dead end for bacterial electrophysiology studies.

      Result section 6: Development of a Hodgkin-Huxley model for the observed membrane potential dynamics

      This results section starts with the authors stating: 'our data provide evidence that E. coli manages light stress through well-controlled modulation of its membrane potential dynamics'. As stated above, I think they are instead observing the process of ThT loading while the light is damaging the membrane and thus simultaneously collapsing the electrochemical gradient of protons. As stated above, this has been modelled before. And then, they observe a ThT staining that is independent from membrane potential.

      This is an erroneous niche opinion. Protons have little say in the membrane potential since there are so few of them. The membrane potential is mostly determined by K+.

      I will briefly comment on the Hodgkin Huxley (HH) based model. First, I think there is no evidence for two channels with different activation profiles as authors propose. But also, the HH model has been developed for neurons. There, the leakage and the pumping fluxes are both described by a constant representing conductivity, times the difference between the membrane potential and Nernst potential for the given ion. The conductivity in the model is given as gK*n^4 for potassium, gNa*m^3*h sodium, and gL for leakage, where gK, gNa and gL were measured experimentally for neurons. And, n, m, and h are variables that describe the experimentally observed voltage-gated mechanism of neuronal sodium and potassium channels. (Please see Hodgkin AL, Huxley AF. 1952. Currents carried by sodium and potassium ions through the membrane of the giant axon of Loligo. J. Physiol. 116:449-72 and Hodgkin AL, Huxley AF. 1952. A quantitative description of membrane current and its application to conduction and excitation in nerve. J. Physiol. 117:500-44).

      In the 70 years since Hodgkin and Huxley first presented their model, a huge number of similar models have been proposed to describe cellular electrophysiology. We are not being hyperbolic when we state that the HH models for excitable cells are like the Schrödinger equation for molecules. We carefully adapted our HH model to reflect the currently understood electrophysiology of E. coli.

      Thus, in applying the model to describe bacterial electrophysiology one should ensure near equilibrium requirement holds (so that (V-VQ) etc terms in authors' equation Figure 5 B hold), and potassium and other channels in a given bacterium have similar gating properties to those found in neurons. I am not aware of such measurements in any bacteria, and therefore think the pump leak model of the electrophysiology of bacteria needs to start with fluxes that are more general (for example Keener JP, Sneyd J. 2009. Mathematical physiology: I: Cellular physiology. New York: Springer or https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0000144 [journals.plos.org])

      The reference is to a slightly more modern version of a simple Nernstian battery model. The model will not oscillate and thus will not help modelling membrane potentials in bacteria. We are unsure where the equilibrium requirement comes from (inadequate modelling of the dynamics?)

      Result section 7: Mechanosensitive ion channels (MS) are vital for the first hyperpolarization event in E. coli.

      The results that Mcs channels affect the profile of ThT dye are interesting. It is again possible that the membrane permeability of these mutants has changed and therefore the dynamics have changed, so this needs to be checked first. I also note that our results show that the peak of ThT coincides with cell expansion. For this to be understood a model is needed that also takes into account the link between maintenance of electrochemical gradients of ions in the cell and osmotic pressure.

      The evidence for permeability changes in the membranes seems to be tenuous.

      A side note is that the authors state that the Msc responds to stress-related voltage changes. I think this is an overstatement. Mscs respond to predominantly membrane tension and are mostly nonspecific (see how their action recovers cellular volume in this publication https://www.pnas.org/doi/full/10.1073/pnas.1522185113 [pnas.org]). Authors cite references 35-39 to support this statement. These publications still state that these channels are predominantly membrane tension-gated. Some of the references state that the presence of external ions is important for tension-related gating but sometimes they gate spontaneously in the presence of certain ions. Other publications cited don't really look at gating with respect to ions (39 is on clustering). This is why I think the statement is somewhat misleading.

      We have reworded the discussion of Mscs since the literature appears to be ambiguous. We will try to run some electrical impedance spectroscopy experiments on the Msc mutants in the future to attempt to remove the ambiguity.

      Result section 8: Anomalous ion-channel-mediated wavefronts propagate light stress signals in 3D E. coli biofilms.

      I am not commenting on this result section, as it would only be applicable if ThT was membrane potential dye in E. coli.

      Ok, but we disagree on the use of ThT.

      Aims achieved/results support their conclusions:

      The authors clearly present their data. I am convinced that they have accurately presented everything they observed. However, I think their interpretation of the data and conclusions is inaccurate in line with the discussion I provided above.

      Likely impact of the work on the field, and the utility of the methods and data to the community:

      I do not think this publication should be published in its current format. It should be revised in light of the previous literature as discussed in detail above. I believe presenting it in it's current form on eLife pages would create unnecessary confusion.

      We believe many of the Pilizota group articles are scientifically flawed and are causing the confusion in the literature.

      Any other comments:

      I note, that while this work studies E. coli, it references papers in other bacteria using ThT. For example, in lines 35-36 authors state that bacteria (Bacillus subtilis in this case) in biofilms have been recently found to modulate membrane potential citing the relevant literature from 2015. It is worth noting that the most recent paper https://journals.asm.org/doi/10.1128/mbio.02220-23 [journals.asm.org] found that ThT binds to one or more proteins in the spore coat, suggesting that it does not act as a membrane potential in Bacillus spores. It is possible that it still reports membrane potential in Bacillus cells and the recent results are strictly spore-specific, but these should be kept in mind when using ThT with Bacillus.

      >>ThT was used successfully in previous studies of normal B. subtilis cells (by our own group and A.Prindle, ‘Spatial propagation of electrical signal in circular biofilms’, J.A.Blee et al, Physical Review E, 2019, 100, 052401, J.A.Blee et al, ‘Membrane potentials, oxidative stress and the dispersal response of bacterial biofilms to 405 nm light’, Physical Biology, 2020, 17, 2, 036001, A.Prindle et al, ‘Ion channels enable electrical communication in bacterial communities’, Nature, 2015, 527, 59-63). The connection to low metabolism pore research seems speculative.

      Reviewer #3 (Public Review):

      It has recently been demonstrated that bacteria in biofilms show changes in membrane potential in response to changes in their environment, and that these can propagate signals through the biofilm to coordinate bacterial behavior. Akabuogu et al. contribute to this exciting research area with a study of blue light-induced membrane potential dynamics in E. coli biofilms. They demonstrate that Thioflavin-T (ThT) intensity (a proxy for membrane potential) displays multiphasic dynamics in response to blue light treatment. They additionally use genetic manipulations to implicate the potassium channel Kch in the latter part of these dynamics. Mechanosensitive ion channels may also be involved, although these channels seem to have blue light-independent effects on membrane potential as well. In addition, there are challenges to the quantitative interpretation of ThT microscopy data which require consideration. The authors then explore whether these dynamics are involved in signaling at the community level. The authors suggest that cell firing is both more coordinated when cells are clustered and happens in waves in larger, 3D biofilms; however, in both cases evidence for these claims is incomplete. The authors present two simulations to describe the ThT data. The first of these simulations, a Hodgkin-Huxley model, indicates that the data are consistent with the activity of two ion channels with different kinetics; the Kch channel mutant, which ablates a specific portion of the response curve, is consistent with this. The second model is a fire-diffuse-fire model to describe wavefront propagation of membrane potential changes in a 3D biofilm; because the wavefront data are not presented clearly, the results of this model are difficult to interpret. Finally, the authors discuss whether these membrane potential changes could be involved in generating a protective response to blue light exposure; increased death in a Kch ion channel mutant upon blue light exposure suggests that this may be the case, but a no-light control is needed to clarify this.

      In a few instances, the paper is missing key control experiments that are important to the interpretation of the data. This makes it difficult to judge the meaning of some of the presented experiments.

      (1) An additional control for the effects of autofluorescence is very important. The authors conduct an experiment where they treat cells with CCCP and see that Thioflavin-T (ThT) dynamics do not change over the course of the experiment. They suggest that this demonstrates that autofluorescence does not impact their measurements. However, cellular autofluorescence depends on the physiological state of the cell, which is impacted by CCCP treatment. A much simpler and more direct experiment would be to repeat the measurement in the absence of ThT or any other stain. This experiment should be performed both in the wild-type strain and in the ∆kch mutant.

      ThT is a very bright fluorophore (much brighter than a GFP). It is clear from the images of non-stained samples that autofluorescence provides a negligible contribution to the fluorescence intensity in an image.

      (2) The effects of photobleaching should be considered. Of course, the intensity varies a lot over the course of the experiment in a way that photobleaching alone cannot explain. However, photobleaching can still contribute to the kinetics observed. Photobleaching can be assessed by changing the intensity, duration, or frequency of exposure to excitation light during the experiment. Considerations about photobleaching become particularly important when considering the effect of catalase on ThT intensity. The authors find that the decrease in ThT signal after the initial "spike" is attenuated by the addition of catalase; this is what would be predicted by catalase protecting ThT from photobleaching (indeed, catalase can be used to reduce photobleaching in time lapse imaging).

      Photobleaching was negligible over the course of the experiments. We employed techniques such as reducing sample exposure time and using the appropriate light intensity to minimize photobleaching.

      (3) It would be helpful to have a baseline of membrane potential fluctuations in the absence of the proposed stimulus (in this case, blue light). Including traces of membrane potential recorded without light present would help support the claim that these changes in membrane potential represent a blue light-specific stress response, as the authors suggest. Of course, ThT is blue, so if the excitation light for ThT is problematic for this experiment the alternative dye tetramethylrhodamine methyl ester perchlorate (TMRM) can be used instead.

      Unfortunately the fluorescent baseline is too weak to measure cleanly in this experiment. It appears the collective response of all the bacteria hyperpolarization at the same time appears to dominate the signal (measurements in the eLife article and new potentiometry measurements).

      (4) The effects of ThT in combination with blue light should be more carefully considered. In mitochondria, a combination of high concentrations of blue light and ThT leads to disruption of the PMF (Skates et al. 2021 BioRXiv), and similarly, ThT treatment enhances the photodynamic effects of blue light in E. coli (Bondia et al. 2021 Chemical Communications). If present in this experiment, this effect could confound the interpretation of the PMF dynamics reported in the paper.

      We think the PMF plays a minority role in determining the membrane potential in E. coli. For reasons outlined before (H+ is a minority ion in E. coli compared with K+).

      (5) Figures 4D - E indicate that a ∆kch mutant has increased propidium iodide (PI) staining in the presence of blue light; this is interpreted to mean that Kch-mediated membrane potential dynamics help protect cells from blue light. However, Live/Dead staining results in these strains in the absence of blue light are not reported. This means that the possibility that the ∆kch mutant has a general decrease in survival (independent of any effects of blue light) cannot be ruled out.

      >>Both strains of bacterial has similar growth curve and also engaged in membrane potential dynamics for the duration of the experiment. We were interested in bacterial cells that observed membrane potential dynamics in the presence of the stress. Bacterial cells need to be alive to engage in membrane potential  dynamics (hyperpolarize) under stress conditions. Cells that engaged in membrane potential dynamics and later stained red were only counted after the entire duration. We believe that the wildtype handles the light stress better than the ∆kch mutant as measured with the PI.

      (6) Additionally in Figures 4D - E, the interpretation of this experiment can be confounded by the fact that PI uptake can sometimes be seen in bacterial cells with high membrane potential (Kirchhoff & Cypionka 2017 J Microbial Methods); the interpretation is that high membrane potential can lead to increased PI permeability. Because the membrane potential is largely higher throughout blue light treatment in the ∆kch mutant (Fig. 3AB), this complicates the interpretation of this experiment.

      Kirchhoff & Cypionka 2017 J Microbial Methods, using fluorescence microscopy, suggested that changes in membrane potential dynamics can introduce experimental bias when propidium iodide is used to confirm the viability of tge bacterial strains, B subtilis (DSM-10) and Dinoroseobacter shibae, that are starved of oxygen (via N2 gassing) for 2 hours. They attempted to support their findings by using CCCP in stopping the membrane potential dynamics (but never showed any pictoral or plotted data for this confirmatory experiment). In our experiment methodology, cell death was not forced on the cells by introducing an extra burden or via anoxia. We believe that the accumulation of PI in ∆kch mutant is not due to high membrane potential dynamics but is attributed to the PI, unbiasedly showing damaged/dead cells. We think that propidium iodide is good for this experiment. Propidium iodide is a dye that is extensively used in life sciences. PI has also been used in the study of bacterial electrophysiology (https://pubmed.ncbi.nlm.nih.gov/32343961/, ) and no membrane potential related bias was reported.

      Throughout the paper, many ThT intensity traces are compared, and described as "similar" or "dissimilar", without detailed discussion or a clear standard for comparison. For example, the two membrane potential curves in Fig. S1C are described as "similar" although they have very different shapes, whereas the curves in Fig. 1B and 1D are discussed in terms of their differences although they are evidently much more similar to one another. Without metrics or statistics to compare these curves, it is hard to interpret these claims. These comparative interpretations are additionally challenging because many of the figures in which average trace data are presented do not indicate standard deviation.

      Comparison of small changes in the absolute intensities is problematic in such fluorescence experiments. We mean the shape of the traces is similar and they can be modelled using a HH model with similar parameters.

      The differences between the TMRM and ThT curves that the authors show in Fig. S1C warrant further consideration. Some of the key features of the response in the ThT curve (on which much of the modeling work in the paper relies) are not very apparent in the TMRM data. It is not obvious to me which of these traces will be more representative of the actual underlying membrane potential dynamics.

      In our experiment, TMRM was used to confirm the dynamics observed using ThT. However, ThT appear to be more photostable than TMRM (especially towars the 2nd peak). The most interesting observation is that with both dyes, all phases of the membrane potential dynamics were conspicuous (the first peak, the quiescent period and the second peak). The time periods for these three episodes were also similar.

      A key claim in this paper (that dynamics of firing differ depending on whether cells are alone or in a colony) is underpinned by "time-to-first peak" analysis, but there are some challenges in interpreting these results. The authors report an average time-to-first peak of 7.34 min for the data in Figure 1B, but the average curve in Figure 1B peaks earlier than this. In Figure 1E, it appears that there are a handful of outliers in the "sparse cell" condition that likely explain this discrepancy. Either an outlier analysis should be done and the mean recomputed accordingly, or a more outlier-robust method like the median should be used instead. Then, a statistical comparison of these results will indicate whether there is a significant difference between them.

      The key point is the comparison of standard errors on the standard deviation.

      In two different 3D biofilm experiments, the authors report the propagation of wavefronts of membrane potential; I am unable to discern these wavefronts in the imaging data, and they are not clearly demonstrated by analysis.

      The first data set is presented in Figures 2A, 2B, and Video S3. The images and video are very difficult to interpret because of how the images have been scaled: the center of the biofilm is highly saturated, and the zero value has also been set too high to consistently observe the single cells surrounding the biofilm. With the images scaled this way, it is very difficult to assess dynamics. The time stamps in Video S3 and on the panels in Figure 2A also do not correspond to one another although the same biofilm is shown (and the time course in 2B is also different from what is indicated in 2B). In either case, it appears that the center of the biofilm is consistently brighter than the edges, and the intensity of all cells in the biofilm increases in tandem; by eye, propagating wavefronts (either directed toward the edge or the center) are not evident to me. Increased brightness at the center of the biofilm could be explained by increased cell thickness there (as is typical in this type of biofilm). From the image legend, it is not clear whether the image presented is a single confocal slice or a projection. Even if this is a single confocal slice, in both Video S3 and Figure 2A there are regions of "haze" from out-of-focus light evident, suggesting that light from other focal planes is nonetheless present. This seems to me to be a simpler explanation for the fluorescence dynamics observed in this experiment: cells are all following the same trajectory that corresponds to that seen for single cells, and the center is brighter because of increased biofilm thickness.

      We appreciate the reviewer for this important observation. We have made changes to the figures to address this confusion. The cell cover has no influence on the observed membrane potential dynamics. The entire biofilm was exposed to the same blue light at each time. Therefore all parts of the biofilm received equal amounts of the blue light intensity. The membrane potential dynamics was not influenced by cell density (see Fig 2C). 

      The second data set is presented in Video S6B; I am similarly unable to see any wave propagation in this video. I observe only a consistent decrease in fluorescence intensity throughout the experiment that is spatially uniform (except for the bright, dynamic cells near the top; these presumably represent cells that are floating in the microfluidic and have newly arrived to the imaging region).

      A visual inspection of Video S6B shows a fast rise, a decrease in fluorescence and a second rise (supplementary figure 4B). The data for the fluorescence was carefully obtained using the imaris software. We created a curved geometry on each slice of the confocal stack. We analyzed the surfaces of this curved plane along the z-axis. This was carried out in imaris.

      3D imaging data can be difficult to interpret by eye, so it would perhaps be more helpful to demonstrate these propagating wavefronts by analysis; however, such analysis is not presented in a clear way. The legend in Figure 2B mentions a "wavefront trace", but there is no position information included - this trace instead seems to represent the average intensity trace of all cells. To demonstrate the propagation of a wavefront, this analysis should be shown for different subpopulations of cells at different positions from the center of the biofilm. Data is shown in Figure 8 that reflects the velocity of the wavefront as a function of biofilm position; however, because the wavefronts themselves are not evident in the data, it is difficult to interpret this analysis. The methods section additionally does not contain sufficient information about what these velocities represent and how they are calculated. Because of this, it is difficult for me to evaluate the section of the paper pertaining to wave propagation and the predicted biofilm critical size.

      The analysis is considered in more detail in a more expansive modelling article, currently under peer review in a physics journal, ‘Electrical signalling in three dimensional bacterial biofilms using an agent based fire-diffuse-fire model’, V.Martorelli, et al, 2024 https://www.biorxiv.org/content/10.1101/2023.11.17.567515v1

      There are some instances in the paper where claims are made that do not have data shown or are not evident in the cited data:

      (1) In the first results section, "When CCCP was added, we observed a fast efflux of ions in all cells"- the data figure pertaining to this experiment is in Fig. S1E, which does not show any ion efflux. The methods section does not mention how ion efflux was measured during CCCP treatment.

      We have worded this differently to properly convey our results.

      (2) In the discussion of voltage-gated calcium channels, the authors refer to "spiking events", but these are not obvious in Figure S3E. Although the fluorescence intensity changes over time, it's hard to distinguish these fluctuations from measurement noise; a no-light control could help clarify this.

      The calcium transients observed were not due to noise or artefacts.

      (3) The authors state that the membrane potential dynamics simulated in Figure 7B are similar to those observed in 3D biofilms in Fig. S4B; however, the second peak is not clearly evident in Fig. S4B and it looks very different for the mature biofilm data reported in Fig. 2. I have some additional confusion about this data specifically: in the intensity trace shown in Fig. S4B, the intensity in the second frame is much higher than the first; this is not evident in Video S6B, in which the highest intensity is in the first frame at time 0. Similarly, the graph indicates that the intensity at 60 minutes is higher than the intensity at 4 minutes, but this is not the case in Fig. S4A or Video S6B.

      The confusion stated here has now been addressed. Also it should be noted that while Fig 2.1 was obtained with LED light source, Fig S4A was obtained using a laser light source. While obtaining the confocal images (for Fig S4A ), the light intensity was controlled to further minimize photobleaching. Most importantly, there is an evidence of slow rise to the 2nd peak in Fig S4B. The first peak, quiescence and slow rise to second peak are evident.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Scientific recommendations:

      - Although Fig 4A clearly shows that light stimulation has an influence on the dynamics of cell membrane potential in the biofilm, it is important to rule out the contribution of variations in environmental parameters. I understand that for technical reasons, the flow of fresh medium must be stopped during image acquisition. Therefore, I suggest performing control experiments, where the flow is stopped before image acquisition (15min, 30min, 45min, and 1h before). If there is no significant contribution from environmental variations (pH, RedOx), the dynamics of the electrical response should be superimposed whatever the delay between stopping the flow stop and switching on the light.

      In this current research study, we were focused on studying how E. coli cells and biofilms react to blue light stress via their membrane potential dynamics. This involved growing the cells and biofilms, stopping the media flow and obtaining data immediately. We believe that stopping the flow not only helped us to manage data acquisition, it also helped us reduce the effect of environmental factors. In our future study we will expand the work to include how the membrane potential dynamics evolve in the presence of changing environmental factors for example such induced by stopping the flow at varied times.

      - Since TMRM signal exhibits a linear increase after the first response peak (Supplementary Figure 1D), I recommend mitigating the statement at line 78.

      - To improve the spatial analysis of the electrical response, I suggest plotting kymographs of the intensity profiles across the biofilm. I have plotted this kymograph for Video S3 and it appears that there is no electrical propagation for the second peak. In addition, the authors should provide technical details of how R^2(t) is measured in the first regime (Figure 7E).

      See the dedicated simulation article for more details. https://www.biorxiv.org/content/10.1101/2023.11.17.567515v1

      - Line 152: To assess the variability of the latency, the authors should consider measuring the variance divided by the mean instead of SD, which may depend on the average value.

      We are happy with our current use of standard error on the standard deviation. It shows what we claim to be true.

      - Line 154-155: To truly determine whether the amplitude of the "action potential" is independent of biofilm size, the authors should not normalise the signals.

      Good point. We qualitatively compared both normalized and unnormalized data. Recent electrical impedance spectroscopy measurements (unpublished) indicate that the electrical activity is an extensive quantity i.e. it scales with the size of the biofilms.

      - To precise the role of K+ in the habituation response, I suggest using valinomycin at sub-inhibitory concentrations (10µM). Besides, the high concentration of CCCP used in this study completely inhibits cell activity. Not surprisingly, no electrical response to light stimulation was observed in the presence of CCCP. Finally, the Kch complementation experiment exhibits a "drop after the first peak" on a single point. It would be more convincing to increase the temporal resolution (1min->10s) to show that there is indeed a first and a second peak.

      An interesting experiment for the future.

      - Line 237-238: There are only two points suggesting that the dynamics of hyperpolarization are faster at higher irradiance(Fig 4A). The authors should consider adding a third intermediate point at 17µW/mm^2 to confirm the statement made in this sentence.

      Multiple repeats were performed. We are confident of the robustness of our data.

      - Line 249 + Fig 4E: It seems that the data reported on Fig 4E are extracted from Fig 4D. If this is indeed the case, the data should be normalised by the total population size to compare survival probabilities under the two conditions. It would also be great to measure these probabilities (for WT and ∆kch) in the presence of ROS scavengers.

      - To distinguish between model fitting and model predictions, the authors should clearly state which parameters are taken from the literature and which parameters are adjusted to fit the experimental data.

      - Supplementary Figure 4A: why can't we see any wavefront in this series of images?

      For the experimental data, the wavefront was analyzed by employing the imaris software. We systematically created a ROI with a curved geometry within the confocal stack (the biofilm). The fluorescence of ThT was traced along the surface of the curved geometry was analyzed along the z-axis.

      - Fig 7B: Could the authors explain why the plateau is higher in the simulations than in the biofilm experiments? Could they add noise on the firing activities?

      See the dedicated Martorelli modelling article. In general we would need to approach stochastic Hodgkin-Huxley modelling and the fluorescence data (and electrical impedance spectroscopy data) presented does not have extensive noise (due to collective averaging over many bacteria cells).

      - Supplementary Figure 4B: Why can't we see the second peak in confocal images?

      The second peak is present although not as robust as in Fig 2B. The confocal images were obtained with a laser source. Therefore we tried to create a balance between applying sufficient light stress on the bacterial cells and mitigating photobleaching.

      Editing recommendations:

      The editing recommendations below has been applied where appropriate

      - Many important technical details are missing (e.g. R^2, curvature, and 445nm irradiance measurements). Error bars are missing from most graphs. The captions should clearly indicate if these are single-cell or biofilm experiments, strain name, illumination conditions, number of experiments, SD, or SE. Please indicate on all panels of all figures in the main text and in the supplements, which are the conditions: single cell vs. biofilm, strains, medium, centrifugal vs centripetal etc..., where relevant. Please also draw error bars everywhere.

      We have now made appropriate changes. We specifically use cells when we were dealing with single cells and biofilms when we worked on biofilms. We decided to describe the strain name either on the panel or the image description.

      - Line 47-51: The way the paragraph is written suggests that no coordinated electrical oscillations have been observed in Gram-negative biofilms. However, Hennes et al (referenced as 57 in this manuscript) have shown that a wave of hyperpolarized cells propagates in Neisseria gonorrhoea colony, which is a Gram-negative bacterium.

      We are now aware of this work. It was not published when we first submitted our work and the authors claim the waves of activity are due to ROS diffusion NOT propagating waves of ions (coordinated electrical wavefronts).

      - Line 59: "stressor" -> "stress" or "perturbation".

      The correction has been made.

      - Line 153: Please indicate in the Material&Methods how the size of the biofilm is measured.

      The biofilm size was obtained using BiofilmQ and the step by step guide for using BiofilmQ were stated..

      - Figure 2A: Please provide associated brightfield images to locate bacteria.

      - Line 186: Please remove "wavefront" from the caption. Fig2B only shows the average signal as a function of time.

      This correction has been implemented.

      - Fig 3B,C: Please indicate single cell and biofilm on the panels and also WT and ∆kch.

      - Line 289: I suggest adding "in single cell experiments" to the title of this section.

      - Fig 5A: blue light is always present at regular time intervals during regime I and II. The presence of blue light only in regime I could be misleading.

      - Fig 5C: The curve in Fig 5D seems to correspond to the biofilm case. The curve given by the model, should be compared with the average curve presented in Fig 1D.

      - Fig 6A, B, and C: These figures could be moved to supplements.

      - Line 392: Replace "turgidity" with "turgor pressure".

      - Fig 7C,E: Please use a log-log scale to represent these data and indicate the line of slope 1.

      - Fig 7E: The x-axis has been cropped.

      - Please provide a supplementary movie for the data presented in Fig 7E.

      - Line 455: E. Coli biofilms do not express ThT.

      - Line 466: "\gamma is the anomalous exponent". Please remove anomalous (\gamma can equal 1 at this stage).

      - Line 475: Please replace "section" with "projection".

      - Line 476: Please replace "spatiotemporal" with "temporal". There is no spatial dependency in either figure.

      - Line 500: Please define Eikonal approximation.

      - Fig 8 could be moved to supplements.

      - Line 553: "predicted" -> "predict".

      - Line 593: Could the authors explain why their model offers much better quantitative agreement?

      - Line 669: What does "universal" mean in that context?

      - Line 671: A volume can be pipetted but not a concentration.

      - Line 676: Are triplicates technical or biological replicates?

      - Sup Fig1: Please use minutes instead of seconds in panel A.

      - Model for membrane dynamics: "The fraction of time the Q+ channel is open" -> "The dynamics of Q+ channel activity can be written". Ditto for K+ channel...

      - Model for membrane dynamics: "the term ... is a threshold-linear". This function is not linear at all. Why is it called linear? Also, please describe what \sigma is.

      - ABFDF model: "releasing a given concentration" -> "releasing a local concentration" or "a given number" but it's not \sigma anymore. Besides, this \sigma is unlikely related to the previous \sigma used in the model of membrane potential dynamics in single cells. Please consider renaming one or the other. Also, ions are referred to as C+ in the text and C in equation 8. Am I missing something?

      Reviewer #2 (Recommendations For The Authors):

      I have included all my comments as one review. I have done so, despite the fact that some minor comments could have gone into this section, because I decided to review each Result section. I thus felt that not writing it as one review might be harder to follow. I have however highlighted which comments are minor suggestions or where I felt corrections.

      However, while I am happy with all my comments being public, given their nature I think they should be shown to authors first. Perhaps the authors want to go over them and think about it before deciding if they are happy for their manuscript to be published along with these comments, or not. I will highlight this in an email to the editor. I question whether in this case, given that I am raising major issues, publishing both the manuscript and the comments is the way to go as I think it might just generate confusion among the audience.

      Reviewer #3 (Recommendations For The Authors):

      I was unable to find any legends for any of the supplemental videos in my review materials, and I could not open supplemental video 5.

      I made some comments in the public review about the analysis and interpretation of the time-to-fire data. One of the other challenges in this data set is that the time resolution is limited- it seems that a large proportion of cells have already fired after a single acquisition frame. It would be ideal to increase the time resolution on this measurement to improve precision. This could be done by imaging more quickly, but that would perhaps necessitate more blue light exposure; an alternative is to do this experiment under lower blue light irradiance where the first spike time is increased (Figure 4A).

      In the public review, I mentioned the possible impact of high membrane potential on PI permeability. To address this, the experiment could be repeated with other stains, or the viability of blue light-treated cells could be addressed more directly by outgrowth or colony-forming unit assays.

      In the public review, I mentioned the possible combined toxicity of ThT and blue light. Live/dead experiments after blue light exposure with and without ThT could be used to test for such effects, and/or the growth curve experiment in Figure 1F could be repeated with blue light exposure at a comparable irradiance used in the experiment.

      Throughout the paper and figure legends, it would help to have more methodological details in the main text, especially those that are critical for the interpretation of the experiment. The experimental details in the methods section are nicely described, but the data analysis section should be expanded significantly.

      At the end of the results section, the authors suggest a critical biofilm size of only 4 µm for wavefront propagation (not much larger than a single cell!). The authors show responses for various biofilm sizes in Fig. 2C, but these are all substantially larger. Are there data for cell clusters above and below this size that could support this claim more directly?

      The authors mention image registration as part of their analysis pipeline, but the 3D data sets in Video S6B and Fig. S4A do not appear to be registered- were these registered prior to the velocity analysis reported in Fig. 8?

      One of the most challenging claims to demonstrate in this paper is that these membrane potential wavefronts are involved in coordinating a large, biofilm-scale response to blue light. One possible way to test this might be to repeat the Live/Dead experiment in planktonic culture or the single-cell condition. If the protection from blue light specifically emerges due to coordinated activity of the biofilm, the Kch mutant would not be expected to show a change in Live/Dead staining in non-biofilm conditions.

      Line 140: How is "mature biofilm" defined? Also on this same line, what does "spontaneous" mean here?

      Line 151: "much smaller": Given that the reported time for 3D biofilms is 2.73 {plus minus} 0.85 min and in microclusters is 3.27 {plus minus} 1.77 min, this seems overly strong.

      Line 155: How is "biofilm density" characterized? Additionally, the data in Figure 2C are presented in distance units (µm), but the text refers to "areal coverage"- please define the meaning of these distance units in the legend and/or here in the text (is this the average radius?).

      Lines 161-162: These claims seem strong given the data presented before, and the logic is not very explicit. For example, in the second sentence, the idea that this signaling is used to "coordinate long-range responses to light stress" does not seem strongly evidenced at this point in the paper. What is meant by a long-range response to light stress- are there processes to respond to light that occur at long-length scales (rather than on the single-cell scale)? If so, is there evidence that these membrane potential changes could induce these responses? Please clarify the logic behind these conclusions.

      Lines 235-236: In the lower irradiance conditions, the responses are slower overall, and it looks like the ThT intensity is beginning to rise at the end of the measurement. Could a more prominent second peak be observed in these cases if the measurement time was extended?

      Line 242-243: The overall trajectories of extracellular potassium are indeed similar, but the kinetics of the second peak of potassium are different than those observed by ThT (it rises some minutes earlier)- is this consistent with the idea that Kch is responsible for that peak? Additionally, the potassium dynamics also reflect the first peak- is this surprising given that the Kch channel has no effect on this peak?

      Line 255-256: Again, this seems like a very strong claim. There are several possible interpretations of the catalase experiment (which should be discussed); this experiment perhaps suggests that ROS impacts membrane potential, but does not obviously indicate that these membrane potential fluctuations mitigate ROS levels or help the cells respond to ROS stress. The loss of viability in the ∆kch mutant might indicate a link between these membrane potential experiments and viability, but it is hard to interpret without the no-light control I mention in the public review.

      Lines 313-315: "The model predicts... the external light stress". Please clarify this section. Where this prediction arises from in the modeling work? Second, I am not sure what is meant by "modulates the light stress" or "keeps the cell dynamics robust to the intensity of external light stress" (especially since the dynamics clearly vary with irradiance, as seen in Figure 4A).

      Line 322: I am not sure what "handles the ROS by adjusting the profile of the membrane potential dynamics" means. What is meant by "handling" ROS? Is the hypothesis that membrane potential dynamics themselves are protective against ROS, or that they induce a ROS-protective response downstream, or something else? Later in lines 327-8 the authors write that changes in the response to ROS in the model agree with the hypothesis, but just showing that ROS impacts the membrane potential does not seem to demonstrate that this has a protective effect against ROS.

      Line 365-366: This section title seems confusing- mechanosensitive ion channels totally ablate membrane potential dynamics, they don't have a specific effect on the first hyperpolarization event. The claim that mechanonsensitive ion channels are specifically involved in the first event also appears in the abstract.

      Also, the apparent membrane potential is much lower even at the start of the experiment in these mutants- is this expected? This seems to imply that these ion channels also have a blue light independent effect.

      Lines 368, 371: Should be VGCCs rather than VGGCs.

      Line 477: I believe the figure reference here should be to Figure 7B, not 6B.

      Line 567-568: "The initial spike is key to registering the presence of the light stress." What is the evidence for this claim?

      Line 592-594: "We have presented much better quantitative agreement..." This is a strong claim; it is not immediately evident to me that the agreement between model and prediction is "much better" in this work than in the cited work. The model in Figure 4 of reference 57 seems to capture the key features of their data. Clarification is needed about this claim.

      Line 613: "...strains did not have any additional mutations." This seems to imply that whole genome sequencing was performed- is this the case?

      Line 627: I believe this should refer to Figure S2A-B rather than S1.

      Line 719: What percentage of cells did not hyperpolarize in these experiments?

      Lines 751-754: As I mentioned above, significant detail is missing here about how these measurements were made. How is "radius" defined in 3D biofilms like the one shown in Video S6B, which looks very flat? What is meant by the distance from the substrate to the core, since usually in this biofilm geometry, the core is directly on the substrate? Most importantly, this only describes the process of sectioning the data- how were these sections used to compute the velocity of ThT signal propagation?

      I also have some comments specifically on the figure presentation:

      Normalization from 0 to 1 has been done in some of the ThT traces in the paper, but not all. The claims in the paper would be easiest to evaluate if the non-normalized data were shown- this is important for the interpretation of some of the claims.

      Some indication of standard deviation (error bars or shading) should be added to all figures where mean traces are plotted.

      Throughout the paper, I am a bit confused by the time axis; the data consistently starts at 1 minute. This is not intuitive to me, because it seems that the blue light being applied to the cells is also the excitation laser for ThT- in that case, shouldn't the first imaging frame be at time 0 (when the blue light is first applied)? Or is there an additional exposure of blue light 1 minute before imaging starts? This is consequential because it impacts the measured time to the first spike. (Additionally, all of the video time stamps start at 0).

      Please increase the size of the scale bars and bar labels throughout, especially in Figure 2A and S4A.

      In Figure 1B and D, it would help to decrease the opacity on the individual traces so that more of them can be discerned. It would also improve clarity to have data from the different experiments shown with different colored lines, so that variability between experiments can be clearly visualized.

      Results in Figure 1E would be easier to interpret if the frequency were normalized to total N. It is hard to tell from this graph whether the edges and bin widths are the same between the data sets, but if not, they should be. Also, it would help to reduce the opacity of the sparse cell data set so that the full microcluster data set can be seen as well.

      Biofilm images are shown in Figures 2A, S3A, and Video S3- these are all of the same biofilm. Why not take the opportunity to show different experimental replicates in these different figures? The same goes for Figure S4A and Video S6B, which again are of the same biofilm.

      Figure 2C would be much easier to read if the curves were colored in order of their size; the same is true for Figure 4A and irradiance.

      The complementation data in Figure S3D should be moved to the main text figure 3 alongside the data about the corresponding knockout to make it easier to compare the curves.

      Fig.ure S3E: Is the Y-axis in this graph mislabeled? It is labeled as ThT fluorescence, but it seems that it is reporting fluorescence from the calcium indicator?

      Video S6B is very confusing - why does the video play first forwards and then backwards? Unless I am looking very carefully at the time stamps it is easy to misinterpret this as a rise in the intensity at the end of the experiment. Without a video legend, it's hard to understand this, but I think it would be much more straightforward to interpret if it only played forward. (Also, why is this video labeled 6B when there is no video 6A?)

    1. Author response:

      The following is the authors’ response to the original reviews.

      The points raised let us critically rethink our approach, our results, and our conclusions. Furthermore, it gave us the chance to elaborate on some critical aspects that were mentioned. With the help of the reviewers, we made some clarifications in the point-by-point responses and implemented them in the manuscript. Furthermore, we modified the figures as suggested:

      - The colors in Figure 1C, D, G and H have been adapted as suggested

      - We added a Figure2-figure supplement 1, which strengthens our conclusion in Figure 2

      - As asked by reviewer #1 (weaknesses #3), we added the data about neutrophil numbers in the different organs (Figure 6-figure supplement 3C).

      Reviewer #1 (Public Review):

      Summary:

      - Extracellular ATP represents a danger-associated molecular pattern associated to tissue damage and can act also in an autocrine fashion in macrophages to promote proinflammatory responses, as observed in a previous paper by the authors in abdominal sepsis. The present study addresses an important aspect possibly conditioning the outcome of sepsis that is the release of ATP by bacteria. The authors show that sepsis-associated bacteria do in fact release ATP in a growth dependent and strain-specific manner. However, whether this bacterial derived ATP play a role in the pathogenesis of abdominal sepsis has not been determined. To address this question, a number of mutant strains of E. coli has been used first to correlate bacterial ATP release with growth and then, with outer membrane integrity and bacterial death. By using E. coli transformants expressing the ATP-degrading enzyme apyrase in the periplasmic space, the paper nicely shows that abdominal sepsis by these transformants results in significantly improved survival. This effect was associated with a reduction of peritoneal macrophages and CX3CR1+ monocytes, and an increase in neutrophils. To extrapolate the function of bacterial ATP from the systemic response to microorganisms, the authors exploited bacterial OMVs either loaded or not with ATP to investigate the systemic effects devoid of living microorganisms. This approach showed that ATP-loaded OMVs induced degranulation of neutrophils after lysosomal uptake, suggesting that this mechanism could contribute to sepsis severity.

      Strengths:

      - A strong part of the study is the analysis of E. coli mutants to address different aspects of bacterial release of ATP that could be relevant during systemic dissemination of bacteria in the host.

      We want to thank the reviewer for recognizing this important aspect of our experimental approach.

      Weaknesses:

      - As pointed out in the limitations of the study whether ATP-loaded OMVs provide a mechanistic proof of the pathogenetic role of bacteria-derived ATP independently of live microorganisms in sepsis is interesting but not definitively convincing. It could be useful to see whether degranulation of neutrophils is differentially induced by apyrase-expressing vs control E. coli transformants.

      We thank the reviewer for raising several important points. In our study, we assessed local and systemic effects of released bacterial ATP. The consequences of local bacterial ATP release were assessed using an apyrase-expressing E. coli transformant. Locally, bacterial ATP resulted in a decrease in neutrophil numbers and we hypothesize that directly released bacterial ATP either leads to neutrophil death (e.g. via P2X7 receptor (Proietti et al., 2019)) or interferes with the recruitment of neutrophils (e.g. via P2Y receptors (Junger, 2011)).

      The systemic consequences were assessed using ATP-loaded and empty OMV. We have shown that degranulation is induced by OMV-derived bacterial ATP. ATP-containing OMV are engulfed by neutrophils, reach its endolysosomal compartment and might activate purinergic receptors, which then lead to aberrant degranulation. This concept, that needs to be explored in future studies, is fundamentally different from classical purinergic signaling via directly released bacterial ATP into the extracellular space.

      It is possible that neutrophil degranulation is also modulated by directly released bacterial ATP. We agree that this should be assessed in future studies. Also, the role of OMV-derived bacterial ATP should be assessed locally as well as the importance of directly released vs. OMV-mediated bacterial ATP dissected locally. Based on our measurements (Figure 4-figure supplement 1A and Figure 5C), we estimate that the effect of OMV-derived bacterial ATP might be much smaller than the effects of directly released bacterial ATP. Thus, direct ATP release might predominate locally. However, we fully agree that this has to be investigated in a future study to reconcile the different aspects of bacterial ATP signaling. A paragraph will be added to the manuscript, in which we discuss this particular issue.

      - Also, the increase of neutrophils in bacterial ATP-depleted abdominal sepsis, which has better outcomes than "ATP-proficient" sepsis, seems difficult to correlate to the hypothesized tissue damage induced by ATP delivered via non-infectious OMVs.

      We fully acknowledge the mentioned discrepancy. What we propose is that bacterial ATP exhibits different functions that are dependent on the release mechanism (see above). Locally, in the peritoneal cavity, neutrophil numbers are decreased by directly released bacterial ATP. Remotely, ATP is delivered via OMV and impacts on neutrophil function. We agree that, in particular, in the peritoneal cavity, both effects may play a role. However, the impact of directly released bacterial ATP seems to be dominant (see above).

      We propose that neutrophils are decreased locally because of directly released bacterial ATP, which prevents efficient infection control and, therefore, impairs sepsis survival. In addition, these fewer neutrophils might even be dysregulated by the engulfment of bacterial ATP delivered via OMV, which leads to an upregulated and possibly aberrant degranulation process worsening local and remote tissue damage. We agree that in addition to neutrophil numbers, the function of local neutrophils should be assessed with and without the influence of OMV-delivered bacterial ATP. This could be done by RNA sequencing of primary neutrophils from the peritoneal cavity or neutrophil cell lines as well as degranulation assays.

      - Are the neutrophils counts affected by ATP delivered via OMVs?

      This is difficult to show in the peritoneal cavity where we have both, directly released bacterial ATP and OMV-derived bacterial ATP. We assessed such putative difference, however, for the systemic organs and the blood, where we did not find any differences in neutrophil numbers.

      Author response image 1.

      - A comparison of cytokine profiles in the abdominal fluids of E. coli and OMV treated animals could be helpful in defining the different responses induced by OMV-delivered vs bacterial-released ATP. The analyses performed on OMV treated versus E. coli infected mice are not closely related and difficult to combine when trying to draw a hypothesis for bacterial ATP in sepsis.

      We fully agree that there are several open questions that remain to be elucidated, in particular, to differentiate the local role of directly released versus OMV-delivered bacterial ATP. In this study, we laid the foundation for future in vivo research to examine the specific role of bacterial ATP in sepsis. Such future research avenues might be to investigate the local effects of OMV-delivered bacterial ATP, and how neutrophil migration, apoptosis and degranulation are altered. We agree that exploration of the local secretory immune response and cytokine profiles are relevant to understand the different mechanisms of how bacterial ATP alters sepsis. However, such experiments should be ideally performed in systems where the source and the delivery of ATP can be modulated locally.

      - Also it was not clear why lung neutrophils were used for the RNAseq data generation and analysis.

      Thank you for this remark. We have chosen primary lung neutrophils for four reasons:

      (1) Isolation of primary lung neutrophils allowed us to assess an in vivo response that would not have been possible with cell lines.

      (2) The lung and the respiratory system are among the clinically most important organs affected during sepsis resulting in a significant cause of mortality.

      (3) We show in Figure 6C that specifically in the lung, OMV are engulfed by neutrophils, which shows the relevance of the lung also in our study context.

      (4) And finally, lung neutrophils were chosen to examine specifically distant and not local effects.

      Reviewer #2 (Public Review):

      Summary:

      - In their manuscript "Released Bacterial ATP Shapes Local and Systemic Inflammation during Abdominal Sepsis", Daniel Spari et al. explored the dual role of ATP in exacerbating sepsis, revealing that ATP from both host and bacteria significantly impacts immune responses and disease progression.

      Strengths:

      - The study meticulously examines the complex relationship between ATP release and bacterial growth, membrane integrity, and how bacterial ATP potentially dampens inflammatory responses, thereby impairing survival in sepsis models. Additionally, this compelling paper implies a concept that bacterial OMVs act as vehicles for the systemic distribution of ATP, influencing neutrophil activity and exacerbating sepsis severity.

      We thank the reviewer for mentioning these key points and supporting the relevance of our study.

      Weaknesses:

      (1) The researchers extracted and cultivated abdominal fluid on LB agar plates, then randomly picked 25 colonies for analysis. However, they did not conduct 16S rRNA gene amplicon sequencing on the fluid itself. It is worth noting that the bacterial species present may vary depending on the individual patients. It would be beneficial if the authors could specify whether they've verified the existence of unculturable species capable of secreting high levels of Extracellular ATP.

      Most septic complications are caused by a limited spectrum of bacteria, belonging mainly either to the Firmicutes or the Proteobacteria phyla, including E. coli, K. pneumoniae, S. aureus or E. faecalis (Diekema et al., 2019; Mureșan et al., 2018). We validated this well documented existing evidence by randomly assessing 25 colonies. For the planned experiments, it was crucial to work with culturable bacteria; otherwise, ATP measurements, the modulation of ATP generation or loading of OMV would not have been possible. Using such culturable bacteria allowed us to describe mechanisms of ATP release.

      We fully agree that hard-to-culture or unculturable bacteria might contribute significantly to septic complications. This, however, would need to be explored in future studies using extensive culturing methods (Cheng et al., 2022).

      (2) Do mice lacking commensal bacteria show a lack of extracellular ATP following cecal ligation puncture?

      ATP is typically secreted by many cells of the host in active and passive manners in the case of any injury, including cecal ligation and puncture (Burnstock, 2016; Dosch et al., 2018; Eltzschig et al., 2012; Idzko et al., 2014). We hypothesize that bacterial ATP is a potential priming agent at early stages of sepsis, and indeed, at such early time points, a comparison of peritoneal ATP levels between germfree and colonized mice could support our hypothesis. Future studies addressing this question must, however, correct for the different immune responses between germ-free and colonized mice. This is of utmost importance, especially for the cecal ligation and puncture model, since the cecum of germ-free mice is extremely large, making such experiments hard to control.

      (3) The authors isolated various bacteria from abdominal fluid, encompassing both Gram-negative and Gram-positive types. Nevertheless, their emphasis appeared to be primarily on the Gram-negative E. coli. It would be beneficial to ascertain whether the mechanisms of Extracellular ATP release differ between Gram-positive and Gram-negative bacteria. This is particularly relevant given that the Gram-positive bacterium E. faecalis, also isolated from the abdominal fluid, is recognized for its propensity to release substantial amounts of Extracellular ATP.

      We fully agree with this comment. In this paper, we used E. coli as our model organism to determine the principles of sepsis-associated bacterial ATP release and therefore focused on gram-negative bacteria. In addition to the direct, growth-dependent release, we found a relevant impact of OMV-delivered bacterial ATP. For this latter purpose, a gram-negative strain, in which OMV generation has been well described (Schwechheimer & Kuehn, 2015), was chosen. Recently, gram-positive bacteria have been shown to secrete ATP and OMV as well (Briaud & Carroll, 2020; Hironaka et al., 2013; Iwase et al., 2010). Given the fundamental differences in the structure of the cell wall of gram-positive bacteria and the mechanisms of OMV generation and release, future studies are required to assess the relevance of directly released and OMV-delivered ATP in gram-positive bacteria.

      (4) The authors observed changes in the levels of LPM, SPM, and neutrophils in vivo. However, it remains uncertain whether the proliferation or migration of these cells is modulated or inhibited by ATP receptors like P2Y receptors. This aspect requires further investigation to establish a convincing connection.

      We fully agree with this comment. The decrease in LPM and the consequential predomination of SPM have been well described after inflammatory stimuli in the context of the macrophage disappearance reaction (Ghosn et al., 2010). Also, it has been shown that purinergic signaling modulates infiltration of neutrophils and can lead to cell death as a consequence of  P2Y and P2X receptor activation (Junger, 2011; Proietti et al., 2019). In our study, we propose that intracellular purinergic receptors contribute to neutrophil function during sepsis. After introducing the general principles and fundaments of bacterial ATP with our studies, we fully agree that additional experiments need to address downstream purinergic receptor activation. That, however, would go beyond the scope of our study.

      (5) Additionally, is it possible that the observed in vivo changes could be triggered by bacterial components other than Extracellular ATP? In this research field, a comprehensive collection of inhibitors is available, so it is desirable to utilize them to demonstrate clearer results.

      This question is of utmost importance and defined the choice of our model and experimental approach. When we started the project, we used two different E. coli mutants that release low (ompC) and high (eaeH) amounts of ATP. However, the limitation of this approach is that these are different bacteria, which may also differ in the components they secrete or the surface proteins they express. We, therefore, decided against that approach. With the approach we finally used (same bacterium, just with and without ATP), we aimed to minimize the influence of non-ATP bacterial components.

      (6) Have the authors considered the role of host-derived Extracellular ATP in the context of inflammation?

      Yes, the role of host-derived extracellular ATP in inflammation and sepsis is well-established with contradictory results (Csóka et al., 2015; Ledderose et al., 2016). This conflicting data was the rationale to test the relevance of bacterial ATP. We suggest that bacterial ATP is essential in the early phase of sepsis when bacteria invade the sterile compartment and before efficient host response, including the eukaryotic release of ATP, is established.

      (7) The authors mention that Extracellular ATP is rapidly hydrolyzed by ectonucleotases in vivo. Are the changes of immune cells within the peritoneal cavity caused by Extracellular ATP released from bacterial death or by OMVs?

      This is a relevant question that was also asked by reviewer #1, and we answered it in detail above (weaknesses comment #1 and #2). From our ATP measurements (Figure 4-figure supplement 1A and Figure 5C), we conclude that locally, the role of directly released bacterial ATP (extracellular) predominates over OMV-derived bacterial ATP. Furthermore, the mechanisms between directly released and OMV-derived bacterial ATP (within OMV, engulfed and transported to the endolysosomal compartment) are different, and especially extracellular ATP has been described to lead to apoptosis via P2X7 signaling.

      (8) In the manuscript, the sample size (n) for the data consistently remains at 2. I would suggest expanding the sample size to enhance the robustness and rigor of the results.

      Two biological replicates (independent cultures) were only used for the bacteria cultures in Figure 1, Figure 2, and Figure 3, which achieved similar results and the standard deviation remained very small, indicating its robustness. In the in vitro experiments in Figure 5 we used a sample size of 6 (three biological replicates measured in technical duplicates), since we saw bigger deviations in our measurements. For the in vivo experiments, we always used 5 or more animals in at least two independent experiments.

      Reviewer #2 (Recommendations For The Authors):

      (9). Line 37: 11 million sepsis-related deaths were reported "in" 2017.

      The passage has been corrected as suggested.

      (10) By the way, the similar colors used in Figure 1C and G are too chaotic, making it difficult to distinguish.

      We agree, the colors have been adapted.

      Author response image 2.

      (11). All "in vivo" and "in vitro" should be italicized.

      We italicized all of them.

      (12). The title of Figure 4 is confusing: "Impairs sepsis outcome in vivo?" Could you make it more specific?

      We agree, the title has been rephrased:

      “Bacterial ATP reduces neutrophil counts and reduces survival in a mouse model of abdominal sepsis.”

      (13) Line 314-316: The sentence "Potentially, despite the lack of a transporter, ATP may similarly to eukaryotic cells leak (Yegutkin et al., 2006) across the inner membrane into the periplasmic space that lacks the enzymes for ATP generation." sounds odd.

      This passage was reformulated in the manuscript.

      “Despite the lack of a transporter, ATP may leak across the inner membrane into the periplasmic space. Such leakage may be similar to baseline leakage in eukaryotic cells (Yegutkin et al., 2006).”

      (14) The numerical notation in the paper is odd: sometimes it uses a prime symbol as a superscript (such as line 504), and sometimes it does not (such as line 421). Should it be standardized to "3,200" and "150,000"?

      Thank you for this remark. The numbers have been standardized throughout the manuscript.

      (15) Line "0.4 mm EP cuvettes" should be "0.4 cm EP cuvettes"

      The specified passage has been corrected as suggested.

      References

      Briaud, P., & Carroll, R. K. (2020). Extracellular Vesicle Biogenesis and Functions in Gram-Positive Bacteria. Infection and Immunity, 88(12), 10.1128/iai.00433-20. https://doi.org/10.1128/iai.00433-20

      Burnstock, G. (2016). P2X ion channel receptors and inflammation. Purinergic Signalling, 12(1), 59–67. https://doi.org/10.1007/s11302-015-9493-0

      Cheng, A. G., Ho, P.-Y., Aranda-Díaz, A., Jain, S., Yu, F. B., Meng, X., Wang, M., Iakiviak, M., Nagashima, K., Zhao, A., Murugkar, P., Patil, A., Atabakhsh, K., Weakley, A., Yan, J., Brumbaugh, A. R., Higginbottom, S., Dimas, A., Shiver, A. L., … Fischbach, M. A. (2022). Design, construction, and in vivo augmentation of a complex gut microbiome. Cell, 185(19), 3617-3636.e19. https://doi.org/10.1016/j.cell.2022.08.003

      Csóka, B., Németh, Z. H., Törő, G., Idzko, M., Zech, A., Koscsó, B., Spolarics, Z., Antonioli, L., Cseri, K., Erdélyi, K., Pacher, P., & Haskó, G. (2015). Extracellular ATP protects against sepsis through macrophage P2X7 purinergic receptors by enhancing intracellular bacterial killing. The FASEB Journal, 29(9), 3626–3637. https://doi.org/10.1096/fj.15-272450

      Diekema, D. J., Hsueh, P.-R., Mendes, R. E., Pfaller, M. A., Rolston, K. V., Sader, H. S., & Jones, R. N. (2019). The Microbiology of Bloodstream Infection: 20-Year Trends from the SENTRY Antimicrobial Surveillance Program. Antimicrobial Agents and Chemotherapy, 63(7), e00355-19. https://doi.org/10.1128/AAC.00355-19

      Dosch, M., Gerber, J., Jebbawi, F., & Beldi, G. (2018). Mechanisms of ATP Release by Inflammatory Cells. International Journal of Molecular Sciences, 19(4), 1222. https://doi.org/10.3390/ijms19041222

      Eltzschig, H. K., Sitkovsky, M. V., & Robson, S. C. (2012). Purinergic Signaling during Inflammation. New England Journal of Medicine, 367(24), 2322–2333. https://doi.org/10.1056/NEJMra1205750

      Ghosn, E. E. B., Cassado, A. A., Govoni, G. R., Fukuhara, T., Yang, Y., Monack, D. M., Bortoluci, K. R., Almeida, S. R., Herzenberg, L. A., & Herzenberg, L. A. (2010). Two physically, functionally, and developmentally distinct peritoneal macrophage subsets. Proceedings of the National Academy of Sciences, 107(6), 2568–2573. https://doi.org/10.1073/pnas.0915000107

      Hironaka, I., Iwase, T., Sugimoto, S., Okuda, K., Tajima, A., Yanaga, K., & Mizunoe, Y. (2013). Glucose Triggers ATP Secretion from Bacteria in a Growth-Phase-Dependent Manner. Applied and Environmental Microbiology, 79(7), 2328–2335. https://doi.org/10.1128/AEM.03871-12

      Idzko, M., Ferrari, D., & Eltzschig, H. K. (2014). Nucleotide signalling during inflammation. Nature, 509(7500), 310–317. https://doi.org/10.1038/nature13085

      Iwase, T., Shinji, H., Tajima, A., Sato, F., Tamura, T., Iwamoto, T., Yoneda, M., & Mizunoe, Y. (2010). Isolation and Identification of ATP-Secreting Bacteria from Mice and Humans. Journal of Clinical Microbiology, 48(5), 1949–1951. https://doi.org/10.1128/JCM.01941-09

      Junger, W. G. (2011). Immune cell regulation by autocrine purinergic signalling. Nature Reviews Immunology, 11(3), 201–212. https://doi.org/10.1038/nri2938

      Ledderose, C., Bao, Y., Kondo, Y., Fakhari, M., Slubowski, C., Zhang, J., & Junger, W. G. (2016). Purinergic Signaling and the Immune Response in Sepsis: A Review. Clinical Therapeutics, 38(5), 1054–1065. https://doi.org/10.1016/j.clinthera.2016.04.002

      Mureșan, M. G., Balmoș, I. A., Badea, I., & Santini, A. (2018). Abdominal Sepsis: An Update. The Journal of Critical Care Medicine, 4(4), 120–125. https://doi.org/10.2478/jccm-2018-0023

      Proietti, M., Perruzza, L., Scribano, D., Pellegrini, G., D’Antuono, R., Strati, F., Raffaelli, M., Gonzalez, S. F., Thelen, M., Hardt, W.-D., Slack, E., Nicoletti, M., & Grassi, F. (2019). ATP released by intestinal bacteria limits the generation of protective IgA against enteropathogens. Nature Communications, 10(1), Article 1. https://doi.org/10.1038/s41467-018-08156-z

      Schwechheimer, C., & Kuehn, M. J. (2015). Outer-membrane vesicles from Gram-negative bacteria: Biogenesis and functions. Nature Reviews Microbiology, 13(10), 605–619. https://doi.org/10.1038/nrmicro3525

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      Major concerns:

      (1) It is not clear about the biological significance of the inhibitory effects of human Abeta42 on gammasecretase activity. As the authors mentioned in the Discussion, it is plausible that Abeta42 may concentrate up to microM level in endosomes. However, subsets of FAD mutations in APP and presenilin 1 and 2 increase Abeta42/Abeta40 ratio and lead to Abeta42 deposition in brain. APP knock-in mice NLF and NLGF also develop Abeta42 deposition in age-dependent manner, although they produce more human Abeta42 than human Abeta40. 

      If the production of Abeta42 is attenuated, which results in less Abeta42 deposition in brain. So, it is unlikely that human Abeta42 interferes gamma-secretase activity in physiological conditions. This reviewer has an impression that inhibition of gamma-secretase by human Abeta42 is an interesting artifact in high Abeta42 concentration. If the authors disagree with this reviewer's comment, this manuscript needs more discussion in this point of view. 

      We thank the Reviewer for raising this key conceptual point, we acknowledge that it was insufficiently discussed in the original manuscript. In response to this point, we introduced the following paragraph in the discussion section of the revised manuscript:

      “From a mechanistic standpoint, the competitive nature of the Aβ42-mediated inhibition implies

      that it is partial, reversible, and regulated by the relative concentrations of the Aβ42 peptide (inhibitor) and the endogenous substrates (Figure 10C and 10D). The model that we put forward is that cellular uptake, as well as endosomal production of Aβ, result in increased intracellular concentration of Aβ42, facilitating γ-secretase inhibition and leading to the buildup of APP-CTFs (and γ-secretase substrates in general). As Aβ42 levels fall, the augmented concentration of substrates shifts the equilibrium towards their processing and subsequent Aβ production. As Aβ42 levels rise again, the equilibrium is shifted back towards inhibition. This cyclic inhibitory mechanism will translate into pulses of (partial) γsecretase inhibition, which will alter γ-secretase mediated-signaling (arising from increased CTF levels at the membrane or decreased release of soluble intracellular domains from substrates). These alterations may affect the dynamics of systems oscillating in the brain, such as NOTCH signaling, implicated in memory formation, and potentially others (related to e.g. cadherins, p75 or neuregulins). It is worth noting that oscillations in γ-secretase activity induced by treatment with a γ-secretase inhibitor semagacestat have been proposed to have contributed to the cognitive alterations observed in semagacestat treated patients in the failed Phase-3 IDENTITY clinical trial (7) and that semagacestat, like Aβ42, acts as a high affinity competitor of substrates (85).

      The convergence of Aβ42 and tau at the synapse has been proposed to underlie synaptic dysfunction in AD (86-89), and recent assessment of APP-CTF levels in synaptosome-enriched fractions from healthy control, SAD and FAD brains (temporal cortices) has shown that APP fragments concentrate at higher levels in the synapse in AD-affected than in control individuals (90).  Our analysis adds that endogenous Aβ42 concentrates in synaptosomes derived from end-stage AD brains to reach ~10 nM, a concentration that in CM from human neurons inhibits γ-secretase in PC12 cells (Figure 7). Furthermore, the restricted localization of Aβ in endolysosomal vesicles, within synaptosomes, likely increases the local peptide concentration to the levels that inhibit γ-secretase-mediated processing of substrates in this compartment. In addition, we argue that the deposition of Aβ42 in plaques may be preceded a critical increase in the levels of Aβ present in endosomes and the cyclical inhibition of γsecretase activity that we propose. Under this view, reductions in γ-secretase activity may be a (transient) downstream consequence of increases in Aβ due to failed clearance, as represented by plaque deposition, contributing to AD pathogenesis.“

      We have also added figures 10C and 10D, presented here for convenience.

      Author response image 1.

      (2) It is not clear whether the FRET-based assay in living cells really reflects gamma-secretase activity.

      This reviewer thinks that the authors need at least biochemical data, such as levels of Abeta. 

      We have established a novel, HiBiT tag based assay reporting on the global γ-secretase activity in cells, using as a proxy the total levels of secreted HiBiT-tagged Aβ peptides. The assay and findings are presented in the revised manuscript as follows:

      In the result section, in the “Aβ42 treatment leads to the accumulation of APP C-terminal fragments in neuronal cell lines and human neuron” subsection:

      “The increments in the APP-CTF/FL ratio suggested that Aβ42 (partially) inhibits the global γ-

      secretase activity. To further investigate this, we measured the direct products of the γ-secretase mediated proteolysis of APP. Since the detection of the endogenous Aβ products via standard ELISA methods was precluded by the presence of exogenous human Aβ42 (treatment), we used an N-terminally tagged version of APPC99 and quantified the amount of total secreted Aβ, which is a proxy for the global γsecretase activity. Briefly, we overexpressed human APPC99 N-terminally tagged with a short 11 amino acid long HiBiT tag in human embryonic kidney (HEK) cells, treated these cultures with human Aβ42 or p3 17-42 peptides at 1 μM or DAPT (GSI) at 10 µM, and determined total HiBiT-Aβ levels in conditioned media (CM). DAPT was considered to result in full γ-secretase inhibition, and hence the values recorded in DAPT treated conditions were used for the background subtraction. We found a ~50% reduction in luminescence signal, directly linked to HiBiT-Aβ levels, in CM of cells treated with human Aβ42 and no effect of p3 peptide treatment, relative to the DMSO control (Figure 3D). The observed reduction in the total Aβ products is consistent with the partial inhibition of γ -secretase by Aβ42.”

      In Methods:

      “Analysis of γ-secretase substrate proteolysis in cultured cells using secreted HiBiT-Aβ or -Aβ-like peptide levels as a proxy for the global γ-secretase endopeptidase activity

      HEK293 stably expressing APP-CTF (C99) or a NOTCH1-based substrate (similar in size as

      APP- C99) both N-terminally tagged with the HiBiT tag were plated at the density of 10000 cells per 96-well, and 24h after plating treated with Aβ or p3 peptides diluted in OPTIMEM (Thermo Fisher Scientific) supplemented with 5% FBS (Gibco). Conditioned media was collected and subjected to analysis using Nano-Glo® HiBiT Extracellular Detection System (Promega). Briefly, 50 µl of the medium was mixed with 50 µl of the reaction mixture containing LgBiT Protein (1:100) and Nano-Glo HiBiT Extracellular Substrate (1:50) in Nano-Glo HiBiT Extracellular Buffer, and the reaction was incubated for 10 minutes at room temperature. Luminescence signal corresponding to the amount of the extracellular HiBiT-Aβ or -Aβ-like peptides was measured using victor plate reader with default luminescence measurement settings.”

      As the direct substrate of γ -secretase was used in this analysis, the observed reduction (~50%) in the levels of N-terminally-tagged (HiBiT) Aβ peptides in the presence of 1 µM Aβ42, relative to control conditions, demonstrates a selective inhibition of γ-secretase by Aβ42 (not by the p3). These data complement the FRET-based findings presented in Figure 5.

      (3) Processing of APP-CTF in living cells is not only the cleavage by gamma-secretase. This reviewer thinks that the authors need at least biochemical data, such as levels of Abeta in Figures 4, 5 and 7.

      We tried to measure the levels of Aβ peptides secreted by cells into the culture medium directly by ELISA (using different protocols) or MS (using established methods, as reported in Koch et al, 2023), but exogenous Aβ42 (treatment) present at relatively high levels interfered with the readout and rendered the analysis inconclusive. 

      However, we were successful in the determination of total secreted (HiBiT-tagged) Aβ peptides from the HiBiT tagged APP-C99 substrate, as indicated in the previous point. The quantification of the levels of these peptides showed that Aβ42 treatment resulted in ~50% reduction in the γ -secretase mediated processing of the tagged substrate.    

      In addition, we would like to highlight that our analysis of the contribution of other APP-CTF degradation pathways, using cycloheximide-based assays in the constant presence of γ-secretase inhibitor, failed to reveal significant differences between Aβ42 treated cells and controls (Figure 6B & C). The lack of a significant impact of Aβ42 on the half-life of APP-CTFs under the conditions of γsecretase inhibition maintained by inhibitor treatment is consistent with the proposed Aβ42-mediated inhibitory mechanism.

      (4) Similar to comment #3. Processing of Pancad-CTF and p75 in living cells may be not only the cleavage by gamma-secretase. This reviewer thinks that the authors need at least biochemical data, such as levels of ICDs in Figures 6C and E. 

      To address this comment we have now performed additional experiments where we measured Nterminal Aβ-like peptides derived from NOTCH1-based substrate using the HiBiT-based assay. These experiments showed a reduction in the aforementioned peptides in the cells treated with Aβ42 relative to the vehicle control, and hence further confirmed the inhibitory action of Aβ42. These new data have been included as Figure 8D in the revised manuscript and described as follow:

      Finally, we measured the direct N-terminal products generated by γ-secretase proteolysis from a HiBiT-tagged NOTCH1-based substrate, an estimate of the global γ-secretase activity. We quantified the Aβ-like peptides secreted by HEK 293 cells stably expressing this HiBiT-tagged substrate upon treatment with 1 µM Aβ1-42,  p3 17-42 peptide or  DAPT (GSI) (Figure 8D). DAPT treatment was considered to result in a complete γ-secretase inhibition, and hence the values recorded in the DAPT condition were used for background subtraction. A ~20% significant reduction in the amount of secreted

      N-terminal HiBiT-tagged peptides derived from the NOTCH1-based substrates in cells treated with Aβ1-

      42 supports the inhibitory action of Aβ1-42 on γ-secretase mediated proteolysis.

      Minor concerns:

      (1) Murine Abeta42 may be converted to murine Abeta38 easily, compared to human Abeta42. This may be a reason why murine Abeta42 exhibits no inhibitory effect on gamma-secretase activity. 

      In order to address this question, we performed additional experiments where we assessed the processing of murine Aβ42 into Aβ38. Analogous to human Aβ42, the murine Aβ42 peptide was not processed to Aβ38 in the assay conditions. These new data have been integrated in the manuscript and added as a Supplementary figure 1B.

      (2) It is curious to know the levels of C99 and C83 in cells in supplementary figure 3.  

      The conditions used in these assays were analogous to the conditions used in the figure 3 (i.e. treatment with Aβ peptides at 1 µM concentrations). Such conditions were associated with profound and consistent APP-CTF accumulation in this model system.

      Reviewer #2 (Recommendations For The Authors):

      In the current study, the authors show that Aβs with low affinity for γ-secretase, but when present at relatively high concentrations, can compete with the longer, higher affinity APPC99 substrate for binding and processing. They also performed kinetic analyses and demonstrate that human Aβ1-42 inhibits γ-secretase-mediated processing of APP C99 and other substrates. Interestingly, neither murine Aβ1-42 nor human p3 (17-42 amino acids in Aβ) peptides exerted inhibition under similar conditions. The authors also show that human Aβ1-42-mediated inhibition of γ-secretase activity results in the accumulation of unprocessed, which leads to p75-dependent activation of caspase 3 in basal forebrain cholinergic neurons (BFCNs) and PC12 cells. 

      These analyses demonstrate that, as seen for γ-secretase inhibitors, Aβ1-42 potentiates this marker of apoptosis. However, these are no any in vivo data to support the physiological significance of the current finding. The author should show in APP KO mice whether gamma-secretase enzymatic activity is elevated or not, and putting back Aβ42 peptide will abolish these in vivo effects. 

      The findings presented in this manuscript form the basis for further in vitro and in vivo research to investigate the mechanisms of inhibition and its contribution to brain pathophysiology. Here, we used well-controlled model systems to investigate a novel mechanism of Aβ42 toxicity. Multiple mechanisms regulate the local concentration of Aβ42 in vivo, making the dissection of the biochemical mechanisms of the inhibition more complex. Nevertheless, beyond the scope of this report, we consider these very reasonable comments as a motivation for further research activities. 

      The experimental concentrations for Aβ42 peptide in the assay are too high, which are far beyond the physiological concentrations or pathological levels. The artificial observations are not supported by any in vivo experimental evidence.

      It is correct that in the majority of the experiments we used low μM concentrations of Aβ42. However, we would like to note that we have also performed experiments where conditioned medium collected from human APP.Swe expressing neurons was used as a source of Aβ. In these experiments total Aβ concentration was in low nM range (0.5-1 nM) (Figure 7). Treatment with this conditioned medium  led to the increase APP-CTF levels, supporting  that low nM concentrations of Aβ are sufficient for partial inhibition of  γ-secretase. 

      In addition, we highlight that analyses of the brains of the AD affected individuals have shown that APPCTFs accumulate in both sporadic and genetic forms of the disease (Pera et al. 2013, Vaillant-Beuchot et al. 2021); and recently, Ferrer-Raventós et al. 2023 have revealed a correlation between APP-CTFs and Aβ levels at the synapse (Ferrer-Raventós et al. 2023). We therefore assessed the concentration of Aβ42 in synaptosomes derived from frontal cortices of post-mortem AD and age-matched non-demented (ND) control individuals. Our findings and conclusions are included in the revised version as follows: 

      In the results section:

      “We next investigated the levels of Aβ42 in synaptosomes derived from frontal cortices of post-mortem AD and age-matched non-demented (ND) control individuals (Figure 10B). Towards this, we prepared synaptosomes from frozen brain tissues using Percoll gradient procedure (62, 63). Intact synaptosomes were spun to obtain a pellet which was resuspended in minimum amount of PBS, allowing us to estimate the volume containing the resuspended synaptosome sample. This is likely an overestimate of the actual synaptosome volume. Finally, synaptosomes were lysed in RIPA buffer and Aβ peptide concentrations measured using ELISA (MSD). We observed that the concentration of Aβ42 in the synaptosomes from (end-stage) AD tissues was significantly higher (10.7 nM)  than those isolated from non-demented tissues (0.7 nM), p<0.0005***. These data provide evidence for accumulation at nM concentrations of endogenous Aβ42 in synaptosomes in end-stage AD brains. Given that we measured Aβ42 concentration in synaptosomes, we speculate that even higher concentrations of this peptide may be present in the endolysosome vesicle system, and therein inhibit the endogenous processing of APP-CTF at the synapse. Of note treatment of PC12 cells with conditioned medium containing even lower amounts of Aβ (low nanomolar range (0.5-1 nM)) resulted in the accumulation of APP-CTFs.” 

      In the discussion: 

      “The convergence of Aβ42 and tau at the synapse has been proposed to underlie synaptic dysfunction in AD (86-89), and recent assessment of APP-CTF levels in synaptosome-enriched fractions from healthy control, SAD and FAD brains (temporal cortices) has shown that APP fragments concentrate at higher levels in the synapse in AD-affected than in control individuals (90).  Our analysis adds that endogenous Aβ42 concentrates in synaptosomes derived from end-stage AD brains to reach ~10 nM, a concentration that in CM from human neurons inhibits γ-secretase in PC12 cells (Figure 7). Furthermore, the restricted localization of Aβ in endolysosomal vesicles, within synaptosomes, likely increases the local peptide concentration to the levels that inhibit γ-secretase-mediated processing of substrates in this compartment. In addition, we argue that the deposition of Aβ42 in plaques may be preceded by a critical increase in the levels of Aβ present in endosomes and the cyclical inhibition of γ-secretase activity that we propose. Under this view, reductions in γ-secretase activity may be a (transient) downstream consequence of increases in Aβ due to failed clearance, as represented by plaque deposition, contributing to AD pathogenesis. ”

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      In 2019, Wilkinson and colleagues (PMID: 31142833) managed to break the veil in a 20-year open question on how to properly culture and expand Hematopoietic Stem Cells (HSCs). Although this study is revolutionizing the HSC biology field, several questions regarding the mechanisms of expansion remain open. Leveraging on this gap, Zhang et al.; embarked on a much-needed investigation regarding HSC self-renewal in this particular culturing setting.

      The authors firstly tacked the known caveat that some HSC membrane markers are altered during in vitro cultures by functionally establishing EPCR (CD201) as a reliable and stable HSC marker (Figure 1), demonstrating that this compartment is also responsible for long-term hematopoietic reconstitution (Figure 3). Next in Figure 2, the authors performed single-cell omics to shed light on the potential mechanisms involved in HSC maintenance, and interestingly it was shown that several hematopoietic populations like monocytes and neutrophils are also present in this culture conditions, which has not been reported. The study goes on to functionally characterize these cultured HSCs (cHSC). The authors elegantly demonstrate using state-of-the-art barcoding strategies that these culturing conditions provoke heterogeneity in the expanding HSC pool (Figure 4). In the last experiment (Figure 5), it was demonstrated that cHSC not only retain their high EPCR expression levels but upon transplantation, these cells remain more quiescent than freshly-isolated controls.

      Taken together, this study independently validates that the proposed culturing system works and provides new insights into the mechanisms whereby HSC expansion takes place.

      Most of the conclusions of this study are well supported by the present manuscript, some aspects regarding experimental design and especially the data analysis should be clarified and possibly extended.

      1) The first major point regards the single-cell (sc) omics performed on whole cultured cells (Figure 2):

      a. The authors claim that both RNA and ATAC were performed and indeed some ATAC-seq data is shown in Figure 2B, but this collected data seems to be highly underused.

      We appreciate the opportunity to clarify our analytical approach and the rationale behind it. In our study, we employed a novel deep learning framework, SAILERX, for our analysis. This framework is specifically designed to integrate multimodal data, such as RNAseq and ATACseq. The advantage of SAILERX lies in its ability to correct for technical noise inherent in sequencing processes and to align information from different modalities. Unlike methods that force a hard alignment of modalities into a shared latent space, SAILERX allows for a more refined integration. It achieves this by encouraging the local structures of the two modalities, as measured by pairwise similarities.

      To put it more simply, SAILERX combines RNAseq and ATACseq data, ensuring that the unique characteristics of each data type are respected and used to enhance the overall biological picture, rather than forcing them into a uniform framework.

      While it is indeed possible to analyze the ATAC-seq and RNA-seq modalities separately, and we acknowledge the potential value in such an approach, our primary objective in this study was to highlight the relatively low content of HSCs in cultures. This finding is a key point of our work, and the multiome data support this from a molecular point of view.

      The Seurat object we provide was created to facilitate further analysis by interested researchers. This object simplifies the exploration of both the ATAC-seq and RNA-seq data, allowing for additional investigations that may be of interest to the scientific community. We hope this explanation clarifies our methodology and its implications.

      b. It's not entirely clear to this reviewer the nature of the so-called "HSC signatures"(SF2C) and why exactly these genes were selected. There are genes such as Mpl and Angpt1 which are used for Mk-biased HSCs. Maybe relying on other HSC molecular signatures (PMID: 12228721, for example) would not only bring this study more into the current field context but would also have a more favorable analysis outcome. Moreover reclustering based on a different signature can also clarify the emergence of relevant HSC clusters.

      In our study, the selection of the HSC signature in our work was based on well-referenced datasets on well-defined HSPCs, as detailed in the "v. HSC signature" section of our methods. This signature was projected also to another single-cell RNA sequencing dataset generated from ex vivo expanded HSC culture (PMID: 35971894, see Author response image 1 below), demonstrating again an association primarily to the most primitive cells (at least based on gene expression).

      Author response image 1.

      Projection of "our" HSC signature on scRNAseq data from independent work.

      In further response to the suggestion here, we have also examined the molecular signature of HSCs referenced in PMID: 12228721 but also of another HSC signature from PMID: 26004780 in our data (Author response image 2). While these signatures do indeed enrich for cells that fall in the cluster of molecularly defined HSCs, our analysis indicates that neither of them significantly improves the identification of HSCs in our dataset compared to the signature we originally used. This finding reinforces our confidence in the appropriateness of our chosen HSC signature for this study.

      Author response image 2.

      Projection of alternative HSC signatures onto the SAILERX UMAP.

      Regarding the specific genes Mpl and Angpt1, we respectfully oppose the view that these genes are exclusively associated with MK-biased HSCs. There is substantial evidence supporting the broader role of Mpl in regulating HSCs, regardless of any particular "lineage bias". Similarly, while Angpt1 has been less extensively studied, its role in HSCs, as examined in PMID: 25821987, suggests a more general association with HSCs rather than a specific impact on MKs. Therefore, we maintain that it is more accurate to consider these genes as HSC-associated rather than restricted to MK-biased HSCs.

      Finally, addressing the comment on reclustering based on different signatures, we would like to clarify that the clustering process is independent of the projection of signatures. The clustering aims to identify cell populations based on their overall molecular profiles, and while signatures can aid in characterizing these populations, they do not influence the clustering process itself.

      c. The authors took the hard road to perform experiments with the elegant HSC-specific Fgd5-reporter, and they claim in lines 170-171 that it "failed to clearly demarcate in our single-cell multimodal data". This seems like a rather vague statement and leads to the idea that the scRNA-seq experiment is not reliable. It would be interesting to show a UMAP with this gene expression regardless and also potentially some other HSC markers.

      We understand the concerns raised about our statement on the performance of the Fgd5-reporter in our multimodal data analysis. Our aim was not to suggest that single-cell molecular data are unreliable. Instead, we intended to point out specific challenges associated with scRNA sequencing, notably the high rates of dropout. Regarding the specific example of Fgd5, it appears this transcript is not efficiently captured by 10x technology. Our previous 10x scRNA-seq experiments on cells from the Fgd5 reporter strain (Säwén et al., eLife 2018; Konturek-Ciesla et al., Cell Rep. 2023) support this observation. Despite cells being sorted as Fgd5-reporter positive, many showed no detectable transcripts.

      We consider it pertinent to note that our study integrates ATAC-seq data in conjunction with single-cell molecular data. We believe that this integration, coupled with the analytical methods we have employed, potentially offers a way to address some of the limitations typically associated with scRNA sequencing. However, in assessing frequencies, we observe that the number of candidate HSCs identified via single-cell molecular data is substantially higher compared to those identified through flow cytometry, the latter which we demonstrate correlate functionally with genuine long-term repopulating activity.

      With respect to Fgd5, as depicted in our analysis below, there appears to be an enrichment of cells in the cluster identified as HSCs, as well as a significant representation in the cycling cell cluster (Author response image 3). Regarding the projection of other individual genes, the Seurat object we have provided allows for such projections to be readily performed. This offers an opportunity for further exploration and validation of our findings by interested researchers.

      Author response image 3.

      Feature plot depicting Fgd5 expression in the SAILERX UMAP.

      2) During the discussion and in Figure 4, the authors ponder and demonstrate that this culturing system can provoke divert HSC close expansion, having also functional consequences. This a known caveat from the original system, but in more recent publications from the original group (PMID: 36809781 and PMID: 37385251) small alterations into the protocol seem to alleviate clone selection. It's intriguing why the authors have not included these parameters at least in some experiments to show reproducibility or why these studies are not mentioned during the discussion section.

      Thank you for pointing out the recent publications (PMID: 36809781 and PMID: 37385251) that discuss modifications to the HSC culturing system. We appreciate the opportunity to address why these were not included in our discussion or experiments.

      Firstly, it is important to note that these papers were published after the submission of our manuscript. In fact, one of the studies (PMID: 36809781) references the preprint version of our work on Biorxiv. This timing meant that we were unable to consider these studies in our initial manuscript or incorporate any of their findings into our experimental designs.

      Furthermore, as strong advocates for the peer-review system, we prioritize references that have undergone this rigorous process. Preprints, while valuable for early dissemination of research findings, do not offer the same level of scrutiny and validation as peer-reviewed publications. Our approach was to rely on the most relevant and rigorously reviewed literature available to us at the time of submission. This included, most notably, the original and ground-breaking work by Wilkinson et al., which provided a foundational basis for our research.

      We acknowledge that the field of HSC research is rapidly evolving, and new findings, such as those mentioned, are continually emerging. These new studies undoubtedly contribute valuable insights into HSC culturing systems and their optimization. However, given the timing of their publication relative to our study, we were not able to include them in our analysis or discussion.

      3) In this reviewer's opinion, the finding that transplanted cHSC are more quiescent than freshly isolated controls is the most remarkable aspect of this manuscript. There is a point of concern and an intriguing thought that sprouts from this experiment. It is empirical that for this experiment the same HSC dose is transplanted between both groups. This however is technically difficult since the membrane markers from both groups are different. Although after 8 weeks chimerism levels seem to be the same (SF5D) for both groups, it would strengthen the evidence if the author could demonstrate that the same number of HSCs were transplanted in both groups, likely by limiting dose experiments. Finally, it's interesting that even though EE100 cells underwent multiple replication rounds (adding to their replicative aging), these cells remained more quiescent once they were in an in vivo setting. Since the last author of this manuscript has also expertise in HSC aging, it would be interesting to explore whether these cells have "aged" during the expansion process by assessing whether they display an aged phenotype (myeloid-skewed output in serial transplantations and/or assisting their transcriptional age).

      We thank the reviewer for the insightful observations regarding the quiescence of transplanted cultured HSCs. We appreciate the opportunity to clarify the experimental design and its implications, particularly in the context of HSC aging.

      The primary aim of comparing cKit-enriched bone BM cells with cultured cells was to investigate if ex vivo activated HSCs exhibit a similar proliferation pattern to in vivo quiescent HSCs post-transplantation. This comparison was crucial for evaluating the similarity between in vitro cultured and "unmanipulated" HSC behavior. While we acknowledge the technical challenge of transplanting equivalent HSC doses between groups due to differing membrane markers, our study design focused on assessing stem cell activity post-culture. This was quantitatively evaluated by calculating the repopulating units (detailed in Table 1 and Fig S4G), rather than through a limiting dilution assay. There exists a plethora of literature demonstrating the correlation between these assays, although of course the limiting dilution assay is designed to provide a more exact output.

      Regarding the intriguing aspect of HSC aging in the context of ex vivo expansion, our observations indicate that both the subfraction of ex vivo expanded cells (Fig 3 and Fig S3) and the entire cultured population (Fig 4B, Fig 5B, Fig S4A, and Fig S5B) maintain long-term multilineage reconstitution capacity post-transplantation. This suggests that the PVA-culture system does not lead to apparent signs of "HSC aging," despite the cells undergoing active self-renewal in vitro. This is further supported by our serial transplantation experiments, where cultured cells continued to demonstrate multilineage capacity rather than any evident myeloid-biased reconstitution 16 weeks post-second transplantation (see Author response image 4 below).

      Author response image 4.

      Serial transplantation behavior of ex vivo expanded HSCs. 5 million whole BM cells from primary transplantation were transplanted together with 5 million competitor whole BM cells. The control group was transplanted with 100 cHSCs freshly isolated from BM for the primary transplantation. Mann-Whitney test was applied and the asterisks indicate significant differences. , p < 0.05; , p < 0.01; ***, p < 0.0001. Error bars denote SEM.

      However, we recognize the complexity of defining HSC aging and the potential for the culture system to influence certain aspects of this process. The association of aging signature genes with HSC primitiveness and young signature genes with differentiation presents an interesting dichotomy. Our analysis of a native dataset on young mice and the projection of aged signatures onto our multiome data (as shown below for a set of genes known to be induced at higher levels in aged HSCs (f.i. Wahlestedt et al., Nature Comm 2017), aging scRNAseq data from PMID: 36581635) does not directly indicate that the culture system promotes HSC aging compared to aged Lin-Sca+Kit+ cells. Yet, we do not rule out the possibility that culturing may influence other facets of the HSC aging process.

      In conclusion, while our current data do not provide direct evidence of induced HSC aging through the culture system, this remains a compelling area for future research. The potential impact of ex vivo culture on aspects of the HSC aging process warrants further exploration, and we appreciate your suggestion in this regard.

      Author response image 5.

      No evident signs of "molecular aging" following ex vivo expansion of HSCs. Young and aged scRNAseq data from PMID: 36581635 were integrated and explored from the perspective of known genes associating to HSC aging. The top row depicts contribution to UMAPs from young and aged cells (two left plots), cell cycle scores of the cells, and the expression of EPCR and CD48 as examples markers for primitive and more differentiated cells, respectively. The expression of the HSC aging-associated genes Wwtr1, Cavin2, Ghr, Clu and Aldh1a1 was then assessed in the data as well as in the SAILERX UMAP of cultured HSCs (bottom row).

      Reviewer #2 (Public Review):

      Summary:

      In this study, Zhang and colleagues characterise the behaviour of mouse hematopoietic stem cells when cultured in PVA conditions, a recently published method for HSC expansion (Wilkinson et al., Nature, 2019), using multiome analysis (scRNA-seq and scATACseq in the same single cell) and extensive transplantation experiments. The latter are performed in several settings including barcoding and avoiding recipient conditioning. Collectively the authors identify several interesting properties of these cultures namely: 1) only very few cells within these cultures have long-term repopulation capacity, many others, however, have progenitor properties that can rescue mice from lethal myeloablation; 2) single-cell characterisation by combined scRNAseq and scATACseq is not sufficient to identify cells with repopulation capacity; 3) expanded HSCs can be engrafted in unconditioned host and return to quiescence.

      The authors also confirm previous studies that EPCRhigh HSCs have better reconstitution capability than EPCRlow HSCs when transplanted.

      Strengths:

      The major strength of this manuscript is that it describes how functional HSCs are expanded in PVA cultures to a deeper extent than what has been done in the original publication. The authors are also mindful of considering the complexities of interpreting transplantation data. As these PVA cultures become more widely used by the HSC community, this manuscript is valuable as it provides a better understanding of the model and its limitations.

      Novelty aspects include:

      • The authors determined that small numbers of expanded HSCs enable transplantation into non-conditioned syngeneic recipients.

      • This is to my knowledge the first report characterising the output of PVA cultures by multiome. This could be a very useful resource for the field.

      • They are also the first to my knowledge to use barcoding to quantify HSC repopulation capacity at the clonal level after PVA culture.

      • It is also useful to report that HSCs isolated from fetal livers do expand less than their adult counterparts in these PVA cultures.

      Weaknesses:

      • The analysis of the multiome experiment is limited. The authors do not discuss what cell types, other than functional or phenotypic HSCs are present in these cultures (are they mostly progenitors or bona fide mature cells?) and no quantifications are provided.

      The primary objective of our manuscript was to characterize the features of HSCs expanded from ex vivo culture. In this context, our analysis of the single cell multiome sequencing data was predominantly centered on elucidating the heterogeneity of cultures, along with subsequent in vivo functional analysis. This focus is reflected in our comparisons between the molecular features of ex vivo cultured candidate HSCs (cHSCs) and "fresh/unmanipulated" HSCs, as illustrated in Figures 2D-E of our manuscript.

      Our findings provide substantial evidence that ex vivo expanded cells share significant similarities with HSCs isolated from the BM in terms of molecular features, differentiation potential, heterogeneity, and in vivo stem cell activity/function. This suggests that the ex vivo culture system closely mimics several aspects of the in vivo environment, thereby broadening the potential applications of this system for HSC research.

      Regarding the presence of other cell types in the cultures, it is important to note that most cells did not express mature lineage markers, suggesting their immature status. However, we acknowledge the presence of some mature lineage marker-positive cells within the cultures. These cells are represented by the endpoints in our SAILERX UMAP, indicating a progression from immature to more differentiated states within the culture system.

      While the main emphasis of our study was on HSCs, we understand the importance of acknowledging and briefly discussing the presence and characteristics of other cell types in the cultures. This aspect provides a more comprehensive understanding of the culture system and its impact on cellular heterogeneity, although it was for the most part beyond the scope of our studies.

      • Barcoding experiments are technically elegant but do not bring particularly novel insights. We respectfully disagree with the view that our barcoding experiments do not offer novel insights. We believe that the application of barcoding technology in our study represents a significant advancement over previous methods, both in terms of quantitative rigor and ethical considerations.

      In the foundational work by Wilkinson et al., clonal assessments were indeed performed, but these were limited in scope and largely served as proof of concept. Our use of barcoding technology, on the other hand, allowed for a comprehensive quantitative assessment of the expansion potential of HSC clones. This technology enabled us to rigorously quantify the number of HSC clones capable of undergoing at least three self-renewing divisions (e.g. those clones present in 5 separate animals), while also revealing the heterogeneity in their expansion potential.

      One alternative approach could have been to culture single HSCs and distribute the progeny among multiple mice for analysis. However, when considering the sheer number of mice that would be required for such an experiment for quantitative assessments, it becomes evident that viral barcoding is a far superior method. Not only does it provide a more efficient and scalable approach to assessing clonal expansion, but it also significantly reduces the number of animals required for the study, aligning with the principles of ethical research and animal welfare.

      In conclusion, we assert that the barcoding experiments conducted in our study are not only technically robust but also yield novel quantitative insights into the dynamics of HSC clones within expansion cultures. These insights have value not only for current research but also hold potential implications for future applications.

      • The number of mice analysed in certain experiments is fairly low (Figures 1 and 5).

      We would like to clarify our approach in the context of the 3R (replacement, refinement, and reduction) policy, which guides ethical considerations in animal research.

      In alignment with the 3R principles, our study was designed to minimize the use of experimental animals wherever possible. For most experiments, including those presented in Figures 1 and 5, we adopted a standard of using five mice per group. Based on the effect sizes we observed, we concluded that this sample size was appropriate for most parts of our study.

      Specifically for Figure 5, we used two animals per time point, totaling seven animals per treatment group. It is important to note that we did not monitor the same animals over time but used different animals at each time point, as mice had to be sacrificed for the type of analyses conducted. Despite the seemingly small sample size, the results we obtained were remarkably consistent across groups. This consistency provided strong evidence that ex vivo activated HSCs return to a more quiescent state after being transplanted into unconditioned recipients. Given the clear and consistent nature of these results, we determined that including more animals for the purpose of additional statistical analysis was not necessary.

      Our approach reflects a balance between adhering to ethical standards in animal research and ensuring the scientific validity and reliability of our findings. We believe that the sample sizes chosen for our experiments are justified by the consistent and significant results we obtained, which contribute meaningfully to our understanding of HSC behavior post-transplantation.

      • The manuscript remains largely descriptive. While the data can be used to make useful recommendations to future users working with PVA cultures and in general with HSCs, those recommendations could be more clearly spelled out in the discussion.

      We fully agree that many aspects of our study are indeed descriptive, which is reflective of the exploratory and foundational nature of this type of research.

      We have strived to provide clear and direct recommendations for researchers interested in utilizing the PVA culture system, which we believe are evident throughout our manuscript:

      1) Utility of Viral Delivery in HSC Research: Our research, particularly through the use of barcoding experiments, underscores the effectiveness of viral delivery methods in HSC studies. While barcoding itself is a significant tool, it is the underlying process of viral delivery that truly exemplifies the potential of this approach. Our work shows that the culture system is highly conducive to maintaining HSC activity, which is critical for genetic manipulation. This is evident not only in our current study but also in our previous work that included for transient delivery methods (Eldeeb et al., Cell Reports 2023).

      2) Non-conditioned transplantation: Our findings suggest that non-conditioned transplantation can be a valuable method in studying both normal and malignant hematopoiesis. This approach can complement genetic lineage tracing models, providing a more native and physiological context for hematopoietic research. We state this explicitly in our discussion.

      3) Integration with recent technical advances: The combination of the PVA culture system with recent developments in transplantation biology, genome engineering, and single-cell technologies holds significant promise. This integration is likely to yield exciting discoveries with relevance to both basic and clinically oriented hematopoietic research. This is the end statement of our discussion.

      While our manuscript is in a way tailored to those with experience in HSC research, we have made a concerted effort to ensure that the content is accessible and informative to a broader audience, including those less familiar with this area of study. Our intention is to provide a resource that is both informative for experts in the field and approachable for newcomers.

      • The authors should also provide a discussion of the other publications that have used these methods to date.

      We would like to clarify that the scope of literature on the specific methods we employed, particularly in the context of our research objectives, is not extensive. Most of the existing references on these methods come from a relatively narrow range of research groups. In preparing our manuscript, we tried to be comprehensive yet selective in our citations to maintain focus and relevance. Our referencing strategy was guided by the aim to include literature that was most directly pertinent to our study's methodologies and findings.

      Overall, the authors succeeded in providing a useful set of experiments to better interpret what type of HSCs are expanded in PVA cultures. More in-depth mining of their bioinformatic data (by the authors or other groups) is likely to highlight other interesting/relevant aspects of HSC biology in relation to this expansion methodology.

      We are grateful for the overall positive assessment of our work and the recognition of its contributions to understanding HSC expansion in PVA cultures.

      We agree that every study, including ours, has its limitations, particularly regarding the scope and depth of exploration. It is challenging to cover every aspect comprehensively in a single study. Our research aimed to provide a foundational understanding of HSCs in PVA cultures, and we are pleased that this goal appears to have been met.

      We also concur with your point on the potential for further in-depth mining of our bioinformatic data. Our hope is that this data can serve as a resource (or at least a starting point) for other investigators.

      In conclusion, we hope that our responses have adequately addressed your queries and clarified any concerns. We are committed to contributing to the growth of knowledge in HSC research and look forward to the advancements that our study might enable, both within our team and the wider scientific community.

      Reviewer #1 (Recommendations For The Authors):

      1) In Line 150, the R packages can/should be mentioned just in the method section;

      We have moved this text to the methods section.

      2) In Figure F3C adding a legend next to the plot would assist the reader in identifying which populations are referred to, as the same color pellet is used for other panels;

      We have now adjusted the figure legend position to make it more clear for the reader.

      3) In Figure 4D, for the pre-culture experiments 1000 cHSCs were used and then in the post-culture 1200 cHSCs were used. Can the authors justify the different numbers?

      The decision to use 1000 cHSCs in the pre-culture experiments and 1200 cHSCs in the post-culture experiments was not based on a specific rationale favoring one cell number over the other. In our Method section, we have detailed our experimental design, which was structured to provide robust and reliable readouts of HSC behavior and characteristics in different conditions.

      We consider the two cell numbers – 1000 and 1200 – to be quite similar in the context of our experimental aims. Since the readouts here are based on clonal assessments, this slight difference in cell numbers is unlikely to significantly impact the overall conclusions drawn from these experiments. The primary focus of our study was on qualitative aspects of HSC behavior and function, rather than on quantitative differences that might arise from small variations in initial cell numbers.

      4) In SF5F it would help readers if a line plot (per group) was also shown together with the dot plots. Moreover, applying statistics to the trend lines (Wilcoxon, for example) would strengthen the argument that cHSCs divide less than control cells.

      We would like to clarify that the data presented in SF5F were derived from different animals at each respective time point. As such, the data points at each time point represent independent measurements from separate animals, rather than a continuous measurement from the same set of animals over time. Therefore, creating a line plot that connects each time point within a group would inadvertently convey a misleading impression of a longitudinal study on the same animals, which is not reflective of the actual experimental design. Instead, the dot plot format was chosen as it more accurately depicts the independent and discrete nature of the measurements at each time point. Our current data presentation method was selected to provide the most accurate and transparent representation of our findings.

      Reviewer #2 (Recommendations For The Authors):

      Listed below are recommendations to further improve this manuscript:

      Major Comments

      1) Fig 1: the authors showed that EPCRhigh HSCs have better reconstitution capability than EPCRlow HSCs via bone marrow transplantation. Additionally, mice receiving cultured EPCRhigh SLAM LSK cells were more efficiently radioprotected than those receiving PVA expanded EPCRlow SLAM LSK.

      a. In addition to Fig.1F, authors should show the lineage distributions and chimerism of mice receiving cultured EPCRhigh and EPCRlow SLAM LSK respectively.

      We have indeed analyzed the lineage distribution in these experiments, and our findings indicate no statistically significant differences between the groups (see graph in Author response image 6). This suggests that the cultured EPCRhigh and EPCRlow SLAM LSK cells do not preferentially differentiate into specific lineages in a way that would impact the overall interpretation of our results.

      Author response image 6.

      Regarding the chimerism in peripheral blood (PB) lineages, Fig. 1F in our manuscript currently shows the PB myeloid chimerism. We chose to focus on this parameter as it most directly relates to our study's objectives. We did here not transplant with competitor cells, and in most cases, the chimerism levels reached 100% for lineages other than T cells (T cells being more radioresistant). Based on our analysis, including data on chimerism in other PB lineages would not significantly enhance the understanding of the functional capacity of the transplanted cells, as the myeloid chimerism data already provides a robust indicator of their engraftment and functional potential.

      We believe that our current presentation of data in Fig. 1F, along with the additional analyses provided in the results section, offers a comprehensive understanding of the behavior and potential of the cultured EPCRhigh and EPCRlow SLAM LSK cells.

      b. Fig1F: only 5 mice were used in each group. Could this result occur by chance? Testing with Fisher's exact test with the data provided results in p=0.16. The authors should consider adding more animals or adding the p-value above (or from another relevant test) for readers' consideration.

      We acknowledge the point that only five mice were used in each group and understand the concern regarding the robustness of our findings.

      As correctly noted, applying Fisher's exact test to the data in Fig. 1F results in a p-value which does not reach the conventional threshold for statistical significance. However, one might also consider the analysis of the KM survival curve, which associated with a p-value of 0.0528 (Fig. 1F, left graph below; Gehan-Breslow-Wilcoxon test). A similar test on the single-cell culture transplantation experiment (Fig. 1E, right graph below) also demonstrated statistical significance (p-value = 0.0485).

      While these p-values meet (or are very close to) the conventional criteria for statistical significance (p<0.05), we have chosen to place greater emphasis on effect sizes rather than strictly on p-values. This decision is based on our belief that effect sizes provide a more direct and meaningful measure of the biological impact observed in our experiments. We find that the effect sizes observed are compelling and consistent with the overall narrative of our study.

      Author response image 5.

      2) The characterisation of the multiome experiment is highly underdeveloped.

      a. From an experimental point of view, it is not clear how the PVA culture for this experiment was started. Are there technical/biological replicates? Have several PVA cultures been pooled together?

      We have included these details in the revised text to ensure a comprehensive understanding of our experimental setup.

      b. Fig2B: The authors should present more data as to how each of the clusters was annotated (bubble plot of marker genes used for annotation?) and importantly the percentage of cells in each of the clusters. It is particularly relevant to note what % is the cluster annotated as HSCs and compare that to the % of phenotypic HSCs and the % repopulating HSCs calculated in the transplantation experiments.

      In our study, the annotation of clusters was primarily based on reference genes for cell types from prior works in the field, such as from our recent work (Konturek-Ciesla et al., Cell Reports 2023). Additionally, we employed transcription factor (TF) motifs to assign identities to these clusters. This approach is relatively standard in the field, and we believe it provides a robust framework for our analysis. We included information on some of the key TF motifs used to guide our annotations.

      Regarding the assignment of a percentage to cells within the HSC cluster, we initially had reservations about the utility of this measure. This is because the transcriptional identity of HSCs might not align precisely with their identity based on candidate HSC protein markers. There are complexities related to transcriptional continuums that could influence the interpretation of such data. However, acknowledging your request for this information, we have now included the percentage of cells in the HSC cluster in Fig. 2B for reference.

      We also wish to highlight that when isolating EPCR+ cells, which encompasses a range of CD48 expression, clustering becomes much less distinct, as shown in Fig. 2E. Most of these cells do not demonstrate long-term functional HSC activity in a transplantation setting (as presented in Figure 3). This observation underscores the challenges in deducing HSC identity based solely on molecular data and reinforces the importance of functional validation.

      c. Are there any mature cells in these PVA cultures? The annotations presented in the table under the UMAP are vague: Are cluster 4 monocytes or monocytes progenitors? Same for clusters 0,1 and 7 - are these progenitors or more mature cells? How were HPCs (cluster 3) distinguished from cHSCs (cluster 5)?

      We agree with your observation that the annotations for certain clusters, such as clusters 4, 0, 1, and 7, as well as the distinction between HPCs (cluster 3) and cHSCs (cluster 5), appear vague. This vagueness to some extent stems from the challenges inherent in comparing cultured cells to their counterparts isolated directly from animals. Most reference data defining cell types are derived from cells in their native state, and less is known about how these definitions translate to the progeny of HSPCs cultured in vitro.

      In our study, we used the expression of reference genes and enriched transcription factor motifs to annotate clusters. This method, while useful, has its limitations in precisely defining the maturation stage of cells in culture. The enrichment of lineage-defining factors at the ends of the UMAP suggests the presence of more mature cells, whereas the lack of lineage marker expression in the majority of cells implies a general lack of terminal differentiation.

      This issue is not necessarily unique to the culture situation, as similar challenges in cell type annotation are encountered in other contexts, such as the analysis of granulocyte-macrophage progenitors in bone marrow, where a vast range of cell types and clusters are identified (e.g., PMID: 26627738). To try to address these challenges, we employed an approach detailed in the methods section under the header "iv. ATAC processing and cluster annotation." We assessed marker genes for clusters using Enrichr for cell types, relying on databases designed to provide gene expression identities to defined cell types. This methodology informed our references to the clusters.

      In summary, while our annotations provide a general overview of the cell types present in the cultures, we acknowledge the complexities and limitations in precisely defining these types, particularly in distinguishing between progenitors and more mature cells. We hope this explanation clarifies our approach and the considerations behind our cluster annotations, but at the same time feel that the alternative approaches have their own drawbacks.

      d. What is the meaning of the trajectories presented in Figure 2C? In the absence of a comparison to i) what is observed either when HSCs are cultured in control/non-expanding conditions ii) an in vivo landscape of differentiation in mouse bone marrow; this analysis does not bring any relevant piece of information.

      We understand the perspective on comparisons to control conditions and in vivo differentiation landscapes. However, we respectfully disagree with the viewpoint that the analysis that we have performed does not bring relevant information.

      The trajectory analysis in Figure 2C is intended to provide insights into the cell types generated in our PVA cultures and the potential differentiation pathways they may follow. This kind of analysis is particularly valuable in the context of understanding how in vitro cultures can support HSC maintenance and differentiation, which is a topic of significant interest in the field. For instance, studies like PMID: 31974159 have highlighted the importance of combining in vitro HSC cultures with molecular investigations.

      While we acknowledge that our analysis would benefit from a direct comparison to control or non-expanding conditions, as well as to an in vivo differentiation landscape, we believe that the information provided by our current analysis still holds substantial value. It offers a glimpse into the possible cellular dynamics and differentiation routes within our culture system, which can be a valuable reference point for other investigators working with similar systems.

      Regarding the confidence in computed differentiation trajectories, we recognize that this is an area where caution is warranted. Computational approaches to define cell differentiation pathways have inherent limitations and should be interpreted within the context of their assumptions and the data available. This challenge is not unique to our work but is a broader issue in the field of computational biology.

      In conclusion, while we agree that additional comparative analyses could further enrich our findings, we maintain that the trajectory analysis presented in Figure 2C contributes meaningful insights into cell differentiation in our PVA culture system. We believe these insights are of interest and value to researchers exploring the complex interplay of HSC maintenance and differentiation in vitro.

      3) The addition of barcoding experiments is appreciated. However, it is already known that upon transplantation clonal output is highly heteroegeneous, with a small number of clones predominating over others. This is particularly the case after myeloablation conditioning.

      a. The "pre-culture" experimental design makes sense. The "post-culture" one is however ambiguous in terms of result interpretation. The authors observe fewer clones contributing to a large proportion of the graft (>5%) than in the "pre-culture" setting. Their interpretation is that expanded HSCs are functionally more homogeneous than the input HSCs. However, in the pre-culture experiment, there are 19 days of expansion during which there will be selection pressures over culture plus ongoing differentiation. In the post-culture experiment, there is no time for such pressures to be exerted. Therefore the conclusion drawn by the authors is not the only conclusion. I would encourage the authors to compare the "pre-culture" experiment to an experiment in which cHSCs are in culture for 48h, then barcoded, and then transplanted. This would be much more informative and would allow a proper comparison of expanded HSCs vs input HSCs.

      We understand the perspective that a shorter culture period would reduce the influence of selection pressures and differentiation, potentially allowing for a more direct comparison between expanded HSCs and input HSCs. However, we would like to point out that similar experiments have been conducted in the past, as referenced in our work (PMID: 28224997) and others (PMID: 21964413). These studies have demonstrated a significant heterogeneity in the reconstituting clones when barcoding is done early and cells are transplanted directly.

      In light of previous research, we are confident that our methodology — tracking the fates of candidate HSC clones throughout the culture period and assessing the outcomes of individual cells from these expanding clones — yields significant and pertinent insights. We want to highlight the significance of barcoding cells late in the culture, a strategy that allows us to barcode cells that have already been subjected to potential selection pressures within the culture environment. Our primary objective is to investigate the effects of these selection pressures on the subsequent in vivo behavior of the cells that emerge from this process. By focusing on this aspect, we aim to deepen the understanding of how in vitro culture conditions influence the functional characteristics and heterogeneity of HSCs after expansion. We believe this approach provides a unique perspective on the adaptive changes HSCs undergo during culture and their implications for transplantation efficacy and HSC biology. Our study thus addresses a critical question in the field: how do the conditions and selection pressures inherent to in vitro culture impact the quality and behavior of HSCs upon their return to an in vivo environment?

      b. Another experiment the authors may consider is barcoding in unconditioned recipients as there the bottleneck of selecting specific clones should be lower. In addition, this could nicely complement the return to quiescence observed in Figure 5 (see point below)

      We agree that this experiment could provide valuable insights, particularly in understanding how different selection pressures might affect HSC clones in various transplantation contexts. It would indeed be a worthwhile complement to our observations in Figure 5 regarding the return to quiescence of HSCs post-transplantation.

      However, we would like to point out that our study already includes a substantial amount of data and analyses aimed at addressing specific research questions within this defined scope. The addition of an experiment with barcoding in unconditioned recipients, while undoubtedly relevant and interesting, would extend beyond the boundaries we set for this particular study.

      4) Figure 5D-F, only 2 animals per condition were tested, so the experiment is underpowered for any statistics. How about cell viability of cHSC after in vitro culture? The authors have also not tested whether there is a difference in cell viability post-transplant between EE100 and control. In addition, comparing cell cycle profiles of donor EPCR+ HSCs in these transplanted mice would provide additional evidence to support the conclusion.

      Regarding the sample size, we acknowledge that only two animals per condition were used in these experiments, which limits the statistical power for robust quantitative analysis. This decision was guided by ethical considerations to minimize animal use, in line with the 3Rs principle (Replacement, Reduction, Refinement). Despite the small sample size, we believe that the strong trends observed in these experiments are indicative and consistent with our broader findings, although we recognize the limitations in terms of statistical generalization. At the same time, as we have written in the public response: "Specifically for Figure 5, we used two animals per time point, totaling seven animals per treatment group. It is important to note that we did not monitor the same animals over time but used different animals at each time point, as mice had to be sacrificed for the type of analyses conducted."

      In the context of post-transplant analysis, conducting separate viability assessments on transplanted cells is not typically informative. This is because non-viable cells would naturally be eliminated through biological processes such as phagocytosis soon after transplantation. Therefore, any post-transplant viability analysis would not provide meaningful insights into the engraftment potential or behavior of the transplanted cells.

      However, it is important to note that in all our cell isolation and analysis protocols, we routinely include viability markers. This practice ensures that the cell populations we study and report on are indeed viable. Including these markers is a standard part of our methodology and contributes to the accuracy and reliability of our data.

      Regarding the comparison of cell cycle profiles, we chose to focus on the cell trace assay as a means to monitor and track cell division history, which directly addresses the central theme here - informing on the proliferation and quiescence dynamics of transplanted HSCs. While comparing cell cycle profiles could perhaps offer an additional layer of information, we did not deem it essential for our core objectives.

      5) Several publications have used these PVA cultures and made comments on their strengths and limitations. They do not overlap with this study but should be discussed here for completeness (for example Che et al, Cell Reports, 2022; Becker et al., Cell Stem Cell, 2023; Igarashi, Blood Advances, 2023).

      See comments to reviewer 1.

      Minor Comments

      Figure 1C: should add in the legend that this is in peripheral blood.

      Figure 2C: typo in the title.

      Figure 3A: typo in "equivalent". We thank the reviewer for catching these errors, which we have now corrected.

      Figure 3B and 3C: symbol colours of EPCRhighCD48+ and EPCR- are too similar to distinguish the 2 groups easily. We highly recommend using contrasting colours.

      For easier visualization, we have changed the symbol types and colors in our revised version.

      Fig3B and S3A-B: authors should show statistical significance in comparing the 4 fractions. We have now added this information.

      In the discussion, the authors rightly point out a paper that described EPCR+ HSCs. There are other papers that also looked at EPCR intensity (high vs low), for example, Umemoto et al., EMBO J, 2022.

      While we acknowledge the relevance of the paper you mentioned, we faced constraints in the number of references we could include. Therefore, we prioritized citing the original demonstration of EPCR as an HSC marker, particularly focusing on the work by the Mulligan laboratory, which established that cells expressing the highest levels of EPCR exhibit the most potent HSC activity. We believe this reference most directly supports the core focus of our study and provides the necessary context for our findings.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This valuable study reports on the potential of neural networks to emulate simulations of human ventricular cardiomyocyte action potentials for various ion channel parameters with the advantage of saving simulation time in certain conditions. The evidence supporting the claims of the authors is solid, although the inclusion of open analysis of drop-off accuracy and validation of the neural network emulators against experimental data would have strengthened the study. The work will be of interest to scientists working in cardiac simulation and quantitative pharmacology.

      Thank you for the kind assessment. It is important for us to point out that, while limited, experimental validation was performed in this study and is thoroughly described in the work.

      Reviewer 1 - Comments

      This manuscript describes a method to solve the inverse problem of finding the initial cardiac activations to produce a desired ECG. This is an important question. The techniques presented are novel and clearly demonstrate that they work in the given situation. The paper is well-organized and logical.

      Strengths:

      This is a well-designed study, which explores an area that many in the cardiac simulation community will be interested in. The article is well written and I particularly commend the authors on transparency of methods description, code sharing, etc. - it feels rather exemplary in this regard and I only wish more authors of cardiac simulation studies took such an approach. The training speed of the network is encouraging and the technique is accessible to anyone with a reasonably strong GPU, not needing specialized equipment.

      Weaknesses:

      Below are several points that I consider to be weaknesses and/or uncertainties of the work:

      C I-(a) I am not convinced by the authors’ premise that there is a great need for further acceleration of cellular cardiac simulations - it is easy to simulate tens of thousands of cells per day on a workstation computer, using simulation conditions similar to those of the authors. I do not really see an unsolved task in the field that would require further speedup of single-cell simulations. At the same time, simulations offer multiple advantages, such as the possibility to dissect mechanisms of the model behaviour, and the capability to test its behaviour in a wide array of protocols - whereas a NN is trained for a single purpose/protocol, and does not enable a deep investigation of mechanisms. Therefore, I am not sure the cost/benefit ratio is that strong for single-cell emulation currently.

      An area that is definitely in need of acceleration is simulations of whole ventricles or hearts, but it is not clear how much potential for speedup the presented technology would bring there. I can imagine interesting applications of rapid emulation in such a setting, some of which could be hybrid in nature (e.g. using simulation for the region around the wavefront of propagating electrical waves, while emulating the rest of the tissue, which is behaving more regularly/predictable, and is likely to be emulated well), but this is definitely beyond of the scope of this article.

      Thank you for this point of view. Simulating a population of few thousand cells is completely feasible on single desktop machines and for fixed, known parameters, emulation may not fill ones need. Yet we still foresee a great untapped potential for rapid evaluations of ionic models, such as for the gradient-based inverse problem, presented in the paper. Such inverse optimization requires several thousand evaluations per cell and thus finding maximum conductances for the presented experimental data set (13 cell pairs control/drug → 26 APs) purely through simulations would require roughly a day of simulation time even in a very conservative estimation (3.5 seconds per simulation, 1000 simulations per optimization). Additionally, the emulator provides local sensitivity information between the AP and maximum conductances in the form of the gradient, which enables a whole new array of efficient optimization algorithms [Beck, 2017]. To further emphasize these points, we added the number of emulations and runtime of each conducted experiment in the specific section and a paragraph in the discussion that addresses this point:

      "Cardiomyocyte EP models are already very quick to evaluate in the scale of seconds (see Section 2.3.1), but the achieved runtime of emulations allows to solve time consuming simulation protocols markedly more efficient. One such scenario is the presented inverse maximum conductance estimation problem (see Section 3.1.2 and Section 3.1.3), where for estimating maximum conductances of a single AP, we need to emulate the steady state AP at least several hundred times as part of an optimization procedure. Further applications include the probabilistic use of cardiomyocyte EP models with uncertainty quantification [Chang et al., 2017, Johnstone et al., 2016] where thousands of samples of parameters are potentially necessary to compute a distribution of the steady-state properties of subsequent APs, and the creation of cell populations [Muszkiewicz et al., 2016, Gemmell et al., 2016, Britton et al., 2013]." (Section 4.2)

      We believe that rapid emulations are valuable for several use-cases, where thousands of evaluations are necessary. These include the shown inverse problem, but similarly arise in uncertainty quantification, or cardiomyocyte population creation. Similarly, new use-cases may arise as such efficient tools become available. Additionally, we provided the number of evaluations along with the runtimes for each of the conducted experiments, showing how essential these speedups are to realize these experiments in reasonable timeframes. Utilizing these emulations in organ-level electrophysiological models is a possibility, but the potential problems in such scenarios are much more varied and depend on a number of factors, making it hard to pin-point the achievable speed-up using ionic emulations.

      C I-(b) The authors run a cell simulation for 1000 beats, training the NN emulator to mimic the last beat. It is reported that the simulation of a single cell takes 293 seconds, while emulation takes only milliseconds, implying a massive speedup. However, I consider the claimed speedup achieved by emulation to be highly context-dependent, and somewhat too flattering to the presented method of emulation. Two specific points below:

      First, it appears that a not overly efficient (fixed-step) numerical solver scheme is used for the simulation. On my (comparable, also a Threadripper) CPU, using the same model (”ToR-ORd-dyncl”), but a variable step solver ode15s in Matlab, a simulation of a cell for 1000 beats takes ca. 50 seconds, rather than 293 of the authors. This can be further sped up by parallelization when more cells than available cores are simulated: on 32 cores, this translates into ca. 2 seconds amortized time per cell simulation (I suspect that the NN-based approach cannot be parallelized in a similar way?). By amortization, I mean that if 32 models can be simulated at once, a simulation of X cells will not take X50 seconds, but (X/32)50. (with only minor overhead, as this task scales well across cores).

      Second, and this is perhaps more important - the reported speed-up critically depends on the number of beats in the simulation - if I am reading the article correctly, the runtime compares a simulation of 1000 beats versus the emulation of a single beat. If I run a simulation of a single beat across multiple simulated cells (on a 32-core machine), the amortized runtime is around 20 ms per cell, which is only marginally slower than the NN emulation. On the other hand, if the model was simulated for aeons, comparing this to a fixed runtime of the NN, one can get an arbitrarily high speedup.

      Therefore, I’d probably emphasize the concrete speedup less in an abstract and I’d provide some background on the speedup calculation such as above, so that the readers understand the context-dependence. That said, I do think that a simulation for anywhere between 250 and 1000 beats is among the most reasonable points of comparison (long enough for reasonable stability, but not too long to beat an already stable horse; pun with stables was actually completely unintended, but here it is...). I.e., the speedup observed is still valuable and valid, albeit in (I believe) a somewhat limited sense.

      We agree that the speedup comparison only focused on a very specific case and needs to be more thoroughly discussed and benchmarked. One of the main strengths of the emulator is to cut the time of prepacing to steady state, which is known to be a potential bottleneck for the speed of the single-cell simulations. The time it takes to reach the steady state in the simulator is heavily dependant on the actual maximum conductance configuration and the speed-up is thus heavily reliant on a per-case basis. The differences in architecture of the simulator and emulator further makes direct comparisons very difficult. In the revised version we now go into more detail regarding the runtime calculations and also compare it to an adaptive time stepping simulation (Myokit [Clerx et al., 2016]) in a new subsection:

      "The simulation of a single AP (see Section 2.1) sampled at a resolution of 20kHz took 293s on one core of a AMD Ryzen Threadripper 2990WX (clock rate: 3.0GHz) in CARPentry. Adaptive timestep solver of variable order, such as implemented in Myokit [Clerx et al., 2016], can significantly lower the simulation time (30s for our setup) by using small step sizes close to the depolarization (phase 0) and increasing the time step in all other phases. The emulation of a steady state AP sampled at a resolution of 20kHz for t ∈ [−10, 1000]ms took 18.7ms on a AMD Ryzen 7 3800X (clock rate: 3.9GHz) and 1.2ms on a Nvidia A100 (Nvidia Corporation, USA), including synchronization and data copy overhead between CPU and GPU.

      "The amount of required beats to reach the steady state of the cell in the simulator has a major impact on the runtime and is not known a-priori. On the other hand, both simulator and emulator runtime linearly depends on the time resolution, but since the output of the emulator is learned, the time resolution can be chosen at arbitrarily without affecting the AP at the sampled times. This makes direct performance comparisons between the two methodologies difficult. To still be able to quantify the speed-up, we ran Myokit using 100 beats to reach steady state, taking 3.2s of simulation time. In this scenario, we witnessed a speed-up of 171 and 2 · 103 of our emulator on CPU and GPU respectively (again including synchronization and data copy overhead between CPU and GPU in the latter case). Note that both methods are similarly expected to have a linear parallelization speedup across multiple cells.

      For the inverse problem, we parallelized the problem for multiple cells and keep the problem on the GPU to minimize the overhead, achieving emulations (including backpropagation) that run in 120µs per AP at an average temporal resolution of 2kHz. We consider this the peak performance which will be necessary for the inverse problem in Section 3.1.2." (Section 2.3.1)

      Note that the mentioned parallelization across multiple machines/hardware applies equally to the emulator and simulator (linear speed-up), though the utilization for single cells is most likely different (single vs. multi-cell parallelization).

      C I-(c) It appears that the accuracy of emulation drops off relatively sharply with increasing real-world applicability/relevance of the tasks it is applied to. That said, the authors are to be commended on declaring this transparently, rather than withholding such analyses. I particularly enjoyed the discussion of the not-always amazing results of the inverse problem on the experimental data. The point on low parameter identifiability is an important one and serves as a warning against overconfidence in our ability to infer cellular parameters from action potentials alone. On the other hand, I’m not that sure the difference between small tissue preps and single cells which authors propose as another source of the discrepancy will be that vast beyond the AP peak potential (probably much of the tissue prep is affected by the pacing electrode?), but that is a subjective view only. The influence of coupling could be checked if the simulated data were generated from 2D tissue samples/fibres, e.g. using the Myokit software.

      Given the points above (particularly the uncertain need for further speedup compared to running single-cell simulations), I am not sure that the technology generated will be that broadly adopted in the near future.

      However, this does not make the study uninteresting in the slightest - on the contrary, it explores something that many of us are thinking about, and it is likely to stimulate further development in the direction of computationally efficient emulation of relatively complex simulations.

      We agree that the parameter identifiability is an important point of discussion. While the provided experimental data gave us great insights already, we still believe that given the differences in the setup, we can not draw conclusions about the source of inaccuracies with absolute certainty. The suggested experiment to test the influence of coupling is of interest for future works and has been integrated into the discussion. Further details are given in the response to the recommendation R III- (t)

      Reviewer 2 - Comments

      Summary:

      This study provided a neural network emulator of the human ventricular cardiomyocyte action potential. The inputs are the corresponding maximum conductances and the output is the action potential (AP). It used the forward and inverse problems to evaluate the model. The forward problem was solved for synthetic data, while the inverse problem was solved for both synthetic and experimental data. The NN emulator tool enables the acceleration of simulations, maintains high accuracy in modeling APs, effectively handles experimental data, and enhances the overall efficiency of pharmacological studies. This, in turn, has the potential to advance drug development and safety assessment in the field of cardiac electrophysiology.

      Strengths:

      1) Low computational cost: The NN emulator demonstrated a massive speed-up of more than 10,000 times compared to the simulator. This substantial increase in computational speed has the potential to expedite research and drug development processes

      2) High accuracy in the forward problem: The NN emulator exhibited high accuracy in solving the forward problem when tested with synthetic data. It accurately predicted normal APs and, to a large extent, abnormal APs with early afterdepolarizations (EADs). High accuracy is a notable advantage over existing emulation methods, as it ensures reliable modeling and prediction of AP behavior

      C II-(a) Input space constraints: The emulator relies on maximum conductances as inputs, which explain a significant portion of the AP variability between cardiomyocytes. Expanding the input space to include channel kinetics parameters might be challenging when solving the inverse problem with only AP data available.

      Thank you for this comment. We consider this limitation a major drawback, as discussed in Section 4.3. Identifiability is already an issue when only considering the most important maximum conductances. Further extending the problem to include kinetics will most likely only increase the difficulty of the inverse problem. For the forward problem though, it might be of interest to people studying ionic models to further analyze the effects of channel kinetics.

      C II-(b) Simplified drug-target interaction: In reality, drug interactions can be time-, voltage-, and channel statedependent, requiring more complex models with multiple parameters compared to the oversimplified model that represents the drug-target interactions by scaling the maximum conductance at control. The complex model could also pose challenges when solving the inverse problem using only AP data.

      Thank you pointing out this limitation. We slightly adapted Section 4.3 to further highlight some of these limitations. Note however that the experimental drugs used have been shown to be influenced by this drug interaction in varying degrees [Li et al., 2017] (e.g. dofetilide vs. cisapride). However, the discrepancy in identifiability was mostly channel-based (0%-100%), whereas the variation in identifiability between drugs was much lower (39%-66%).

      C II-(c) Limited data variety: The inverse problem was solved using AP data obtained from a single stimulation protocol, potentially limiting the accuracy of parameter estimates. Including AP data from various stimulation protocols and incorporating pacing cycle length as an additional input could improve parameter identifiability and the accuracy of predictions.

      The proposed emulator architecture currently only considers the discussed maximum conductances as input and thus can only compensate when using different stimulation protocols. However, the architecture itself does not prohibit including any of these as parameters for future variants of the emulator. We potentially foresee future works extending on the architecture with modified datasets to include other parameters of importance, such as channel kinetics, stimulation protocols and pacing cycle lengths. These will however vary between the actual use-cases one is interested in.

      C II-(d) Larger inaccuracies in the inverse problem using experimental data: The reasons for this result are not quite clear. Hypotheses suggest that it may be attributed to the low parameter identifiability or the training data set were collected in small tissue preparation.

      The low parameter identifiability on some channels (e.g. GK1) poses a problem, for which we state multiple potential reasons. As of yet, no final conclusion can be drawn, warranting further research in this area.

      Reviewer 3 - Comments

      Summary:

      Grandits and colleagues were trying to develop a new tool to accelerate pharmacological studies by using neural networks to emulate the human ventricular cardiomyocyte action potential (AP). The AP is a complex electrical signal that governs the heartbeat, and it is important to accurately model the effects of drugs on the AP to assess their safety and efficacy. Traditional biophysical simulations of the AP are computationally expensive and time-consuming. The authors hypothesized that neural network emulators could be trained to predict the AP with high accuracy and that these emulators could also be used to quickly and accurately predict the effects of drugs on the AP.

      Strengths:

      One of the study’s major strengths is that the authors use a large and high-quality dataset to train their neural network emulator. The dataset includes a wide range of APs, including normal and abnormal APs exhibiting EADs. This ensures that the emulator is robust and can be used to predict the AP for a variety of different conditions.

      Another major strength of the study is that the authors demonstrate that their neural network emulator can be used to accelerate pharmacological studies. For example, they use the emulator to predict the effects of a set of known arrhythmogenic drugs on the AP. The emulator is able to predict the effects of these drugs, even though it had not been trained on these drugs specifically.

      C III-(a) One weakness of the study is that it is important to validate neural network emulators against experimental data to ensure that they are accurate and reliable. The authors do this to some extent, but further validation would be beneficial. In particular for the inverse problem, where the estimation of pharmacological parameters was very challenging and led to particularly large inaccuracies.

      Thank you for this recommendation. Further experimental validation of the emulator in the context of the inverse problem would be definitely beneficial. Still, an important observation is that the identifiability varies greatly between channels. While the inverse problem is an essential reason for utilizing the emulator, it is also empirically validated for the pure forward problem and synthetic inverse problem, together with the (limited) experimental validation. The sources of problems arising in estimating the maximum conductances of the experimental tissue preparations are important to discuss in future works, as we now further emphasize in the discussion. See also the response to the recommendations R III-(t).

      Reviewer 1 - Recommendations

      R I-(a) Could further detail on the software used for the emulation be provided? E.g. based on section 2.2.2, it sounds like a CPU, as well as GPU-based emulation, is possible, which is neat.

      Indeed as suspected, the emulator can run on both CPUs and GPUs and features automatic parallelization (per-cell, but also multi-cell), which is enabled by the engineering feats of PyTorch [Paszke et al., 2019]. This is now outlined in a bit more detail in Sec. 2 and 5.

      "The trained emulator is provided as a Python package, heavily utilizing PyTorch [Paszke et al., 2019] for the neural network execution, allowing it to be executed on both CPUs and NVidia GPUs." (Section 5)

      R I-(b) I believe that a potential use of NN emulation could be also in helping save time on prepacing models to stability - using the NN for ”rough” prepacing (e.g. 1000 beats), and then running a simulation from that point for a smaller amount of time (e.g. 50 beats). One could monitor the stability of states, so if the prepacing was inaccurate, one could quickly tell that these models develop their state vector substantially, and they should be simulated for longer for full accuracy - but if the model was stable within the 50 simulated beats, it could be kept as it is. In this way, the speedup of the NN and accuracy and insightfulness of the simulation could be combined. However, as I mentioned in the public review, I’m not sure there is a great need for further speedup of single-cell simulations. Such a hybrid scheme as described above might be perhaps used to accelerate genetic algorithms used to develop new models, where it’s true that hundreds of thousands to millions of cells are eventually simulated, and a speedup there could be practical. However one would have to have a separate NN trained for each protocol in the fitness function that is to be accelerated, and this would have to be retrained for each explored model architecture. I’m not sure if the extra effort would be worth it - but maybe yes to some people.

      Thank you for this valuable suggestion. As pointed out in C I-(a), one goal of this study was to reduce the timeconsuming task of prepacing. Still, in its current form the emulator could not be utilized for prepacing simulators, as only the AP is computed by the emulator. For initializing a simulation at the N-th beat, one would additionally need all computed channel state variables. However, a simple adaptation of the emulator architecture would allow to also output the mentioned state variables.

      R I-(c) Re: ”Several emulator architectures were tried on the training and validation data sets and the final choice was hand-picked as a good trade-off between high accuracy and low computational cost” - is it that the emulator architecture was chosen early in the development, and the analyses presented in the paper were all done with one previously selected architecture? Or is it that the analyses were attempted with all considered architectures, and the well-performing one was chosen? In the latter case, this could flatter the performance artificially and a test set evaluation would be worth carrying out.

      We apologize for the unclear description of the architectural validation. The validation was in fact carried out with 20% of the training data (data set #1), which is however completely disjoint with the test set (#2, #3, #4, formerly data set #1 and #2) on which the evaluation was presented. To further clarify the four different data sets used in the study, we now dedicated an additional section to describing each set and where it was used (see also our response below R I-(d)), and summarize them in Table 1, which we also added at R II-(a). The cited statement was slightly reworked.

      "Several emulator architectures were tried on the training and validation data sets and the final choice was hand-picked as a good trade-off between high accuracy on the validation set (#1) and low computational runtime cost." (Section 2.2.2)

      R I-(d) When using synthetic data for the forward and inverse problem, with the various simulated drugs, is it that split of the data into training/validation test set was done by the drug simulated (i.e., putting 80 drugs and the underlying models in the training set, and 20 into test set)? Or were the data all mixed together, and 20% (including drugs in the test set) were used for validation? I’m slightly concerned by the potential of ”soft” data leaks between training/validation sets if the latter holds. Presumably, the real-world use case, especially for the inverse problem, will be to test drugs that were not seen in any form in the training process. I’m also not sure whether it’s okay to reuse cell models (sets of max conductances) between training and validation tests - wouldn’t it be better if these were also entirely distinct? Could you please comment on this?

      We completely agree with the main points of apprehension that training, validation and test sets all serve a distinct purpose and should not be arbitrarily mixed. However, this is only a result of the sub-optimal description of our datasets, which we heavily revised in Section 2.2.1 (Data, formerly 2.3.1). We now present the data using four distinct numbers: The initial training/validation data, now called data set #1 (formerly no number), is split 80%/20% into training and validation sets (for architectural choices) respectively. The presented evaluations in Section 2.3 (Evaluation) are purely performed on data set #2 (normal APs, formerly #1), #3 (EADs, formerly #2) and #4 (experimental).

      R I-(e) For the forward problem on EADs, I’m not sure if the 72% accuracy is that great (although I do agree that the traces in Fig 12-left also typically show substantial ICaL reactivation, but this definitely should be present, given the IKr and ICaL changes). I would suggest that you also consider the following design for the EAD investigation: include models with less severe upregulation of ICaL and downregulation of IKr, getting a population of models where a part manifests EADs and a part does not. Then you could run the emulator on the input data of this population and be able to quantify true, falsexpositive, negative detections. I think this is closer to a real-world use case where we have drug parameters and a cell population, and we want to quickly assess the arrhythmic risk, with some drugs being likely entirely nonrisky, some entirely risky, and some between (although I still am not convinced it’s that much of an issue to just simulate this in a couple of thousands of cells).

      Thank you for pointing out this alternative to address the EAD identification task. Even though the values chosen in Table 2 seem excessively large, we still only witnessed EADs in 171 of the 950 samples. Especially border cases, which are close to exhibiting EADs are hardest to estimate for the NN emulator. As suggested, we now include the study with the full 950 samples (non-EAD & EAD) and classify the emulator AP into one of the labels for each sample. The mentioned 72.5% now represent the sensitivity, whereas our accuracy in such a scenario becomes 90.8% (total ratio of correct classifications):

      "The data set #3 was used second and Appendix C shows all emulated APs, both containing the EAD and non-EAD cases. The emulation of all 950 APs took 0.76s on the GPU specified in Section 2.2.3 We show the emulation of all maximum conductances and the classification of the emulation. The comparison with the actual EAD classification (based on the criterion outlined in Appendix A) results in true-positive (EAD both in the simulation and emulation), false-negative (EAD in the simulation, but not in the emulation), false-positive (EAD in the emulation, but not in the simulation) and true-negative (no EAD both in the emulation and simulation). The emulations achieved 72.5% sensitivity (EAD cases correctly classified) and 94.9% specificity (non-EAD cases correctly classified), with an overall accuracy of 90.8% (total samples correctly classified). A substantial amount of wrongly classified APs showcase a notable proximity to the threshold of manifesting EADs. Figure 7 illustrates the distribution of RMSEs in the EAD APs between emulated and ground truth drugged APs. The average RMSE over all EAD APs was 14.5mV with 37.1mV being the maximum. Largest mismatches were located in phase 3 of the AP, in particular in emulated APs that did not fully repolarize." (Section 3.1.1)

      R I-(f) Figure 1 - I think a large number of readers will understand the mathematical notation describing inputs/outputs; that said, there may be a substantial number of readers who may find that hard to read (e.g. lab-based researchers, or simulation-based researchers not familiar with machine learning). At the same time, this is a very important part of the paper to explain what is done where, so I wonder whether using words to describe the inputs/outputs would not be more practical and easier to understand (e.g. ”drug-based conductance scaling factor” instead of ”s” ?). It’s just an idea - it needs to be tried to see if it wouldn’t make the figure too cluttered.

      We agree that the mathematical notation may be confusing to some readers. As a compromise between using verbose wording and mathematical notation, we introduced a legend in the lower right corner of the figure that shortly describes the notation in order to help with interpreting the figure.

      R I-(g) ”APs with a transmembrane potential difference of more than 10% of the amplitude between t = 0 and 1000 ms were excluded” - I’m not sure I understand what exactly you mean here - could you clarify?

      With this criterion, we try to discard data that is far away from fully repolarizing within the given time frame, which applies to 116 APs in data set #1 and 50 APs in data set #3. We added a small side note into the text:

      "APs with a transmembrane potential difference of more than 10% of the amplitude between t = 0 and 1000ms (indicative of an AP that is far away from full repolarization) were excluded." (Section 2.2.1)

      R I-(h) Speculation (for the future) - it looks like a tool like this could be equally well used to predict current traces, as well as action potentials. I wonder, would there be a likely benefit in feeding back the currents-traces predictions on the input of the AP predictor to provide additional information? Then again, this might be already encoded within the network - not sure.

      Although not possible with the chosen architecture (see also R I-(b)), it is worth thinking about an implementation in future works and to study differences to the current emulator.

      Entirely minor points:

      R I-(i) ”principle component analysis” → principal component analysis

      Fixed

      R I-(j) The paper will be probably typeset by elife anyway, but the figures are often quite far from their sections, with Results figures even overflowing into Discussion. This can be often fixed by using the !htb parameters (\begin{figure}[!htb]), or potentially by using ”\usepackage[section]{placeins}” and then ”\FloatBarrier” at the start and end of each section (or subsection) - this prevents floating objects from passing such barriers.

      Thank you for these helpful suggestions. We tried reducing the spacing between the figures and their references in the text, hopefully improving the reader’s experience.

      R I-(k) Alternans seems to be defined in Appendix A (as well as repo-/depolarization abnormalities), but is not really investigated. Or are you defining these just for the purpose of explaining what sorts of data were also included in the data?

      We defined alternans since this was an exclusion criterion for generating simulation data.

      Reviewer 2 - Recommendations

      R II-(a) Justification for methods selection: Explain the rationale behind important choices, such as the selection of specific parameters and algorithms.

      Thank you for this recommendation, we tried to increase transparency of our choices by introducing a separate data section that summarizes all data sets and their use cases in Section 2.2.1 and also collect many of the explanations there. Additionally we added an overview table (Table 1) of the utilized data.

      Author response table 1.

      Table 1: Summary of the data used in this study, along with their usage and the number of valid samples. Note that each AP is counted individually, also in cases of control/drug pairs.

      R II-(b) Interpretation of the evaluation results: After presenting the evaluation results, consider interpretations or insights into what the results mean for the performance of the emulator. Explain whether the emulator achieved the desired accuracy or compare it with other existing methods. In the revised version, we tried to further expand the discussion on possible applications of our emulator (Section 4.2). See also our response to C I-(a). To the best of our knowledge, there are currently no out-of-the-box methods available for directly comparing all experiments we considered in our work.

      Reviewer 3 - Recommendations

      R III-(a) In the introduction (Page 3) and then also in the 2.1 paragraph authors speak about the ”limit cycle”: Do you mean steady state conditions? In that case, it is more common to use steady state.

      When speaking about the limit cycle, we refer to what is also sometimes called the steady state, depending on the field of research and/or personal preference. We now mention both terms at the first occurence, but stick with the limit cycle terminology which can also be found in other works, see e.g. [Endresen and Skarland, 2000].

      R III-(b) On page 3, while comparing NN with GP emulators, I still don’t understand the key reason why NN can solve the discontinuous functions with more precision than GP.

      The potential problems in modeling sharp continuities using GPs is further explained in the referenced work [Ghosh et al., 2018] and further references therein:

      "Statistical emulators such as Gaussian processes are frequently used to reduce the computational cost of uncertainty quantification, but discontinuities render a standard Gaussian process emulation approach unsuitable as these emulators assume a smooth and continuous response to changes in parameter values [...] Applying GPs to model discontinuous functions is largely an open problem. Although many advances (see the discussion about non-stationarity in [Shahriari et al., 2016] and the references in there) have been made towards solving this problem, a common solution has not yet emerged. In the recent GP literature there are two specific streams of work that have been proposed for modelling non-stationary response surfaces including those with discontinuities. The first approach is based on designing nonstationary processes [Snoek et al., 2014] whereas the other approach attempts to divide the input space into separate regions and build separate GP models for each of the segmented regions. [...]"([Ghosh et al., 2018])

      We integrated a short segment of this explanation into Section 1.

      R III-(c) Why do authors prefer to use CARPentry and not directly openCARP? The use of CARPentry is purely a practical choice since the simulation pipeline was already set up. As we now point out however in Sec. 2.1 (Simulator), simulations can also be performed using any openly available ionic simulation tool, such as Myokit [Clerx et al., 2016], OpenCOR [Garny and Hunter, 2015] and openCARP [Plank et al., 2021]. We emphasized this in the text.

      "Note, that the simulations can also be performed using open-source software such as Myokit [Clerx et al., 2016], OpenCOR [Garny and Hunter, 2015] and openCARP [Plank et al., 2021]." (Section 2.1)

      R III-(d) In paragraph 2.1:

      (a) In this sentence: ”Various solver and sampling time steps were applied to generate APs and the biomarkers used in this study (see Appendix A)” this reviewer suggests putting the Appendix reference near “biomarkers”. In addition, a figure that shows the test of various solver vs. sampling time steps could be interesting and can be added to the Appendix as well.

      (b) Why did the authors set the relative difference below 5% for all biomarkers? Please give a reference to that choice. Instead, why choose 2% for the time step?

      1) We adjusted the reference to be closer to “biomarkers”. While we agree that further details on the influence of the sampling step would be of interest to some of the readers, we feel that it is far beyond the scope of this paper.

      2) There is no specific reference we can provide for the choice. Our goal was to reach 5% relative difference, which we surpassed by the chosen time steps of 0.01 ms (solver) and 0.05 ms (sampling), leading to only 2% difference. We rephrased the sentence in question to make this clear.

      "We considered the time steps with only 2% relative difference for all AP biomarkers (solver: 0.01ms; sampling: 0.05ms) to offer a sufficiently good approximation." (Section 2.1)

      R III-(e) In the caption of Figure 1 authors should include the reference for AP experimental data (are they from Orvos et al. 2019 as reported in the Experimental Data section?)

      We added the missing reference as requested. As correctly assumed, they are from [Orvos et al., 2019].

      R III-(f) Why do authors not use experimental data in the emulator development/training?

      For the supervised training of our NN emulator, we need to provide the maximum conductances of our chosen channels for each AP. While it would be beneficial to also include experimental data in the training to diversify the training data, the exact maximum conductances in our the considered retrospective experiments are not known. In the case such data would be available with low measurement uncertainty, it would be possible to include.

      R III-(g) What is TP used in the Appendix B? I could not find the acronymous explanation.

      We are sorry for the oversight, TP refers to the time-to-peak and is now described in Appendix A.

      R III-(h) Are there any reasons for only using ST and no S1? Maybe are the same?

      The global sensitivity analysis is further outlined in Appendix B, also showing S1 (first-order effects) and ST (variance of all interactions) together (Figure 11) [Herman and Usher, 2017] and their differences (e.g. in TP) Since S1 only captures first-order effects, it may fail to capture higher-order interactions between the maximum conductances, thus we favored ST.

      R III-(i) In Training Section Page 8. It is not clear why it is necessary to resample data. Can you motivate?

      The resampling part is motivated by exactly capturing the swift depolarization dynamics, whereas the output from CARPentry is uniformly sampled. This is now further highlighted in the text.

      "Then, the data were non-uniformly resampled from the original uniformly simulated APs, to emphasize the depolarization slope with a high accuracy while lowering the number of repolarization samples. For this purpose, we resamled the APs [...]" (Section 2.2.1)

      R III-(j) For the training of the neuronal network, the authors used the ADAM algorithm: have you tested any other algorithm?

      For training neural networks, ADAM has become the current de-facto standard and is certainly a robust choice for training our emulator. While there may exist slightly faster, or better-suited training algorithms, we witnessed (qualitative) convergence in the training (Equation (2)). We thus strongly believe that the training algorithm is not a limiting factor in our study.

      R III-(k) What is the amount of the drugs tested? Is the same dose reported in the description of the second data set or the values are only referring to experimental data? Moreover, it is not clear if in the description of experimental data, the authors are referring to newly acquired data (since they described in detail the protocol) or if they are obtained from Orvos et al. 2019 work.

      In all scenarios, we tested 5 different drugs (cisapride, dofetilide, sotalol, terfenadine, verapamil). We revised our previous presentation of the data available, and now try to give a concise overview over the utilized data (Section 2.2.1 and table 1) and drug comparison with the CiPA distributions (Table 5, former 4). Note that in the latter case, the available expected channel scaling factors by the CiPA distributions vary, but are now clearly shown in Table 5.

      R III-(l) In Figure 4, I will avoid the use of “control” in the legend since it is commonly associated with basal conditions and not with the drug administration.

      The terminology “control” in this context is in line with works from the CiPA initiative, e.g. [Li et al., 2017] and refers to the state of cell conditions before the drug wash-in. We added a minor note the first time we use the term control in the introduction to emphasize that we refer to the state of the cell before administering any drugs

      "To compute the drugged AP for given pharmacological parameters is a forward problem, while the corresponding inverse problem is to find pharmacological parameters for given control (before drug administration) and drugged AP." (Section 1)

      R III-(m) In Table 1 when you referred to Britton et al. 2017 work, I suggest adding also 10.1371/journal.pcbi.1002061.

      We added the suggested article as a reference.

      R III-(n) For the minimization problem, only data set #1 has been used. Have you tested data set #2?

      In the current scenario, we only tested the inverse problem for data set #2 (former #1). The main purpose for data set #3 (former #2), was to test the possibility to emulate EAD APs. Given the overall lower performance in comparison to data set #2 (former #1), we also expect deteriorated results in comparison to the existing inverse synthetic problem.

      R III-(o) In Figure 6 you should have the same x-axis (we could not see any points in the large time scale for many biomarkers). Why dVmMax is not uniformed distributed compared to the others? Can you comment on that?

      As suggested, we re-adjusted the x-range to show the center of distributions. Additionally, we denoted in each subplot the number of outliers which lie outside of the shown range. The error distribution on dVmMax exhibits a slightly off-center, left-tailed normal distribution, which we now describe a bit more in the revised text:

      "While the mismatches in phase 3 were simply a result of imperfect emulation, the mismatches in phase 0 were a result of the difficulty in matching the depolarization time exactly. [...] Likewise, the difficulty in exactly matching the depolarization time leads to elevated errors and more outliers in the biomarkers influenced by the depolarization phase (TP and dVmMax)," (Section 3.1.1)

      R III-(p) Page 14. Can the authors better clarify ”the average RMSE over all APs 13.6mV”: is it the mean for all histograms in Figure 7? (In Figure 5 is more evident the average RMSE).

      The average RMSE uses the same definition for Figures 5 and 7: It is the average over all the RMSEs for each pair of traces (simulated/emulated), though the amount of samples is much lower for the EAD data set and not normal distributed.

      R III-(q) In Table 4, the information on which drugs are considered should be added. For each channel, we added the names of the drugs for which respective data from the CiPA initiative were available.

      R III-(r) Pag. 18, second paragraph, there is a repetition of ”and”.

      Fixed

      R III-(s) The pair’s combination of scaling factors for simulating synthetic drugs reported in Table 2, can be associated with some effects of real drugs? In this case, I suggest including the information or justifying the choice.

      The scaling factors in Table 2 are used to create data set #3 (former #2), and is meant to provide several APs which expose EADs. This is described in more detail in the new data section, Section 2.2.1:

      "Data set #3: The motivation for creating data set #3 was to test the emulator on data of abnormal APs showing the repolarization abnormality EAD. This is considered a particularly relevant AP abnormality in pharmacological studies because of their role in the genesis of drug-induced ventricular arrhythmia’s [Weiss et al., 2010]. Drug data were created using ten synthetic drugs with the hERG channel and the Cav1.2 channel as targets. To this end, ten samples with pharmacological parameters for GKr and PCa (Table 2) were generated and the synthetic drugs were applied to the entire synthetic cardiomyocyte population by scaling GKr and PCa with the corresponding pharmacological parameter. Of the 1000 APs simulated, we discarded APs with a transmembrane potential difference of more than 10% of the amplitude between t = 0 and 1000ms (checked for the last AP), indicative of an AP that does not repolarize within 1000ms. This left us with 950 APs, 171 of which exhibit EAD (see Appendix C)." (Section 2.2.1)

      R III-(t) A general comment on the work is that the authors claim that their study highlights the potential of NN emulators as a powerful tool for increased efficiency in future quantitative systems pharmacology studies, but they wrote ”Larger inaccuracies were found in the inverse problem solutions on experimental data highlight inaccuracies in estimating the pharmacological parameters”: so, I was wondering how they can claim the robustness of NN use as a tool for more efficient computation in pharmacological studies.

      The discussed robustness directly refers to efficiently emulating steady-state/limit cycle APs from a set of maximum conductances (forward problem, Section 3.1.1). We extensively evaluated the algorithm and feel that given the low emulation RMSE of APs (< 1 mV), the statement is warranted. The inverse estimation, enabled through this rapid evaluation, performs well on synthetic data, but shows difficulties for experimental data. Note however that at this point there are multiple potential sources for these problems as highlighted in the Evaluation section (Section 4.1) and Table 5 (former 4) highlights the difference in accuracy of estimating per-channel maximum conductances, revealing a potentially large discrepancy. The emulator also offers future possibilities to incorporate additional informations in the forms of either priors, or more detailed measurements (e.g. calcium transients) and can be potentially improved to a point where also the inverse problem can be satisfactorily solved in experimental preparations, though further analysis will be required.

      References [Beck, 2017] Beck, A. (2017). First-order methods in optimization. SIAM.

      [Britton et al., 2013] Britton, O. J., Bueno-Orovio, A., Ammel, K. V., Lu, H. R., Towart, R., Gallacher, D. J., and Rodriguez, B. (2013). Experimentally calibrated population of models predicts and explains intersubject variability in cardiac cellular electrophysiology. Proceedings of the National Academy of Sciences, 110(23).

      [Chang et al., 2017] Chang, K. C., Dutta, S., Mirams, G. R., Beattie, K. A., Sheng, J., Tran, P. N., Wu, M., Wu, W. W., Colatsky, T., Strauss, D. G., and Li, Z. (2017). Uncertainty quantification reveals the importance of data variability and experimental design considerations for in silico proarrhythmia risk assessment. Frontiers in Physiology, 8.

      [Clerx et al., 2016] Clerx, M., Collins, P., de Lange, E., and Volders, P. G. A. (2016). Myokit: A simple interface to cardiac cellular electrophysiology. Progress in Biophysics and Molecular Biology, 120(1):100–114.

      [Endresen and Skarland, 2000] Endresen, L. and Skarland, N. (2000). Limit cycle oscillations in pacemaker cells. IEEE Transactions on Biomedical Engineering, 47(8):1134–1137.

      [Garny and Hunter, 2015] Garny, A. and Hunter, P. J. (2015). OpenCOR: a modular and interoperable approach to computational biology. Frontiers in Physiology, 6.

      [Gemmell et al., 2016] Gemmell, P., Burrage, K., Rodr´ıguez, B., and Quinn, T. A. (2016). Rabbit-specific computational modelling of ventricular cell electrophysiology: Using populations of models to explore variability in the response to ischemia. Progress in Biophysics and Molecular Biology, 121(2):169–184.

      [Ghosh et al., 2018] Ghosh, S., Gavaghan, D. J., and Mirams, G. R. (2018). Gaussian process emulation for discontinuous response surfaces with applications for cardiac electrophysiology models.

      [Herman and Usher, 2017] Herman, J. and Usher, W. (2017). SALib: An open-source python library for sensitivity analysis. J. Open Source Softw., 2(9):97.

      [Johnstone et al., 2016] Johnstone, R. H., Chang, E. T., Bardenet, R., de Boer, T. P., Gavaghan, D. J., Pathmanathan, P., Clayton, R. H., and Mirams, G. R. (2016). Uncertainty and variability in models of the cardiac action potential: Can we build trustworthy models? Journal of Molecular and Cellular Cardiology, 96:49–62.

      [Li et al., 2017] Li, Z., Dutta, S., Sheng, J., Tran, P. N., Wu, W., Chang, K., Mdluli, T., Strauss, D. G., and Colatsky, T. (2017). Improving the in silico assessment of proarrhythmia risk by combining hERG (human ether`a-go-go-related gene) channel–drug binding kinetics and multichannel pharmacology. Circulation: Arrhythmia and Electrophysiology, 10(2).

      [Muszkiewicz et al., 2016] Muszkiewicz, A., Britton, O. J., Gemmell, P., Passini, E., S´anchez, C., Zhou, X., Carusi, A., Quinn, T. A., Burrage, K., Bueno-Orovio, A., and Rodriguez, B. (2016). Variability in cardiac electrophysiology: Using experimentally-calibrated populations of models to move beyond the single virtual physiological human paradigm. Progress in Biophysics and Molecular Biology, 120(1):115–127.

      [Orvos et al., 2019] Orvos, P., Kohajda, Z., Szlov´ak, J., Gazdag, P., Arp´adffy-Lovas, T., T´oth, D., Geramipour, A.,´ T´alosi, L., Jost, N., Varr´o, A., and Vir´ag, L. (2019). Evaluation of possible proarrhythmic potency: Comparison of the effect of dofetilide, cisapride, sotalol, terfenadine, and verapamil on hERG and native iKr currents and on cardiac action potential. Toxicological Sciences, 168(2):365–380.

      [Paszke et al., 2019] Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Kopf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., and Chintala, S. (2019). PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc.

      [Plank et al., 2021] Plank, G., Loewe, A., Neic, A., Augustin, C., Huang, Y.-L., Gsell, M. A., Karabelas, E., Nothstein, M., Prassl, A. J., S´anchez, J., Seemann, G., and Vigmond, E. J. (2021). The openCARP simulation environment for cardiac electrophysiology. Computer Methods and Programs in Biomedicine, 208:106223.

      [Shahriari et al., 2016] Shahriari, B., Swersky, K., Wang, Z., Adams, R. P., and de Freitas, N. (2016). Taking the Human Out of the Loop: A Review of Bayesian Optimization. Proceedings of the IEEE, 104(1):148–175. Conference Name: Proceedings of the IEEE.

      [Snoek et al., 2014] Snoek, J., Swersky, K., Zemel, R., and Adams, R. (2014). Input Warping for Bayesian Optimization of Non-Stationary Functions. In Proceedings of the 31st International Conference on Machine Learning, pages 1674–1682. PMLR. ISSN: 1938-7228.

      [Weiss et al., 2010] Weiss, J. N., Garfinkel, A., Karagueuzian, H. S., Chen, P.-S., and Qu, Z. (2010). Early afterdepolarizations and cardiac arrhythmias. Heart Rhythm, 7(12):1891–1899.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      The study by Ghafari et al. addresses a question that is highly relevant for the field of attention as it connects structural differences in subcortical regions with oscillatory modulations during attention allocation. Using a combination of magnetoencephalography (MEG) and magnetic resonance imaging (MRI) data in human subjects, inter-individual differences in the lateralization of alpha oscillations are explained by asymmetry of subcortical brain regions. The results are important, and the strength of the evidence is convincing. Yet, clarifying the rationale, reporting the data in full, a more comprehensive analysis, and a more detailed discussion of the implications will strengthen the manuscript further.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors re-analysed the data of a previous study in order to investigate the relation between asymmetries of subcortical brain structures and the hemispheric lateralization of alpha oscillations during visual spatial attention. The visual spatial attention task crossed the factors of target load and distractor salience, which made it possible to also test the specificity of the relation of subcortical asymmetries to lateralized alpha oscillations for specific attentional load conditions. Asymmetry of globus pallidus, caudate nucleus, and thalamus explained inter-individual differences in attentional alpha modulation in the left versus right hemisphere. Multivariate regression analysis revealed that the explanatory potential of these regions' asymmetries varies as a function of target load and distractor salience.

      Strengths:

      The analysis pipeline is straightforward and follows in large parts what the authors have previously used in Mazzetti et al (2019). The authors use an interesting study design, which allows for testing of effects specific to different dimensions of attentional load (target load/distractor salience). The results are largely convincing and in part replicate what has previously been shown. The article is well-written and easy to follow.

      We thank the reviewer for their interest in our study.

      Weaknesses:

      While the article is interesting to read for researchers studying alpha oscillations in spatial attention, I am somewhat sceptical about whether this article is of high interest to a broader readership. Although I read the article with interest, the conceptual advance made here can be considered mostly incremental. As the authors describe, the present study's main advance is that it does not include reward associations (as in previous work) and includes different levels of attentional load. While these design features and the obtained results indeed improve our general understanding of how asymmetries of subcortical structures relate to lateralized alpha oscillations, the conceptual advance is somewhat limited.

      We thank the reviewer for their constructive comment. We’d like to highlight that this is the first study to show relationship between subcortical structures asymmetry with attention-modulated alpha oscillation that did not involve any reward-associations- which is the most studied role of basal ganglia. We also believe there is value is having a second study linking the asymmetry in volume of subcortical structures to the modulation of alpha oscillations as this surprising finding also have important clinical implications (see below). We edited the manuscript as below to explain the advances made in this study:

      Introduction (Line 112): “Our current findings broaden our understanding of how subcortical structures are involved in modulating alpha oscillations during top-down spatial attention, in the absence of any reward or value associations. “

      Discussion (Line 301): “It has also been shown that the spatial extent of pathological change in subcortical structures can predict cognitive changes in Parkinson’s Disease (43). […] Changes in neocortical oscillatory activity have also been observed in neurological disorders which mainly are known to affect subcortical structures. For example, individuals with Alzheimer's Disease demonstrate an increase in slow oscillatory activities and a decrease in higher frequency oscillations (45). Moreover, in patients with Parkinson’s Disease, the power of beta oscillations increases relatively to when they are dopamine-depleted compared with when they are on dopaminergic medication (46).”

      While the analysis of the relation of individual subcortical structures to alpha lateralization in different attentional load conditions is interesting, I am not convinced that the present analysis is suited to draw strong conclusions about the subcortical regions' specificity. For example, the Thalamus (Fig. 5) shows a significant negative beta estimate only in one condition (low-load target, non-salient distractor) but not in the other conditions. However, the actual specificity of the relation of thalamus asymmetry to lateralized alpha oscillations would require that the beta estimate for this one condition is significantly higher than the beta estimates for the other three conditions, which has not been tested as far as I understand.

      We thank the reviewer for this constructive comment. We agree with the reviewer that we should compare the beta value amongst the conditions. We therefore determined to better harness the multivariate nature of our analysis. Multivariate regression analysis allows one to test the null hypothesis that a given predictor does not contribute to all the dependent variables. A rejection of this hypothesis would suggest that lateralization of a given region of interest significantly predicts variability across all 4 of the task conditions, whereas failure to reject the null would imply that the predictive relationship holds only for that single condition. We tested this global null hypothesis using a MANOVA test and found the following which we have added to the manuscript:

      Results (Line 250): “To ascertain whether each predictor contributes to all conditions, we conducted statistical tests on the results of our MMR using the null hypothesis that a given regressor does not impact all dependent variables. We found that while, with marginal significancy, caudate nucleus can predict variability across all four of the task conditions (F(26,4) = 2.82, p-value = 0.046), the predictive relationships of thalamus (F(26,4) = 2.43, p-value = 0.073) with condition 1, and globus pallidus (F(26,4) = 2.29, p-value = 0.087) with conditions 2 and 3 hold only for these conditions. In sum, this demonstrates that when the task is easiest (condition 1), the thalamus is related to alpha modulation. When the task is most difficult (condition 4), the caudate nucleus relates to the alpha modulation, however, its contributions are substantial enough to predict outcomes across all conditions. For the conditions with medium difficulty (conditions 2 and 3) the globus pallidus is related to the alpha band modulation. “

      Method (Line 599): “To examine the specificity of each regressor for lateralized alpha in each condition, we statistically assessed the results of the MMR against the null hypothesis that a particular predictor does not contribute to all dependent variables, employing a MANOVA test in RStudio (version 2022.02.2) (80).”

      Discussion (Line 337): “Thalamus, Globus Pallidus, and Caudate nucleus play varying roles across different load conditions.”

      Discussion (Line 361): “Although these findings highlight the varying contributions of different regions, they do not imply a lack of evidence for correlations between these subcortical structures and other load conditions.”

      Discussion (Line 379): “Additionally, we refrained from directly comparing the contributions of subcortical structures to different conditions due to low statistical power. […] In future studies it would be interesting to design an experiment directly addressing which subcortical regions contribute to distractor and target load in terms of modulating the alpha band activity. In order to ensure sufficient statistical power for doing so possibly each factor needs to be addressed in different experiments.”

      Reviewer #3 (Public Review):

      Summary:

      In this study, Ghafari et al. explored the correlation between hemispheric asymmetry in the volume of various subcortical regions and lateralization of posterior alpha-band oscillations in a spatial attention task with varying cognitive demands. To this end, they combined structural MRI and task MEG to investigate the relationship between hemispheric differences in the volume of basal ganglia, thalamus, hippocampus, and amygdala and hemisphere-specific modulation of alpha-band power. The authors report that differences in the thalamus, caudate nucleus, and globus pallidus volume are linked to the attention-related changes in alpha band oscillations with differential correlations for different regions in different conditions of the design (depending on the salience of the distractor and/or the target).

      Strengths:

      The manuscript contributes to filling an important gap in current research on attention allocation which commonly focuses exclusively on cortical structures. Because it is not possible to reliably measure subcortical activity with non-invasive electrophysiological methods, they correlate volumetric measurements of the relevant subcortical regions with cortical measurements of alpha band power. Specifically, they build on their own previous finding showing a correlation between hemispheric asymmetry of basal ganglia volumes and alpha lateralization by assessing a task without an explicit reward component. Furthermore, the authors use differences in saliency and perceptual load to disentangle the individual contributions of the subcortical regions.

      We appreciate the reviewer’s interest in our study.

      Weaknesses:

      The theoretical bases of several aspects of the design and analyses remain unclear. Specifically, we missed statements in the introduction about why it is reasonable, from a theoretical perspective, to expect:

      (i) a link between volumetric measurements and task activity;

      We thank the reviewer for this constructive feedback. We have now addressed this concern in the revised manuscript.

      Discussion (Line 293): “It has been demonstrated that extensive navigation experience enlarges the size of right hippocampus (40). Furthermore, in terms of neurological disorders, it is well established that shrinkage (atrophy) in specific regions is a predictor of a number of neurological and psychiatric conditions including Parkinson’s disease, dementia, and Huntington’s disease. […] It has also been shown that the spatial extent of pathological change in subcortical structures can predict cognitive changes in Parkinson’s Disease (43). […] Changes in neocortical oscillatory activity have also been observed in neurological disorders which mainly are known to affect subcortical structures. For example, individuals with Alzheimer's Disease demonstrate an increase in slow oscillatory activities and a decrease in higher frequency oscillations (45). Moreover, in patients with Parkinson’s Disease, the power of beta oscillations increase relatively to when they are dopamine-depleted compared with when they are on dopaminergic medication (46). “

      (ii) a specific link with hemispheric asymmetry in subcortical structures (While focusing on hemispheric lateralization might circumvent the problem of differences in head size, it would be better to justify this focus theoretically, which requires for example a short review of evidence showing ipsilateral vs contralateral connections between the relevant subcortical and cortical structures);

      We thank the reviewer for this helpful comment that resulted in clarification of the manuscript. We addressed this issue in the revised manuscript; we also now have complemented the revised manuscript with papers directly investigating asymmetry of subcortical regions in relation to neurological disorders:

      Introduction (Line 102): “We utilized the hemispheric laterality of subcortical structures and alpha modulation to overcome issues related to individual variations in oscillatory power and head size.”

      Discussion (Line 314): “Employing hemispheric lateralization was motivated by the organizational characteristic of structural asymmetry in healthy brain (47). Additionally, considering the effects of aging (48) and neurodegenerative disorders, such as Alzheimer's Disease (49), on brain symmetry influenced this approach. Furthermore, computing lateralization indices for individuals addresses the challenge of accommodating variations in both head size and the power of oscillatory activity.”

      Discussion (Line 374): “Furthermore, in this study, our emphasis has been on assessing the size of subcortical structures. Future investigations could explore subcortical white matter connectivities and hemispheric asymmetries. This approach has previously been conducted on superior longitudinal fasciculus (SLF) (61,62) and holds potential for examining cortico-subcortical connectivity in the context of oscillatory asymmetries.”

      (iii) effects not only in basal ganglia and thalamus, but also hippocampus and amygdala (a justification of selection of all ROIs);

      We thank the reviewer for this comment. We assessed the hippocampus and amygdala because they are automatically segmented in the FIRST algorithm. As our analysis showed they did not show a relation to the modulation of alpha oscillations, these regions also provide a useful control for our approach. Therefore, we included all subcortical structures in the model and evaluated their predictive impact. This is now addressed in the revised manuscript.

      Method (Line 477): “FIRST is an automated model-based tool that runs a two-stage affine transformation to MNI152 space, to achieve a robust pre-alignment of thalamus, caudate nucleus, putamen, globus pallidus, hippocampus, amygdala, and nucleus accumbens based on individual’s T1-weighted MR images.”

      Method (Line 576): “The absence of a relationship between modulations of alpha oscillations and the hippocampus and amygdala was expected as these regions typically are not associated with the allocation of spatial attention and thus add validity to our approach. “

      (iv) effects that depend on distractor versus target salience (a rationale for the specific two-factor design is missing);

      We thank the reviewer for this comment that helped us clarify the manuscript. The two-factor design is to investigate how allocation of attentional resources specifically relates to mechanisms of excitability and suppression mechanism. For this reason, both the salience of the distractor (associated with suppression) and the perceptual load of the target (associated with excitability) had to be manipulated. We clarified the rationale in the revised version as below:

      Introduction (Line 96): “We analyzed MEG and structural data from a previous study (27), in which spatial cues guided participants to covertly attend to one stimulus (target) and ignore the other (distractor). To investigate the relationship between the allocation of attentional resources and mechanisms of neural excitability and suppression, the target load and the visual saliency of the distractor were manipulated using a noise mask. This load/salience manipulation resulted in four conditions that affect the attentional demands of target and distractor.”

      (v) effects in the absence of reward (why it is important to show that the effect seen previously in a task with reward is seen also in a task without reward);

      We thank the reviewer for this clarification comment. We addressed this question in introduction and discussion as below:

      Introduction (Line 107): “By examining their role in a task without explicit reward, we aim to elucidate the generalizability of the contributions of subcortical structures to spatial attention modulation. Such a finding would implicate a role for the basal ganglia in cognition beyond the well-studied realm of the estimation of choice values (33). Specifically, in a prior study (28), we observed that the contributions of the basal ganglia were most pronounced when the items in question were associated with a reward. Our current findings broaden our understanding of how subcortical structures are involved in modulating alpha oscillations during top-down spatial attention, in the absence of any reward or value associations. “

      Discussion (Line 333): “This convergence of results not only corroborates the validity and consistency of our findings but also extends the empirical foundation supporting the predictive role of the asymmetry of globus pallidus in modulating alpha oscillations beyond reward valence and to the context of attention.”

      (vi) effects on rapid frequency tagging.

      We thank the reviewer for this constructive comment. We have now included this analysis and added the results to the revised manuscript.

      Results (Line 224): “It is worth noting that neither the behavioural nor the rapid invisible frequency tagging (RIFT) measures showed significant relationships with LVs and HLM() (Supplementary material, Figure 1 and Table 3).”

      Discussion (Line 396): “We did not find any association between the power of RIFT signal and the size asymmetry of subcortical structures. Since to Bayes factors were less than 0.1, we conclude that our RIFT null findings are robust, suggesting a dissociation between how alpha oscillations and neuronal excitability indexed by RIFT relate to subcortical structures.”

      Method (Line 548): “We computed the modulation index (MI) for rapid invisible frequency tagging (RIFT) by averaging the power of the signal in sensors on the right when attention was directed to the right compared to when it was directed to the left. This calculation was also performed for sensors on the left. Consequently, we identified the top 5 sensors on each side with the highest MI as the Region of Interest (ROI). Utilizing the sensors within the ROI, we computed hemispheric lateralization modulation (HLM) of RIFT by summing the average MI(RIFT) of the right sensors and the average MI(RIFT) of the left sensors, obtaining one HLM(RIFT) value for each participant. For a more comprehensive analysis, refer to reference (24).”

      Supplementary Materials (Line 839): “Figure 1. Lateralization volume of thalamus, caudate nucleus and globus pallidus in relation to hemispheric lateralization modulation of rapid invisible frequency tagging (HLM(RIFT)) on the right and behavioural asymmetry on the left. A and E, The beta coefficients for the best model (having three regressors) associated with a generalized linear model (GLM) where lateralization volume (LV) values were defined as explanatory variables for HLM(RIFT) (A) and behavioural asymmetry (E). Error bars indicate standard errors of mean (SEM). B and F, Partial regression plot showing the association between LVTh and HLM(RIFT) (B, p-value = 0.59) and behavioural asymmetry (F, p-value = 0.38) while controlling for LVGP and LVCN. C and G, Partial regression plot showing the association between LVGP and HLM(RIFT) (C, p-value = 0.16) and behavioural asymmetry (G, p-value = 0.80) while controlling for LVTh and LVCN . D and H, Partial regression plot showing the association between LVCN and HLM(RIFT) (D, p-value = 0.53) and behavioural asymmetry (H, p-value = 0.74) while controlling for LVTh and LVGP. Negative (or positive) LVs indices denote greater left (or right) volume for a given substructure; similarly negative HLM(RIFT) values indicate stronger modulation of RIFT power in the left compared with the right hemisphere, and vice versa; positive behavioural asymmetry value shows higher accuracy when the target was on the right as compared with left, and vice versa for negative behavioural asymmetry values. The dotted curves in B, C, D, F, G, and H indicate 95% confidence bounds for the regression line fitted on the plot in red.

      Author response image 1.

      Second, the results are not fully reported. The model space and the results from the model comparison are omitted. Behavioral data and rapid frequency tagging results are not shown. Without having access to the data or the results of the analyses, the reader cannot evaluate whether the null effect corresponds to the absence of evidence or (as claimed in the discussion) evidence of absence.

      We thank the reviewer for this constructive suggestion. In the revised manuscript, we incorporated the model space, model comparisons, BIC values from the models, behavioral and rapid frequency tagging analysis methods, and their respective results. Additionally, we computed Bayes factors for our null findings to enhance the interpretability of our results.

      Results (Line 199): “This model predicted the HLM(α) values significantly in the GLM (F3,29 = 7.4824, p = 0.0007, adjusted R2 = 0.376) as compared with an intercept-only null model (Figure 4A).”

      Although, the beta estimate of LVGP only showed a positive trend, removing it from the regression resulted in worse models (AIC and BIC tables in supplementary material).

      Supplementary materials (Line 827): “Table 1. Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) values for all possible combinations of regressors (Lateralized Volume of subcortical structures). The selected model, with lowest AIC, is marked in green.

      Author response table 1.

      Author response table 2.

      Author response table 3.

      Bayes factors for correlation between hemispheric laterality of subcortical structures with hemispheric lateralization modulation of rapid invisible frequency tagging (HLM(RIFT)) and with behavioural asymmetry (BA). The Pearson correlation between each subcortical structure with HLM(RIFT) and behavioural asymmetry was calculated. The likelihood of the data under the alternative hypothesis (the evidence of correlation) were subsequently compared to the likelihood under null hypothesis (absence of correlation), given the data. As it is demonstrated in the table, all Bayes factors were below or very close to 1 indicating evidence for the null hypothesis.

      For the results of frequency tagging signal, we have now included this analysis and added the results to the revised manuscript. We refer the reviewer to our response to the weakness (vi) from reviewer #3.

      Third, it remains unclear whether the MMS is the best approach to analyzing effects as a function of target and distractor salience. To address the question of whether the effects of subcortical volumes on alpha lateralization vary with task demands (which we assume is the primary research question of interest, given the factorial design), we would like to evaluate some sort of omnibus interaction effect, e.g., by having target and distractor saliency interact with the subcortical volume factors to predict alpha lateralization. Without such analyses, the results are very hard to interpret. What are the implications of finding the differential effects of the different volumes for the different task conditions without directly assessing the effect of the task manipulation? Moreover, the report would benefit from a further breakdown of the effects into simple effects on unattended and attended alpha, to evaluate whether effects as a function of distractor (vs target) salience are indeed accompanied by effects on unattended (vs attended) alpha.

      The reviewer is correct that we did not directly compare between task conditions when we assessed the predictive relationship between basal ganglia lateralization and alpha lateralization. We opted for the multivariate regression approach as this allowed us to simultaneously model the predictive relationship between our continuous predictors and HLM alpha in each condition, allowing us to be most efficient with our level of statistical power (N=33). Indeed, directly comparing between task conditions within one model would result in an extra 16 regressors (1 (intercept) + 4-1 to model the difference between conditions + 3 to model the regressors + 3 x 3 to model each region x task condition interaction). This approach would be underpowered given our sample size, and the ensuing results are likely to be unreliable.

      However, we statistically analysed our regression results. Multivariate regression analysis allows one to test the null hypothesis that a given predictor does not contribute to all the dependent variables. A rejection of this hypothesis would suggest that lateralization of a given region of interest significantly predicts variability across all 4 of the task conditions, whereas failure to reject the null would imply that the predictive relationship holds only for that single condition. We tested this global null hypothesis using a MANOVA test and reported the findings in response to weakness two from reviewer #1.

      Discussion (Line 384): “In future studies it would be interesting to design an experiment directly addressing which subcortical regions contribute to distractor and target load in terms of modulating the alpha band activity. In order to ensure sufficient statistical power for doing so possibly each factor needs to be addressed in different experiments. “

      The fourth concern is that the discussion section is not quite ready to help the reader appreciate the implications of key aspects of the findings. What are the implications for our understanding of the roles of different subcortical structures in the various psychological component processes of spatial attention? Why does the volumetric asymmetry of different subcortical structures have diametrically opposite effects on alpha lateralization? Instead, the discussion section highlights that the different subcortical structures are connected in circuits: "Globus pallidus also has wide projections to the thalamus and can thereby impact the dorsal attentional networks by modulating prefrontal activities." If this is true, then why does the effect of the GP dissociate from that of the thalamus? Also, what is it about the current behavioural paradigm that makes the behavioral readout insensitive to variation in subcortical volume (or alpha lateralization?)?

      We thank the reviewer for this feedback. These are indeed all good points, and we hope that our findings will inspire further research to address these issues. In the revised manuscript we now write:

      Discussion (Line 349): “The opposite effect of the globus pallidus compared to the thalamus is striking, and possibly explained but the globus pallidus containing GABAergic interneurons. Thus the inhibitory nature of the globus pallidus projections to thalamus could explain why they are related to the alpha modulation in different manners (57).”

      Discussion (Line 379): “Moreover, the current study faced methodological constraints, limiting the analysis to the entire thalamus. […] . It would be of great interest to conduct further investigations to quantify the distinct impacts of individual thalamic nuclei on the association between subcortical structures and the modulation of oscillatory activity.“

      Discussion (Line 388): “Moreover, our failure to identify a relationship between the lateralized volume of subcortical structures and behavioural measures should be addressed in studies that are better designed to capture performance asymmetries (63). Individual preferences toward one hemifield, which were not addressed in the current study design, could potentially strengthen the power to detect correlations between structural variations in the subcortical structures and behavioural measures.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Minor comment:

      Between-subject correlation/regression analyses always rely on the assumption that the underlying dependent measures are reliable. While the reliability of asymmetries of subcortical structures can be assumed, the reliability of lateralized alpha oscillations during spatial attention can be questioned. It would be helpful if the authors could test the reliability of alpha lateralization, for instance by calculating HLM(a) in the first and second half of the experiment and correlating the resulting HLM(a) values (split-half reliability).

      We appreciate the reviewer for their insightful comment. Acknowledging that the between-subject regression relies on the reliability of alpha lateralization. Nonetheless, a previous study has demonstrated consistent results regarding HLM(α). We have further elaborated on these aspects in the discussion section:

      Discussion (Line 328): “Furthermore, our regression analysis outcomes align with the findings of Mazzetti et al. (28) underscoring the significant predictive influence exerted by the lateralized volume of globus pallidus on the modulation of hemispheric lateralization in alpha oscillations during spatial attention tasks. This convergence of results not only corroborates the validity and consistency of our findings but also extends the empirical foundation supporting the predictive role of the asymmetry of globus pallidus in modulating alpha oscillations within the context of attention.”

      Reviewer #3 (Recommendations For The Authors):

      We recommend that a revised version of the manuscript

      • Clarifies the theoretical basis for the 6 key design & analysis choices that we have outlined above;

      We thank the reviewer for their precision. We addressed the concerns outlined above in the previous section.

      • Also clarifies the task description (perhaps referring to target and distractor salience instead of target load versus distractor salience might help);

      Thank you for this constructive comment. We used the terms ‘load’ for target and ‘salience’ for distractor because the noise manipulation of the faces reduces the salience of the image which results in distractors being less distractive (easier) but targets being more perceptually loaded (harder). The explanation of these terms is made clear in the revised manuscript.

      Method (Line 447): “Over trials, the perceptual load of targets was manipulated using a noise mask; noisy targets are harder to detect than clear targets and therefore incur greater perceptual load in their detection. The saliency of distractor stimuli was also manipulated using a noise mask; noisy distractor stimuli are less salient than clear distractors and therefore less disruptive to performance on the detection task. The noise mask was created by randomly swapping 50% of the stimulus pixels (Figure 1B). This manipulation resulted in four target-load/distractor-saliency conditions: (1) target: low load, distractor: low saliency (i.e., clear target, noisy distractor), (2) target: high load, distractor: low saliency (i.e., noisy target, noisy distractor), (3) target: low load, distractor: high saliency (i.e., clear target, clear distractor), (4) target: high load, distractor: high saliency (i.e., noisy target, clear distractor) (Figure 1B and C).”

      • Fully reports all the data, including those of the model comparisons, the behavioural results, and the rapid frequency tagging results;

      We thank the reviewer for this constructive comment. We refer the reviewer to our response to second comment and comment (vi) from reviewer #3.

      • Reports interaction effects to directly test the modulating role of task demands in the link between volume and alpha, and break down the alpha lateralization indices into their simple effects on the ipsilateral and contralateral hemispheres;

      task demands have been addressed in response to in response to weakness two from reviewer #1.

      Regarding the second part of the comment, in our study, to compare the lateralized modulation of alpha oscillations between the right and left hemispheres, we computed hemispheric lateralization modulation. This involved dividing trials into attention right and attention left. Subsequently, we calculated the lateralization index separately for sensors on the right and left. Specifically, this entailed computing ipsilateral – contralateral for sensors on the right and contralateral – ipsilateral for sensors on the left side of the brain. We addressed this concern in methods section as below:

      Method (Line 537): “As MI(α) consistently represents power of alpha in attention right versus attention left conditions, it entails the comparison between ipsilateral and contralateral alpha modulation power for sensors located on the right side of the head. The same comparison applies inversely for sensors situated on the left side of the brain.”

      • Clarifies in the discussion section the specific implications of the results for our understanding of the link between distinct subcortical structures and distinct component processes of spatial attention.

      We thank the reviewer for their constructive comment. This point is addressed in response to the fourth concern of reviewer #3.

      More detailed specific recommendations are provided below:

      • Line 40ff: In this paragraph, the theoretical framework concerning the function of the subcortical regions of interest is described. Here, the authors jump back and forth between the role of the basal ganglia and the role of the thalamus. For clarity, we would advise to describe the functions of these two structures one after the other. And include a justification for assessing the hippocampus and the amygdala.

      We appreciate the reviewer’s preciseness in this comment. We put the description of these structures one after the other in the revised manuscript as below:

      Introduction (Line 44): “For instance, it has been shown that the pulvinar plays an important role in the modulation of neocortical alpha oscillations associated with the allocation of attention (9). Studies in rats and non-human primates have shown that both the thalamus and superior colliculus, are involved in the control of spatial attention by contributing to the regulation of neocortical activity (9-11). Notably, when the largest nucleus of the thalamus, the pulvinar, was inactivated after muscimol infusion, the monkey’s ability to detect colour changes in attended stimuli was lowered. This behavioral deficit occurred when the target was in the receptive field of V4 neurons that were connected to lesioned pulvinar (12). The basal ganglia play a role in different aspects of cognitive control, encompassing attention (13,14), behavioural output (15), and conscious perception (16). Moreover, the basal ganglia contribute to visuospatial attention by linking with cortical regions like the prefrontal cortex via the thalamus.”

      Justification for assessing the hippocampus and the amygdala has been addressed in response to weakness (iii) from reviewer #3.

      • The authors mention they defined symmetric clusters of 5 sensors in each hemisphere that showed the highest modulation, but it is not clear how this number of sensors was determined a priori.

      We thank the reviewer for their comment. We edited the revised manuscript as below:

      Method (Line 536): “Ten sensors were selected to ensure sufficient coverage of the region exhibiting alpha modulation as judged from prior work (62).”

      • In line 141, the abbreviation HLM is first mentioned but the concept of "hemispheric lateralization modulation of alpha power" is only mentioned in the following section. For the ease of the reader, the abbreviation could be mentioned together with this concept at the beginning of this paragraph.

      We thank the reviewer for the attention. In the revised manuscript HLM() is now mentioned with its concept.

      Results (Line 153): “Next, we computed the hemispheric lateralization modulation of alpha power (HLM()) in each individual.”

      • In line 188 of the results section, it is mentioned that the table including the AIC values for model comparisons is in the supplementary material, however, we could not locate this table.

      We thank the reviewer for their constructive feedback. The supplementary materials were uploaded in a separate file, and it must not have been available to the reviewers. We have now added the supplementary materials to the end of the manuscript for convenience.

      • Figure 4 is missing the panel headers A, B, C, and D.

      We thank the reviewer for their precision. This figure is now fixed.

      Author response image 2.

      • In lines 205 and 206, behavioral and rapid frequency tagging analysis are mentioned. For the behavioral analysis, the method is described, but no results are provided. For the rapid frequency tagging, neither the methods nor the results are described. To evaluate the strength of this (non)-evidence, we would advise to elaborate on these analysis steps and report the results in the supplementary material.

      We thank the reviewer for this constructive comment. A brief explanation of the analysis method of rapid frequency tagging signal is added to the revised manuscript.

      Method (Line 548): “We computed the modulation index (MI) for rapid invisible frequency tagging (RIFT) by averaging the power of the signal in sensors on the right when attention was directed to the right compared to when it was directed to the left. This calculation was also performed for sensors on the left. Consequently, we identified the top 5 sensors on each side with the highest MI as the Region of Interest (ROI). Utilizing the sensors within the ROI, we computed hemispheric lateralization modulation (HLM) of RIFT by summing the average MI(RIFT) of the right sensors and the average MI(RIFT) of the left sensors, obtaining one HLM(RIFT) value for each participant. For a more comprehensive analysis, refer to reference (24).” For a more detailed answer, we refer the reviewer to the second comment from reviewer #3.

      • For the paragraph starting at line 209, we would recommend referring to Figure 1.

      We thank the reviewer for their suggestion. This paragraph is now referring to Figure 1.

      Results (Line 229): “To relate load and salience conditions of the task to the relationship between subcortical structures and the alpha activity, we combined low-load or high-load targets with high-saliency or low-saliency distractors to manipulate the perceptual load appointed to each trial (Method section, Figure 1). “

      • Figure 5 as well as the report of the beta weights in this section shows a difference in the direction of the effect for the thalamus compared to the globus pallidus and caudate nucleus which is not discussed in this section.

      We thank the reviewer for bringing this important point to our attention. We addressed this comment in the discussion section as below:

      Discussion (Line 349): “The opposite effect of the globus pallidus compared to the thalamus is striking, and possibly explained by the globus pallidus containing GABAergic interneurons. Thus the inhibitory nature of the globus pallidus projections to thalamus could explain why they are related to the alpha modulation in different manners (54).”

      Discussion (Line 379): “Moreover, the current study faced methodological constraints, limiting the analysis to the entire thalamus. […] It would be of great interest to conduct further investigations to quantify the distinct impacts of individual thalamic nuclei on the association between subcortical structures and the modulation of oscillatory activity.“

      • Comment 2 on line 80 is addressed in the paragraph following 264 by describing volumetric changes in basal ganglia in neurodegenerative disorders such as PD or Huntington's. Still, the link of how a decrease in volume in this region could be causally linked to changes in alpha-band power could be better supported.

      We thank the reviewer for their constructive feedback. We are here highlighting the significant correlation between subcortical structures and changes in attention modulated alpha oscillation. We added a few more references to the discussion supporting the relationship between size and function in relation to neurological disorders. We also edited the manuscript to make this point clearer as below:

      Introduction (Line 113): “Our current findings broaden our understanding of how subcortical structures are involved in modulating alpha oscillations during top-down spatial attention, independent of any reward or value associations. “

      Discussion (Line 305): “Changes in neocortical oscillatory activity have also been observed in neurological disorders which mainly are known to affect subcortical structures. For example, individuals with Alzheimer's Disease demonstrate an increase in slow oscillatory activities and a decrease in higher frequency oscillations (42). Moreover, in patients with Parkinson’s Disease, the power of beta oscillations increases relatively to when they are dopamine-depleted compared with when they are on dopaminergic medication (43). “

      • Related to the previous comment on behavioral and rapid frequency tagging results, these are difficult to evaluate without mention of the methods and/or results.

      We thank the reviewer for this comment. We refer the reviewer to our response to the second comment from reviewer #3.

      • The authors show differential effects of target load and distractor saliency; however, we missed the description of how these two variables differ conceptually as they are both described as contributing to task difficulty and it is not described why we would expect differential effects for these concepts (or in other words, how the authors explain the differential effects).

      We thank the reviewer for their comment. Directly comparing between task conditions within one model would result in an extra 16 regressors (1 (intercept) + 4-1 to model the difference between conditions + 3 to model the regressors + 3 x 3 to model each region x task condition interaction). Give our sample size, this study is underpowered to directly compare alpha lateralisation in contralateral versus ipsilateral conditions. For a more detailed answer please refer to our response to weakness two from reviewer #1.

      • Line 364ff: Based on the description of the experimental design, it is not clear to us whether participants only had to report on the change in gaze for the stimulus in the cued hemifield.

      We thank the reviewer for this comment, which prompted us to clarify the experimental design as below:

      Method (Line 440): “Then followed a 1000 ms response interval where participants were asked to respond with their right or left index finger whether the gaze direction of the cued face shifted left or right.”

      • Line 47ff: As mentioned above, the AIC table is not included. Further, as it is mentioned that BIC values led to similar results (indicating that they are not identical), it would be valuable to report both AIC and BIC values.

      We thank the reviewer for their constructive feedback. The supplementary materials were uploaded in a separate file, and it must not have been available to the reviewers. We have now added the BIC values and attached the supplementary materials to the end of the manuscript for convenience.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary

      This article by Zhai et al, investigates sterol transport in bacteria. Synthesis of sterols is rare in bacteria but occurs in some, such as M capsulatus where the sterols are found primarily in the outer membrane. In a previous paper the authors discovered an operon consisting of five genes, with two of these genes encoding demethylases involved in sterol demethylation. In this manuscript, the authors set out to investigate the functions of the other three genes in the operon. Interestingly, through a bioinformatic analysis, they show that they are an inner membrane transporter of the RND family, a periplasmic binding protein, and an outer membrane-associated protein, all potentially involved with lipid transport, so providing a means of transporting the lipids to the outer membrane. These proteins are then extensively investigated through lipid pulldowns, binding analysis on all three, and X-ray crystallography and docking of the latter two.

      Strengths

      The lipid pulldowns and associated MST binding analysis are convincing, clearly showing that sterols are able to bind to these proteins. The structures of BstB and BstC are high resolution with excellent maps that allow docking studies to be carried out. These structures are distinct from sterol-binding proteins in eukaryotes.

      We thank the reviewer for their favorable impression of this work.

      Weaknesses

      While the docking and molecular dynamics studies are consistent with the binding of sterols to BstB and BstC, this is not backed up particularly well. The MST results of mutants in the binding pocket of BstB have relatively little effect, and while I agree with the authors this may be because of the extensive hydrophobic interactions that the ligand makes with the protein, it is difficult to make any firm conclusions about binding.

      We agree with the reviewer that at this point, there is no experimental evidence to define the sterol binding site in BstB. While in the manuscript we allude to the extensive hydrophobic interactions as being especially stabilizing and difficult to eliminate with one or two mutations, we are now also aware that hydrogen-bonding interactions with the polar head of the sterols are quite important (see data on BstC, where disruption of that interaction significantly reduces the equilibrium affinity for sterols). Our MD simulations show that at least 3 protein amino acids can participate in H-bonding with the sterols. Moreover, recent work from our lab show that even ligand site waters can extend an H-bonding network around the polar head of the lipid (Zhai et al., ChemBioChem 2023, 24, e202300156), thereby enabling H-bonding with amino acids that are further away from the ligand site. It is therefore difficult to predict which mutations will sufficiently destabilize the binding. While this question is one we will tackle in future studies focused on obtaining high-resolution substrate-bound structures of BstB or homologs, the findings reported here are still relevant and timely, and we posit will spur the discovery of functional homologs, including some in organisms that are more tractable.

      The authors also discuss the possibility of a secondary binding site in BstB based on a slight cavity in domain B next to a flexible loop. This is not backed up in any way and seems unlikely.

      The reviewer is correct in that the evidence for this second binding site weak. While the crystallographic structure shows a highly hydrophobic region and the binding studies suggests cooperativity exists in the binding of the 4methylsterol substrate, the docking studies do not strongly support binding at that site. As such, we have clarified in the manuscript that a second hydrophobic cavity is observed, but that its role in ligand interaction remains unexplored.

      Reviewer #2 (Public Review):

      Summary:

      In eukaryotes, sterols are crucial for signaling and regulating membrane fluidity, however, the mechanism governing cholesterol production and transport across the cell membrane in bacteria remains enigmatic. The manuscript by Zhai et al. sheds light on this topic by uncovering three potential cholesterol transport proteins. Through comprehensive bioinformatics analysis, the authors identified three genes bstA, bstB, and bstC encoding proteins which share homology with transporters, periplasmic binding proteins, and periplasmic components superfamily, respectively. Furthermore, the authors confirmed the specific interaction between these three proteins and C-4 methylated sterols and determined the structures of BstB and BstC. Combining these structural insights with molecular dynamics simulation, they postulated several plausible substrate binding sites within each protein.

      Strengths:

      The authors have identified 3 proteins that seem likely to be involved in sterol transport between the inner and outer membrane. The structures are of high quality, and the sterol binding experiments support a role for these proteins in sterol transport.

      We thank the reviewer for this positive view of our work.

      Weaknesses:

      While the author's model is very plausible, direct evidence for a role of BstABC in transport, or that the 3 proteins function together in a single pathway, is limited.

      The reviewer is correct that we were unable to demonstrate that the three proteins work together to transport 4methylsterols. This is not for lack of trying. We first attempted gene deletion studies, and as mentioned in the manuscript (with more details now provided in the experimental section), this appeared to be lethal. We then attempted in vitro exchange experiments, in which the proteins would be used to transfer sterols from sterol-loaded “heavy” liposomes to a sterol-free “light” liposomes – such exchange assays are frequently performed with eukaryotic sterol transporters (see Chung et al., Science 2015, https://doi.org/10.1126/science.aab1370). These assays were not successful because 1) sterols incorporated poorly into liposomes made with E. coli polar lipids and yielded leaky liposomes; 2) use of liposomes prepared with the TLE of M. capsulatus proved more stable, but no appreciable exchange was observed; we reasoned that this might be due to the absence of an energy source for BstA, the RND component for which we have expressed and purified only the soluble periplasmic domain. Given the technical difficulty of these in vitro transport experiments, we will continue to pursue in vivo demonstration of function as new homologs are identified.

      Reviewer #3 (Public Review):

      Summary:

      The work in this manuscript builds on prior efforts by this team to understand how sterols are biosynthesized and utilized in bacteria. The study reports a new function for three genes encoded near sterol biosynthesis enzymes, suggesting the resulting proteins function as a sterol transport system. Biochemical and structural characterization of the two soluble components of the pathway establishes that both proteins can bind sterols, with a preference for 4methylated derivatives. High-resolution x-ray structures of the apoproteins reveal hydrophobic cavities of the appropriate size to accommodate these substrates. Docking and molecular dynamics simulations confirm this observation and provide specific insights into residues involved in substrate binding.

      Strengths:

      The manuscript is comprehensive and well-written. The annotation of a new function in a set of proteins related to bacterial sterol usage is exciting and likely to enable further study of this phenomenon - which is currently not well understood. The work also has implications for improving our understanding of lipid usage in general among bacterial organisms.

      We thank the reviewer for this synopsis of our work.

      Weaknesses:

      The authors might consider moving some of the bioinformatics figures to the main text, given how much space is devoted to this topic in the results section.

      We have taken this advice and moved Figure S1 to the main manuscript.

      Reviewer #1 (Recommendations For The Authors):

      1. In the analysis of the MST data, the authors quote Hill coefficients. How reliable are these numbers? For BstB, for instance, it seems unlikely that more than one molecule would bind. Can the analysis be done without needing to include Hill coefficients?

      We used fits that did and did not invoke cooperativity – see below. We are certain that both BstA and BstB are better fit with cooperativity invoked.

      Author response image 1.

      1. In looking at the maps associated with the structures, which were included in the review package, I see that two citric acid molecules fit beautifully into the density where currently PEG has been modelled. This needs to be fixed and some comments may be appropriate in the manuscript.

      We thank the reviewer for calling our attention to this. Citric acid has now been added to the model, and we reason that these are present in the structure because citric acid was used in the crystallization condition. The revised model is now present in the PDB.

      1. It is not necessary to show the two molecules in the asymmetric unit in Figure 4 given that it is not a dimer. This doesn't add anything to the manuscript.

      We now show a single molecule of BstC in Figure 4 (now Figure 5).

      1. I wouldn't consider the loops shown in Figure S4 as disordered. They have slightly higher B-values but are not completely mobile.

      We did not refer to these loops as disordered. In the text, we say they “exhibit poor electron densities, suggesting conformational sampling of more than one state (Fig. S4A).”

      Reviewer #2 (Recommendations For The Authors):

      pg 7, "hinting at an astounding distinction": I might suggest a word other than astounding that conveys how statistically unlikely, unusual, etc. this result is.

      Thank you – we have removed “astounding”.

      pg 7, paragraph 2: Here the authors show that in the SSN analysis, BstB proteins cluster separately and suggest this implies a distinction in function. However, they also show that PhnD homologs do not cluster separately (distributed across multiple clusters), yet presumably have similar functions. I am not familiar with SSN, but it seems to me that the second statement about PhnD implies that the first statement about BstB might not be valid, i.e., if PhnD doesn't cluster based on function, on what basis can we conclude that BstB does? On what basis does clustering occur in the SSN analysis? Might it be driven by things other than function? This comment also concerns the final paragraph of this section.

      The reviewer is correct in that PhnD homologs occupy separate clusters of the SSN. Many of these homologs were crystallized with phosphate-like compounds, but it is possible that they have non-overlapping substrate scopes and are therefore functionally distinct. As for the basis of clustering, the SSN is fully sequence-based. What has been observed is that proteins with highly similar sequences can have similar functions – but this is not always true.

      pg 8, paragraph 1: The authors suggest that BstABC may be essential. This is probably not a critical claim and it might be simplest to just remove it, but if it is mentioned, the authors should probably explain what was attempted that failed, so a reader can assess the strength of the evidence supporting essentiality. For example, I don't see anything in the methods about genetic manipulations of M. capsulatus, so currently, this falls within the realm of "Data not shown".

      We have provided additional information about the experimental techniques used to do this. This statement was included so that it is understood that the reason for the experimental failure is unlikely to be technical in nature, as we have successfully deleted some sterol related genes while others remain intractable.

      Fig. 2A: It is unclear to me what is being plotted here, perhaps more experimental detail is required in the form of labels and/or legend. Is this a quantification of each sterol in each fraction separated by GC? There are essentially no methods provided for the GC-MS experiments. A reference is provided, but I think providing detailed methods for these specific experiments will provide a higher degree of scientific rigor. I am not sure what is standard for GCMS, but perhaps showing spectra in the supplement that establish the identity of the bound molecules as species I and II would be appropriate?

      Additional experimental details have been provided and the figure legend changed to be more clear. Moreover, we now clearly state that the chromatograms shown were used to identify lipids due to retention times for spectra that were previously published in Wei et al., 2016.

      pg 10-11, comparison with PhnD structure: Perhaps it is worth mentioning a 3rd possible explanation for the relative opening/closing of the cleft is simply crystal packing? I don't think it necessarily has to imply anything about a difference in function. Also, the focus seems to be on this pairwise comparison, but perhaps more insights could be gleaned from an analysis that included a wider range of homologs, especially if any are thought to bind hydrophobic substrates.

      This could be true, and we have included a statement to that effect. We are unaware of homologs shown to bind to large, hydrophobic molecules.

      I think that BstB is shown upside-down in sup movies relative to other figures. If it isn't changed, perhaps adding some labels would help orient the reader.

      We have rotated the movies to be more consistent with the figures.

      Fig. S7: No units are indicated for Kds (uM?).

      Thank you – this has been fixed.

      pg 11, paragraph 2. "adjacent to three residues: Glu118, Tyr120 and Asn192": The residue number used in the text doesn't seem to match the numbering in the PDB file. I think these residues correspond to Glu98, Tyr100, and Asn172 in the PDB file.

      We regret this error. The correct numbering for both structures is now present in the deposited PDB files (7T1M for BstB and 7T1S for BstC).

      pg 12, final paragraph: The authors present binding data for BstB variants with mutations in the putative sterol binding pocket identified in the structural and MD analyses. However, these mutants had no effect on binding. The authors rationalize this in terms of the size of the interface and hydrophobic nature (which indeed, may be correct and is very plausible), and it is worth noting that many of their mutations are to Ala and would largely preserve the hydrophobic nature of the cleft. However, these mutants raise questions about where sterols actually bind. No experimental evidence is presented that substrates bind in the cleft, it is only hypothesized based on structural homology, MD simulations, etc. These mutations formally provide evidence against the hypothesis being tested; I think that has to be discussed a bit more directly, alongside the caveats the authors already discuss about hydrophobicity, etc.

      This is a valid point by the reviewer, and it is one we have attempted to address with our statement in the manuscript and in our response to reviewer 1. We have modified the relevant text to more clearly state that there is as of yet no experimental evidence for the binding of sterols to the cavity identified via molecular docking.

      pg 13: Presumably this is not the full-length lipoprotein, but has been truncated/mutated in some way? Some statement of roughly what was purified/crystallized should be stated.

      The SI methods on protein purification states that the genes of BstB and BstC without their respective signal peptides were obtained.

      pg 13, last paragraph "TN1 exhibits hybrid hydrophobicity, with the sides horizontal to cavities being hydrophobic while the vertical sides are more hydrophilic". I don't really follow the horizontal vs vertical sides. Perhaps this could be described in a different way.

      Noted and changed to “TN1 is closer to the N-terminal face of the structure, while CA1 and CA2 are proximal to the C-terminal face and form two open hydrophobic pockets; TN1 exhibits a mixture of hydrophobic and hydrophilic amino acids (Fig. 4B and Fig. S9B, Table S4).”

      pg 15-16, "Comparison to eukaryotic sterol transporters": Perhaps this would be better suited for the discussion section? Could also be streamlined; it is mostly discussing and comparing eukaryotic sterol binding domains to each other, not to BstABC.

      Given that BstB and BstC are the first identified proteins (and putative transporters) for bacterial sterol engagement, we thought a careful description of the existing sterol transporters (which are all eukaryotic) was warranted.

      Reviewer #3 (Recommendations For The Authors):

      I have just two minor suggestions for the authors if they wish to comment on or address them.

      1. Do the three proteins (BstA/B/C) form any sort of complex? Perhaps this property was not assessed - but it seemed possible that the B and C components might constitute a shuttle for the membrane-bound transporter?

      This is an important observation – the unliganded version of these proteins show no appreciable affinity for each other. However, BstB (which would be expected to engage both with BstA and BstC) belongs to a family of proteins known to undergo significant conformational change upon substrate binding. It is possible that with substrate present, complexes are formed – we have yet to investigate this.

      1. In Figure S1, panel C - it appears that the label for the BstC cluster may have migrated away from the intended location. In this figure, it might also be useful to indicate in the caption the meaning of the red coloring of the nodes?

      The label is now fixed – thank you for drawing our attention to this.

    1. Author Response

      The following is the authors’ response to the original reviews.

      REVIEWER #1

      Leanza et al. investigated the regulation of Wnt signaling factors in the bone tissue obtained from individuals with or without type 2 diabetes. They showed that typical canonical Wnt ligands and downstream factors (Wnt10b, LEF1) are down-regulated, while Wnt5a and sclerostin mRNA are unregulated in diabetic bone tissue. Further, Wnt5a and sclerostin associated with the content of AGEs and SOST mRNA levels also correlated with glycemic control and disease duration.

      Strengths:

      • A strength of the study is the investigation of Wnt signaling in bone tissue from humans with type 2 diabetes. Most studies measure only serum levels of Wnt inhibitors, but this study takes it further and looks into bone specifically.

      • The measurement of AGEs and its correlation to the Wnt signaling molecules is interesting and important. The correlation of sclerostin and Wnt5a with AGEs and disease duration suggests that inhibited Wnt signaling is paralleled by higher AGE levels and potentially weaker bone.

      • The methodology in terms of obtaining the bone samples and the rigorous evaluation of RNA integrity is great and provides a solid basis for further analyses.

      Weaknesses:

      • A weakness may include the rather limited number of samples. Especially for some sub-analyses (e.g. RNA analyses), only a subset of samples was used.

      • How was the sample size determined? It seems like more samples might have been necessary to obtain significant results for methods with a higher standard deviation (e.g. histomorphometry).

      We apology for the oversight in the description of the statistical analysis and we thank the reviewer for the careful reading. For sample size calculation of bone histomorphometry we used the cohort of the only paper analyzing trabecular bone in T2D postmenopausal women by dynamic histomorphometry (Manavalan JS et al, JCEM 2012). We performed a priori sample size calculation using G*Power 3.1.9.7., based on the t-test, difference between two independent groups setting. Analysis demonstrated that given an effect size of 2.2776769, we needed a total of 12 patients (6/group) to reach a power of 0.978. Regarding gene expression analyses, it was performed not in a subset of patients, but in all recruited subjects for this study. Based on the results of gene expression analysis on our main outcome (Wnt signaling), we demonstrated that for SOST gene the effect size was 1.2733824, with a power of 0.9490065, confirming that sample size was sufficient to achieve adequate statistical power.

      • Why is the number of samples different for the mRNA measurements? In most cases, there were 9, but in some 8 and in some 10?

      We sincerely thank the reviewer for the opportunity to clarify such important aspects. The number of samples used for mRNA quantification may differ between the different analyzed genes due to multiple reasons: First, we used for the real-time PCR only samples with high quality ratio (260/280) between 1.8-2.0 as stated in the method section of the manuscript (Page 8, lines 163-164). Moreover, we decided not to use the undetermined values, undetectable after the amplification cycles (40 cycles in total), as specified in the method section (Page 8, line 167).

      Overall, this study validates findings from the group that reported similar findings in 2020. This validates their methodology and shows that alterations in Wnt signaling are reproducible in human bone tissue.

      We thank the reviewer for the positive comment, we really value her/his opinion.

      COMMENTS:

      (1) The authors could provide more details on how much of the bone was analyzed for bone histomorphometry (what area?).

      We truly thank the reviewer for allowing us to explain more in depth our methodology. First, a biopsy containing trabecular bone from the femoral head was fixed in 10% neutral buffered formalin for 24 h prior to storage in 70% ethanol. Tissues were embedded in methylmethacrylate and sectioned sagittally by the Washington University Musculoskeletal Histology and Morphometry Core. Sections were stained with Goldner’s trichrome. Then, a rectangular region of interest containing trabecular bone was chosen below the cartilage-lined joint surface and primary spongiosa. This region had an average dimension of 45 mm2. Tissue processing artifacts, such as folding and edges, were excluded from the ROI. A threshold was chosen using the BIOQUANT software to automatically select trabeculae and measure bone volume. Finally, Osteoid was highlighted in the software and quantified semi-automatically using a threshold and correcting with the brush tool (as shown in the image below).

      We specify that in the methods section (Page 7, lines 146-152).

      Author response image 1.

      (2) Could the number of samples used for histomorphometry be increased? That may also lead to more significant results.

      We sincerely appreciated this suggestion from the reviewer but unfortunately, all available samples for histomorphometry have been analyzed and we are not able to increase the number of recruited participants at this time. Recruitment of people with T2D undergoing hip replacement is extremely difficult giving the limited number of those approved for elective surgery and compliant with our inclusion criteria. Considering also the long time needed to process bone sample for gene expression and histology analysis would require several months to have a consistent increase in recruited subjects. However, we have previously calculated sample size for bone histomorphometry analysis using the only available data of trabecular bone in T2D postmenopausal women measured by dynamic histomorphometry (Manavalan JS et al, JCEM 2012). We performed a priori sample size calculation using G*Power 3.1.9.7., based on the t-test of two independent groups. Analysis demonstrated that given an effect size of 2.2776769, we needed a total of 12 patients (6/group) to reach a power of 0.978.

      (3) It would have been interesting to assess the biomechanical behavior of the bone specimens. While it is known that BMD is often higher in patients with T2D, the resistance to fractures is lower. Ideally, bone strength measures could be correlated with Wnt molecule expression and AGEs.

      We agree with the reviewer that the assessment of biomechanical parameters in our cohort would increase the importance of this study, giving more insights on the effect of downregulation of Wnt signaling on bone strength. Thus, we followed reviewer suggestion, and we performed bone compression tests on trabecular bone core. We found a significant decrease in bone plasticity of T2D compared to controls [Young’s Modulus 21.6 (13.46-30.10 MPa) vs. 76.24 (26.81-132.9 MPa); p=0.0025). We added results of bone compression test in a new paragraph (Page 8, lines 191-194). In order to assess the validity of our results, we performed a post-hoc power calculation using G*Power 3.1.9.7. We demonstrated that effect size was 1.4716626, with a power of 0.9730784, confirming that sample size was sufficient to achieve adequate statistical power. We added methods in the related section and biomechanical data in table 3; we modified the manuscript accordingly (modifications are shown in track changes). Moreover, we also performed correlation analysis between Wnt target genes, AGEs and biomechanical parameters showing significant correlations as reported in the added paragraph in the results section (Page 11, Lines 225-233).

      REVIEWER #2

      This study reports the levels of expression of selected genes implicated in Wnt signaling in trabecular bone from femur heads obtained after surgery from post-menopausal women with (15 women) or without (21 women) type 2 diabetes. They found higher expression levels of SOST and WNT5A, and lower expression levels of LEF-1 and WNT10B in tissues from subjects with T2D, correlating with glycemia and advanced glycation products. No significant differences in bone density were observed. Overall, this is a cross-sectional, observational study measuring a limited set of genes found to vary with glycemia in postmenopausal women undergoing hip surgery.

      Strengths:

      The study demonstrates the feasibility of measuring gene expression in post-surgical trabecular bone samples, and finds differences associated with glycemia despite a relatively small number of subjects. It can form the basis for further research on the causes and consequences of changes in elements of the WNT signaling pathway in bone biology and disease.

      Weaknesses:

      The small number of targeted genes does not provide a comprehensive view of the transcriptional landscape within which the effects are observed. The gene expression changes are not associated with cellular or physiological properties of the tissue, raising questions about the biological significance of the observations.

      We thank the reviewer for the comment. Replying to his/her concerns we have increased the number of Wnt target genes including more interactors of Wnt/β-catenin pathway. We measured GSK3B, AXIN2, BETA-CATENIN and SFRP5 gene expression levels, showing a significant increase in GSK3B, in line with a downregulation of Wnt signaling in T2D. We modified the manuscript accordingly with this new analysis and updated the figure 1 panel (Page 10, lines 210-213). Unfortunately, in this paper we were not able to perform experiments on cellular or physiological properties. However, in order to analyze the biological effect of the analyzed genes on the phenotype, we measured bone strength by performing compression tests on trabecular bone cores (Page 10, lines 201-203 and table 3) and used biomechanical parameters for correlation analysis with targeted genes showing significant correlations of bone strength and Wnt genes. We modified adding a new paragraph in the result section and a new figure panel to the main manuscript (Page 11, lines 225-233 and figure 4).

      COMMENTS:

      (1) The small number of targeted genes does not provide a comprehensive view of the transcriptional landscape within which the effects are observed. Given the author's success in obtaining good-quality RNA from trabecular bone, a more comprehensive exploration would greatly improve the quality of the study.

      We agree with the reviewer that increase the transcriptional landscape related to Wnt signaling would be of interest for this work and we really thank for this opportunity. We were able to increase the number of Wnt target genes including more interactors of Wnt/β-catenin pathway, using the same cohort of patients in which we performed the other analysis. We also measured GSK3B, AXIN2, BETA-CATENIN and SFRP5 gene expression levels, showing a significant increase in GSK3B, in line with a downregulation of Wnt signaling in T2D. We modified the manuscript accordingly with this new analysis and updated the figures panel (Page 10, lines 210-213 and Figure 1).

      (2) The gene expression changes are not associated with cellular or physiological properties of the tissue, raising questions about the biological significance of the observations. Can the authors perform immunohistochemistry to associate the changes in gene expression with protein expression?

      We sincerely acknowledge this comment for focusing the attention on a such important aspect. We have partially replied to this comment in the previous paragraph. Regarding immunohistochemistry analysis, it is not possible to further use the available samples. This is mainly due to the fact that non-decalcified bones were embedded in plastic to allow for separate analysis of newly formed osteoid and mineralized bone. This process leads to poor antigen preservation and unsuitable detection of most targets. Moreover, antibodies for Wnt are also unreliable due to the secreted nature of the protein. Overall, this approach is unlikely to work efficiently. Similarly, RNAscope is not possible due to the resin. Optimization and validation of these analyses will need to be saved for a future study with fresh specimens.

      REVIEWER #3

      The manuscript by Leanza and colleagues explores the regulation of Wnt signaling and its association with advanced glycation end products (AGEs) accumulation in postmenopausal women with type 2 diabetes (T2D). The paper provides valuable insights into the potential mechanisms underlying bone fragility in individuals with T2D. Overall, the manuscript is well-structured, and the methodology is sound. I would suggest some minor revisions to improve clarity.

      Strengths:

      The study addresses an important and clinically relevant question concerning the mechanisms underlying bone fragility in postmenopausal women with T2D.

      The study's methodology appears sound, and the inclusion of postmenopausal women with and without T2D undergoing hip arthroplasty adds to the clinical relevance of the findings. Additionally, measuring gene expression and AGEs in bone samples provides direct insights into the study's objectives.

      The manuscript presents data clearly, and the results are well-organized.

      Weaknesses:

      Title. The title could be more specific to better reflect the content of the study. Also, the abstract should concisely summarize the study's main findings, providing some figures.

      We thank the reviewer for this suggestion, and we modified the title giving specific information on the main findings of this study. The new title is “Bone canonical Wnt signaling is downregulated in type 2 diabetes and associates with higher Advanced Glycation End-products (AGEs) content and reduced bone strength”. Moreover, we added as suggested a graphical abstract summarizing our study results.

      Introduction: the introduction would benefit from the addition of a clearer, more focused statement of the research questions or hypotheses guiding this study.

      We thank the reviewer for this opportunity and we reformulated the hypothesis of this study based on our data and new findings as follow:” we hypothesized that T2D and AGEs accumulation downregulate Wnt canonical signaling and negatively affect bone strength”. (page 6, lines 116-117).

      Methods: more information is needed on the hystomorphometry analysis. Surgical samples from 8 T2D and 9 non-diabetic subjects were used for histomorphometry analysis. How did these subjects compare with the other subjects in the T2D and control groups? Were they representative? How were they selected?

      We thank the reviewer for the opportunity to clarify this important point. The number of subjects included in the different analysis of the paper differ for multiple reasons. In particular, we used only bone specimen with enough trabecular bone material adequate to perform histomorphometry analysis. Therefore, the samples used in the histomorphometry analysis belong to the same subjects enrolled in the study and analyzed for the other experiments of this paper. However, we have previously calculated sample size for bone histomorphometry analysis using the only available data of trabecular bone in T2D postmenopausal women measured by dynamic histomorphometry (Manavalan JS et al, JCEM 2012). We performed a priori sample size calculation using G*Power 3.1.9.7., based on the t-test of two independent groups. Analysis demonstrated that given an effect size of 2.2776769, we needed a total of 12 patients (6/group) to reach a power of 0.978.

      COMMENTS:

      (1) In the Abstract, values and p-values for comparisons, and Spearman's rho and p-values for correlations should be provided. Most adverbs (thus, accordingly, importantly) could be omitted to improve conciseness and clarity.

      We kindly thank the reviewers for this precise and careful comment. We changed the Abstract accordingly. According to the abstract style of the journal we initially reported only the main findings. We have now modified providing values and p values as requested. We defer to the wishes of the editor as to the format in which the abstract should be reported.

      (2) Result presentation: 25th and 75th percentile should be provided rather than the interquartile range, to better reflect data distribution.

      We thank the reviewer for the opportunity to better clarify this part of the results section. We changed the manuscript accordingly.

      (3) Estimated glomerular filtration rate should be calculated and provided as a marker of renal function, rather than serum creatinine values.

      We thank the reviewer for the comment, and we modify the manuscript accordingly, adding the eGFR values in table 1 and in the result section.

      (4) The manuscript should include a statement confirming compliance with the Declaration of Helsinki, considering that human subjects were involved in the study.

      We thank the reviewer for the comment. The study was conducted in accordance with the Declaration of Helsinki. Ethics Committee of Campus Bio-Medico University approved the present study. Informed consent was obtained from all subjects involved in the study. (Page 6, lines 134-137).

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      In this manuscript, the authors investigated the role of Elg1 in the regulation of telomere length. The main role of the Elg1/RLC complex is to unload the processivity factor PCNA, mainly after completion of synthesis of the Okazaki fragment in the lagging strand. They found that Elg1 physically interacts with the CST (Cdc13-Stn1- Ten1) and propose that Elg1 negatively regulates telomere length by mediating the interaction between Cdc13 and Stn1 in a pathway involving SUMOylation of both PCNA and Cdc13. Accumulation of SUMOylated PCNA upon deletion of ELG1 or overexpression of RAD30 leads to elongated telomeres. On the other hand, the interaction of Elg1 with Sten1 is SIM-dependent and occurs concurrently with telomere replication in late S phase. In contrast Elg1-Cdc13 interaction is mediated by PCNA-SUMO, is independent on the SIM of Elg1 but still dependent on Cdc13 SUMOylation. The authors present a model containing two main messages 1) PCNA- SUMO acts as a positive signal for telomerase activation 2) Elg1 promotes Cdc13/Stn1 interaction at the expense of Cdc13/Est1 interaction thus terminating telomerase action.

      The manuscript contains a large amount of data that make a major inroad on a new type of link between telomere replication and regulation of the telomerase. Nevertheless, the detailed choreography of the events as well as the role of PCNA- SUMO remain elusive and the data do not fully explain the role of the Stn1/Elg1 interaction. The data presented do not sufficiently support the claim that SUMO- PCNA is a positive signal for telomerase activation.

      We thank the reviewer for her/his review efforts and opinion. We have re-submitted a new version of the manuscript in which we clarify some of the criticisms presented. In a point-by-point letter we respond to all the specific queries.

      Reviewer #2 (Public Review):

      This paper purports to unveil a mechanism controlling telomere length through SUMO modifications controlling interactions between PCNA unloader Elg1 and the CST complex that functions at telomeres. This is an extremely interesting mechanism to understand, and this paper indeed reveals some interesting genetic results, leading to a compelling model, with potential impact on the field. The conclusions are largely supported by experiments examining protein-protein interactions at low resolution and ambiguous regarding directness of interactions like co-IP and yeast two-hybrid (Y2H) combined with genetics. However, some results appear contradictory and there's a lack of rigor in the experimental data needed to support claims. There is significant room for improvement and this work could certainly attain the quality needed to support the claims. The current version needs substantial revision and lacks the necessary experimental detail. Stronger support for the claims would add detail to help distinguish competing models.

      We thank the reviewer for her/his positive opinion. We have re-submitted a new version of the manuscript in which we clarify some of the criticisms presented by thereferees, and added all the missing experimental details. In a point-by-point letter we respond to all the specific queries.

      Reviewer #3 (Public Review):

      This paper reveals interesting physical connections between Elg1 and CST proteins that suggest a model where Elg1-mediated PCNA unloading is linked to regulation of telomere length extension via Stn1, Cdc13, and presumably Ten1 proteins. Some of these interactions appear to be modulated by sumolyation and connected with Elg1's PCNA unloading activity. The strength of the paper is in the observations of new interactions between CST, Elg1, and PCNA. These interactions should be of interest to a broad audience interested in telomeres and DNA replication.

      We thank the reviewer for her/his positive opinion. We have re-submitted a new version of the manuscript in which we clarify some of the criticisms presented. In a point-by-point letter we respond to all the specific queries.

      What is not well demonstrated from the paper is the functional significance of the interactions described. The model presented by the authors is one interpretation of the data shown, and proposes that the role of sumolyation is temporally regulate the Elg1, PCNA and CST interactions at telomeres. This model makes some assumptions that are not demonstrated by this work (such as Stn1 sumolyation, as noted) and are left for future testing. Alternative models that envision sumolyation as a key in promoting spatial localization could also be proposed based on the data here (as mentioned in the discussion), in addition to or instead of a role for sumolyation in enforcing a series of switches governing a tightly sequenced series of interactions and events at telomeres. Critically, the telomere length data from the paper indicates that the proposed model depicts interactions that are not necessary for telomerase activation or inhibition, as telomeres in pol30-RR strains are normal length and telomeres in elg1∆ strains are not nearly as elongated as in stn1 strains. One possibility mentioned in the paper is the PCNAS and Elg1 interactions are contributing to the negative regulation of telomerase under certain conditions that are not defined in this work. Could it also be possible that the role of these interactions is not primarily directed toward modulating telomerase activity? It will be of interest to learn more about how these interactions and regulation by Sumo function intersect with regulation of telomere extension.

      We present compelling evidence for a role of SUMOylated PCNA in telomere length regulation. Figure 1 shows that this modification is both necessary and sufficient to elongate the telomeres, indicating that PCNA SUMOylation plays a positive role in telomere elongation. The model we present is consistent with all our results. There are, of course, possible alternative models, but they usually fail to explain some of the results. We agree that the fact that pol30-RR presents normal-sized telomeres implies that SUMO-PCNA is not required for telomerase to solve the "end replication problem", but rather is needed for "sustained" activity of telomerase. Since elongated telomeres (by absence of Elg1 or by over-expression of SUMO-PCNA) was the phenotype monitored, this may require sustained telomerase activity. Similar results were seen in the past for Rnr1 (Maicher et al., 2017), and this mode depends on Mec1, rather than Tel1 (Harari and Kupiec, 2018). Telomere length regulation is complex, and we may not yet understand the whole picture. It appears that for normal “end replication problem” solution, very little telomerase activity may be needed, and spontaneous interactions at a low level may suffice. Future work may find the conditions at which telomerase switches from "end replication problem" to "sustained" activity. We have added further explanations on this subject to the Discussion section.

      We suspect, but could not prove, a role for Stn1 SUMOylation in the interactions. SUMOylation is usually transient, and notoriously hard to detect, and despite the fact that many telomeric proteins are SUMOylated, Stn1 SUMOylation could not be shown directly by us and others (Hang et al, 2011).

      Reviewer #1 (Recommendations For The Authors):

      Suggestions for improved or additional experiments, data or analyses.

      • My main concern is the claim that SUMOylated PCNA acts as a positive signal for telomerase activation. Yet the pol30-RR mutant has no impact on telomere length. The explanation of the authors is not entirely convincing.

      We are aware that the regulation of telomere length is complex, and we may not fully understand it yet. Just consider the fact that ~500 genes participate in determining the final telomere length of a yeast (Askree et al., 2004). Since mutation in EACH of these genes has a phenotype, the implication is that the joint action of 500 players determines the outcome (a dialogue of 500 participants). Having said this, we clearly show in figure 1 that mutations that prevent PCNA SUMOylation prevent telomere length elongation in cells lacking Elg1, and overexpressing SUMOylated PCNA is enough to elongate the telomeres. Thus, SUMOylation of PCNA does act as a positive signal for elongation.

      However, it appears that to fulfill the minimal requirement of dealing with the "end- replication problem", PCNA SUMOylation is not required, and only a "sustained activity" mode requires the S-PCNA signal (as we have also shown, surprisingly, for RNR1, Maicher et al. 2017). This sustained activity mode depends on Mec1, rather than Tel1 (Harari and Kupiec, 2018). Since elongated telomeres (by absence of Elg1 or by over-expression of SUMO-PCNA) was the phenotype monitored, this may require sustained telomerase activity. Telomere length regulation is complex, and we may not yet understand the whole picture. It appears that for normal “end replication problem” solution, very little telomerase activity may be needed, and spontaneous interactions at a low level may suffice (for example, unmodified PCNA may promote telomerase activity at a lower level than that of SUMO-PCNA. Future work may find the conditions at which telomerase switches from "end replication problem" to "sustained" activity.

      We have added further explanations on this subject to the Discussion section.

      • The model is entitled « Elg1 negatively regulates the telomere length by forming an interaction with the CST complex ». Nevertheless, expression of PCNA-RR completely reversed the long telomere phenotype of elg1∆ cells. Thus it appears that although the interaction between Stn1 and Cdc13 is reduced in the absence of Elg1, Elg1/Stn1 interaction is not instrumental in the formation of the CST complex and thus in the termination of telomerase activity. Does the elg1∆SIM mutant that does not interact with Stn1 impact telomere length?

      • In the model part (lane 318), it is argued that the complex Elg1-Stn1 unloads SUMOylated PCNA. Elg1-Stn1 interaction depends on the SIM of Elg1. This SIM is however not required for Elg1's function in genome-wide SUMO-PCNA unloading, is it required specifically at telomeres?

      The interactions between Elg1 and SUMOylated PCNA are carried out through both the SIM and the Threonines 386 and 387 (Shemesh et al, 2017). Consistently, the single elg1-SIM mutant has telomeres of normal length, and its effects on telomere length can only be seen when combined with mutations in the Threonines (elg1- TT386/7AA or elg1-TT386/7DD). Although the unloading of SUMOylated PCNA by Elg1 is important, the gene is not essential, and PCNA is either eventually unloaded by RFC, or spontaneously dis-assembles. This explains why the telomere length does not reach the same length in the absence of Elg1 as in the absence of, say, Stn1.

      • The model suggests that Elg1 promotes the interaction between Cdc13 and Stn1. This is based on the data presented in Figure 5 E and F. This is an important result. Because the experiment has been done on cells synchronized in S phase and the Elg1/Stn1 interaction occurs specifically at the end of S-phase, the FACS profile should be shown or a control provided to show that the two conditions are comparable.

      The FACS profile for this experiment is shown in Figure 5C.

      • Does the interaction between Cdc13 and Pol30 depend on the SUMOyaltion of POL30 ?

      Yes. We have added this as new Figure S2, and presented the results together with Figure 3 (Figure 3 is already too crowded).

      Others points :

      • Fig 1 : it should be mentioned in the Materials and Methods or in the figure legend how the average telomere lengths (horizontal bar) were calculated from the teloblot, as the position of the bar is not always intuitive

      We estimate telomere length by using TelQuant (Rubinstein et al., 2014). We have added this to the Methods section.

      -Fig 2 : Owing to the large span of telomere length in the stn1 mutants, the epistatic relationship between elg1∆ and stn1 mutants is poorly illustrated by the teloblot.

      We repeated this experiment several times, and stn1 mutants consistently gave a very spread telomere length. In ALL the blots, however, the double mutants elg1 stn1 showed a telomere length similar to that of the single stn1 mutant, and never longer.

      • It is mentioned that other mutants in the collection showed epistasis. Are any of these mutants related to telomere replication or the proposed model?

      Since we used the collection of non-essential mutants (so far), it was quite devoid of genes involved in DNA replication, which are mostly essential. An exception was siz1, which showed epistasis with elg1Δ.

      • The section entitled « Elg1's functional activity is essential for its interaction with Cdc13 » (lane 205) is difficult to follow. The hierarchy between the different mutants of Elg1 on their capacity to unload PCNA is not totally in agreement with the data published in Itzkovich et al 2023 and Shemesh et al. 2017. In particular it appears to me from these papers that elg1-WalkerA 238 (KK343/4AA) mutant did not show a defect in contrast to elg1-WalkerA 238(KK343/4DD).

      We are sorry for the typo in the results. We used the elg1-WalkerA (KK343/4DD) allele, which has a normal SIM but no activity. In a nutshell, we used mutants that either did or did not show unloading activity and/or SIM. The results clearly show that you need to unload PCNA in order for the N-ter of Elg1 to interact with Cdc13.

      • Are the synchronization done at 30{degree sign}C ?

      Yes. We have added the information to the Methods section.

      • ChIP experiments are not described in the Materials and Methods

      We apologize for this. They are now described.

      • In the figure 6, the PCNA rings are curiously placed at the beginning of the Okasaki fragments.

      We thank the referee for noticing, we have corrected the figure.

      Reviewer #2 (Recommendations For The Authors):

      This paper purports to unveil a mechanism controlling telomere length through SUMO modifications controlling interactions between PCNA unloader Elg1 and the CST complex that functions at telomeres. This is an extremely interesting mechanism to understand, and this paper indeed reveals some interesting genetic results, leading to a compelling model, with potential impact on the field. The conclusions are largely supported by experiments examining protein-protein interactions at low resolution and ambiguous regarding directness of interactions like co-IP and yeast two-hybrid (Y2H) combined with genetics. However, some results appear contradictory and there's a lack of rigor in the experimental data needed to support claims. There is significant room for improvement and this work could certainly attain the quality needed to support the claims. The current version needs substantial revision and lacks necessary experimental detail. Stronger support for the claims would add detail to help distinguish competing models.

      Specific comments:

      Insufficient technical detail: I could find no explanation of how overexpression was achieved. No description of how teloChIP is performed, either for the PCNA IP or how the sequence analysis is performed. Too limited details on growth like exact temperatures for the cell cycle time course.

      We have significantly expanded the Methods section to include all the technical information.

      Please do not bold and underline text for emphasis-EVER

      We have removed those from the text.

      Lines 130-132: they have not shown "accumulation of SUMOylated PCNA" anywhere; this is an inference.

      We have modified the text, it says: ”show that SUMOylated PCNA, and not unmodified or ubiquitinated PCNA, is both necessary and sufficient for telomere elongation in the presence or in the absence of Elg1.”

      Fig 2A Can authors show any other very long-telomere mutant like stn1 that does show enhancement in combination with elg1∆ to show feasibility of such phenotype?

      We don't think it is appropriate for the paper, but we have systematically created double mutants with elg1Δ and found many additive and even synergistic interactions. Here is an example. in Author response image 1, taken from the PhD thesis of Taly Ben-Shitrit, a PhD student in the lab.

      Author response image 1.

      What about cdc13 or ten1? Epistatic?

      We did not test telomere length in combination with Ten1. Combining elg1 with cdc13-50 resulted in synergistic elongation. Given the complex genetic relationship between Stn1/Ten1 and Cdc13, it is hard to interpret this result.

      Seems tenuous to use Y2H to decipher protein-protein interactions occurring out of context (i.e., not at telomere but at reporter gene promoter)

      Y2H is a great method to detect interactions, even if they are transient. Whenever possible, we confirm our findings using co-IP or telo-ChIP.

      Lines 268-270: It would be more accurate to state "can be" instead of "becomes" or "is" as they have not shown that SUMOylation or PCNA unloading have occurred.

      We agree, and have changed the text.

      Cdc13snm protein level?

      Unfortunately our Western blot is not presentable, but the level of Cdc13snm was similar to that of the wt Cdc13, and this result has been already published by Hang et al., 2011.

      Fig S3A: If SUMOylated Cdc13 mediates the Stn1-Elg1 interaction, why is Stn1-Elg1 interaction maintained in cdc13snm strain? This result seems to directly contradict the premise and overall conclusion of this section that Cdc13-SUMO mediates the (Y2H) interaction of Elg1 and Stn1.

      According to our model, the interaction between Stn1 and Elg1 takes place upstream, and only then this complex interacts with SUMOylated Cdc13. Hence, if Cdc13 cannot be SUMOylated, the interaction Elg1-Stn1 is not lost, although Stn1 fails to interact with Cdc13, leading to a telomeric phenotype.

      Line 279: which data establishes Stn1-Elg1 interaction as direct? Fig 2B co-Ip indicates physical but not necessarily direct interaction, but later the authors suggest that the interaction requires a SUMOylated intermediary, and Y2H in Fig. S3B doesn't demonstrate direct interaction.

      We have changed the text, taking out the word "direct".

      Co-Ip shows that interaction of Elg1 with Stn1 occurs mainly during later Sphase and with an overall delay compared to initial Elg1-Pol3 interaction.Co-IP Interaction between Cdc13 and Stn1 is reduced in the absence of Elg1

      The subsection title: "The interaction of Elg1 with Stn1 takes place at telomeres only at late S-phase" is not well supported by the data. I agree the data are consistent with the idea of the interactions occurring at telomeres but there's no direct evidence of this.

      We have changed the subsection title. It now reads: " The interaction of Elg1 with Stn1 takes place only at late S-phase"

      Model: Is unloading happening at the fork? Doesn't PCNA unloading have to follow its loading which occurred behind the fork particularly on the lagging strand? Model now suggest that Stn1 itself is SUMOylated.

      Yes, according to the model Elg1 moves with the fork, unloading PCNA from the lagging strand. Once Elg1 reaches the telomeres, it interacts with Stn1 (Figure 5). This interaction requires SUMOylation of Stn1 or of some other protein, which is not PCNA (Figure 3D) nor Cdc13 (Figure S3A) and could be Stn1 itself or another telomeric protein (Hang et al., 2011)

      Title is rather vague.

      We think it summarizes what we present in the paper.

      Abstract:

      "We report that SUMOylated PCNA acts as a signal that positively regulates telomerase activity."

      I don't think this is supported or a good description of what they find

      Figure 1B clearly shows that SUMO-PCNA is both necessary and sufficient for telomere elongation.

      "and dissected the mechanism by which Elg1 and Stn1 negatively regulates telomere elongation, coordinated by SUMO."

      Again, I don't think this is sufficiently supported and the model invokes SUMOylation events not demonstrated like Stn1, which might be a significant step forward.

      On the positive side, their model makes several predictions that they could test much more directly and rigorously: for example, examining the impact of the relevant mutations in the recruitment of proteins to the telomere.

      We have dissected the mechanism, and future work will be devoted to examining the impact of the relevant mutations in the recruitment of proteins to the telomere.

      Reviewer #3 (Recommendations For The Authors):

      Comments:

      1) The telomere length analysis data presented here is consistent with an interpretation that Stn1 and Elg1 play roles in a similar telomere maintenance pathway because the telomere restriction fragment pattern in the double mutants are not longer than the stn1 single mutants. No comment is made with respect to the yellow bars in Figure 2 that presumably measure telomere length appearing to be slightly shorter than in the stn1 single mutants. It may be interesting and informative if the double mutants do in fact have some phenotype distinct from the single stn1 mutants. Is there an impact on viability in the double mutant?

      Given the variable telomeric phenotype of the single stn1 mutants, slight variations in the measurement of the median telomere size are expected. The difference observed is not likely to be significant. What is important is that the double mutants with elg1 do not show longer telomeres. In terms of fitness, the stn1 mutants grow slightly slowly, but the elg1 mutation does not slow them down further.

      2) It is somewhat surprising that no additional telomere length analysis is included that actually tests the proposed model, including whether this path could be operational only under certain conditions. Maybe this is a topic of the next paper?

      Indeed, future work will explore the conditions under which PCNA SUMOylation is essential, and those under which is only needed.

      3) Were the error bars in Figure 5F determined only from the experiment in E? Does this represent error in measuring the data from one biological replicate? The type of error should be made clear to avoid readers assuming the data represents measurements from more than one sample in more than one experiment. The data would be stronger if it represented measurements from multiple experiments.

      The graph was made with data from three biological replicates. We show the best blot in Figure 5E. We have now stressed this in the Figure Legend.

      4) Why was only one two hybrid reporter shown? Having the multiple reporters can give confidence in interactions. (Not a big deal here given the nice co-IP data.)

      We thought that it is enough to show one reporter, as the results with a different reporter (B-gal assay) led to the same conclusions. since this did not add information and made the paper too lengthy (and boring), we took them out. In any case all data was verified by co-IP.

      5) Line 414 - what are the 32P-radio labeled PCR fragments? Are these solely comprised of TG1-3 repeats of some length? A bit more detail in this aspect of the method could be helpful.

      We have added an explanation on the probe in the Methods section.

      6) Line 432-433 - which anti-HA or anti-My antibodies are these? (very minor detail)

      We have added the details.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public review): 

      Summary: 

      Liu and colleagues applied the hidden Markov model on fMRI to show three brain states underlying speech comprehension. Many interesting findings were presented: brain state dynamics were related to various speech and semantic properties, timely expression of brain states (rather than their occurrence probabilities) was correlated with better comprehension, and the estimated brain states were specific to speech comprehension but not at rest or when listening to non-comprehensible speech. 

      Strengths: 

      Recently, the HMM has been applied to many fMRI studies, including movie watching and rest. The authors cleverly used the HMM to test the external/linguistic/internal processing theory that was suggested in comprehension literature. I appreciated the way the authors theoretically grounded their hypotheses and reviewed relevant papers that used the HMM on other naturalistic datasets. The manuscript was well written, the analyses were sound, and the results had clear implications. 

      Weaknesses: 

      Further details are needed for the experimental procedure, adjustments needed for statistics/analyses, and the interpretation/rationale is needed for the results. 

      For the Experimental Procedure, we will provide a more detailed description about stimuli, and the comprehension test, and upload the audio files and corresponding transcriptions as the supplementary dataset. 

      For statistics/analyses, we have reproduced the states' spatial maps using unnormalized activity pattern. For the resting state, we observed a state resembling the baseline state described in Song, Shim, & Rosenberg (2023). However, for the speech comprehension task, all three states were characterized by network activities varying largely from zero. In addition, we have re-generated the null distribution for behaviorbrain state correlations using circular shift. The results are largely consistent with the previous findings. We have also made some other adjustment to the analyses or add some new analyses as recommended by the reviewer. We will revise the manuscript to incorporate these changes.

      For the interpretation/rationale: We will add a more detailed interpretation for the association between state occurrence and semantic coherence. Briefly speaking, higher semantic coherence may allow for the brain to better accumulate information over time.

      State #2 seems to be involved in the integration of information at shorter timescales (hundreds of milliseconds) while State #3 seems to be involved in the longer timescales (seconds). 

      We greatly appreciate the reviewer for the insightful comments and constructive suggestions.  

      Reviewer #2 (Public review): 

      Liu et al. applied hidden Markov models (HMM) to fMRI data from 64 participants listening to audio stories. The authors identified three brain states, characterized by specific patterns of activity and connectivity, that the brain transitions between during story listening. Drawing on a theoretical framework proposed by Berwick et al. (TICS 2023), the authors interpret these states as corresponding to external sensory-motor processing (State 1), lexical processing (State 2), and internal mental representations (State 3). States 1 and 3 were more likely to transition to State 2 than between one another, suggesting that State 2 acts as a transition hub between states. Participants whose brain state trajectories closely matched those of an individual with high comprehension scores tended to have higher comprehension scores themselves, suggesting that optimal transitions between brain states facilitated narrative comprehension. 

      Overall, the conclusions of the paper are well-supported by the data. Several recent studies (e.g., Song, Shim, and Rosenberg, eLife, 2023) have found that the brain transitions between a small number of states; however, the functional role of these states remains under-explored. An important contribution of this paper is that it relates the expression of brain states to specific features of the stimulus in a manner that is consistent with theoretical predictions. 

      (1) It is worth noting, however, that the correlation between narrative features and brain state expression (as shown in Figure 3) is relatively low (~0.03). Additionally, it was unclear if the temporal correlation of the brain state expression was considered when generating the null distribution. It would be helpful to clarify whether the brain state expression time courses were circularly shifted when generating the null. 

      In the revision, we generated the null distribution by circularly shifting the state time courses. The results remain consistent with our previous findings: p = 0.002 for the speech envelope, p = 0.007 for word-level coherence, and p = 0.001 for clause-level coherence.

      We note that in other studies which examined the relationship between brain activity and word embedding features, the group-mean correlation values are similarly low but statistically significant and theoretically meaningful (e.g., Fernandino et al., 2022; Oota et al., 2022). We think these relatively low correlations are primarily due to the high level of noise inherent in neural data. Brain activity fluctuations are shaped by a variety of factors, including task-related cognitive processing, internal thoughts, physiological states, as well as arousal and vigilance. Additionally, the narrative features we measured may account for only a small portion of the cognitive processes occurring during the task. As a result, the variance in narrative features can only explain a limited portion of the overall variance in brain activity fluctuations.

      We will replace Figure 3 and the related supplementary figures with new ones, in which the null distribution is generated via circular shift. Furthermore, we will expand our discussion to address why the observed brain-stimuli correlations are relatively small, despite their statistical significance.

      (2) A strength of the paper is that the authors repeated the HMM analyses across different tasks (Figure 5) and an independent dataset (Figure S3) and found that the data was consistently best fit by 3 brain states. However, it was not entirely clear to me how well the 3 states identified in these other analyses matched the brain states reported in the main analyses. In particular, the confusion matrices shown in Figure 5 and Figure S3 suggests that that states were confusable across studies (State 2 vs. State 3 in Fig. 5A and S3A, State 1 vs. State 2 in Figure 5B). I don't think this takes away from the main results, but it does call into question the generalizability of the brain states across tasks and populations. 

      We identified matching states across analyses based on similarity in the activity patterns of the nine networks. For each candidate state identified in other analyses, we calculate the correlation between its network activity pattern and the three predefined states from the main analysis, and set the one it most closely resembled to be its matching state. For instance, if a candidate state showed the highest correlation with State #1, it was labelled State #1 accordingly. 

      Each column in the confusion matrix depicts the similarity of each candidate state with the three predefined states. In Figure S3 (analysis for the replication dataset), the highest similarity occurred along the diagonal of the confusion matrix. This means that each of the three candidate states was best matched to State #1, State #2, and State #3, respectively, maintaining a one-to-one correspondence between the states from two analyses.

      For the comparison of speech comprehension task with the resting and the incomprehensible speech condition, there was some degree of overlap or "confusion."

      In Figure 5A, there were two candidate states showing the highest similarity to State #2. In this case, we labelled the candidate state with the strongest similarity as State #2, while the other candidate state is assigned as State #3 based on the ranking of similarity. This strategy was also applied to naming of states for the incomprehensible condition. The observed confusion supports the idea that the tripartite-state space is not an intrinsic, task-free property. To make the labeling clearer in the presentation of results, we will use a prime symbol (e.g., State #3') to indicate cases where such confusion occurred, helping to distinguish these ambiguous matches.

      (3) The three states identified in the manuscript correspond rather well to areas with short, medium, and long temporal timescales (see Hasson, Chen & Honey, TiCs, 2015).

      Given the relationship with behavior, where State 1 responds to acoustic properties, State 2 responds to word-level properties, and State 3 responds to clause-level properties, the authors may want to consider a "single-process" account where the states differ in terms of the temporal window for which one needs to integrate information over, rather than a multi-process account where the states correspond to distinct processes. 

      The temporal window hypothesis provides a more fitting explanation for our results. Based on the spatial maps and their modulation by speech features, States #1, #2, and #3 seem to correspond to short, medium, and long processing timescales, respectively. We will update the discussion to reflect this interpretation.

      We sincerely appreciate the constructive suggestions from the two anonymous reviewers, which have been highly valuable in improving the quality of the manuscript.  

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors): 

      (1) The "Participants and experimental procedure" section deserves more details. I've checked Liu et al. (2020), and the dataset contained 43 participants aged 20-75 years, whereas this study contained data from 64 young adults and 30 old adult samples. The previous dataset seems to have two stories, whereas this study seems to have three. Please be specific, given that the dataset does not seem the same. Could the authors also include more descriptions of what the auditory stories were? For example, what were the contents, and how were they recorded? 

      The citation is partially incorrect. The dataset of young adults is shared with our work published in (2022). The 64 participants listened to one of three stories told by a female college student in Mandarin, recounting her real-life experience of hiking, a graduate admission interview, and her first time taking a flight, respectively. The sample of older adults is from our work published in (2020), which includes 30 older adults and additionally 13 young adults. The stimuli in this case were two stories told by an older woman in a Chinese dialect, describing her experience in Thailand and riding a warship, respectively. Since we aim to explore whether the main results can be replicated on a different age group, we excluded the 13 young adults from the analysis. 

      All the stories were recorded during fMRI scanning using a noise-canceling microphone (FOMRI-III; Optoacoustics Ltd, Or-Yehuda, Israel) positioned above the speaker’s mouth. The audio recordings were subsequently processed offline with Adobe Audition 3.0 (Adobe Systems Inc., USA) to further eliminate MRI scanner noise.

      In the revised manuscript, we have updated the citation, and provided a more detailed description of the stimuli in the supplementary material. We have also uploaded the audio files along with their corresponding transcriptions to GitHub.

      (2) I am curious about individual differences in comprehension scores. Did participants have less comprehension of the audio-narrated story because the story was a hard-tocomprehend narrative or because the audio quality was low? Could the authors share examples of comprehension tests? 

      We believe two factors contribute to the individual differences in comprehension scores. First, the audio quality is indeed moderately lower than in dailylife story-listening conditions. This is because those stories were recorded and played during fMRI scanning. Although a noise-canceling equipment was used, there were still some noises accompanying the speech, which may have made speech perception and comprehension more difficult than usual.

      Second, the comprehension test measured how much information about the story (including both main themes and details) participants could recall. Specifically, participants were asked to retell the stories in detail immediately after the scanning session. Following this free recall, the experimenters posed a few additional questions drawn from a pre-prepared list, targeting information not mentioned in their recall. If participants experienced lapses of attention or did not store the incoming information into memory promptly, they might fail to recall the relevant content. In several studies, such a task has been called a narrative recall test. However, memory plays a crucial role in real-time speech comprehension, while comprehension affects the depth of processing during memory encoding, thereby influencing subsequent recall performance. To align with prior work (e.g., Stephens et al., 2010) and our previous publications, we chose to referred to this task as narrative comprehension. 

      In the revised manuscript, we have provided a detailed description about the comprehension test (Line 907-933) and share the examples on GitHub. 

      (3) Regarding Figure 3, what does it mean for a state occurrence to follow semantic coherence? Is there a theoretical reason why semantic coherence was measured and related to brain state dynamics? A related empirical question is: is it more likely for the brain states to transition from one state to another when nearby time points share low semantic similarity compared to chance? 

      We analyzed semantic coherence and sound envelope as they capture different layers of linguistic and acoustic structure that unfold over varying temporal scales. Changes in the sound envelope typically occur on the order of milliseconds to a few hundred milliseconds, changes in word-level semantic coherence span approximately 0.24 ± 0.15 seconds, and changes in clause-level semantic coherence extend to 3.2 ± 1.7 seconds. Previous theory and empirical studies suggest that the timescales of information accumulation vary hierarchically, progressing from early sensory areas to higher-order areas (Hasson et al., 2015; Lerner et al., 2011). Based on this work, we anticipate that the three brain states, which are respectively associated with the auditory and sensory motor network, the language network and the DMN, would be selectively modulated by these speech properties corresponding to distinct timescales. 

      Accordingly, when a state occurrence aligns with (clause-level) semantic coherence, it suggests that this state is engaged in processing information accumulated at the clause level (i.e., its semantic relationship). Higher coherence facilitates better accumulation, making it more likely for the associated brain state to be activated. 

      We analyzed the relationship between state transition probability and semantic coherence, but did not find significant results. Here, the transition probability was calculated as Gamma(t) – Gamma(t-1), where Gamma refers to the state occurrence probability. The lack of significant findings may be because brain state transitions are driven primarily by more slowly changing factors. Indeed, we found the average dwell time of the three states ranges from 9.66 to 15.29s, which is a much slower temporal dynamics compared to the relatively rapid shifts in acoustic/semantic properties. 

      In the revised version, we have updated the Introduction to clarify the rational for selecting the three speech properties and to explore their relationship with brain dynamics (Line 111-118)

      (4) When running the HMM, the authors iterated K of 2 to 10 and K = 4, 10, and 12. However, the input features of the model consist of only 9 functional networks. Given that the HMM is designed to find low-dimensional latent state sequences, the choice of the number of latent states being higher than the number of input features sounds odd to me - to my speculation, it is bound to generate almost the exact same states as 9 networks and/or duplicates of the same state. I suggest limiting the K iterations from 2 to 8. For replication with Yeo et al.'s 7 networks, K iteration should also be limited to K of less than 7, or optionally, Yeo's 7 network scheme could be replaced with a 17network scheme. 

      We understand your concern. However, the determination of the number (K) of hidden states is not directly related to the number of features (in this case, the number of networks), but rather depends on the complexity of the time series and the number of underlying patterns. Given that each state corresponds to a distinct combination of the features, even a small number of features can be used to model a system with complex temporal behaviors and multiple states. For instance, for a system with n features, assuming each is a binary variable (0 or 1), there are maximally 2<sup>n</sup> possible underlying states. 

      In our study, we recorded brain activity over 300 time points and used the 9 networks as features. At different time points, the brain can exhibit distinct spatial configurations, reflected in the relative activity levels of the nine networks and their interactions. To accurately capture the temporal dynamics of brain activity, it is essential to explore models that allow for more states than the number of features. We note that in other HMM studies, researchers have also explored states more than the number of networks to find the best number of hidden states (e.g., Ahrends et al., 2022; Stevner et al., 2019). 

      Furthermore, Ahrends et al. (2022) suggested that “Based on the HCP-dataset, we estimate as a rule of thumb that the ratio of observations to free parameters per state should not be inferior to 200”, where free parameters per state is [𝐾 ∗(𝐾 −1)+ (𝐾 −1)+𝐾 ∗𝑁 ∗(𝑁 +1)/2]/𝐾. According to this, there should be above 10, 980 observations when the number of states (K) is 10 (the maximal number in our study) and the number of networks (N) is 9. In our group-level HMM model, there were 64 (valid runs) * 300 (TR) = 19200 observations for young adults, and 50 (valid runs) * 210 (TR) = 10500 observations for older adults. Aside from the older adults' data being slightly insufficient (4.37% less than the suggestion), all other hyperparameter combinations in this study meet the recommended number of observations. 

      (5) In Figure 2, the authors write that the states' spatial maps were normalized for visualization purposes. Could the authors also show visualization of brain states that are not normalized? The reason why I ask is, for example, in Song, Shim, & Rosenberg (2023), the base state was observed which had activity levels all close to the mean (which is 0 because the BOLD activity was normalized). If the activity patterns of this brain state were to be normalized after state estimation, the base state would have looked drastically different than what is reported. 

      We derived the spatial maps of the states using unnormalized activity patterns, with the BOLD signals Z-score normalized to a mean of zero. Under the speech comprehension task, the three states exhibited relatively large fluctuations in network activity levels. The activity ranges were as follows: [-0.71 to 0.51] for State #1, [-0.26 to 0.30] for State #2, and [-0.82 to 0.40] for State #3. For the resting state, we observed a state resembling the baseline state as described in Song, Shim, & Rosenberg (2023), with activity values ranging from -0.133 to 0.09. 

      In the revision, we have replaced the states' spatial maps with versions showing unnormalized activity patterns. 

      (6) In line 297, the authors speculate that "This may be because there is too much heterogeneity among the older adults". To support this speculation, the authors can calculate the overall ISC of brain state dynamics among older adults and compare it to the ISC estimated from younger adults.  

      We analyzed the overall ISC of brain state dynamics, and found the ISC was indeed significantly lower among the older adults than that among the younger adults. We have revised this statement as follows:

      These factors can diminish the inter-subject correlation of brain state dynamics— indeed, ISCs among older adults were significantly lower than those among younger adults (Figure S5)—and reduce ISC's sensitivity to individual differences in task performance (Line 321-326).

      Other comments: 

      (7) In Figure 4, the authors showed a significant positive correlation between head movement ISC with the best performer and comprehension scores. Does the average head movement of all individuals negatively correlate with comprehension scores, given that the authors argue that "greater task engagement is accompanied by decreased movement"? 

      We examined the relationship between participants' average head movement across the comprehension task and their comprehension scores. There was no significant correlation (r = 0.041, p = 0.74). In the literature (e.g. ,Ballenghein et al., 2019) , the relationship between task engagement and head movement was also assessed at the moment-by-moment level, rather than by using time-averaged data.

      Real-time head movements reflect fluctuations in task engagement and cognitive state. In contrast, mean head movement, as a static measure, fails to capture these changes, and thus is not effective in predicting task performance.

      (8) The authors write the older adults sample, the "independent dataset". Technically, however, this dataset cannot be independent because they were collected at the same time by the same research group. I would advise replacing the word independent to something like second dataset or replication dataset. 

      We have replaced the phrase “independent dataset” with “replication dataset”. 

      (9) Pertaining to a paragraph starting in line 586: For non-parametric permutation tests, the authors note that the time courses of brain state expression were "randomly shuffled". How was this random shuffling done: was this circular-shifted randomly, or were the values within the time course literally shuffled? The latter approach, literal shuffling of the values, does not make a fair null distribution because it does not retain temporal regularities (autocorrelation) that are intrinsic to the fMRI signals. Thus, I suggest replacing all non-parametric permutation tests with random circular shifting of the time series (np. roll in python).  

      In the original manuscript, the time course was literally shuffled. In the revised version, we circular-shifted the time course randomly (circshift.m in Matlab) to generate the null distribution. The results remain consistent with our previous findings: p = 0.002 for the speech envelope, p = 0.007 for word-level coherence, and p = 0.001 for clause-level coherence (Line 230-235). 

      (10) The p value calculation should be p = (1+#(chance>=observed))/(1+#iterations) for one-tailed test and p = (1+#(abs(chance)>=abs(observed)))/(1+#iterations) for twotailed test. Thus, if 5,000 iterations were run and none of the chances were higher than the actual observation, the p-value is p = 1/5001, which is the minimal value it can achieve. 

      Have corrected. 

      (11) State 3 in Figure S2 does not resemble State 3 of the main result. Could the authors explain why they corresponded State 3 of the Yeo-7 scheme to State 3 of the nineparcellation scheme, perhaps using evidence of spatial overlap? 

      The correspondence of states between the two schemes was established using evidence of state expression time course. 

      To assess temporal overlap, we calculated Pearson’s correlation between each candidate state obtained by the Yeo-7 scheme and the three predefined states obtained by the nine-network parcellation scheme in terms of state expression probabilities. The time courses of the 64 participants were concatenated, resulting in 19200 (300*64) time points for each state. The one that the candidate state most closely resembled was set to be its corresponding state. For instance, if a candidate state showed the highest correlation with State #1, it was labelled State #1 accordingly. As demonstrated in the confusion matrix, each of the three candidate states was best matched to State #1, State #2, and State #3, respectively, maintaining a one-to-one correspondence between the states from the two schemes.

      We also assessed the spatial overlap between the two schemes. First, a state activity value was assigned to each voxel across the whole brain (including a total of 34,892 voxels covered by both parcellation schemes). This is done for each brain state. Next, we calculated Spearman’s correlation between each candidate state obtained by the Yeo-7 scheme and the three predefined states obtained by the nine-network scheme in terms of whole-brain activities. The pattern of spatial overlap is consistent with the pattern of temporal overlap, such that each of the three candidate states was best matched to State #1, State #2, and State #3, respectively.

      Author response image 1.

      We noted that the networks between the two schemes are not well aligned in their spatial location, especially for the DMN (as shown below). This may lead to the low spatial overlap of State #3, which is dominated by DMN activity. Consequently, establishing state correspondence based on temporal information is more appropriate in this context. We therefore only reported the results of temporal overlap in the manuscript. 

      We have added a paragraph in the main text for “Establishing state correspondence between analyses” (Line 672-699). We have also updated the associated figures (Fig.S2, Fig.S3 and Fig.5)

      Author response image 2.

      (12) Line 839: gamma parameter, on a step size of? 

      (16) Figure 3. Please add a legend in the "Sound envelope" graph what green and blue lines indicate. The authors write Coh(t) and Coh(t, t+1) at the top and Coh(t) and Coh(t+1) at the bottom. Please be consistent with the labeling. Shouldn't they be Coh(t-1, t) and Coh(t, t+1) to be exact for both? 

      Have corrected. 

      (17) In line 226, is this one-sample t-test compared to zero? If so, please write it inside the parentheses. In line 227, the authors write "slightly weaker"; however, since this is not statistically warranted, I suggest removing the word "slightly weaker" and just noting significance in both States 1 and 2.  

      Have corrected.

      (18) In line 288, please fix "we also whether". 

      Have corrected. 

      (19) In Figure 2C, what do pink lines in the transition matrix indicate? Are they colored just to show authors' interests, or do they indicate statistical significance? Please write it in the figure legend.   

      Yes, the pink lines indicate a meaningful trend, showing that the between-state transition probabilities are significantly higher than those in permutation.

      We have added this information to the figure legend. 

      Reviewer #2 (Recommendations for the authors):

      (1) It is unclear how the correspondence between states across different conditions and datasets was computed. Given the spatial autocorrelation of brain maps, I recommend reporting the Dice coefficient along with a spin-test permutation to test for statistical significance.  

      The state correspondence between different conditions and between the two datasets are established using evidence of spatial overlap. The spatial overlap between states was quantified by Pearson’s correlation using the activity values (derived from HMM) of the nine networks. For each candidate state identified in other analyses (for the Rest, MG and older-adult datasets), we calculate the correlation between its network activity pattern and the three predefined states from the main analysis (for the young-adults dataset), and set the one it most closely resembled to be its matching state. For instance, if a candidate state showed the highest correlation with State #1, it was labelled State #1 accordingly. 

      For the comparison between the young and older adults’ datasets (as shown below), the largest spatial overlap occurred along the diagonal of the confusion matrix, with high correlation values. This means that each of the three candidate states was best matched to State #1, State #2, and State #3, respectively, maintaining a one-to-one correspondence between the states from the two datasets. As the HMM is modelled at the level of networks which lack accurate coordinates, we did not apply the spin-test to assess the statistical significance of overlap. Instead, we extracted the state activity patterns from the 1000 permutations (wherein the original BOLD time courses were circularly shifted and an HMM was conducted) for the older-adults dataset. Applying the similar state-correspondence strategy, we generated a null distribution of spatial overlap. The real overlap of the three states was greater than and 97.97%, 95.34% and 92.39% instances from the permutation (as shown below). 

      Author response image 3.

      For the comparison of main task with the resting and the incomprehensible speech condition, there was some degree of confusion: there were two candidate states showing the highest similarity to State #2. In this case, we labeled the most similar candidate as State #2. The other candidate was then assigned to the predefined state with which it had the second-highest correlation. We used a prime symbol (e.g., State #3') to denote cases where such confusion occurred. These findings support our conclusion that the tripartite-organization of brain states is not a task-free, intrinsic property.

      When establishing the correspondence between the Yeo-7 network and the ninenetwork parcellation schemes, we primarily relied on evidence from temporal overlap measures, as a clear network-level alignment between the two parcellation schemes is lacking. Temporal overlap was quantified by calculating the correlation of state occurrence probabilities between the two schemes. To achieve this, we concatenated the time courses of 64 participants, resulting in a time series consisting of 19,200 time points (300 time points per participant) for each state. Each of the three candidate states from the Yeo-7 network scheme was best matched to State #1, State #2, and State #3 from the main analyses, respectively. To determine the statistical significance of the temporal overlap, we circular shifted each participant’s time course of state expression obtained from the Yeo-7network scheme for 1000 times. Applying the same strategy to find the matching states, we generated a null distribution of overlap. The real overlap was much higher than the instances from permutation. 

      Author response image 4.

      In the revision, we have provided detailed description for how the state correspondence is established and reported the statistical significance of those correspondence (Line 671-699). The associated figures have also been updated (Fig.5, Fig. S2 and Fig.S3).  

      (2) Please clarify if circle-shifting was applied to the state expression time course when generating the null distribution for behavior-brain state correlations reported in Figure (3). This seems important to control for the temporal autocorrelation in the time courses.  

      We have updated the results by using circle-shifting to generated the null distribution. The results are largely consistent with the previous on without circular shifting (Line 230-242). 

      (3) Figure 3: What does the green shaded area around the sound envelope represent? In the caption, specify whether the red line in the null distributions indicates the mean or median R between brain state expression and narrative features. It would also be beneficial to report this value in the main text. 

      The green shaded area indicated the original amplitude of speech signal, while blue line indicates the smoothed, low-frequency contour of amplitude changes over time (i.e., speech envelope). We have updated the figure and explained this in the figure caption. 

      The red line in the null distributions indicates the R between brain state expression and narrative features for the real data. and reported the mean R of the permutation in the main text. 

      (4) The manuscript is missing a data availability statement (https://elifesciences.org/inside-elife/51839f0a/for-authors-updates-to-elife-s-datasharing-policies). 

      We have added a statement of data availability in the revision, as follows: 

      “The raw and processed fMRI data are available on OpenNeuro: https://openneuro.org/datasets/ds005623. The experimental stimuli, behavioral data and main scripts used in the analyses are provided on Github. ”

      (5) There is a typo in line 102 ("perceptual alalyses"). 

      Have corrected. 

      We sincerely thank the two reviewers for their constructive feedback, thorough review, and the time they dedicated to improving our work.

      Reference: 

      Ahrends, C., Stevner, A., Pervaiz, U., Kringelbach, M. L., Vuust, P., Woolrich, M. W., & Vidaurre, D. (2022). Data and model considerations for estimating timevarying functional connectivity in fMRI. Neuroimage, 252, 119026. 

      Ballenghein, U., Megalakaki, O., & Baccino, T. (2019). Cognitive engagement in emotional text reading: concurrent recordings of eye movements and head motion. Cognition and Emotion. 

      Fernandino, L., Tong, J.-Q., Conant, L. L., Humphries, C. J., & Binder, J. R. (2022). Decoding the information structure underlying the neural representation of concepts. Proceedings of the national academy of sciences, 119(6), e2108091119. https://doi.org/10.1073/pnas.2108091119  

      Hasson, U., Chen, J., & Honey, C. J. (2015). Hierarchical process memory: memory as an integral component of information processing. Trends in Cognitive Sciences, 19(6), 304-313. 

      Lerner, Y., Honey, C. J., Silbert, L. J., & Hasson, U. (2011). Topographic mapping of a hierarchy of temporal receptive windows using a narrated story [Article]. Journal of Neuroscience, 31(8), 2906-2915. https://doi.org/10.1523/JNEUROSCI.3684-10.2011  

      Liu, L., Li, H., Ren, Z., Zhou, Q., Zhang, Y., Lu, C., Qiu, J., Chen, H., & Ding, G. (2022). The “two-brain” approach reveals the active role of task-deactivated default mode network in speech comprehension. Cerebral Cortex, 32(21), 4869-4884. 

      Liu, L., Zhang, Y., Zhou, Q., Garrett, D. D., Lu, C., Chen, A., Qiu, J., & Ding, G. (2020). Auditory–Articulatory Neural Alignment between Listener and Speaker during Verbal Communication. Cerebral Cortex, 30(3), 942-951. https://doi.org/10.1093/cercor/bhz138

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The authors show that SVZ-derived astrocytes respond to a middle carotid artery occlusion (MCAO) hypoxia lesion by secreting and modulating hyaluronan at the edge of the lesion (penumbra) and that hyaluronan is a chemoattractant to SVZ astrocytes. They use lineage tracing of SVZ cells to determine their origin. They also find that SVZ-derived astrocytes express Thbs-4 but astrocytes at the MCAO-induced scar do not. Also, they demonstrate that decreased HA in the SVZ is correlated with gliogenesis. While much of the paper is descriptive/correlative they do overexpress Hyaluronan synthase 2 via viral vectors and show this is sufficient to recruit astrocytes to the injury. Interestingly, astrocytes preferred to migrate to the MCAO than to the region of overexpressed HAS2.

      Strengths:

      The field has largely ignored the gliogenic response of the SVZ, especially with regard to astrocytic function. These cells and especially newborn cells may provide support for regeneration. Emigrated cells from the SVZ have been shown to be neuroprotective via creating pro-survival environments, but their expression and deposition of beneficial extracellular matrix molecules are poorly understood. Therefore, this study is timely and important. The paper is very well written and the flow of results is logical.

      Weaknesses:

      The main problem is that they do not show that Hyaluronan is necessary for SVZ astrogenesis and or migration to MCAO lesions. Such loss of function studies have been carried out by studies they cite (e.g. Girard et al., 2014 and Benner et al., 2013). Similar approaches seem to be necessary in this work. 

      We appreciate the comments by the reviewer. The article is, indeed, largely descriptive since we attempt to describe in detail what happens to newborn astrocytes after MCAO. Still, we have not attempted any modification to the model, such as amelioration of ischemic damage. This is a limitation of the study that we do not hide. However, we use several experimental approaches, such as lineage tracing and hyaluronan modification, to strengthen our conclusions.

      Regarding the weaknesses found by the reviewer, we do not claim that hyaluronan is necessary for SVZ astrogenesis. Indeed, we observe that when the MCAO stimulus (i.e. inflammation) is present, the HMW-HA (AAV-Has2) stimulus is less powerful (we discuss this in line 330-332). We do claim, and we believe we successfully demonstrate, the reverse situation: that SVZ astrocytes modulate hyaluronan, not at the SVZ but at the site of MCAO, i.e. the scar. However, regarding whether hyaluronan is necessary for SVZ astrogenesis, we only show a correlation between its degradation and the time-course of astrogenesis. We suggest this result as a starting point for a follow-up study. We have included a phrase in the discussion (line 310), stating that further experiments are needed to fully establish a link between hyaluronan and astrogenesis in the SVZ.

      Major points:

      (1) How good of a marker for newborn astrocytes is Thbs4? Did you co-label with B cell markers like EGFr? Is the Thbs4 gene expressed in B cells? Do scRNAseq papers show it is expressed in B cells? Are they B1 or B2 cells?

      We chose Thbs4 as a marker of newborn astrocytes based on published research (Beckervordersanforth et al., 2010; Benner et al., 2013; Llorens-Bobadilla et al. 2015, Codega et al, 2014; Basak et al., 2018; Mizrak et al., 2019; Kjell et al., 2020; Cebrian-Silla et al., 2021). From those studies, at least 3 associate Thbs4 to B-type cells based on scRNAseq data (LlorensBobadilla et al. 2015; Cebrian-Silla et al., 2021; Basak et al., 2018). We have included a sentence about this and the associated references, in line 92. 

      We co-label Thbs4 with EGFR, but in the context of MCAO. We observed an increase of EGFR expression with MCAO, similar to the increase in Thbs4 alongside ischemia (see author ). We did not include this figure in the manuscript since we did not have available tissue from all the time points we used (7d, 60d post-ischemia). 

      Author response image 1.

      Thbs4 cells, in basal and ischemic conditions, only represent a small amount of IdU-positive cells (Fig 3F), suggesting that they are mostly quiescent cells, i.e., B1 cells. However, the scRNAseq literature is not consistent about this.

      (2) It is curious that there was no increase in Type C cells after MCAO - do the authors propose a direct NSC-astrocyte differentiation?

      Type C cells are fast-proliferating cells, and our BrdU/IdU experiment (Fig. 3) suggests that Thbs4 cells are slow-proliferating cells. Some authors suggest (Encinas lab, Spain) that when the hippocampus is challenged by a harsh stimulus, such as kainate-induced epilepsy, the NSCs differentiate directly into reactive astrocytes and deplete the DG neurogenic niche (Encinas et al., 2011, Cell Stem Cell; Sierra et al., 2015, Cell Stem Cell). We believe this might be the case in our MCAO model and the SVZ niche, since we observe a decrease in DCX labeling in the olfactory bulb (Fig S5) and an increase in astrocytes in the SVZ, which migrate to the ischemic lesion. We did not want to overcomplicate an already complicated paper, dwelling with direct NSC-astrocyte differentiation or with the reactive status of these newborn astrocytes. 

      (3) The paper would be strengthened with orthogonal views of z projections to show colocalization.

      We thank the reviewer for this observation. We have now included orthogonal projections in the critical colocalization IF of CD44 and hyaluronan (hyaluronan internalization) in Fig S6D, and a zoomed-in inset. Hyaluronan membrane synthesis is already depicted with orthogonal projection in Fig 6F.

      (4) It is not clear why the dorsal SVZ is analysed and focused on in Figure 4. This region emanates from the developmental pallium (cerebral cortex anlagen). It generates some excitatory neurons early postnatally and is thought to have differential signalling such as Wnt (Raineteau group).

      We decided to analyze in depth the dorsal SVZ after the BrdU experiment (Fig S3), where we observed an increase in BrdU+/Thbs4+ cells mostly in the dorsal area. Hence, the electrodes for electroporation were oriented in such a way as to label the dorsal area. We appreciate the paper by Raineteau lab, but we assume that this region may potentially exploit other roles (apart from excitatory neurons generated early postnatally) depending on the developmental stage (our model is in adults) and/or pathological conditions (MCAO). 

      (5) Several of the images show the lesion and penumbra as being quite close to the SVZ. Did any of the lesions contact the SVZ? If so, I would strongly recommend excluding them from the analysis as such contact is known to hyperactivate the SVZ.

      We thank the referee for the suggestion to exclude the harsher MCAO-lesioned animals from the analysis. Indeed, the MCAO ischemia, methodologically, can generate different tissue damages that cannot be easily controlled. Thus, based on TTC staining, we had already excluded the more severe tissue damage that contacted the SVZ, based on TTC staining.

      (6) The authors switch to a rat in vitro analysis towards the end of the study. This needs to be better justified. How similar are the molecules involved between mouse and rat?

      We chose the rat culture since it is a culture that we have already established in our lab, and that in our own hands, is much more reproducible than the mouse brain cell culture that we occasionally use (for transgenic animals only). Benito-Muñoz et al., Glia. 2016; Cavaliere et al., Front Cell Neurosci. 2013. It is true that there could be differences between the rat and mouse Thbs4-cell physiology, despite a 96% identity between rat and mouse Thbs4 protein sequence (BLASTp). In vitro, we only confirm the capacity of astrocytes to internalize hyaluronan, which was a finding that we did not expect in our in vivo experiments. Indeed, these observations, notwithstanding the obvious differences between in vivo and in vitro scenarios, suggest that the HA internalization by astrocytes is a cross-species event, at least in rodents. Regarding HA, hyaluronan is similar in all species, since it’s a glycan (this is why there are no antibodies against HA, and ones has to rely on binding proteins such as HABP to label it).

      (7) Similar comment for overexpression of naked mole rat HA.

      We chose the naked mole rat Hyaluronan synthase (HAS), because it is a HAS that produces HA of very high molecular weight, similar to the one found accumulated in the glial scar, at the lesion border. The naked-mole rat HAS used in mice (Gorbunova Lab) is a known tool in the ECM field. (Zhang et al, 2023, Nature; Tian et al., 2013, Nature).

      Reviewer 1 (Recommendation to authors):

      (1) Line 22: most of the cells that migrate out of the SVZ are not stem cells but cells further along in the lineage - neuroblasts and glioblasts.

      We thank the reviewer for this clarification. We have modified the abstract accordingly. 

      (2) In Figure 3d the MCAO group staining with GFAP looks suspiciously like ependymal cells which have been shown to be dramatically activated by stroke models.

      The picture does show ependymal cells, which are located next to the ventricle and are indeed very proliferative in stroke. However, these cells do not express Thbs4 (Shah et al., 2018, Cell). In the quantifications from the SVZ of BrdU and IdU injected animals (Fig 3e and f), we only take into account Thbs4+ GFAP+ cells, no GFAP+ only. 

      (3) The TTC injury shown in Figure 5c is too low mag.

      We apologize for the low mag. We have increased the magnification two-fold without compromising resolution. The problem might also have arisen from the compression of TIF into JPEG in the PDF export process. We will address this in the revised version by carefully selecting export settings. The images we used are all publication quality (300 ppi).

      (4) How specific to HA is HABP?

      Hyaluronic Acid Binding Protein is a canonical marker for hyaluronan that is used also in ELISA to quantify it specifically, since it does not bind other glycosaminoglycans. The label has been used for years in the field for immunochemistry, and some controls and validations have been published: Deepa et al., 2006, JBC performed appropriate controls of HABP-biotin labeling using hyaluronidase (destroys labeling) and chondroitinase (preserves labeling). Soria et al., 2020, Nat Commun checked that (i) streptavidin does not label unspecifically, and (ii) that HABP staining is reduced after hyaluronan depletion in vivo with HAS inhibitor 4MU.

      (5) A number of images are out of focus and thus difficult to interpret (e.g. SFig. 4e).

      This is true. We realized that the PDF conversion process for the preprint version has severely compressed the larger images, such as the one found in Fig. S4e. We have submitted a revised version in a better-quality PDF (the final paper will have the original TIFF files). We apologize for the technical problem.

      (6) "restructuration" is not a word.

      We apologize for the mistake and thank the reviewer for the correction. We corrected “restructuration” with “reorganization” in line 67.

      (7) While much of the manuscript is well-written and logical it could use an in-depth edit to remove awkward words and phrasings.

      A native English speaker has revised the manuscript to correct these awkward phrases. All changes are labeled in red in the revised version.

      (8) Please describe why and how you used skeleton analysis for HABP in the methods, this will be unfamiliar to most readers. The one-sentence description in the methods is insufficient.

      We have modified the text accordingly, explaining in depth the logic behind the skeleton analysis. (Line 204). We also added several lines of text describing in detail the image analysis (CD44/HABP spots, fractal dimension, masks for membranal HABP, among others, in lines 484494) 

      Reviewer #2 (Public Review)

      Summary:

      In their manuscript, Ardaya et al have addressed the impact of ischemia-induced gliogenesis from the adult SVZ and their effect on the remodeling of the extracellular matrix (ECM) in the glial scar. They use Thbs4, a marker previously identified to be expressed in astrocytes of the SVZ, to understand its role in ischemia-induced gliogenesis. First, the authors show that Thbs4 is expressed in the SVZ and that its expression levels increase upon ischemia. Next, they claim that ischemia induces the generation of newborn astrocyte from SVZ neural stem cells (NSCs), which migrate toward the ischemic regions to accumulate at the glial scar. Thbs4-expressing astrocytes are recruited to the lesion by Hyaluronan where they modulate ECM homeostasis.

      Strengths:

      The findings of these studies are in principle interesting and the experiments are in principle good.

      Weaknesses:

      The manuscript suffers from an evident lack of clarity and precision in regard to their findings and their interpretation.

      We thank the reviewer for the valuable feedback. We hope the changes proposed improve clarity and precision throughout the manuscript.

      (1) The authors talk about Thbs4 expression in NSCs and astrocytes, but neither of both is shown in Figure 1, nor have they used cell type-specific markers.

      As we reported also to Referee #1 (major point 1), Thbs4 is widely considered in literature as a valid marker for newly formed astrocytes (Beckervordersanforth et al., 2010; Benner et al., 2013; Llorens-Bobadilla et al. 2015, Codega et al, 2014; Basak et al., 2018; Mizrak et al., 2019; Kjell et al., 2020; Cebrian-Silla et al., 2021). Some of the studies mentioned here and discussed in the manuscript text, also associate Thbs4 to B-type cells based on scRNAseq data (LlorensBobadilla et al. 2015; Cebrian-Silla et al., 2021; Basak et al., 2018). Moreover, we also showed colocalization of Thbs4 with activated stem cells marker nestin (Fig.2), glial marker GFAP (Fig. 3) and with dorsal NSCs marker tdTOM (from electroporation, Fig. 4). 

      (2) Very important for all following experiments is to show that Thbs4 is not expressed outside of the SVZ, specifically in the areas where the lesion will take place. If Thbs4 was expressed there, the conclusion that Thbs4+ cells come from the SVZ to migrate to the lesion would be entirely wrong.

      In Figure 1a, we show that Thbs4 is expressed in the telencephalon, exclusively in the neurogenic regions like SVZ, RMS and OB, together with cerebellum and VTA, which are likely not directly topographically connected to the damaged area (cortex and striatum). Regarding the origin of Thbs4+ cells, we demonstrated their SVZ origin by lineage tracking experiments after in vivo cell labeling (Fig. 4).

      (3) Next, the authors want to confirm the expression level of Thbs4 by electroporation of pThbs4-eGFP at P1 and write that this results in 20% of total cells expressing GFP, especially in the rostral SVZ. I do not understand the benefit of this sentence. This may be a confirmation of expression, but it also shows that the GFP+ cells derive from early postnatal NSCs.

      Furthermore, these cells look all like astrocytes, so the authors could have made a point here that indeed early postnatal NSCs expressing Thbs4 generate astrocytes alongside development. Here, it would have been interesting to see how many of the GFP+ cells are still NSCs.

      We thank the reviewer for this useful remark. We have rephrased this paragraph in the results section (Line 99).

      (4) In the next chapter, the authors show that Thbs4 increases in expression after brain injury. I do not understand the meaning of the graphs showing expression levels of distinct cell types of the neuronal lineage. Please specify why this is interesting and what to conclude from that.

      Also here, the expression of Thbs4 should be shown outside of the SVZ as well.

      In Fig 2, we show the temporal expression of two markers (besides Thbs4) in the SVZ. Nestin and DCX are the gold standard markers for NSCs, with DCX present in neuroblasts. This is already explained in line 119. What we didn’t explain, and now we say in line 124, is that Nestin and DCX decrease immediately after ischemia (7d time-point). This probably means that the NSCs stop differentiating into neuroblast to favor glioblast formation. This is also supported by the experiments in the olfactory bulb depicted in Fig. S5C-H.

      (5) Next, the origin of newborn astrocytes from the SVZ upon ischemia is revealed. The graphs indicate that the authors perfused at different time points after tMCAO. Did they also show the data of the early time points? If only of the 30dpi, they should remove the additional time points indicated in the graph. In line 127 they talk about the origin of newborn astrocytes. Until now they have not even mentioned that new astrocytes are generated. Furthermore, the following sentences are imprecise: first they write that the number of slow proliferation NSCs is increased, then they talk about astrocytes. How exactly did they identify astrocytes and separate them from NSCs? Morphologically? Because both cell types express GFAP and Thbs4.

      The same problem also occurs throughout the next chapter.

      We thank the reviewer for this interesting comment. The experiment in Fig 3 combines BrdU and IdU. This is a tricky experiment, since chronic BrdU is normally analyzed after 30d, since the experimenter must wait for the wash out of BrdU (it labels slow-proliferating cells). Since we also wanted to label fast proliferative cells with IdU, we used IP injections of this nucleotide at the different time points, and perfused the day after. It wouldn’t make sense to show BrdU at earlier time points. We do so in Fig 3e, just to colocalize with Thbs4 to read the tendency of the experiment. However, the quantification of BrdU (not of IdU) is done only at 30 DPI, which is explained in the methods (line 407).

      “In line 127, they talk about the origin of newborn astrocytes…” 

      Indeed, we wanted to introduce in the paragraph title that ischemia induced the generation of new astrocytes, which is more clearly described in the text. We changed the paragraph title with “Characterization of Ischemia-induced cell populations”

      “How exactly did they identify astrocytes and separate them from NSC?” 

      With this experiment and using two different protocols to label proliferating cells (BrdU vs IdU) we wanted to track the precursor cells that derivate to astrocytes and that already expressed the marker Thbs4. Indeed, the different increase and rate of proliferation is only related to the progenitor cells that lately will differentiate in astrocytes. In this experiment we only referred to the astrocytes in the last sentence “These results suggest that, after ischemia, Thbs4positive astrocytes derive from the slow proliferative type B cells”

      (6) "These results suggest that ischemia-induced astrogliogenesis in the SVZ occurs in type B cells from the dorsal region, and that these newborn Thbs4-positive astrocytes migrate to the ischemic areas." This sentence is a bit dangerous and bares at least one conceptual difficulty: if NSCs generate astrocytes under normal conditions and along the cause of postnatal development (which they do), then local astrocytes  (expressing the tdTom because they stem from a postnatal NSC ), may also react to MCAO and proliferate locally. So the astrocytes along the scar do not necessarily come from adult NSCs upon injury but from local astrocytes.  If the authors state that NSCs generate astrocytes that migrate to the lesion, I would like to see that no astrocytes inside the striatum carry the tdTom reporter before MCAO is committed.

      We understand the referee’s concern about the postnatal origin of astrocytes that can also be labeled with tdTom. Our hypothesis, tested at the beginning of the paper, is that SVZ-derived astrocytes derive from slow proliferative NSC. Thus, it is reasonable that Tom+ cells can reach the cortical region in such a short time frame. This is why we assumed that local astrocytes can’t be positive for tdTom. We characterized the expression of tfTom in sham animals and we observed few tdTom+ cells in the cortex and striatum (Author response image 2 and Figure S4). The expression of tdTom mainly remains in the SVZ and the corpus callosum under physiological conditions. However, proliferation of local astrocytes labeled with tdTom expression (early postnatally astrocytes) could explain the small percentage of tdTom+ cells in the ischemic regions that do not express Thbs4, even though this percentage could represent other cell types such as OPCs or oligodendrocytes. 

      Author response image 2.

      (7) If astrocytes outside the SVZ do not express Thbs4, I would like to see it.  Otherwise, the discrimination of SVZ-derive GFAP+/Thbs4+ astrocytes and local astrocytes expressing only GFAP is shaky.

      Regarding Thbs4 outside the SVZ, we already answered this in point 2 (please refer to Fig 1A). We also quantified the expression of Thbs4+/GFAP+ astrocytes in the corpus callosum, cortex and striatum of sham and MCAO mice (Figure S5a-b) and we did not observe that local astrocytes express Thbs4 under physiological conditions.

      (8) Please briefly explain what a Skeleton analysis and a Fractal dimension analysis is, and what it is good for.

      We apologized for the brief information on Skeleton and Fractal dimension analysis. We included a detailed explanation of these analyses in methods (line 484-494).

      (9) The chapter on HA is again a bit difficult to follow. Please rewrite to clarify who produces HA and who removes it by again showing all astrocyte subtypes (GFAP+/Thbs4+ and GFAP+/Thbs4-).

      We apologize for the lack of clarity. We rewrote some passages of those chapters (changes in red), trying to convey the ideas more clearly. We also changed a panel in Figure S6b-c to clarify all astrocytes subtypes that internalize hyaluronan (Thbs4+/GFAP+ and Thbs4-/GFAP+). See Author response image 3.

      Author response image 3.

      (10) Why did the authors separate dorsal, medial, and ventral SVZ so carefully? Do they comment on it? As far as I remember, astrogenesis in physiological conditions has some local preferences (dorsal?)

      We performed the electroporation protocol in the dorsal SVZ based on previous results (Figure 3 and Figure S3). NSC produce specific neurons in the olfactory bulb according to their location in the SVZ. However, postnatal production of astrocytes mainly occurs through local astrocytes proliferation and the SVZ contribution is very limited at this time point. 

      Reviewer #3 (Public Review)

      Summary:

      The authors aimed to study the activation of gliogenesis and the role of newborn astrocytes in a post-ischemic scenario. Combining immunofluorescence, BrdU-tracing, and genetic cellular labelling, they tracked the migration of newborn astrocytes (expressing Thbs4) and found that Thbs4-positive astrocytes modulate the extracellular matrix at the lesion border by synthesis but also degradation of hyaluronan. Their results point to a relevant function of SVZ newborn astrocytes in the modulation of the glial scar after brain ischemia. This work's major strength is the fact that it is tackling the function of SVZ newborn astrocytes, whose role is undisclosed so far.

      Strengths:

      The article is innovative, of good quality, and clearly written, with properly described Materials and Methods, data analysis, and presentation. In general, the methods are designed properly to answer the main question of the authors, being a major strength. Interpretation of the data is also in general well done, with results supporting the main conclusions of this article.

      Weaknesses:

      However, there are some points of this article that still need clarification to further improve this work.

      (1) As a first general comment, is it possible that the increase in Thbs4-positive astrocytes can also happen locally close to the glia scar, through the proliferation of local astrocytes or even from local astrocytes at the SVZ? As it was shown in published articles most of the newborn astrocytes in the adult brain actually derive from proliferating astrocytes, and a smaller percentage is derived from NSCs. How can the authors rule out a contribution of local astrocytes to the increase of Thbs4-positive astrocytes? The authors also observed that only about one-third of the astrocytes in the glial scar derived from the SVZ.

      We thank the reviewer for the interesting comment. We have extended the discussion about this topic in the manuscript, (lines 333-342), including the statement about a third of glial scar astrocytes being from the SVZ and not downplaying the role of local astrocytes.  Whether the glial scar is populated by newborn astrocytes derived from SVZ or from local astrocytes is under debate, since there are groups that found astrocytes contribution from local astrocytes (Frisèn group, Magnusson et al., 2014) but there are others that observed the opposite (Li et al., 2010; Benner et al., 2013; Faiz et al., 2015; Laug et al., 2019 & Pous et al., 2020). 

      In our study we observed that Thbs4 expression is almost absent in the cortex and striatum of sham mice. To demonstrate that new-born astrocytes are derived from SVZ we used two techniques: the chronic BrdU treatment and the cell tracing which mainly labels SVZ neural stem cells. Fast proliferating cells lose BrdU quickly so local astrocytes under ischemic conditions do not express BrdU. In addition, we injected IdU the day before perfusion in order to see if local astrocytes express Thbs4 when they respond to the brain ischemia. However, we did not observe proliferating local astrocytes expressing Thbs4 after MCAO (see Author response image 4)

      Author response image 4.

      As mentioned in the response for reviewer 2, the cell tracing technique could label early postnatal astrocytes. We characterized the technique and only a small percentage of tdTom expression was found in the cortex and striatum of sham animals.  This tdTom population could explain the percentage of tdTom+ cells in the ischemic regions that do not express Thbs4 even though this percentage could represent other cell types such as OPCs or oligodendrocytes. Taking all together, evidences suggest that Thbs4+ astrocyte population derived from the SVZ. 

      We indeed observed a small contribution of Thbs4+ astrocytes to the glial scar. However, Thbs4+ astrocytes arrive at the lesion at a critical temporal window - when local hyper-reactive astrocytes die or lose their function. We hypothesized that Thbs4+ astrocytes could help local astrocytes or replace them in reorganizing the extracellular space and the glial scar, an instrumental process for the recovery of the ischemic area. 

      (2) It is known that the local, GFAP-reactive astrocytes at the scar can form the required ECM. The authors propose a role of Thbs4-positive astrocytes in the modulation, and perhaps maintenance, of the ECM at the scar, thus participating in scar formation likewise. So, this means that the function of newborn astrocytes is only to help the local astrocytes in the scar formation and thus contribute to tissue regeneration. Why do we need specifically the Thbs4positive astrocytes migrating from the SVZ to help the local astrocytes? Can you discuss this further?

      Unfortunately, we could not demonstrate which molecular machinery is involved in these mechanisms, and we can only speculate the functional meaning of a second wave of glial activation. We added a lengthy discussion in lines 333-342.

      (3) The authors observed that the number of BrdU- and DCX-positive cells decreased 15 dpi in all OB layers (Fig. S5). They further suggest that ischemia-induced a change in the neuroblasts ectopic migratory pathway, depriving the OB layers of the SVZ newborn neurons. Are the authors suggesting that these BrdU/DCX-positive cells now migrate also to the ischemic scar, or do they die? In fact, they see an increase in caspase-3 positive cells in the SVZ after ischemia, but they do not analyse which type of cells are dying. Alternatively, is there a change in the fate of the cells, and astrogliogenesis is increased at the expense of neurogenesis?  The authors should understand which cells are Cleaved-caspase-3 positive at the SVZ and clarify if there is a change in cell fate. Also please clarify what happens to the BrdU/DCX-positive cells that are born at the SVZ but do not migrate properly to the OB layers.

      Actually, we cannot demonstrate the fate of missing BrdU/DCX cells in the OB. We can reasonably speculate that following the ischemic insult, the neurogenic machinery steers toward investing more energy in generating glial cells to support the lesion. We didn’t analyze the fate of the DCX that originally should migrate and differentiate to the OB, whether they die or if there is a shift in the differentiation program in the SVZ, since we consider that question is out of the study’s scope.   

      (4) The authors showed decreased Nestin protein levels at 15 dpi by western blot and immunostaining shows a decrease already at 7div (Figure 2). These results mean that there is at least a transient depletion of NSCs due to the promotion of astrogliogenesis. However, the authors show that at 30dpi there is an increase of slow proliferating NSCs (Figure 3). Does this mean, that there is a reestablishment of the SVZ cytogenic process?  How does it happen, more specifically, how NSCs number is promoted at 30dpi?  Please explain how are the NSCs modulated throughout time after ischemia induction and its impact on the cytogenic process.

      Based on the chronic BrdU treatment, results suggested a restoration of SVZ cytogenic process (also observed in the nestin and DCX proteins expression at 30dpi). However, we did not analyze how it happens (from asymmetric or symmetric divisions). As suggested by Encinas group, we hypothesized that the brain ischemia induces the exhaustion of the neurogenic niche of the SVZ by symmetric divisions of NSC into reactive astrocytes.

      (5) The authors performed a classification of Thbs4-positive cells in the SVZ according to their morphology. This should be confirmed with markers expressed by each of the cell subtypes.

      We thank the referee for the comment. Classifying NSC based on different markers could also be tricky because different NSC cell types share markers. This classification was made considering the specific morphology of each NSC cell type. In addition, Thbs4 expression in Btype cells is also observed in other studies (Llorens-Bobadilla et al. 2015; Cebrian-Silla et al.,

      2021; Basak et al., 2018).

      (6) In Figure S6, the authors quantified HABP spots inside Thbs4-positive astrocytes. Please show a higher magnification picture to show how this quantification was done.

      We quantified HABP area and HABP spots inside Thbs4+ astrocytes with a custom FIJI script.

      Thbs4 cell mask was done via automatic thresholding within the GFAP cell mask. Threshold for HABP marker was performed and binary image was processed with 1 pixel median filter (to eliminate 1 px noise-related spots). “Analyze particles” tool was used to sort HABP spots in the cell ROI. HABP spot number per compartment and population was exported to excel and data was normalized dividing HABP spots per ROI by total HABP spots. See Author response image 5.

      Author response image 5.

    1. Author Response

      We thank all three Reviewers and the editors for the time and effort they put in reading and critiquing the manuscript. Our revised manuscript includes new data and analyses that address the original concerns. These include, 1) a new Supplemental Figure characterizing Cre expression and cellular phenotypes in the hippocampus, 2) new tables that give a more comprehensive picture of the EEG recordings and statistical analyses, 3) addition of whole cell electrophysiology data, and 4) rewriting to ensure that we do not state that either mTORC1 or mTORC2 hyperactivation is sufficient to cause epilepsy. We discuss the issue of statistical power to detect reduction in generalized seizure rate in the responses below. These suggestions and additions have improved the paper and we hope they will raise both significance and strength of support for the conclusions.

      Reviewer #1 (Public Review):

      Hyperactivation of mTOR signaling causes epilepsy. It has long been assumed that this occurs through overactivation of mTORC1, since treatment with the mTORC1 inhibitor rapamycin suppresses seizures in multiple animal models. However, the recent finding that genetic inhibition of mTORC1 via Raptor deletion did not stop seizures while inhibition of mTORC2 did, challenged this view (Chen et al, Nat Med, 2019). In the present study, the authors tested whether mTORC1 or mTORC2 inhibition alone was sufficient to block the disease phenotypes in a model of somatic Pten loss-of-function (a negative regulator of mTOR). They found that inactivation of either mTORC1 or mTORC2 alone normalized brain pathology but did not prevent seizures, whereas dual inactivation of mTORC1 and mTORC2 prevented seizures. As the functions of mTORC1 versus mTORC2 in epilepsy remain unclear, this study provides important insight into the roles of mTORC1 and mTORC2 in epilepsy caused by Pten loss and adds to the emerging body of evidence supporting a role for both complexes in the disease development.

      Strengths:

      The animal models and the experimental design employed in this study allow for a direct comparison between the effects of mTORC1, mTORC2, and mTORC1/mTORC2 inactivation (i.e., same animal background, same strategy and timing of gene inactivation, same brain region, etc.). Additionally, the conclusions on brain epileptic activity are supported by analysis of multiple EEG parameters, including seizure frequencies, sharp wave discharges, interictal spiking, and total power analyses.

      Weaknesses:

      (1) The sample size of the study is small and does not allow for the assessment of whether mTORC1 or mTORC2 inactivation reduces seizure frequency or incidence. This is a limitation of the study.

      We agree that this is a minor limitation of the present study, however, for several reasons we decided not to pursue this question by increasing the number of animals. First, we performed a power analysis of the existing data. This analysis showed that we would need to use 89 animals per group to detect a significant difference (0.8 Power, p= 0.05, Mann-Whitney test) in the frequency of generalized seizures in the Pten-Raptor group and 31 animals per group in the Pten-Rictor group versus Pten alone. It is simply not feasible to perform video-EEG monitoring on this many animals for a single study. Second, even if we did do enough experiments to detect a reduction in seizure frequency, it is clear that neither Rptor nor Rictor deletion provides the kind normalization in brain activity that we seek in a targeted treatment. Both Pten-Rptor and Pten-Rictor animals still have very frequent spike-wave events (Fig. 3D) and highly abnormal interictal EEGs (Fig. 4), suggesting that even if generalized seizures were reduced, epileptic brain activity persists. This is in contrast to the triple KO animals, which have no increase in SWD above control level and very normal interictal EEG.

      (2) The authors describe that they inactivated mTORC1 and mTORC2 in a new model of somatic Pten loss-of-function in the cortex. This is slightly misleading since Cre expression was found both in the cortex and the underlying hippocampus, as shown in Figure 1. Throughout the manuscript, they provide supporting histological data from the cortex. However, since Pten loss-of-function in the hippocampus can lead to hippocampal overgrowth and seizures, data showing the impact of the genetic rescue in the hippocampus would further strengthen the claim that neither mTORC1 nor mTORC2 inactivation prevents seizures.

      Thank you for pointing out this issue. Cre expression was observed in both the cortex and the dorsal hippocampus in most animals, and we agree that differences in cortical versus hippocampal mTOR signaling could have differential contributions to epilepsy. We initially focused our studies on the cortex because spike-and-wave discharge, the most frequent and fully penetrant EEG phenotype in our model, is associated with cortical dysfunction. In our revised submission we have included a new Figure that quantifies Cre expression in the hippocampal subfields, as well as pS6, pAkt and soma size. These new data show that the amount of Cre expression in the hippocampus is not related to the occurrence of generalized seizures. The pattern of cell size changes in hippocampal neurons is the same as observed in cortical neurons. The levels of pS6 and pAkt are not much changed in the hippocampus, likely due to the sparse Cre expression there. We interpret these findings as supporting the conclusion that the reason we do not see seizure prevention by mTORC1 or mTORC2 inactivation is not due to hippocampal-specific dysfunction.

      (3)Some of the methods for the EEG seizure analysis are unclear. The authors describe that for control and Pten-Raptor-Rictor LOF animals, all 10-second epochs in which signal amplitude exceeded 400 μV at two time-points at least 1 second apart were manually reviewed, whereas, for the Pten LOF, Pten-Raptor LOF, and Pten-Rictor LOF animals, at least 100 of the highest- amplitude traces were manually reviewed. Does this mean that not all flagged epochs were reviewed? This could potentially lead to missed seizures.

      We reviewed at least 48 hours of data from each animal manually. All seizures that were identified during manual review were also identified by the automated detection program. It is possible but unlikely that there are missed seizures in the remaining data. We have added these details to the Methods of the revised submission.

      (4) Additionally, the inclusion of how many consecutive hours were recorded among the ~150 hours of recording per animal would help readers with the interpretation of the data.

      Thank you for this recommendation. Our revised submission includes a table with more information about the EEG recordings in the revised version of the manuscript. The number of consecutive hours recorded varied because the wireless system depends on battery life, which was inconsistent, but each animal was recorded for at least 48 consecutive hours on at least two occasions.

      (5) Finally, it is surprising that mTORC2 inactivation completely rescued cortical thickness since such pathological phenotypes are thought to be conserved down the mTORC1 pathway. Additional comments on these findings in the Discussion would be interesting and useful to the readers.

      We agree that the relationship between mTORC2, cortical thickness, and growth in general is an interesting topic with conflicting results in the literature. We didn’t add anything to the Discussion along these lines because we are up against word limits, but comment here that soma size was increased 120% by Pten inactivation and partially normalized to a 60% increase from Controls by mTORC2 inactivation (Fig. 2C). We and others have previously shown that mTORC2 inactivation (Rictor deletion) in neurons reduces brain size, neuron soma size, and dendritic outgrowth (PMIDs: 36526374, 32125271, 23569215). In our revised submission we also include new data showing that the membrane capacitance of Pten-Ric LOF neurons is normal. Thus, we do not find it completely surprising that mTORC2 inactivation reduces the cortical thickness increase caused by Pten loss. There may still be a slight increase in cortical thickness in Pten-Rictor animals, but it is statistically indistinguishable from Controls.

      Reviewer #2 (Public Review):

      Summary:

      The study by Cullen et al presents intriguing data regarding the contribution of mTOR complex 1 (mTORC1) versus mTORC2 or both in Pten-null-induced macrocephaly and epileptiform activity. The role of mTORC2 in mTORopathies, and in particular Pten loss-off-function (LOF)-induced pathology and seizures, is understudied and controversial. In addition, recent data provided evidence against the role of mTORC1 in PtenLOF-induced seizures. To address these controversies and the contribution of these mTOR complexes in PtenLOF-induced pathology and seizures, the authors injected a AAV9-Cre into the cortex of conditional single, double, and triple transgenic mice at postnatal day 0 to remove Pten, Pten+Raptor or Rictor, and Pten+raptor+rictor. Raptor and Rictor are essentially binding partners of mTORC1 and mTORC2, respectively. One major finding is that despite preventing mild macrocephaly and increased cell size, Raptor knockout (KO, decreased mTORC1 activity) did not prevent the occurrence of seizures and the rate of SWD event, and aggravated seizure duration. Similarly, Rictor KO (decreased mTORC2 activity) partially prevented mild macrocephaly and increased cell size but did not prevent the occurrence of seizures and did not affect seizure duration. However, Rictor KO reduced the rate of SWD events. Finally, the pathology and seizure/SWD activity were fully prevented in the double KO. These data suggest the contribution of both increased mTORC1 and mTORC2 in the pathology and epileptic activity of Pten LOF mice, emphasizing the importance of blocking both complexes for seizure treatment. Whether these data apply to other mTORopathies due to Tsc1, Tsc2, mTOR, AKT or other gene variants remains to be examined.

      Strengths:

      The strengths are as follows: 1) they address an important and controversial question that has clinical application, 2) the study uses a reliable and relatively easy method to KO specific genes in cortical neurons, based on AAV9 injections in pups. 2) they perform careful video-EEG analyses correlated with some aspects of cellular pathology.

      Weaknesses:

      The study has nevertheless a few weaknesses: 1) the conclusions are perhaps a bit overstated. The data do not show that increased mTORC1 or mTORC2 are sufficient to cause epilepsy. However the data clearly show that both increased mTORC1 and mTORC2 activity contribute to the pathology and seizure activity and as such are necessary for seizures to occur.

      We agree that our findings do not directly show that either mTORC1 or mTORC2 hyperactivity are sufficient to cause seizures, as we do not individually hyperactivate each complex in the absence of any other manipulation. We interpreted our findings in this model as suggesting that either is sufficient based on the result that there is no epileptic activity when both are inactivated, and thus assume that there is not a third, mTOR-independent, mechanism that is contributing to epilepsy in Pten, Pten-Raptor, and Pten-Rictor animals. In addition, the histological data show that Raptor and Rictor loss each normalize activity through mTORC1 and mTORC2 respectively, suggesting that one in the absence of the other is sufficient. However, we agree that there could be other potential mTOR-independent pathways downstream of Pten loss that contribute to epilepsy. We have revised the manuscript to reflect this.

      (2) The data related to the EEG would benefit from having more mice. Adding more mice would have helped determine whether there was a decrease in seizure activity with the Rictor or Raptor KO.

      Please see response to Reviewer 1’s first Weakness.

      (3) It would have been interesting to examine the impact of mTORC2 and mTORC1 overexpression related to point #1 above.

      We are not sure that overexpression of individual components of mTORC1 or mTORC2 would result in their hyperactivation or lead to increases in downstream signaling. We believe that cleanly and directly hyperactivating mTORC1 or especially mTORC2 in vivo without affecting the other complex or other potential interacting pathways is a difficult task. Previous studies have used mTOR gain-of-function mutations as a means to selectively activate mTORC1 or pharmacological agents to selectively activate mTORC2, but it not clear to us that the former does not affect mTORC2 activity as well, or that the latter achieves activation of mTORC2 targets other than p-Akt 473, or that it is truly selective. We agree that these would be key experiments to further test the sufficiency hypothesis, but that the amount of work that would be required to perform them is more that what we can do in this Short Report.

      Reviewer #3 (Public Review):

      Summary: This study investigated the role of mTORC1 and 2 in a mouse model of developmental epilepsy which simulates epilepsy in cortical malformations. Given activation of genes such as PTEN activates TORC1, and this is considered to be excessive in cortical malformations, the authors asked whether inactivating mTORC1 and 2 would ameliorate the seizures and malformation in the mouse model. The work is highly significant because a new mouse model is used where Raptor and Rictor, which regulate mTORC1 and 2 respectively, were inactivated in one hemisphere of the cortex. The work is also significant because the deletion of both Raptor and Rictor improved the epilepsy and malformation. In the mouse model, the seizures were generalized or there were spike-wave discharges (SWD). They also examined the interictal EEG. The malformation was manifested by increased cortical thickness and soma size.

      Strengths: The presentation and writing are strong. The quality of data is strong. The data support the conclusions for the most part. The results are significant: Generalized seizures and SWDs were reduced when both Torc1 and 2 were inactivated but not when one was inactivated.

      Weaknesses: One of the limitations is that it is not clear whether the area of cortex where Raptor or Rictor were affected was the same in each animal.

      Our revised submission includes new data showing that the area of affected cortex and hippocampus are similar across groups. (Figure 1A and Supplementary Figure 1)

      Also, it is not clear which cortical cells were measured for soma size.

      Soma size was measured by dividing Nissl stain images into a 10 mm2 grid. The somas of all GFP-expressing cells fully within three randomly selected grid squares in Layer II/III were manually traced. Three sections per animal at approximately Bregma -1.6, -2,1, and -2.6 were used. As Cre expression was driven by the hSyn promoter these cells include both excitatory and inhibitory cortical neurons.

      Another limitation is that the hippocampus was affected as well as the cortex. One does not know the role of cortex vs. hippocampus. Any discussion about that would be good to add.

      See response to Reviewer 1’s second Weakness.

      It would also be useful to know if Raptor and Rictor are in glia, blood vessels, etc.

      Raptor and Rictor are thought to be ubiquitously active in mammalian cells including glia and endothelial cells. Previous studies have shown that mTOR manipulation can affect astrocyte function and blood vessel organization, however, our study induced gene knockout using an AAV that expressed Cre under control of the hSyn promoter, which has previously been shown to be selective for neurons. Manual assessment of Cre expression compared with DAPI, NeuN, and GFAP stains suggested that only neurons were affected.

      Recommendations for the authors: please note that you control which revisions to undertake from the public reviews and recommendations for the authors

      Reviewer #1 (Recommendations For The Authors):

      In addition to the comments in the public review, it is recommended that the authors provide a more representative figure for p-Akt staining in the Pten LOF condition in Figure 1 D2. The current figure is not convincing.

      Thanks for the suggestion. We have replaced the images with zoomed in panels that beter demonstrate the difference.

      Additionally, in the last paragraph of the discussion, there is a reference error to an incorrect paper (reference 18) that should be corrected.

      Thanks, corrected.

      Reviewer #2 (Recommendations For The Authors):

      Major comments:

      Comment 1: Some statements need clarifications or changes.

      (1) Abstract: "spontaneous seizures and epileptiform activity persisted despite mTORC1 or mTORC2 inactivation alone but inactivating both mTORC1 and mTORC2 normalized pathology." Did inactivation of one only also normalized the pathology? Did inactivating both normalized the seizures? Pathology is not equal to seizures.

      We have altered this statement to avoid ambiguity.

      (2) Abstract: "These results suggest that hyperactivity of both mTORC1 and mTORC2 are sufficient to cause epilepsy,". Based on the abstract, it is not clear that it is sufficient. It is necessary.

      We have altered this statement by removing the term “sufficient.”

      (3) "Thus, there is strong evidence that hyperactivation of mTORC1 downstream of PTEN disruption causes the macrocephaly, epilepsy, early mortality, and synaptic dysregulation observed in humans and model organisms [17]" I would suggest adding that the strongest evidence is that mTOR GOF mutations lead to the same pathology and epilepsy, suggesting mTORC1 is sufficient. The other findings suggest that it is necessary.

      Unless we misunderstand the Reviewer’s point, we believe this viewpoint is already encompassed by the proceeding text that “These phenotypes resemble those observed in models of mTORC1- specific hyperactivation.”

      (4) Introduction (end): "suggesting that hyperactivity of either complex can lead to neuronal hyperexcitability and epilepsy".

      Comment 2: I do not agree with the title based on comment 1 above. You did not provide evidence that the mTORCs cause seizures. Your data suggest that they are necessary for seizures or contribute to seizures, but there is no evidence that mTORC2 can induce seizure.

      We softened the title by replacing “cause” with “mediate.”

      Comment 3: Fig. 1B. Could you beter describe the affected regions. I can see other regions than just the cortex and hippocampus.

      Almost all affected cell bodies were in the cortex and hippocampus. The virus in the image is cell-filling and as such projections from affected neurons throughout the brain can also be seen. We have clarified this in the figure legend.

      Comment 4: I feel unease about the number of animals recorded for EEG to assess seizure frequency. There is not enough power to draw clear conclusions. So, please make sure to not oversell your findings since it is all-or-nothing data (seizure or no seizure) in this case and the seizure frequency could very well be decreased with single mTOR LOF, but it is impossible to conclude. Maybe discuss this limitation of your study.

      We have addressed this in the public comments response.

      Minor:

      (1) Pten LOF: define the abbreviation.

      Done

      (2) Make sure that gene name in mice are not capitalized and italicized.

      OK

      (3) Fig 1C: could you specify in the results where the analysis was done.

      Detail added to Methods (to keep Results concise for word limit)

      (4) In the subtitle: "Concurrent mTORC1/2 inactivation, but neither alone, rescues epilepsy and interictal EEG abnormalities in focal Pten LOF". Replace "rescues" but prevents. This is not a rescue experiment since the LOF is done at the same time.

      OK

      (5) "GS did not appear to be correlated with mTOR pathway activity (Supplementary Figure 2)." Please can you do proper correlation analysis, by plotting all the values as a function of seizure frequency independent of the condition? There is also no correlation between pAKt and seizures.

      Here are those data in Author response image 1. They are now part of Supplementary Figure 2.

      Author response image 1.

      Reviewer #3 (Recommendations For The Authors):

      Figures 1 D, and E show images that are too small to judge. Where are the layers? Please add marks.

      We replaced these images with larger zoomed in images to show group differences more clearly. The images no longer show multiple differentiable cortical layers.

      If Fig 1 characterizes the model, where is the seizure data? When did they start? Where did they start? Was the focus of the cortical area affected by PTEN loss of function?

      Updated figure name to reflect content. Information about the seizure phenotypes is included in Figure 3.

      Figure 2 The font size for the calibration is too small. The correlations are hard to see. Colors are not easy to discriminate.

      We edited the figure to correct these problems.

      Figure 3 shows a clear effect on generalized seizures but the text of the Results does not reflect that.

      We wanted to be cautious about interpreting these data based on the issue raised by other reviewers that they are underpowered to detect seizure reduction in the Pten-Raptor and Pten-Rictor groups. We have updated the language to atempt to strike a beter balance between over- and under-interpretation. We also performed an additional analysis of the occurrence of generalized seizures to emphasize that only Control and PtRapRic animals have significantly lower seizure occurrence that Pten LOF mice (Fig 3C).

      For interictal power, was the same behavioral state chosen? Was a particular band affected?

      Epochs to be analyzed were selected automatically and were agnostic to behavioral state. Band-specific effects are outlined in Figure 4B and Table [2].

      There is no information about whether the model exhibits altered sleep, food intake, weight, etc.

      We didn’t collect information on food intake. It would be possible to look at sleep from the EEG, but that is not something that we are prepared to do at this point. Weight at endpoint was not different between genotypes but we did not collect longitudinal data on weight.

      Were the sexes different?

      Included in new Table [1]

      Where were EEG electrodes and were they subdural or not?

      Additional detail on this has been added to Methods. The screws are placed in the skull but above the dura.

      How long were continuous EEG records- the method just says 150 hr. per mouse in total.

      Included in new Table [1]

      The statistics don't discuss power, normality, whether variance was checked to ensure it did not differ significantly between groups, or whether data are mean +- sem or sd. For ANOVAs, were there multifactorial comparisons and what were F, df, and p values? Exact p for post hoc tests?

      We have added a new table (Table [3]) that gives information on the exact test used, F, p values, and exact p for post hoc tests. Information regarding power, normality, variance, post- tests and multiple comparisons corrections have been added to Methods section “Statistical Analysis.”

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This valuable study addresses the long-term effect of warming and altered precipitation on microbial growth, as a proxy for understanding the impact of global warming. While the methods are compelling and the evidence supporting the claims is solid, additional analysis of the data would strengthen the study, which should be of broad interest to microbial ecologists and microbiologists.

      We sincerely appreciate your assessment and thoughtful comments, which are valuable and very helpful for improving our manuscript. We have carefully considered all comments, and made extensive, thorough corrections and additional analysis of the data, which we hope to meet with approval.

      Reviewer #1 (Public Review):

      Warming and precipitation regime change significantly influences both above-ground and below-ground processes across Earth's ecosystems. Soil microbial communities, which underpin the biogeochemical processes that often shape ecosystem function, are no exception to this, and although research shows they can adapt to this warming, population dynamics and ecophysiological responses to these disturbances are not currently known. The Qinghai-Tibet Plateau, the Third Pole of the Earth, is considered among the most sensitive ecosystems to climate change. The manuscript described an integrated, trait-based understanding of these dynamics with the qSIP data. The experimental design and methods appear to be of sufficient quality. The data and analyses are of great value to the larger microbial ecological community and may help advance our understanding of how microbial systems will respond to global change. There are very few studies in which the growth rates of bacterial populations from multifactorial manipulation experiments on the Qinghai-Tibet Plateau have been investigated via qSIP, and the large quantity of data that comprises the study described in this manuscript, will substantially advance our knowledge of bacterial responses to warming and precipitation manipulations.

      We appreciate the encouragement and positive comments.

      Specific comments:

      (1) Please add some names of microbial groups with most common for the growth rates.

      We have added the sentence “The members in Solirubrobacter and Pseudonocardia genera had high growth rates under changed climate regimes” In the Abstract (Line 57-58).

      (2) L47-48, consider changing "microbial growth and death" to "microbial eco-physiological processes (e.g., growth and death)", and changing "such eco-physiological traits" to "such processes".

      Done (Line 47 and 48).

      (3) L50-51, the author estimated bacterial growth in alpine meadow soils of the Tibetan Plateau after warming and altered precipitation manipulation in situ. Actually, the soil samples were collected and incubated in the laboratory rather than in the field like the previous experiment conducted by Purcell et al. (2021, Global Change Biology). "In situ" would lead me to believe that the qSIP incubation was conducted in the field, so I think the use of the word in situ is inappropriate here. [https://onlinelibrary.wiley.com/doi/full/10.1111/gcb.15911]

      Agreed. We have deleted “in situ”.

      (4) L52, what does "interactive global change factors" mean?

      We have revised this sentence to “the growth of major taxa was suppressed by the single and combined effects of temperature and precipitation” (Line 52-53).

      (5) L61, in my opinion, "Microbial diversity" belongs to the category of species composition, rather than ecosystem functional services. Please revise it.

      Agree. We have deleted it.

      (6) L69, consider changing "further" to "thus".

      Done (Line 70).

      (7) L82, delete "The evidence is overwhelming that".

      Done.

      (8) L85-90, these two sentences have similar meanings, please express them concisely.

      We have deleted the sentence “Altered precipitation, particularly drought or heavy precipitation events, also tends to negatively influence soil processes and biodiversity”.

      (9) L91, the effect of drought on soil microorganisms is lacking here.

      We have added the sentence “Reduced precipitation affects soil processes notably by directly stressing soil organisms, and also altering the supply of substrates to microbes via dissolution, diffusion, and transport” in the Introduction (Line87-89).

      (10) L102, "Growth" should be highlighted here, as changes in relative abundance can also be classified as population dynamics. The use of the term "population dynamics" will eliminate the highlight of this study in calculating the growth rate of microbial species in in-situ soil based on qSIP. Consider changing "population dynamics" to "population-growth responses" or something like that.

      Done (Line 98).

      (11) L105, please note that this citation focuses on plant physiological characteristics.

      We have revised the reference (Line 102).

      (12) L115, "soil temperature, water availability" should be considered as a direct impact of climate change, rather than an indirect impact on microorganisms.

      We have deleted them.

      (13) L134-135, please clarify the interaction types between which climate factors.

      We have deleted this sentence.

      (14) L135-138, suggest modifying or deleting this sentence. The results in this study are already eco-physiological data and do not need to be further "understood and predicted".

      We have deleted this sentence.

      (15) L150, "The experimental design has been described in previously". I think this refers to another study and not the actual incubations in this study. Also in L198, suggest a change to "Incubation conditions were similar to those previously described". So, it's clear it's not the same experiment.

      We have revised these sentences to “has been described previously in (Ma et al., 2017)” (Line 136) and “according to a previous publication” (Line 194).

      Reference:

      Ma, Z., Liu, H., Mi, Z., Zhang, Z., Wang, Y., Xu, W. et al. (2017). Climate warming reduces the temporal stability of plant community biomass production. Nature Communications, 8, 15378.

      (16) L188, change "pre-wet soil samples" to "pre-wet samples" and change "soil samples for 48h incubation" to "incubation samples". What does "pre-wet" mean? Does it represent soil pre-cultivation?

      Done. The pre-wet samples, i.e., the soil samples before incubation (T = 0 d), were used to estimate the initial microbial composition. "pre-wet" does not mean soil pre-cultivation. We have added the description “A portion of the air-dried soil samples was taken as the pre-wet treatment (i.e., before incubation without H2O addition)” in MATERIALS AND METHODS (Line 174-175).

      (17) Unify the time unit of incubation (hour or day). Consider changing "48 h" to "2 d" in Materials and Methods.

      Done.

      (18) L247, what version of RDP Classifier was used?

      We used RDP v16 database for taxonomic annotation. We have added this information in the revision (Line 246).

      (19) L270, "average molecular weights".

      Done (Line 268).

      (20) L272-275, based on the preceding description, it appears that the culture period was limited to 48 hours. Please confirm it.

      Apologize for this mistake. We have revised it (Line 273).

      (21) L297, switch the order of the first two sentences of this paragraph.

      Done (Line 297).

      (22) L331, change "smaller-than-additive" to "smaller than their expected additive effect".

      Done (Line 331).

      (23) L374 and 381, I struggle with why "larger combined effects" than single factor effects represent higher degree of antoninism, and I think it should be "smaller combined effects".

      Agree. We have revised it according to this suggestion (Line 369 and 374).

      (24) L375, remove "than that of drought and warming".

      Done.

      (25) L405, simplify the expression, change "between different warming and rainfall regimes" to "between climate regimes"

      We have deleted this sentence.

      (26) L406-408, species are already on the phylogenetic tree and they can not "clustered at the phylogenetic branches", but the functional traits of microbes can. Please revise it.

      We have revised this sentence to “Overall, the most incorporators whose growth was influenced by the antagonistic interaction of T × P showed significant phylogenetic clustering (i.e., species clustered at the phylogenetic branches; NTI > 0, P < 0.05)” (Line 402-404).

      (27) L409, the same as above, and consider removing "The incorporators subjected to". We have revised this sentence to “The incorporators whose growth subjected to the additive interaction of warming × drought also showed significant phylogenetic clustering (P < 0.05)” (Line 404-406).

      (28) L412, consider changing "incorporators subjected to the synergistic interaction" to "the synergistic growth responses under multifactorial changes".

      We have revised the sentence to “incorporators whose growth is influenced by the synergistic interaction showed phylogenetically random distribution under both climate scenarios (P > 0.05)” (Line 407-409).

      (29) L505-506, please add a reference for this sentence.

      Done (Line 488).

      (30) L511-514, It should be noted that the production of MBC does not necessarily imply a net change in the C pool size. The accelerated growth rates may result in expedited turnover of MBC, rather than an increase in carbon sequestration.

      Thanks. We have deleted this sentence.

      (31) Language precision. In the discussion section there must be some additional caveats introduced to some of the claims the authors are making. For instance, L518, the author should clarify that "in this study, the bacterial growth in alpine grassland may be influenced by antagonistic interactions between multiple climatic factors after a decadal-long experiment". Because other studies may exhibit different results due to the focus on different ecosystem functions as well as environmental conditions. As such, softening of the language is recommended- lines are noted below- and these will not adjust the outcomes of this study, but support more precise interpretation.

      We have revised the sentence to “In this study, a decade-long experiment revealed that bacterial growth in alpine meadows is primarily influenced by the antagonistic interaction between T × P” (Line 497-499).

      (32) Picrust analysis is a good way to connect species and their functions, especially Picrust2, which updated the reference database and optimized the algorithm to improve its prediction accuracy (Douglas et al., 2020, Nature Biotechnology). However, the link between microbial taxonomy and microbial metabolism is still not straightforward, especially in diverse microbial communities like soils. The authors should introduce caveats within discussion that they know the limitations of their methods. For context, as a reader who does metabolisms in soils, I found myself somewhat disappointed when piecrust data was introduced and not properly caveated. Particularly, it might be helpful to introduce briefly in the last paragraph of the results. These caveats are necessary to not potentially overstate the author's findings, and to make sure the reader knows the authors understand the very clear limitations of these methods. [https://www.nature.com/articles/s41587-020-0548-6]

      Thanks. We have introduced caveats in DISCUSSION, that is “This is, however, still to be verified, as the functional output from PICRUSt2 is less likely to resolve rare environment-specific functions (Douglas et al., 2020)” (Line 540-542).

      Reference:

      Douglas, G., Maffei, V., Zaneveld, J., Yurgel, S., Brown, J., Taylor, C. et al. (2020). PICRUSt2 for prediction of metagenome functions. Nature Biotechnology, 38, 1-5.

      (33) Although the author has explained the potential causes for the negative effects of different climate change factors (i.e., warming, drought, and wet) on microbial growth, there seems to be a lack of a summary assertion and an extension on how climate change affects microbial growth and related ecosystem functions. It is recommended to make a general summary of the results in the last part of Discussion.

      We have added a general summary in the last paragraph of DISCUSSION, that is “Our results demonstrated that both warming and altered precipitation negatively affect the growth of grassland bacteria; However, the combined effects of warming and altered precipitation on the growth of ~70% soil bacterial taxa were smaller than the single-factor effects, suggesting antagonistic interaction. This suggests the development of multifactor manipulation experiments in precise prediction of future ecosystem services and feedbacks under climate change scenarios” (Line 552-558).

      (34) L546, please add the taxonomic information for "OTU 14".

      Done (Line 533).

      (35) L800, change "The phylogenetic tree" to "A phylogenetic tree".

      Done (Line 762).

      Reviewer #2 (Public Review):

      Summary:

      The authors aimed to describe the effect of different temperature and precipitation regimes on microbial growth responses in an alpine grassland ecosystem using quantitative 18O stable isotope probing. It was found that all climate manipulations had negative effects on microbial growth, and that single-factor manipulations exerted larger negative effects as compared to combined-factor manipulations. The degree of antagonism between factors was analyzed in detail, as well as the differential effect of these divergent antagonistic responses on microbial taxa that incorporated the isotope. Finally, a hypothetical functional profiling was performed based on taxonomic affiliations. This work gives additional evidence that altered warming and precipitation regimes negatively impact microbial growth.

      Strengths:

      A long term experiment with a thorough experimental design in apparently field conditions is a plus for this work, making the results potentially generalisable to the alpine grassland ecosystem. Also, the implementation of a qSIP approach to determine microbial growth ensures that only active members of the community are assessed. Finally, particular attention was given to the interaction between factors and a robust approach was implemented to quantify the weight of the combined-factor manipulations on microbial growth.

      We appreciate the reviewer’s positive comments.

      Weaknesses:

      The methodology does not mention whether the samples taken for the incubations were rhizosphere soil, bulk soil or a mix between both type of soils. If the samples were taken from rhizosphere soil, I wonder how the plants were affected by the infrared heaters and if the resulting shadow (also in the controls with dummy heaters) had an effect on the plants and the root exudates of the parcels as compared to plants outside the blocks? If the samples were bulk soil, are the results generalisable for a grassland ecosystem? In my opinion, it is needed to add more info on the origin of the soil samples and how these were taken.

      The samples taken for the incubations can be considered as a mixture of rhizosphere and bulk soils. During soil sampling, we did not use conventional rhizosphere soil collection methods. However, there is a certain proportion of fragmented roots in the soil samples we collected, indicating that soil properties are influenced by plants. We have added this description in MATERIALS AND METHODS (Line 158).

      To minimize the impact of physical shading on the plants, each sampling point was as far away from infrared heaters as possible. We have added this information of soil collection in MATERIALS AND METHODS, that is “In each plot, three soil cores of the topsoil (0-5 cm in depth) were randomly collected and combined as a composite sample, which can be considered as a mixture of rhizosphere and bulk soils. Each sampling point was as far away from infrared heaters as possible to minimize the impact of physical shading on the plants. The fresh soil samples were shipped to the laboratory and sieved (2-mm) to remove root fragments and stones.” (Line 157-162).

      Previous studies based on our field experiment assessed the effects of warming and altered precipitation on soil microbial communities (Zhang et al., 2016), the temporal stability of plant community biomass (Ma et al., 2017), shifting plant species composition and grassland primary production (Liu et al., 2018). These studies provide guidance for the experiment design and execution.

      Reference:

      Zhang, KP., Shi, Y., Jing, X. et al. (2016). Effects of Short-Term Warming and Altered Precipitation on Soil Microbial Communities in Alpine Grassland of the Tibetan Plateau. Frontiers in Microbiology, 7, 1-11.

      Ma ZY., Liu, HY., Mi, ZR. et al. (2017). Climate warming reduces the temporal stabilityof plant community biomass production. Nature Communications, 8, 15378.

      Liu, HY., Mi, ZR., Lin, L. et al. (2018). Shifting plant species composition in response to climate change stabilizes grassland primary production. Proceedings of the National Academy of Sciences, 115, 4051-4056.

      The qSIP calculations reported in the methodology for this work are rather superficial and the reader must be experienced in this technique to understand how the incorporators were identified and their growth quantified. For instance, the GC content of taxa was calculated for reads clustered in OTUs, and it is not discussed in the text the validity of such approach working at genus level.

      We have added the description of qSIP calculations in Supplementary Materials.

      The approach of GC content calculation can be used at genus level (Koch et al., 2018). The GC content of each bacterial taxon (Gi) was calculated using the mean density for the unlabeled (WLIGHTi) treatments (Hungate et al. 2015), rather than OTU sequence information. We have revised the sentence in MATERIALS AND METHODS, that is “the number of 16S rRNA gene copies per OTU taxon (e.g., genus or OTU) in each density fraction was calculated by multiplying the relative abundance (acquisition by sequencing) by the total number of 16S rRNA gene copies (acquisition by qPCR)” (Line 255-258).

      Reference:

      Hungate, B., Mau, R., Schwartz, E., Caporaso, J., Dijkstra, P., Van Gestel, N. et al. (2015). Quantitative microbial ecology through stable isotope probing. Applied and Environmental Microbiology, 81, 7570-7581.

      Koch, B., McHugh, T., Hayer, M., Schwartz, E., Blazewicz, S., Dijkstra, P. et al. (2018). Estimating taxon-specific population dynamics in diverse microbial communities. Ecosphere, 9, e02090.

      The selection of V4-V5 region over V3-V4 region to quantify the number of copies of the 16S rRNA gene should be substantiated in the text. Classic works determined one decade ago that primer pairs that amplify V3-V4 are most suitable to assess soil bacterial communities. Hungate et al. (2015), worked with the V3-V4 region when establishing the qSIP method. Maybe the number of unassigned OTUs is related with the selection of this region.

      Both primer sets (V3-V4 and V4-V5 regions), are widely used across various sample sets, with highly similar in representing the total microbial community composition (Fadeev et al., 2021; Zhang et al., 2018).

      A previous study based on our Field Research Station of Alpine Grassland Ecosystem used V4-V5 primer pairs to investigated the effect of warming and altered precipitation on the overall bacterial community composition (Zhang et al., 2016).

      Another reason for choosing the V4-V5 primer set in this study was to integrate and compare the data with that of two previous qSIP studies (Ruan et al., 2023; Guo et al., submitted), both of them focused on the growth responses of active species to global change and used V4-V5 primer pairs.

      We have added an explanation about primer selection as “The V4-V5 primer pairs were chosen to facilitate integration and comparison with data from previous studies (Ruan et al., 2023; Zhang et al., 2016)” (Line 213-215).

      Reference:

      Fadeev, E., Cardozo-Mino, M.G., Rapp, J.Z. et al. (2021). Comparison of Two 16S rRNA Primers (V3–V4 and V4–V5) for Studies of Arctic Microbial Communities. Frontiers in Microbiology, 12

      Zhang, J.Y., Ding, X., Guan, R. et al. (2018). Evaluation of different 16S rRNA gene V regions for exploring bacterial diversity in a eutrophic freshwater lake. Science of The Total Environment, 618, 1254-1267.

      Zhang, K.P., Shi, Y., Jing, X. et al. (2016). Effects of Short-Term Warming and Altered Precipitation on Soil Microbial Communities in Alpine Grassland of the Tibetan Plateau. Frontiers in Microbiology, 7, 1-11.

      Ruan, Y., Kuzyakov, Y., Liu, X. et al. (2023). Elevated temperature and CO2 strongly affect the growth strategies of soil bacteria. Nature Communications, 14, 1-12.

      Guo, J.J., Kuzyakov, Y., Li, L. et al. (2023). Bacterial growth acclimation to long-term nitrogen input in soil. The ISME Journal, Submitted.

      Report of preprocessing and processing of the sequences does not comply state of the art standards. More info on how the sequences were handled is needed, taking into account that a significant part of the manuscript relies on taxonomic classification of such sequences. Also, an OTU approach for an almost species-dependent analysis (GC contents) should be replaced or complemented with an ASV or subOTUs approach, using denoisers such as DADA2 or deblur. Usage of functional prediction tools underestimates gene frequencies, including those related with biogeochemical significance for soil-carbon and nitrogen cycling.

      (1) We have complemented the information about sequence processing as “The raw sequences were quality-filtered using the USEARCH v.11.0 (Edgar, 2010). In brief, the paired-end sequences were merged and quality filtered with “fastq_mergepairs” and “fastq_filter” commands, respectively. Sequences < 370 bp and total expected errors > 0.5 were removed. Next, “fastx_uniques” command was implemented to remove redundant sequences. Subsequently, high-quality sequences were clustered into operational taxonomic units (OTUs) with “cluster_otus” commandat a 97% identity threshold, and the most abundant sequence from each OTU was selected as a representative sequence.” (Line 238-245).

      (2) We have complemented the zero-radius OTU (ZOTU) analysis by the unoise3 command in USEARCH (https://drive5.com/usearch/manual/pipe_otus.html), as shown in Fig. S1-S2. The results showed that overall growth responses of soil bacteria to warming and precipitation changes were similar based on OTU and ZOTU analyses, i.e., warming and altered precipitation tend to negatively affect the growth of grassland bacteria and the prevalence of antagonistic interactions of T × P. The similarity of results between the different methods is reflected at the overall community level, the phylum level, the genus level and the species (i.e., OTU or ZOTU) level (Fig. S1 and S2).

      Author response image 1.

      The growth responses of grassland bacteria to warming and altered precipitation based on ZOTU analysis. The results of growth rates at the community level (A), the phylum level (B), and the ZOTU level (C and D) were similar to those based on OTU analysis. C the single and combined factor effects of climate factors on species growth, by comparing with the growth rates in T0nP. D the proportions of species growth influenced by different interaction types of T × P. T0-P represents the ambient temperature and decreased precipitation; T+-P represents warming and decreased precipitation; T0cP represents ambient temperature and precipitation; T+cP represents warming and ambient precipitation; T0+P represents ambient temperature and enhanced precipitation; T++P represents warming and enhanced precipitation. Values represent mean and the error bars represent standard deviation. Different letters indicate significant differences between climate treatments.

      Author response image 2.

      The growth responses of grassland bacteria at the genus level to warming and altered precipitation based on OTU analysis (A and C) and ZOTU analysis (B and D). A and B the single and combined factor effects of climate factors on growth in genera, by comparing with those in T0nP. C and D the proportions of genera whose growth influenced by different interaction types of T × P.

      (3) Agreed. We have introduced the caveat about the limitation of usage of functional prediction tools to the end of DISCUSSION, that is “This is, however, still to be verified, as the functional output from PICRUSt2 is less likely to resolve rare environment-specific functions (Douglas et al., 2020)” (Line 540-542). The caveat ensures that the reader knows the limitations of these methods, and are not potentially overstate our findings.

      Reference:

      Douglas, G.M., Maffei, V.J., Zaneveld, J.R. et al. (2020) PICRUSt2 for prediction of metagenome functions. Nat Biotechnol. 38, 685–688.

      Reviewer #2 (Recommendations For The Authors):

      General suggestions:

      Regarding the qSIP method, would be of help to see the differences in density vs number of 16S rRNA gene abundance for the most responsive bacterial groups in the different treatments, taking into account that with only 7 fractions the entire change in bacterial growth was resolved.

      We have selected three representative bacterial taxa (OTU1 belonging to Bradyrhizobium, OTU14 belonging to Solirubrobacter, OTU15 belonging to Pseudoxanthomonas), which have high growth rates in climate change treatments. The result showed that the peaks in the 18O treatment are shifted "backwards" (greater average weighted buoyancy density) compared to the 16O treatment, indicating that these species assimilates the 18O isotope into the DNA molecules during growth.

      Author response image 3.

      The distribution of 16S rRNA gene abundance of three representative bacterial taxa (OTU1- Bradyrhizobium, OTU14-Solirubrobacter, and OTU15-Pseudoxanthomonas) in different buoyant density fractions. Values represent mean and the error bars represent standard deviation.

      Seven fractionated DNA samples were selected for sequencing because they contained more than 99% gene copy numbers of each samples (please see the Figure below). The DNA concentrations of other fractions were too low to construct sequencing libraries.

      Author response image 4.

      Relative abundance of 16S rRNA gene copies in each fraction. The fractions with density between 1.703 and 1.727 g ml-1 were selected because they contained more than 99% gene copy numbers of each sample. T0-P represents the ambient temperature and decreased precipitation; T+-P represents warming and decreased precipitation; T0cP represents ambient temperature and precipitation; T+cP represents warming and ambient precipitation; T0+P represents ambient temperature and enhanced precipitation; T++P represents warming and enhanced precipitation. Values represent mean and the error bars represent standard deviation.

      With such dataset additional multivariate analysis would be of help to better interpret the ecological framework.

      Thanks for the suggestion. Interpreting the ecological framework is meaningful for understanding microbial responses to environmental changes.

      The main objective of this study is to investigate the growth response of soil microbes in alpine grasslands to the temperature and precipitation changes, and the interaction between climate factors. Our results, as well as the results of complementary analyses (based on subOTU analyses, SHOWN BELOW), indicate that warming and altered precipitation tend to negatively affect the growth of grassland bacteria, and the prevalence of antagonistic interactions of T × P.

      We have emphasized our research objectives and main conclusions in the revised manuscript: “The goal of current study is to comprehensively estimate taxon-specific growth responses of soil bacteria following a decade of warming and altered precipitation manipulation on the alpine grassland of the Tibetan Plateau” (Line 112-114);

      “Our results demonstrated that both warming and altered precipitation negatively affect the growth of grassland bacteria; However, the combined effects of warming and altered precipitation on the growth of ~70% soil bacterial taxa were smaller than the single-factor effects, suggesting antagonistic interaction” (Line 552-556).

      Extension of interaction analysis and its conclusions should be shortened, summarizing the most relevant findings. In my opinion, it becomes a bit redundant.

      We have shortened the discussion of Extension of interaction analysis by deleting the little relevant contents.

      Below are some, but not all, examples that have been deleted or revised in the Discussion,

      (1) Deleted “This result supports our second hypothesis that the interactive effects between warming and altered precipitation on soil microbial growth are not simply additive”;

      (2) Deleted “A previous study suggested that multiple global change factors had negative effects on soil microbial diversity (Yang et al., 2021)”;

      (3) Revised “A meta‐analysis of experimental manipulation revealed that the combined effects of different climate factors were usually less than expected additive effects, revealing antagonistic interactions on soil C storage and nutrient cycling processes (Dieleman et al., 2012; Wu et al., 2011). Moreover, two experimental studies on N cycling and net primary productivity demonstrated that the majority of interactions among multiple factors are antagonistic rather than additive or synergistic, thereby dampening the net effects (Larsen et al., 2011; Shaw et al., 2002)” to “A range of ecosystem processes have been revealed to be potentially subject to antagonistic interactions between climate factors, for instance, net primary productivity (Shaw et al., 2002), soil C storage and nutrient cycling processes (Dieleman et al., 2012; Wu et al., 2011; Larsen et al., 2011)” (Line 499-503);

      (4) Revised “Previous evidences suggest that warming has a negative impact on soil carbon pools (Jansson & Hofmockel, 2020; Purcell et al., 2022). During the first phase of soil warming (~ 10 years), microbial activity increased, resulting in rapid soil carbon mineralization and respiration (Melillo et al., 2017)” to “Previous evidences suggest that warming has a negative impact on soil carbon pools (Jansson & Hofmockel, 2020; Purcell et al., 2022), mainly because of the rapid soil carbon mineralization and respiration (Melillo et al., 2017)” (Line 464-466).

      I strongly suggest a functional analysis based on shotgun sequencing or RNAseq approaches. With this approach this work would be able to answer who is growing under altered T and Precipitation regimes and what are those that are growing doing.

      Thanks for the suggestion. Metagenomic sequencing is a popular approach to evaluate potential functions of microbial communities in environment. However, there are two main reasons that limit the application of metagenomic or metatranscriptomic sequencing in this study: 1) Most of the fractionated samples in SIP experiment have low DNA concentration and do not meet the requirement of library construction for sequencing; 2) Metagenome and metatranscriptomics usually have relatively low sensitivity to rare species, reducing the diversity of detected active species.

      This study focused on active microbial taxa and their growth in response to multifactorial climate change. We have added the prospect in DISCUSSION, that is “This suggests the development of methods combining qSIP with metagenomes and metatranscriptomes to assess the functional shifts of active microorganisms under global change scenarios” (Line 542-544).

      Minor suggestions:

      L121. _As

      We have deleted this sentence and relocated the hypotheses in the last paragraph of INTRODUCTION (according to the suggestion of the reviewer #3).

      Line150. Described previously in.

      Done (Line 136).

      Line500. Check whether it is better to use the word acclimatization (Coordinated response to several simultaneous stressors) in exchange of acclimation

      We have revised it according to this suggestion (Line 481).

      Fig.4C Drought

      Done (Line 761).

      Reviewer #3 (Public Review):

      Summary:

      In this paper, Ruan et al. studied the long-term impact of warming and altered precipitations on the composition and growth of the soil microbial community. The researchers adopted an experimental approach to assess the impact of climate change on microbial diversity and functionality. This study was carried out within a controlled environment, wherein two primary factors were assessed: temperature (in two distinct levels) and humidity (across three different levels). These factors were manipulated in a full factorial design, resulting in a total of six treatments. This experimental setup was maintained for ten years. To analyze the active microbial community, the researchers employed a technique involving the incorporation of radiolabeled water into biomolecules (particularly DNA) through quantitative stable isotope probing. This allowed for the tracking of the active fraction of microbes, accomplished via isopycnic centrifugation, followed by Illumina sequencing of the denser fraction. This study was followed by a series of statistical analysis to identify the impact of these two variables on the whole community and specific taxonomic groups. The full factorial design arrangement enabled the researchers to discern both individual contributions as well as potential interactions among the variables

      Strengths:

      This work presents a timely study that assesses in a controlled fashion the potential impact of global warming and altered precipitations on microbial populations. The experimental setup, experimental approach and data analysis seem to be overall solid. I consider the paper of high interest for the whole community as it provides a baseline to the assessment of global warming on microbial diversity.

      Thanks for the encouragement and positive comments.

      Weaknesses:

      While taxonomic information is interesting, it would have been highly valuable to include transcriptomics data as well. This would allow us to understand what active pathways become enriched under warming and altered precipitations. Non-metabolic OTUs hold significance as well. The authors could have potentially described these non-incorporators and derived hypotheses from the gathered information. The work would have benefited from using more biological replicates of each treatment.

      Thanks for the valuable suggestions.

      (1) Metatranscriptomics can assess the functional profiles of the community, but it has relatively low sensitivity to rare species, which is difficult to correlate the function pathways with the assignment to the numerous active taxa identified by qSIP. Additionally, due to the low DNA concentration, most fractionated samples are difficult to construct sequencing libraries, while amplicon based sequencing analyses were allowed. This study therefore focused on active microbial taxa and their growth in response to multifactorial climate change. We have added the prospect in DISCUSSION, that is “This suggests the development of methods combining qSIP with metagenomes and metatranscriptomes to assess the functional shifts of active microorganisms under global change scenarios” (Line 542-544).

      (2) 18O-qSIP can identify the growing microbial species (i.e., 18O incorporators) in the environment rather than metabolically active taxa. These non-incorporators in our study were likely to be metabolically active, i.e., maintaining life activities without reproduction, or recently deceased (Blazewicz et al., 2013). Therefore, it is hard to distinguish whether these non-incorporators possess metabolic activity.

      (3) Agreed. The qSIP experiments involve the use of isotopes and the sequencing of a large number of DNA samples (90 samples per biological replicate in this study). Considering its high cost, we selected three replicates for analysis. We have explained this issue in MATERIALS AND METHODS, that is “Considering the cost of qSIP experiment (i.e., the use of isotopes and the sequencing of a large number of DNA samples), we randomly selected three out of the six plots, serving as three replicates for each treatment” (Line 154-157).

      Reference:

      Nuccio, E.E., Starr, E., Karaoz, U. et al. (2020) Niche differentiation is spatially and temporally regulated in the rhizosphere. ISME J 14, 999–1014.

      Blazewicz, S.J., Barnard, R.L., Daly, R.A., Firestone, M.K (2013). Evaluating rRNA as an indicator of microbial activity in environmental communities: limitations and uses. The ISME Journal, 7, 2061–2068.

      Reviewer #3 (Recommendations For The Authors):

      Major comments:

      The manuscript should be written in a clearer way. The language should be more direct, so the message is conveyed faster and clearer. Some sentences, for instance, could be shortened or re-organized. Below, you will find some examples.

      We have rewritten the sentences to make the manuscript clearer. Below are some, but not all, examples that have been revised:

      (1) Deleted “(reduced precipitation, hereafter ‘drought’, or enhanced precipitation, hereafter ‘wet’)” in INTRODUCTION;

      (2) Deleted “Controlled experiments simulating climate change have investigated changes in microbial community composition as measured by shifts in the relative abundances (Evans & Wallenstein, 2014; Barnard et al., 2015). However, changes in relative abundances may be poor indicators of population responses to environmental change in some cases (Blazewicz et al., 2020). Another challenge is the presence of a large number of inactive microbial cells in the soil, which hinders the direct, quantitative measure of the ecological drivers in population dynamics (Fierer, 2017; Lennon & Jones, 2011).” in DISCUSSION;

      (3) Deleted “This result supports our second hypothesis that the interactive effects between warming and altered precipitation on soil microbial growth are not simply additive” in DISCUSSION;

      (4) Deleted “A previous study suggested that multiple global change factors had negative effects on soil microbial diversity (Yang et al., 2021)” in DISCUSSION;

      (5) Revised “A meta‐analysis of experimental manipulation revealed that the combined effects of different climate factors were usually less than expected additive effects, revealing antagonistic interactions on soil C storage and nutrient cycling processes (Dieleman et al., 2012; Wu et al., 2011). Moreover, two experimental studies on N cycling and net primary productivity demonstrated that the majority of interactions among multiple factors are antagonistic rather than additive or synergistic, thereby dampening the net effects (Larsen et al., 2011; Shaw et al., 2002)” to “A range of ecosystem processes have been revealed to be potentially subject to antagonistic interactions between climate factors, for instance, net primary productivity (Shaw et al., 2002), soil C storage and nutrient cycling processes (Dieleman et al., 2012; Wu et al., 2011; Larsen et al., 2011)” in DISCUSSION (Line 499-503);

      (6) Revised “Previous evidences suggest that warming has a negative impact on soil carbon pools (Jansson & Hofmockel, 2020; Purcell et al., 2022). During the first phase of soil warming (~ 10 years), microbial activity increased, resulting in rapid soil carbon mineralization and respiration (Melillo et al., 2017)” to “Previous evidences suggest that warming has a negative impact on soil carbon pools (Jansson & Hofmockel, 2020; Purcell et al., 2022), mainly because of the rapid soil carbon mineralization and respiration (Melillo et al., 2017)” in DISCUSSION (Line 464-466).

      I'm curious about why, even though there were six replicates of the experiment, only three samples were collected for analysis. Metagenomic analyses tend to display high variability.

      The qSIP experiments involve the use of isotopes and the sequencing of a large number of DNA samples (90 samples per biological replicate in this study). Considering its high cost, we selected three replicates for analysis..

      In Fig. 3A, the absolute growth rates (16S copies/d*g) are shown. How do you know that the efficiency of DNA extraction was similar across all treatments and therefore the absolute numbers are comparable?

      To avoid differences in extraction efficiency caused by experimental procedures, all DNA samples were extracted by the same person (the first author) within 2-3 hours, and a unifying procedure of cell lysis and DNA extraction was used, i.e., the mechanical cell destruction was attained by multi-size beads beating at 6 m s-1 for 40 s, and then FastDNA™ SPIN Kit for Soil (MP Biomedicals, Cleveland, OH, USA) was used for DNA extraction.

      We have measured the concentration of extracted DNA and found no significant difference between treatments (Table for the response letter).

      Author response table 1.

      Soil DNA concentration in climate change treatments after qSIP incubation (measured by Qubit® DNA HS Assay Kits).

      Values represent mean and standard deviation. T0-P represents the ambient temperature and decreased precipitation; T+-P represents warming and decreased precipitation; T0cP represents ambient temperature and precipitation; T+cP represents warming and ambient precipitation; T0+P represents ambient temperature and enhanced precipitation; T++P represents warming and enhanced precipitation. The results of ANOVA indicated no significant difference of extracted DNA concentration between treatments (p > 0.05).

      We have introduced the caveat in the DISCUSSION, that is “Note that the experimental parameters such as DNA extraction and PCR amplification efficiencies also have significant effects on the accuracy of growth assessment. This alerts the need to standardize experimental practices to ensure more realistic and reliable results” (Line 544-547).

      Line 96-99 and 121-124: "Hypotheses are typically placed at the end of the final paragraph in the Introduction section. It is advisable to relocate them there and provide a clearer description of the paper's main goal."

      We have relocated the hypotheses at the end of INTRODUCTION, and the main goal of this study, that is “The goal of current study is to comprehensively estimate taxon-specific growth responses of soil bacteria following a decade of warming and altered precipitation manipulation on the alpine grassland of the Tibetan Plateau, by using the 18O-quantitative stable isotope probing (18O-qSIP)” (Line 112-115).

      Line 399: Although you describe the classification among antagonistic interactions in the Methods section, I think you should describe this in further detail here. Can you clarify how you carried out this categorization and how these results were interpreted considering the phylogenetic classification.

      We have added the description of antagonistic interactions, that is “The interaction type of T × P on the growth of ~70% incorporators was antagonistic (i.e., the combined effect size is smaller than the additive expectation) (Fig. 4C)” (Line 388-390).

      The interaction types between factors can be classified into three categories: additive, synergistic and antagonistic. Additive interactions are those in which the combined effect size of factors is equal to the sum of the single effects of the factors (i.e., additive expectation, Fig. 1B). Synergistic interactions refer to the effect size was larger than the additive expectation by the combined manipulation of factors. On the contrary, antagonistic interactions refer to the combined effect size of factors is smaller than the additive expectation. In this study, the antagonistic interactions were further divided into three sub-categories: weak antagonistic interaction, strong antagonistic interaction, and neutralizing effect (Fig. 1B). The weak antagonistic interaction refers to the combined effect size smaller than the additive expectation and larger than any of the single factor effects. The strong antagonistic interaction refers to that the combined effect size is smaller than any of the single factor effects but larger than 0. The neutralizing effect refers to that the combined effect size is equal to 0, implying that the effects of different factors cancel each other out.

      Methodologically, the single and combined effects of two climate factors and their interaction effects were calculated by the natural logarithm of response ratio (lnRR) and Hedges’ d, respectively (Yue et al., 2017).

      We have added the result interpretation about the phylogenetic distribution patterns of incorporators, that is “The degree of phylogenetic relatedness can indicate the processes that influenced community assembly, like the extent a community is shaped by environmental filtering (clustered by phylogeny) or competitive interactions (life strategy is phylogenetically random distribution) (Evans & Wallenstein, 2014; Webb et al., 2002).The results showed that the incorporators whose growth was influenced by the antagonistic interaction of T × P showed significant phylogenetic relatedness, indicating the occurrence of taxa more likely shaped by environment filtering (i.e., selection pressure caused by changes in temperature and moisture conditions). In contrast, the growing taxa affected by synergistic interactions of T × P showed random phylogenetic distributions (Table S1), which may be explained by competition between taxa with similar eco-physiological traits or changes in genotypes (possibly through horizontal gene transfer) (Evans & Wallenstein, 2014). We also found that the extent of phylogenetic relatedness to which taxa groups of T × P interaction types varied by climate scenarios, suggesting that different climate history processes influenced the ways bacteria survive temperature and moisture stress” (Line 515-529).

      Reference:

      Evans, S.E. & Wallenstein, M.D. (2014). Climate change alters ecological strategies of soil bacteria. Ecology Letters, 17, 155-164.

      Webb, C.O., Ackerly, D.D., McPeek, M.A. & Donoghue, M.J. (2002). Phylogenies and Community Ecology. Annual Review of Ecology and Systematics, 33, 475-505.

      Yue, K., Fornara, D.A., Yang, W., Peng, Y., Peng, C., Liu, Z. et al. (2017). Influence of multiple global change drivers on terrestrial carbon storage: additive effects are common. Ecology Letters, 20, 663-672.

      Line 407-8: What do you mean with "...clustered at the phylogenetic branches" and Line 410: "cluster near the tips of the phylogenetic tree". Can you please clarify?

      Sorry for the unclear statement. We have added the explanation of NTI, that is “Nearest taxon index (NTI) was used to determine whether the species in a particular growth response are more phylogenetically related to one another than to other species (i.e., close or clustering on phylogenetic tree). NTI is an indicator of the extent of terminal clustering, or clustering near the tips of the tree (Evans & Wallenstein, 2014; Webb et al., 2002)” (Line 397-401).

      Reference:

      Evans, S.E. & Wallenstein, M.D. (2014). Climate change alters ecological strategies of soil bacteria. Ecology Letters, 17, 155-164.

      Webb, C.O., Ackerly, D.D., McPeek, M.A. & Donoghue, M.J. (2002). Phylogenies and Community Ecology. Annual Review of Ecology and Systematics, 33, 475-505.

      Could you provide some info about the biochemistry of the incorporation of heavy water into DNA molecules? What specific enzymes are typically involved?

      Due to the low DNA concentration in most fractionated samples (less than 10 ng/μL, measured by Qubit DNA HS Assay Kits), only amplicon based sequencing analyses were allowed. This study therefore focused only on active microbial taxa and their growth in response to multifactorial climate change.

      What might be the impact of soil desiccation on bacterial survival and subsequent water uptake?

      Slow dehydration and air drying of soil is a very common phenomenon in nature (Koch et al., 2018). In this process, microorganisms will reduce metabolism, and shift towards a potentially active state (Blagodatskaya and Kuzyakov, 2013). A previous study suggested that the potentially active microbial population permanently existing in soil between the active and dormant physiological states. Even under long-term starvation the potentially active microorganisms maintain ‘physiological alertness’ to be ready to occasional substrate input (Blagodatskaya and Kuzyakov, 2013). These microorganisms are important participants in the biogeochemical cycle is the focus of this study.

      Replacing the environmental water in the soil with 18O-labelled water is a typical practice for qSIP studies (Hungate et al. 2015; Koch et al., 2018). This process may cause disturbance to the microbial community. In this study, the soil samples were placed in a thermostatic incubator (14℃ and 16℃), rather than air-drying at 25℃ (as used in most studies). The incubation temperature is relatively low (compared to 25℃) and there is no violent air convection in the incubator, resulting slower evaporation and no significant discoloration caused by severe soil dehydration after 48 h. The process of soil drying in this study simulated the natural phenomenon, i.e., slow water loss in soil.

      We have added the description in MATERIALS AND METHODS, that is “There is no violent air convection in the incubator and the incubation temperature is relatively low (compared to 25℃ used in previous studies), resulting slower evaporation and no significant discoloration caused by severe soil dehydration after 48 h” (Line 171-174).

      Reference:

      Blagodatskaya, E. & Kuzyakov, Y. (2013) Active microorganisms in soil: Critical review of estimation criteria and approaches. Soil Biology and Biochemistry, 67, 192-211.

      Hungate, B., Mau, R., Schwartz, E., Caporaso, J., Dijkstra, P., Van Gestel, N. et al. (2015). Quantitative microbial ecology through stable isotope probing. Applied and Environmental Microbiology, 81, 7570-7581.

      Koch, B., McHugh, T., Hayer, M., Schwartz, E., Blazewicz, S., Dijkstra, P. et al. (2018). Estimating taxon-specific population dynamics in diverse microbial communities. Ecosphere, 9, e02090.

      The analysis of the 180 incorporators is interesting as it defines what microbes are metabolically active and hence growing under the different conditions tested. Should not be worth to analyze the non-incorporators? Is it possible to identify a pattern to generate a hypothesis of why they are metabolically inactive based on this information? In the Methods section, the authors state that they identified a total of 6,938 OTUs, of which only 1,373 were found to be incorporators.

      Microbes exist in a range of metabolic states: growing, active (non-growth), dormant and recently deceased (Blazewicz et al., 2013), and there is still a lack of clear threshold for their identification. 18O-DNA qSIP can identified the growing microbial species (i.e., 18O incorporators) rather than all metabolic active taxa, because some cells are measurably metabolizing (catabolic and/or anabolic processes) without reproduction. Therefore, the non-incorporators in our study may be metabolically active, or not (recently deceased microorganisms). This study focuses on the growing microorganisms identified by 18O-qSIP.

      In this study, ~20% microbial taxa (1,373/6,938) were identified as 18O incorporators. Microorganisms in soils suffer from resource and energy constraints frequently (Blagodatskaya and Kuzyakov, 2013). The energy requirements of species in the growing state are much higher (~30 fold) than those in the non-growing state, so the percentage of growing bacterial taxa in soil tends to be low.

      Reference:

      Blazewicz, S.J., Barnard, R.L., Daly, R.A., Firestone, M.K (2013). Evaluating rRNA as an indicator of microbial activity in environmental communities: limitations and uses. The ISME Journal, 7, 2061–2068.

      Blagodatskaya, E. & Kuzyakov, Y. (2013) Active microorganisms in soil: Critical review of estimation criteria and approaches. Soil Biology and Biochemistry, 67, 192-211.

      Minor comments:

      Fig. 3A and 3B. Please show the results of the multiple comparisons.

      Done.

      Author response image 5.

      Bacterial growth responses to climate change and the interaction types between warming and altered precipitation. The growth rates (A), and responses (LnRR) of soil bacteria to warming and altered precipitation (B) at the whole community level. The growth rates (C), and responses of the dominant bacterial phyla (D) had similar trends with that of the whole community. Values represent mean and the error bars represent standard deviation. Different letters indicate significant differences between climate treatments.

      Fig. 4. This figure should be self-explanatory. This diagram is challenging to understand.

      We have revised Fig. 4 to improve clarity.

      Author response image 6.

      The growth responses and phylogenetic relationship of incorporators subjected to different interaction types under two climate scenarios. A phylogenetic tree of all incorporators observed in the grassland soils (A). The inner heatmap represents the single and combined factor effects of climate factors on species growth, by comparing with the growth rates in T0nP. The outer heatmap represents the interaction types between warming and altered precipitation under two climate change scenarios. The proportions of positive or negative responses in species growth to single and combined manipulation of climate factors by summarizing the data from the inner heatmap (B). The proportions of species growth influenced by different interaction types of T × P by summarizing the data from the outer heatmap (C).

      Fig. 4. It says "Dorought" instead of "drought"

      Done (Line 760).

      Line 109: "relieves" instead of "relieved"

      Done (Line 102).

      Line 129: Should be: "We classified the interaction types as additive, synergistic, antagonistic, null and neutralizing."

      Done (Line 117).

      Line 233: How were the 16S rRNA sequences from each density fraction analyzed?

      (1) Raw sequencing data processing:

      The raw 16S rRNA gene sequences of each density fraction were quality-filtered using the USEARCH v.11.0 (Edgar, 2010). The paired-end sequences were merged and quality filtered with “fastq_mergepairs” and “fastq_filter” commands, respectively. Sequences < 370 bp and total expected errors > 0.5 were removed. Next, “fastx_uniques” command was implemented to identify the unique sequences. Subsequently, high-quality sequences were clustered into operational taxonomic units (OTUs) with “cluster_otus” commandat a 97% identity threshold, and the most abundant sequence from each OTU was selected as a representative sequence. The taxonomic affiliation of the representative sequence was determined using the RDP classifier (Wang et al., 2007).

      (2) qSIP calculation:

      Sequencing data reflects the relative abundance of taxa in community. We multiply the OTU’s relative abundance (acquisition by sequencing) and the number of 16S rRNA gene copies (acquisition by qPCR) to obtain the number of gene copies per OTU in each fraction. Then, the proportion of gene copies of a specific OTU of each fraction relative to the total amount of gene copies in one sample was calculated and used as a weight value for further calculation of the average weighted buoyant density (the critical parameter for assessing microbial growth).

      Line 366: "Three single-factor ... between warming and altered precipitation" -> "The individual impact of warming, drought, and wet conditions resulted in the most substantial negative effects on bacterial growth compared with the effects of warming x drought and warming x wet. A result that illustrates the negative interactions between warming and modified precipitations patterns."

      Done (Line 365-368).

      Line 376: "Similar with the result of whole growth of bacteria community, the growth responses of the major bacterial phyla were also negatively influenced by single climate factors". This sentence is hard to read. Maybe something like this: "Growth of the major bacterial phyla was also negatively influenced by the individual climate factors".

      Done (Line 371-372).

      Line 383: "In particular, the effects of wet and warming neutralized each other, resulting the net effects became zero on the growth rates of the phyla Actinobacteria and Bacteroidetes". "In Actinobacteria and Bacteroidetes, the effect of wet and warming neutralized each other, as the combined effect of these two factors had no effect on growth".

      Done (Line 377-379).

      Line 390: "The individual warming treatment (T+nP) reduced the growth rates of 75% incorporators..." "Warming (T+nP) reduced the growth of 75% of the taxonomic groups, which was followed by drought and wet.

      Done (Line 384-385).

      Line 392: "The combined manipulations of warming and altered precipitation lowered the percentages of incorporators with negative responses compared with single factor manipulation, especially warming and enhanced precipitation manipulation" -> "Warming x drought and warming x wet had a smaller impact on the growth of incorporators, compared with single effects."

      Done (Line 385-387).

      Line 468. This sentence "To the best ..." is not necessary.

      We have deleted this sentence.

      Line 476. Is it really "synthesis" the word you want to use?

      We have deleted this sentence.

      Line 477. Maybe should written like this: "Consistent with our findings, a recent experimental study demonstrated that 15 years of warming reduced the growth rate of soil bacteria in a montane meadow in northern Arizona."

      Done (Line 459-461).

      Line 490 and 502. Consider using "however" only once in a paragraph.

      We have deleted the second “however” (Line 483).

      Line 555-559. Based on genomic data you cannot predict the functional role of microbes in the environment. These sentences are speculative. Please, consider using less strong affirmations and focus more on the pathways that are enriched in the incorporators.

      Agreed. We have deleted this part of content.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their careful, critical, and insightful evaluation of our manuscript.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The preprint by Laganowsky and co-workers describes the use of mutant cycles to dissect the thermodynamic profile of specific lipid recognition by the ABC transporter MsbA. The authors use native mass spectrometry with a variable temperature source to monitor lipid binding to the native protein dimer solubilized in detergent. Analysis of the peak intensities (that is, relative abundance) of 1-3 bound lipids as a function of solution temperature and lipid concentration yields temperature-dependent Kds. The authors use these to then generate van't Hoff plots, from which they calculate the enthalpy and entropy contributions to binding of one, two, and in some cases, three lipids to MsbA.

      The authors then employ mutant cycles, in which basic residues involved in headgroup binding are mutated to alanine. By comparing the thermodynamic signatures of single and double (and in one instance triple) mutants, they aim to identify cooperativity between the different positions. They furthermore use inward and outward locking conditions which should control access to the different binding sites determined previously.

      The main conclusion is that lipid binding to MsbA is driven mainly by energetically favorable entropy increase upon binding, which stems from the release of ordered water molecules that normally coordinate the basic residues, which helps to overcome the enthalpic barrier of lipid binding. The authors also report an increase in lipid binding at higher temperatures which they attribute to a non-uniform heat capacity of the protein. Although they find that most residue pairs display some degree of cooperativity, particularly between the inner and outer lipid binding sites, they do not provide a structural interpretation of these results.

      Strengths:

      The use of double mutant cycles and mass spectrometry to dissect lipid binding is novel and interesting. For example, the observation that mutating a basic residue in the inner and one in the outer binding site abolishes lipid binding to a greater extent than the individual mutations is highly informative even without having to break it down into thermodynamic terms (see "weaknesses" section). In this sense, the method and data reported here opens new avenues for the structure/activity relationship of MsbA. The "mutant cycle" approach is in principle widely applicable to other membrane proteins with complex lipid interactions.

      Weaknesses:

      The use of double mutant cycles to dissect binding energies is well-established, and has, as the authors point out, been employed in combination with mass spectrometry to study protein-protein interactions. Its application to extract thermodynamic parameters is robust in cases where a single binding event is monitored, e.g. the formation of a complex with well-defined stoichiometry, where dissociation constants can be determined with high confidence. It is, however, complicated significantly by the fact that for MsbA-lipid interactions, we are not looking at a single binding event, but a stochastic distribution of lipids across different sites. Even if the protein is locked in a specific conformation, the observation of a single lipid adduct does not guarantee that the one lipid is always bound to a specific site. In some of the complexes detected by MS, the lipid is likely bound somewhere else. Lipid binding Kds from mass spectrometry, although helpful in some instances as a proxy for global binding affinities, should therefore be taken with a grain of salt.

      We agree with the reviewer in that while we will measure binding of lipid (mass shift) we do not know the binding location(s). Given this issue, we have added to the discussion section on this important point and elaborate more broadly on this problem in the context of membrane protein-lipid interactions. Tackling this issue represents a frontier challenge for the field.

      The authors analyze the difference in binding upon mutating binding sites (ddG etc). Here, another complicating factor comes into play, the fact that mutation of a binding site (which the authors show reduces lipid binding) may instead allow the lipid to bind to a lower-affinity site elsewhere. Unfortunately, the authors do not specify the protein concentration, but assuming it is in the single-digit micromolar range, as common for native MS experiments, lipid and protein concentrations are almost equal for most of the data points, resulting in competition between binding sites for free lipids. As a rule of thumb, for Kd measurements, the concentration of the constant component, the protein, should be far below the Kd, to avoid working in the "titration" regime rather than the "binding" regime (see Jarmoskaite et al, eLife 2020). I cannot determine whether this is the case here. The way I understand the double mutant cycle approach, reliable Kd measurements are required to accurately determine dH and TdS, so I would encourage the authors to confirm their Kd values using complementary methods before in-depth interpretations of the thermodynamic components.

      The reviewer references an article in eLife by Jarmoskaite and co-workers describing “titration” vs “binding” regimes. Below we paste a snippet from this article:

      Author response image 1.

      Equation 4a is an expression for the fraction of protein bound to ligand, which universally holds, i.e., if we know the concentration of molecules at equilibrium (including those unbound or free) then one can obtain the special ratio or equilibrium constant at a given temperature. Jarmoskaite et al. note that in practice (using traditional biophysical approaches) one cannot readily distinguish protein that is free or bound to ligand (see highlighted part above). While this assumption is basis of their eLife assessment, it does NOT apply to native mass spectrometry data. It is important to realize that the mole fraction (or concentration) of apo and each lipid bound states, i.e., [P], [PL], [PL2], …, [PLn+1], can readily be obtained directly from the deconvoluted mass spectrum. This is unlike other biophysical methods that are ensemble measurements, which measures the amount of heat or fraction of total ligand bound to protein. Since we can discern each lipid bound state, including the free protein and free ligand concentrations, the equilibrium binding constants can be directly calculated, and the protein and ligand concentration becomes irrelevant. In principle, equilibrium constants for protein-lipid interactions can be calculated from one mass spectrum. To increase transparency, we have updated the results section to highlight the important difference of the native MS approach compared to less robust traditional approaches that are riddled with underlying issues/assumptions.

      We appreciated the reviewer’s suggestion of using complementary methods to confirm Kd values. In our previous report [1], we determined binding thermodynamics for soluble protein-ligand interactions using native MS, surface plasmon resonance (SPR), and isothermal calorimetry (ITC) and found the techniques yield similar binding constants and thermodynamic parameters. The use of soluble proteins with defined ligand binding studies was rather straightforward to carry out a complementary study. We have also shown consistent findings for native MS and SPR of membrane protein interaction with a soluble, regulatory protein [2]. However, in the case of membrane proteins they can bind the first few lipids very specifically and, with the addition of more lipid, bind even more lipids that represent rather weak binding. Thus, traditional approaches would report on the ensemble of lipids bound to membranes and specific lipid binding sites (such as inner and outer LPS binding sites in MsbA) are saturable but also additional binding will be observed, i.e., doesn’t follow traditional soluble protein-ligand binding studies. In the past we have used a fluorescent-lipid competition binding assay [3] to corroborate native MS results for Kir3.2, which showed a direct correlation. The disadvantage of this complementary approach is using a non-natural, fluorescent-modified lipid. Unfortunately, there is no commercial source for a fluorophore modified KDL.

      It is somewhat counterintuitive that for many double mutants, and the triple mutant, the entropic component becomes more favorable compared to the WT protein. If the increase in entropy upon lipid binding comes from the release of ordered water molecules around the basic residues (a reasonable assumption) why does this apply even more in proteins where several basic residues have been changed to alanine, which coordinate far fewer water molecules?

      There are many factors that contribute to the change in entropy of the system, beyond solvation entropy, and deciphering the entropic contributions of the various components remains a challenging task. We have revised the manuscript to emphasize that solvation is one component of the entropic term and other components are likely at play.

      The authors could devote more attention to the fact that they use detergent micelles as a vehicle for lipid binding studies. To a limited extent, detergents compete with lipids for binding, and are present in extreme excess over the lipid. The micelle likely changes its behavior in response to temperature changes. For example, the packing around the protein loosens up upon heating, which may increase the chance for lipids to bind. In this case, the increase in binding at higher temperatures may not be related to a change in heat capacity. This question could be addressed by MD simulations, if it's not already in the literature.

      The detergent and its concentration are consistent for all the different MsbA proteins in this study. In fact, we observe linear van’t Hoff plots with positive and negative slopes as well as non-linear curves that are convex or concave. The MsbA protein (wt or mutant), trapped or not, all display unique temperature-dependent responses. The reviewers comment of increasing temperature to loosen packing of detergent to promote lipid binding is clearly NOT that simple. If detergent was significantly influencing lipid binding (as suggested by reviewer) then increasing its concentration should impact lipid binding. In a previous study, we found no difference in membrane protein-lipid thermodynamics even when the concentration of detergent was increased five-fold [1]. We repeated similar experiments for MsbA and find the increased detergent concentration does not impact the abundances of lipid bound states. The figure to the right shows MsbA in the presence of lipid in 2x CMC (panel a and b) and 10x CMC (panel c and d). As you will see, no appreciably difference in the lipid bound signal is observed.

      Author response image 2.

      We applaud the suggestion of MD simulation. However, it is far beyond the scope of this paper and its not clear what will really be learned.

      Reviewer #2 (Public Review):

      Summary:

      This is a solid study that dissects the thermodynamics of lipopolysaccharide (LPS) transporter MsbA and LPS. Native ESI-MS and the novel strategies developed by the authors were employed to quantify the affinities of LPS-MsbA interactions and its temperature dependence. Here, the equilibrium of lipid-protein interactions occurs in the micellar phase. The double-/triple-mutant cycle analysis and van't Hoff analysis allowed a full thermodynamic description of the lipid-protein interactions and the analysis of thermodynamic coupling between LPS binding sites. The most notable result would be that LPS-MsbA interaction is largely driven by entropy involving the negative heat capacity, a signature of the solvent reorganization effect (here authors attribute the solvent effect to "water" reorganization). The entropy driven lipid binding has been previously reported by the same authors for Kir1,2-PIP2 interactions.

      Strengths:

      1. This is overall a very thorough and rigorous study providing the detailed thermodynamic principles of LPS-MsbA interaction.

      2. The double and triple-mutant cycle approaches are newly applied to lipid-protein interactions, enabling detailed thermodynamics between LPS binding sites.

      3. The entropy-driven protein-lipid interaction is surprising. The binding seems to be mainly mediated by the electrostatic interaction between the positively charged residues on the protein and the negatively charged or polar headgroup of LPS, which could be thought of as "enthalpic" (making of a strong bond relative to that with solvent).

      Weaknesses:

      1. This study is a good contribution to the field, but it was difficult to find novel biological insights or methodological novelty from this study.

      1a. Thermodynamic analysis of lipid-protein interactions, an example of entropy-driven lipid-protein interactions, and the cooperativity between lipid binding sites have been reported by the author's group. Also, the cooperativity between binding sites in general have been reported from numerous studies of biomolecular interactions.

      We appreciate the reviewer for highlighting our previous work. Of course, a single study does not establish a pattern, such as entropy-driven lipid-protein interactions.

      While we agree with the reviewer that cooperativity in biomolecular interactions has been established for many soluble protein systems, by no means do we have a detailed understanding of membrane protein-lipid interactions. This work is an important contribution to expanding on classical work on soluble protein systems to more challenging membrane protein systems and their interactions with lipids.

      1b. It is not clear how this study provides new insights into the understanding of LPS transport mechanisms. Probably, authors could strengthen the Discussion by providing biological insights-how the residue coupling.

      The thermodynamics provides us with a deeper insight into the chemical principles that drive specific membrane protein-lipid interactions. We have revised the discussion to highlight the importance of thermodynamics and the implication of individual residues to KDL binding, and the inner and outer LPS binding sites appear to be coupled, something that is new.

      1. One to three LPS molecules bind to MsbA, but it is unclear whether bound KDL occupies inner or outer cavities, or both and how a specific mutation affects the affinity of specific LPS (i.e., to inner or to outer cavities). Based on the known structures, the maximal number of LPS is three. It is possible that the inner and outer cavities have different LPS affinities. Also, there can be multiple one-LPS-bound states, two-LPS-bound states if LPS strictly binds to the binding sites indicated by the structures. This aspect is beyond the scope of this study and difficult to address, but without this information, it seems hard to tell what is going on in the system.

      In our response above, we note that lipids will bind to membrane proteins at specific site(s) and weaker sites, often described as non-annular lipids. The revision includes this discussion point.

      1. If a single mutation is introduced to the inner cavity, its effect will be "doubled" because the inner cavity is shared by two identical subunits. This effect needs to be clarified in the result section.

      Great point. In addition, an outer mutant will also impact not one but both outer binding site(s)s. The revised manuscript makes note of this point.

      1. In the result section, "Mutant cycle analysis of KDL binding to vanadate-trapped MsbA.":

      4a. It seems necessary to show the mass spectra for Msb-ADP-vanadate complex as well as its lipid bound forms.

      In the original submission, the mass spectra of vanadate trapped MsbA with KDL binding was provided in Supplementary Figures 10 and 11.

      4b. The rationale of this section (i.e., what mechanistic insights can be obtained from this study) is unclear. For example, it is not sure what meaningful information can be obtained from a single type (ADP/vanadate) of the bound state regarding the ATP-driven function of MsbA.

      MsbA is a dynamic, populates different conformations. Trapping with vanadate locks the transporter in an outwardfacing state with NDB interacting. This provides the opportunity to characterize binding to the exterior site. We revised the manuscript to note this point.

      Reviewer #3 (Public Review):

      Summary:

      In this paper presented by Liu et al, native MS on the lipid A transporter MsbA was used to obtain thermodynamic insight into protein-lipid interactions. By performing the analyses at different lipid A concentrations and temperatures, dissociation constants for 2-3 lipid A binding sites were determined, as well as enthalpies were calculated using nonlinear van't Hoff fitting. Changes in free Gibb's energies were then calculated based on the determined dissociation constants, and together with the enthalpy values obtained via van' t Hoff analysis, the entropic contribution to lipid binding (DeltaS*T) was indirectly determined.

      Strengths:

      This is an extensive high quality native MS dataset that provides unique opportunities to gain insights into the thermodynamic parameters underlying lipid A binding. In addition, it provides coupling energies between mutations introduced into MsbA, that are implicated in lipid A binding.

      Weaknesses:

      The data all rely on the accuracy of determining KD values for lipid binding to MsbA. For the weaker binding sites, the range of lipid concentrations probed were in fact too low to generate highly accurate data. Another weakness is a lack of clear evidence, which KD values belong to which of the possible lipid A binding sites.

      See our detailed response to reviewer 1 regarding Kd determination using native MS compared to other techniques. We chose to focus on the first three lipid binding events and adjusted the concentrations accordingly to titrate these three. As noted above, the Kd values can be determined from one mass spectrum. For rigor, we include different titration points and fit sequential binding model to the data – the fits are shown in supplemental and quite reasonable.

      Regarding multiple lipids binding to different site(s), we have been able to distinguish high-affinity vs low-affinity PIP binding to Kir3.2 in a previous study [4]. This was apparent by the mole fraction curves for some lipid bound states not returning back to zero. We agree binding to multiple sites can be an issue. However, other techniques report on the ensemble of binding and, hence, no real useful information is obtained. Native MS enables one step in the right direction by dissecting the different lipid bound states. Future directions will need to further address this forefront question in the field, which we make point of now in discussion.

      Reviewer #1 (Recommendations For The Authors):

      Experiments/analysis: In short, there should be a proof of principle experiment that the thermodynamic constants determined by MS are accurate. Once that is done, the authors can add a more engaging structural interpretation of the results from the mutant cycles (which they seem to consciously avoid in the present manuscript?). How are cooperative residues coupled? Why?

      See our detailed response to reviewer 1 above.

      The manuscript is well-written, but Figures 3-5 are somewhat repetitive and require a lot of time to understand. Schematics of the main findings in each figure would help the uninitiated reader.

      We agree the illustrations are complex but there is rich data being shown.

      Figure 2 C contains an x-axis label error.

      Corrected.

      Reviewer #2 (Recommendations For The Authors):

      1. Lines 128-129: "Like other mutant cycle studies, we assume the single- and double-mutations do not disrupt binding at specific sites on MsbA."

      This statement is obscure and needs to be clarified. Does this mean that the mutations still allow binding of KDL, or the mutations do not disrupt the conformational integrity of the binding sites?

      This statement has been removed.

      1. Lines 137-139: "More specifically, R78 coordinates one of the characteristic phosphoglucosamine (P-GlcN) substituents of KDL whereas K299 interacts with a carboxylic acid group in the headgroup of KDL."

      Two identical subunits form a dimer interface that forms an LPS binding site. Thus, a single mutation on the inner cavity will disrupt two binding sites on LPS. One R78 to P-ClcN and the other to a sugar backbone. Also, one K299 interacts with a carboxylic acid group in the headgroup and the other to an unknown (not clear in the figure).

      Also noted above, mutation of the outer site will also impact the two outer sites. We have made note of this caveat.

      1. Lines 171-172: "leading to an increase in ΔG by ~4 kJ/mol (Fig. 2d)"

      Relative to what?

      Corrected.

      1. Lines 172-173: "Mutant cycle analysis indicates a coupling energy (ΔΔGint) of 1.7 (plus minus) 0.4 kJ/mol that contributes to the stability of KDL-MsbA complex."

      The sign of DDG (DDH,DDS)_int is a bit confusing. I recommend that authors define the meaning of negative or positive sign of DDG_int (DDH,DDS) at this point. Here, a positive sign means favorable cooperation between the two mutated residues. Sometimes, researchers designate a positive cooperativity as a negative sign.

      The literature on mutant cycles does not appear to follow a consensus on the sign. Here, we have revised the manuscript to note positive sign means favorable cooperation and follow the formalism recently described by Horovitz, Sharon, and co-workers [5].

      1. Lines 182-185: "Enthalpy and entropy for KDL binding MsbA R188A was largely similar to the wild-type protein (Fig 3a). However, the R243A mutation resulted in an increase in entropy, compensated for by an increase in positive enthalpy (Fig 3a)."

      The thermodynamic parameters for R243A mutation change in a similar manner to WT and R188A. It is R238A, not R243A, whose DH-DS interplay shows a distinct pattern from WT. Please, reword this sentence.

      The sentence has been revised.

      1. Lines 252-253: Solvation of polar groups in aqueous solvent has been ascribed to positive heat capacities whereas negative for apolar solvation.

      This statement is not precise. More precisely, the collapse of apolar molecules from their solvated state leads to the negative "change" in heat capacity.

      The sentence has been corrected.

      1. Line 262-267: "These hydrophilic patches will be highly solvated, which will be desolvated upon binding lipids contributing favorably to entropy. In the case of MsbA, the selected lysine and arginine residues (based alpha carbon position) are separated by about 9 to 18 Å (PDB 8DMM). This distance could result in overlap of solvation shells that collectively contribute to the positive coupling enthalpy observed for MsbA-KDL interactions."

      This statement is too speculative without presenting the degree of solvation of the residues targeted for mutation. More quantitative arguments seem to be needed.

      We have removed the speculative statement.

      Reviewer #3 (Recommendations For The Authors):

      In this paper presented by Liu et al, native MS on the lipid A transporter MsbA was used to obtain thermodynamic insight into protein-lipid interactions. By performing the analyses at different lipid A concentrations and temperatures, dissociation constants for 2-3 lipid A binding sites were determined, as well as enthalpies were calculated using nonlinear van't Hoff fitting.

      Changes in free Gibb's energies were then calculated based on the determined dissociation constants, and together with the enthalpy values obtained via van' t Hoff analysis the entropic contribution to lipid binding (DeltaS*T) was indirectly determined.

      Correction – In the case on linear van’t Hoff plots, dH and dS were determined directly from the plot. For the nonlinear form of the van’t Hoff equation, which does not include an entropy fitting parameter, we back calculated dS using dH and dG at a given temperature.

      The authors then included single, double and triple mutants of residues known based on cryo-EM and X-ray structures to interact with Lipid A either in the large inward-facing cavity or at a secondary binding site accessible at the surface of outward-facing MsbA, and determined the thermodynamic parameters of these mutants alone and combined to gain access to coupling energies of pairwise interactions. This method has its roots in studying pair-wise interactions of protein-protein interfaces, generally known as thermodynamic mutant cycle analysis.

      Having the main expertise in ABC transporter structure-function, I will judge the paper mostly from the standpoint of what I can learn as a transporter expert from this study and whether the insights are of value for researchers with average biophysical knowledge.

      My overall impression of the manuscript is that, while it contains a wealth of experimental data using the innovative and unique method of native mass spectrometry, it is hard to understand what one can learn from this analysis beyond their interesting key finding that entropy plays an important role in lipid binding (but only at certain temperatures). In particular, the lessons learned from the coupling energy analysis of the introduced mutations is hard to grasp/digest for me with regards to what I can learn from these numbers (other than learning that there are such coupling effects).

      We agree the thermodynamic data is rich. Often a ddGint of zero is reported as having no coupling/significance but here the value is due to compensating ddH and d-dTS terms. In our view, this work forms the foundation of additional studies to better understand the coupling energetic terms, beyond ddGint.

      In some instances, the text/figure legends are a bit unclear or contain some typos; but this part can easily be handled in a revision. The discussion is well written and embeds the main findings in the (still rather limited) literature on thermodynamic analyses of lipid binding of membrane proteins.

      Major points

      1. The authors may have clarified the following point in a previous paper; but at least in this paper, it is unclear to me how they purified MsbA without lipid A. The reason I am asking is that in our experience, if one purifies MsbA expressed from E. coli with standard detergents (e.g. beta-DDM) one will find a perfect density for Lipid A when determining an inward-facing structure by cryo-EM. According to the Methods, MsbA is purified initially in DDM, and rebuffered to C10E5 during size exclusion chromatography. When looking at Fig. 2b, the authors state (or assume?) that if no lipid A is added, MsbA has 0 % lipid A bound.

      We have previously reported details of MsbA sample prep and optimization [6]. The revised manuscript makes note of this previous work and refers the reader to the publication. Yes, we see no appreciable signal for lipid A bound to MsbA (see Fig 2b).

      We also note that samples of MsbA prepared using DDM is highly heterogenous, contaminated by a battery of small molecules (that we suspect are co-purified lipids). These contaminants will inadvertently impact biochemical studies.

      1. A second topic where further clarification is in my view needed is the question of the conformations that were probed and the lipid binding sites. If I get the experimental rationale correctly, most of the data were determined in the absence of nucleotides, and only a small subset (Fig. 5) of data were determined in the presence of ATP-vanadate. However, structural evidence for the cytosolic lipid A binding site has been only determined for outward-facing MsbA (PDB: 8DMM), but has thus far not been seen in any of the inward-facing cryo-EM structures of MsbA, including recent well-resolved cryo-EM structures showing excellent density for the lipid A bound to the inward-facing cavity (PDB: 7PH2). Further, there is only one lipid A molecule that can be accommodated by the inward-facing cavity, whereas (owing to the symmetry of the homodimer) two lipid A can be bound sideways to outward-facing MsbA. Now, my understanding problem is why one does see up to three lipid A molecules bound to inward-facing apo MsbA, e.g. Fig. 2b and elsewhere. Where are they expected to bind? And what is the evidence supporting these additional binding sites?

      See our detailed response to reviewer 1. If we add more lipid, we see more lipid binding to MsbA, like every other membrane protein we have studied. This data clearly indicates that there are more KDL binding site(s) – deciphering the affinity of these site(s) represents a problem on the horizon.

      A further question is which lipid A binding sites are present in vanadate-trapped MsbA. Here, there are two identical binding sites (at the surface of each MsbA molecule), and it is therefore surprising to see that the affinities for the first and the second binding site are so different (see e.g. Supplementary Fig. 13).

      Great point. A logical explanation (described for other biochemical systems) is the two exterior LPS binding sites display negative cooperativity i.e., binding at one site weakens the affinity at the other site.

      Finally, what is the evidence that in vanadate-trapped MsbA, all molecules have closed NBDs and thus assume the outward-facing conformation? It is not uncommon that vanadate trapping leads to NBD closure only in a subfraction of all transporters (hence not in 100 % of them).

      Yes, the native mass spectrum shows no appreciable signal for MsbA not trapped with vanadate/ADP. In our previous cryoEM study [6], using the vanadate-trapped transporter, we did not observe particles with NDBs dissociated in space. Regarding samples from other labs, a native mass spectrum could shed light into the population of untrapped protein – however, most studies use SDS-PAGE for quality control of their purified samples. This technology is not sufficient to address underlying biochemical issues.

      We do have a new report in preparation describing a new discovery regarding trapping efficiency of MsbA.

      1. The key parameter that is underlying the entire thermodynamic analysis of wt and mutant MsbA is the dissociation/association constant, which are used to calculate free Gibb's energy and, via van't Hoff analysis, enthalpy. Entropy is not determined directly, but in fact indirectly from these two numbers both depending on the measurement quality of dissociation/association constant. Now, when looking at the fitted curves as shown in Figure 2b (and in the supplement), determination of the dissociation constant for KDL1 (blue curves) look reasonable and the determined KDs are within the range of measured points. However, for KDL2 (red) and even more so KDL3 (yellow), the determined KD values (Supplementary Table 5), the measured KD values are typically higher than highest KDL conc used in the assay (1.5 uM). For this reason, and despite the fact that error bars of the fits look reasonably small, I still have doubts about the reliability of these KD values for KDL2 and KDL3.

      Hence, the surprisingly strong changes of enthalpy/entropy values for different mutants/temperatures may have their origin in incorrectly determined KD values.

      The increase in binding affinity of subsequent lipid binding events is consistent with many reports from our group [1, 2, 4, 6-9] and that of Prof. Robinson [10, 11] on this topic. As noted above, we indeed observe linear van’t Hoff plots with positive and negative slopes as well as non-linear curves that are convex or concave. The MsbA protein (wt or mutant), trapped or not, all display unique temperature-dependent responses. If the reviewer suggestion that the Kd values are incorrectly or randomly determined, then none of the binding data should follow thermodynamic van’t Hoff equations. This is simply not the case - the error bars and fits are reasonable. Backing up even further, looking the raw native mass spectra (see supplemental figure 1-3 and 10-11) one can see different temperature-dependence of lipid binding.

      Minor points

      1. Lines 116-131: this section reads as an extended introduction/aims, and does not contain any results.

      This section has been moved to introduction.

      1. Lines 137-139: suggested to check whether these interactions are also present in recently determined cryo-EM structures determined at fairly high resolution (PDB: 7PH2)

      The interactions of MsbA and LPS (bound at the interior site) are comparable for PDB 7PH2 and 6BPL.

      1. Lines 144-146: suggested to elude in more detail on the fitting procedure here, as the KD values determined in this way are the foundation of all quantitative assessments.

      Details of data analysis and the fitting procedure are provided in methods.

      1. Figure legend, Fig. 2: Technically, MsbA was solubilized and purified in DDM and detergent exchange was done on SEC to C10E5.

      Corrected.

      1. Figure legend, Fig. 4: description in a) on deconvoluted mass spec data is incorrect. Letter below needs to be adjusted accordingly.

      Corrected.

      1. Figure legend, Fig. 5: suggested to mention in Figure legend title that here we look at ADP-vanadate trapped MsbA.

      Corrected.

      References 1. Cong, X., et al., Determining Membrane Protein–Lipid Binding Thermodynamics Using Native Mass Spectrometry. Journal of the American Chemical Society, 2016. 138(13): p. 4346-4349.

      1. Cong, X., et al., Allosteric modulation of protein-protein interactions by individual lipid binding events. Nat Commun, 2017. 8(1): p. 2203.

      2. Qiao, P., et al., Insight into the Selectivity of Kir3.2 toward Phosphatidylinositides. Biochemistry, 2020. 59(22): p. 2089-2099.

      3. Qiao, P., et al., Entropy in the Molecular Recognition of Membrane Protein-Lipid Interactions. J Phys Chem Lett, 2021. 12(51): p. 12218-12224.

      4. Sokolovski, M., et al., Measuring inter-protein pairwise interaction energies from a single native mass spectrum by double-mutant cycle analysis. Nat Commun, 2017. 8(1): p. 212.

      5. Lyu, J., et al., Structural basis for lipid and copper regulation of the ABC transporter MsbA. Nat Commun, 2022. 13(1): p. 7291.

      6. Patrick, J.W., et al., Allostery revealed within lipid binding events to membrane proteins. Proc Natl Acad Sci U S A, 2018. 115(12): p. 2976-2981.

      7. Schrecke, S., et al., Selective regulation of human TRAAK channels by biologically active phospholipids. Nature Chemical Biology, 2021. 17(1): p. 89-95.

      8. Zhu, Y., et al., Cupric Ions Selectively Modulate TRAAK-Phosphatidylserine Interactions. J Am Chem Soc, 2022. 144(16): p. 7048-7053.

      9. Tang, H., et al., The solute carrier SPNS2 recruits PI(4,5)P(2) to synergistically regulate transport of sphingosine1-phosphate. Mol Cell, 2023. 83(15): p. 2739-2752 e5.

      10. Yen, H.Y., et al., PtdIns(4,5)P(2) stabilizes active states of GPCRs and enhances selectivity of G-protein coupling. Nature, 2018. 559(7714): p. 423-427.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      I have one major concern regarding this draft of the manuscript:

      (1) In the manuscript (lines 130-31) it is stated that "About 55% (8/15) of mice with unilateral AAV-hM3Dq centered in the PMv showed an increase in LH release above 0.5ng/ml within 10-20 min following the CNO injection" However, data at time zero are not shown for 4 of the 8 "LH peak" animals. The missing data at time zero seems problematic for the analysis of the CNO-stimulated cohort. As mentioned in the manuscript, the area under the curve was calculated between the range of -10 to 20min post-injection. Because diestrus animals have spontaneous LH pulses, it is highly possible that an LH pulse is initiated in the10 minutes prior to drug delivery, as seen in the AAV-mCherry group in 1D, and similarly in 2C. Given the current form of analysis, it seems possible that a spontaneous LH pulse initiated anywhere up to 10 minutes prior to drug delivery could conceivably count as an experimentally induced "LH peak". Can you address this concern?

      We understand the reviewer’s concern about the spontaneous LH pulses. This is the reason we have been very strict on our analysis and have taken multiple approaches to analyze these data. In our hM3Dq group 55% of the animals responded to CNO with an increase in LH, while 0 responded in the negative control group. But also, in the clozapine group, where no time 0 points were missing, 100% of the animals with hM3Dq showed an LH increase after the injection while only 28% (2/7) showed the increase in the negative control group. Rigorously, the DREADDs approach doubled the chances of LH increase. Note that the spontaneous LH peaks observed in negative controls or during baseline show a very sharp increase and decrease at the next time point, while the 4 “PMv hits” without time 0 and increase in LH in the CNO-hM3Dq group showed a sustained rise after the 10 min or prolonged high LH levels (above 1ng/ml) even 30 min after the injection. But, ultimately, the cFOS levels in the PMv of CNO-hM3Dq group with increase in LH are significantly higher than in any other group and the number of cFOS neurons are highly correlated to LH levels. Another important aspect that should not be dismissed is that in this experimental design, we used unilateral injection in animals that are in a fed state, therefore the leptin role in rising LH levels is probably dampened.

      We have added a statement to clarify this issue.

      The following are minor concerns:

      a) Figure 4 a-d, it is clear that Vglut2 is absent in the VMH, but it seems more relevant to show this expression pattern in the PMv.

      We chose the VMH because it has a very dense collection of either LeprCre;VGlut2 or Vglut2 only cells and it illustrates very well the conditional Vglut2 deletion at small and high magnifications. In the PMv, however, the distribution of these cells is sparse. The reviewer is correct that for the current study, the PMv is more relevant and therefore, we have included images of the PMv showing a control and a LeprCre-Vglut2floxed animal in higher magnification.

      b) Methods section, targeting PMv: please check the injection coordinate: "dura-mater [dorsoventral -0.54]"

      Thank you for noticing this mistake, all coordinates for the injection have now been corrected (-5.4 mm, ±0.5 and -5.4mm)

      Reviewer #2 (Recommendations For The Authors):

      This is a very well-written manuscript by Saenz de Meira and colleagues on a careful study reporting on the key role of glutamate transporter vGlut2 expression in the neurons of the ventral perimammillary nucleus (PMv) of the hypothalamus expressing the leptin receptor LepRb in energy homeostasis, puberty, and estrous cyclicity. The authors first show using cre-dependent chemogenetic viral tools that the selective activation of the PMv LepRb induces luteinizing hormone (LH) release. Then the authors demonstrate that the selective invalidation of vGlut2 in LepRb-expressing cells in the all body induces obesity and mild alteration of sexual maturation in both sexes and blunted estrous cyclicity in females. Finally, the authors knock out vGlut2 in PMv neurons in which they reintroduce LepRb expression in an otherwise LepRb-null background using an AAV Cre approach. This latter very elegant experiment shows that while the sole re-expression of LepRb in PMv neurons in LepRb-null mice was shown before to restore puberty onset, deleting vGlut2 in LepRb-expressing PMv neurons blunts this effect.

      My specific comments are as follows. Please note that none of them require additional experiments and that they can be answered by amending the text.

      (1) Please provide information on the serotypes and promoters of the AAVs used in the study to enhance reproducibility.

      Thank you, serotypes and promoters have been added for all AAVs.

      (2) Please reformulate lines 220-221. Indeed, this reviewer does not agree with the fact that balanopreputial separation (BPS) is a sign of puberty completion. BPS is merely a sign of the advancement of sexual maturation, akin to vaginal opening in females. In certain mouse strains, BPS coincides with mini puberty rather than puberty. The definitive sign of puberty completion involves the presence of spermatozoa in the vas deferens (equivalent to the first ovulation/first estrus in females).

      Thank you for this remark, this statement has now been modified.

      (3) The authors convincingly show that the potential contamination of the arcuate nucleus of the hypothalamus (ARH) with the AAV injections targeted to the PMv should not account for the DREADD-mediated activation of LH release. However, do the authors believe that DREADD activation of LepRb-expressing PMv neurons, inducing cFOS expression in these neurons, could also activate ARH kisspeptin neurons (which do not express LepRb) via transsynaptic action? Alternatively, do they posit direct activation of GnRH cell bodies in the preoptic region or GnRH axon/dendrites in the ARH/median eminence region?

      Thank you for this comment. We don’t have enough evidence from this DREADDs experiment to make a strong prediction on the downstream pathways. However, as discussed, from the DREADDs khrGFP females, we observed very few kisspeptin cells expressing cFOS, reducing the evidence for a PMv to ARH kisspeptin action in this case. With the evidence from our LepR-Cre;Vglut2flox animals that showed no alterations in kiss1 gene expression but a strong decrease in GnRH release, we hypothesize that this acute activation of LH is mediated by direct inputs from PMv to GnRH neurons, while acknowledging the possible existence of alternative pathways. These arguments have been added to the discussion. 

      (4) This reviewer finds it intriguing that glutamatergic signaling is required for LepRb re-expression in the PMv to restore fertility. Given that the authors and others have shown that PMv neurons heavily express NOS1, the activity of which is known to heavily rely on glutamatergic NMDAR activation, the authors may want to contextualize their results in light of the recent study showing that NOS1 is found to be a new causative gene in people with congenital hypogonadotropic hypogonadism.

      Thank you for the advice, we have added a paragraph discussing the possible involvement of nNos from PMv neurons in the discussion.

      (5) Does the absence of vGlut2 have any impact on the obesity phenotype in mice where LepRb is selectively re-expressed in the PMv?

      We have followed the weight of these animals after the AAV injections. However, due to the difficulty of generating dual homozygous (LepRnull homozygous are infertile) and producing adequate stereotaxic injections with minimum contamination of adjacent nuclei, the groups could not be run all together and thus, we refrained from performing comparative analysis of energy balance. Analysis of body weight in LepRnull mice with reactivation of LepR in PMv neurons have been published before (Donato et al., 2011 using the Flp/Frt model and Mahany et al., 2018 using the Cre/loxP system). No difference in body weight was observed in both studies. Below is the progression of body weight in mice with reactivation of LepR and deletion of Vglut2 in PMv neurons. We added a comment on this regard.

      Author response image 1.

      Reviewer #3 (Recommendations For The Authors):

      The authors examined the effects of glutamate release from PMv LepR neurons in the regulation of puberty and reproduction in female mice. Multiple genetic mouse models were utilized to either manipulate PMv LepR neuron activities, or to delete glutamate vesicle transporters from LepR neurons. The authors have been quite rigorous in validating these models and exploring potential contaminations. Most of the data presented are solid and convincing, and support the conclusion. This reviewer has the following suggestions for the authors to further improve this work and the manuscript.

      (1) The DREADD study had some issues. For example, "2 out of 7 control mice with no AAV showed an increase in LH...", indicating that LH increase may just happen randomly. More importantly, 45% of PMv-hit mice did not show LH response to CNO, making it hard to interpret the positive LH responses from the other 55% PMv-hit mice undergoing the same treatment. Overall, there are just too many variabilities in these DREADD data for anyone to come up with a clean and convincing conclusion. This reviewer suggests repeating these experiments or removing the DREADD data altogether. After all, the rest of the results are much more convincing and stand alone to support the role of glutamate release from these PMv LepR neurons.

      We appreciate the reviewer’s concern. Indeed, LH shows spontaneous pulsatility which is one of the biggest challenges in our field. We have answered this concern for Reviewer 1 above and modified the text accordingly. We decided to keep the data in the publication because we believe that this is very important evidence supporting our observations since this is the only experiment that approaches the role of the PMv in a free-moving, ad libitum fed mouse model that is not deficient for leptin signaling or glutamatergic neurotransmission. Altogether this paper strongly supports a role for glutamate signaling on leptin’s action in reproductive function. Evidence for this role were dismissive or contentious until now.

      (2) The mCherry signals in Figure 3 are of low quality and do not look like cell bodies.

      We have now equally increased the contrast and brightness in all higher magnification images of mCherry neurons (Fig 3F, G, I and J) to improve their visibility. The lower magnification images are high quality images of areas with high density of mCherry positive neurons. Thick section (30µm) at low magnification compromises the focus at different Z-axis levels. We feel that images 3E and 3H are important to define the location of cells in the arcuate nucleus. Colocalization and mCherry expression are clear in high magnification images.

      (3) The validation of Vglut2 deletion in LepR neurons (Fig. 4A-D) is very nice and convincing, but the images are from the VMH region. Why not show the PMv region?

      As mentioned to Reviewer 1, we chose the VMH because it has a very dense collection of either LeprCre;VGlut2 or Vglut2 only cells and it illustrates very well the Vglut2 deletion at small and high magnifications. In the PMv, however, the distribution of these cells is sparce. The reviewer is correct that for the current study, the PMv is more relevant and therefore, we have included images of the PMv showing a control and a LeprCre-Vglut2floxed animal in higher magnification.

      (4) Figures 4-5 used LepR-Cre as controls, while Figure 6 used Vglut2flox as controls. Why? Also, how did the authors set up the breedings to generate "littermates" in each of these studies?

      We used the LepR-Cre as controls for our experiments since we need Cre homozygous for proper Cre expression and we had the LepR-Cre homozygous colony from the DREADDs experiment. Also, these mice had previously been thoroughly evaluated and no metabolic and/or reproductive disruption were noticed (please, see lines 213-214 of the original submission). However, our LepR-Cre colony had to be drastically reduced during COVID and suffered from unexpected Δ recombination leading to loss of Vglut2 homozygotes. To overcome these issues, we used VGlut2-floxed controls for the gene expression and GnRH immunoreactivity experiments. These mice had previously been used as controls for metabolic experiments with the LepCre-Vglut2fl genotype (Xu et al., 2013 Mol Metab), showing no deficiencies in the metabolic phenotype.

      As described in the methods section (lines 464-466 of the original preprint), to inactivate glutamate in leptin responsive cells, LepRb-Cre mice were crossed with mice carrying loxP-modified Vglut2 alleles. Our experimental mice were homozygous for the LepRb-Cre allele (LepRb_cre/cre_) and homozygous for the Vglut2-loxP allele (Vglut2_fl/fl_). Our controls consisted of mice homozygous for the Cre allele (LepRb_cre/cre_;Vglut2_+/+, named LepRb-Cre) or homozygous for the Vglut2-loxP allele (LepRb+/+;Vglut2_fl/fl, named Vglut2_flox_). Both experimental (LepRb_cre/cre_;Vglut2_fl/fl_, named LepRbΔVglut2) and control mice were derived from the same litters with parents homozygous for one of the genes and heterozygous for the other gene (LepRb_cre/cre_;Vglut2_fl/+or LepRb_cre/+;Vglut2_fl/fl_). Mice were genotyped at weaning (21 days) and again at the end of the experiments.

      (5) The labeling of Figures 5E-F is missing, making it hard to read.

      We have confirmed that Figure 5E and F were mentioned in the figure legends and in the results text. To improve the analysis of the figure we have added the Y axis titles to Figure 5 C,D, E and F, previously only shown in Fig 5A and B.

      (6) The last experiment was very nice confirming the role of glutamate release from PMv LepR neurons. However, the key phenotypes (puberty development, pregnancy) were not graphed and only stated in the text.

      Thank you for your comment. Since the key result is that none the LeprLoxTb;Vglut2flox animals showed vaginal opening or pregnancy, we don’t feel the need to graph this. All the details of the reproductive and metabolic phenotyping of the Lepr-loxTB with re-expression of LepR in the PMV were described in Mahany et al., 2018.

    1. Author Response

      The following is the authors’ response to the original reviews.

      This important study shows that two methods of sleep induction in the fly, optogenetically activation of the dorsal fan-shaped body (which is rapidly reversible and maintains a neuronal activity signature similar to wakefulness), and Gaboxadol-induced sleep (which shuts down neuronal activity), produce distinct forms of sleep and have different effects on brain-wide neural activity. The majority of the conclusions of the paper are supported by compelling data, but the evidence supporting the claim that the two interventions trigger distinct transcriptional responses is incomplete.

      Thank you for the helpful and detailed reviews. We feel that these have improved the manuscript considerably, and hopefully the additional figures in this Reply letter will help further convince our readers.

      Public Review

      In this study, Anthoney and coworkers continue an important, unique, and technologically innovative line of inquiry from the van Swinderen lab aimed at furthering our understanding of the different sleep stages that may exist in Drosophila. Here, they compare the physiological and transcriptional hallmarks of sleep that have been induced by two distinct means, a pharmacological block of GABA signaling and optogenetic activation of dorsal fan-shaped-body neurons. They first employ an incredibly impressive fly-on-the-ball 2-photon functional imaging setup to monitor neural activity during these interventions, and then perform bulk RNA sequencing of fly brains at different stages. These transcriptomic analyses leads them to (a) knocking out nicotinic acetyl-choline receptor subunits and (b) knocking down AkhR throughout the fly brain testing the impact of these genetic interventions on sleep behaviors in flies. Based on this work, the authors present evidence that optogenetically and pharmacologically induced sleep produces highly distinct brain-wide effects on physiology and transcription. The study is of significant interest, is easy to read, and the figures are mostly informative. However there are features of the experimental design and the interpretation of results that diminish enthusiasm.

      a- Conditions under which sleep is induced for behavioral vs neural and transcriptional studies

      1- There is a major conceptual concern regarding the relationships between the physiological and transcriptomic effects of optogenetic and pharmacological sleep promotion, and the effects that these manipulations have on sleep behavior. The authors show that these two means of sleep-induction produce remarkably distinct physiological and transcriptional responses, however, they also show that they produce highly similar effects on sleep behavior, causing an increase in sleep through increases in the duration of sleep bouts. If dFB neurons were promoting active sleep, the sleep it produces should be more fragmented than the sleep induced by the drug, because the latter is supposed to produce quiet sleep. Yet both manipulations seem to be biasing behavior toward quiet sleep.

      This is a correct observation, which is already evident in our sleep architecture data (Figure 2E-H): chronic optogenetic sleep induction promotes longer sleep bouts that are similar in structure (bout number vs bout duration) to those produced by THIP feeding. Since our plots in Figure 2E-H follow the 5min sleep criterion cutoff, upon the Reviewer’s advice we re-analyzed our optogenetic experiments for short (1-5min) sleep. These are graphed below in Author response image 1. As can be seen, and as suspected by the Reviewer, the optogenetic manipulation does not increase the total amount of short sleep; indeed, it decreases it compared to baseline (these are for the exact same data as in Figure 2). Optogenetic sleep induction does not create a bunch of short sleep bouts.

      Author response image 1.

      Short sleep in optogenetic experiments. A. Average baseline (±SEM) 1-5min sleep across a day and night. B. Average (±SEM) 1-5min sleep in optogenenetically-activated flies, across a day and night.

      We agree with the reviewer that this observation might seem inconsistent with the idea that optogenetic activation promotes active sleep, and that short sleep is active sleep. However, it does not necessarily follow that optogenetic activation has to produce short sleep. Indeed, we know from our brain imaging data (and the associated behavioral analysis) that active sleep will persist for as long as we induce it with red light. While we have not induced it for longer than 15 minutes (Tainton-Heap et al, Current Biology, 2021; Troup et al, J. of Neuroscience, 2023), this is already clearly longer than a <5min sleep bout. So our interpretation is that the longer sleep bouts induced by optogenetic activation are prolonged active sleep, rather than quiet sleep. In other words, this artificial sleep manipulation induces prolonged active sleep, rather than many short sleep bouts. This is of course different than what happens during spontaneous sleep. We have tried to be clearer about sleep bout durations in the revised manuscript (e.g., the new Figure 3), and we now admit early in the results (lines 376-380) that that we don’t know what optogenetic activation looks like in the fly brain beyond 15 minutes.

      2- The authors show that the pharmacological block of GABA signaling and the optogenetic activation of dorsal fan-shaped-body neurons cause different responses on brain activity. Based on these recordings and the behavioral and brain transcriptomic data they then claim that these responses correspond to different sleep states and are associated with the expression and repression of a different constellation of genes. Nevertheless, neural activity in animals was recorded following short stimulations whereas behavioral and transcriptomic data were obtained following chronic stimulation. In this regard, it would be interesting to determine how the 12-hour pharmacological intervention they employed for their transcriptomic analysis changes neural activity throughout the brain - 12 hours will likely be too long for the open-cuticle preps, but an in-between time-point (e.g. 1h) would probably be equally informative.

      The longest we’ve imaged brain activity for optogenetic sleep induction is 15 minutes, as discussed above. We see no changes in activity across this time, which would normally have led to a quiet sleep stage in spontaneous sleep recordings. Whole-brain imaging after 10 hours of optogenetic sleep induction (our RNA collection timepoint) is not realistic, and even 1 hour is difficult. We have however conducted overnight electrophysiological recordings (with multichannel silicon probes), where we activated the same R23E10 neurons for successive 20-minute bouts (alternating with 20min of no red light). We are preparing this work for publication (Van De Poll, et al). We see no evidence of optogenetic activation of this circuit ever producing anything resembling quiet sleep. Since we are not in a position to provide this new electrophysiological data in the current study, we are careful to clarify that we have not investigated what brain imaging looks like after chronic optogenetic activation (lines 376-380). We are showing through diverse lines of evidence that what is called sleep can look different in flies.

      b- Efficiency of THIP treatment under different conditions

      1- There are no data to quantify how THIP alters food consumption. It is evident that flies consume it otherwise they would not show increased sleep. However, they may consume different amounts of food overall than the minus THIP controls. This might have an influence on the animal's metabolism, which could at least explain the fact that metabolism-related genes are regulated (Figure 5). Therefore, in the current state, it is not possible to be certain that gene regulation events measured in this experiment are solely due to THIP effects on sleep.

      We have two arguments against this reasonable criticism. First, as discussed above, the optogenetic flies are sleeping at least as much as the THIP-fed flies, so in principle they also might be feeding less. But we see no metabolic gene downregulation in the optogenetic dataset. We include this counterargument in the discussion (lines 752-756). Then, together with our co-author Paul Shaw we have shown that THIP-fed flies are not eating less compared to controls (Dissel et al, Current Biology, 2015), by tracking dye consumption. We show those results again below in Author response image 2 to support our reasoning that feeding is not an issue.

      Author response image 2.

      Flies were fed blue dye in their food while being sleep deprived (SD), or while being induced to sleep with 0.1mg/ml THIP in their food, or both. Dye consumption was measured in triplicate for pooled groups of 16 flies. Average absorbance at 625nm (±stan dev) is shown. Experiments were not significantly different (ANOVA of means).

      2- A similar problem exists in the sleep deprivation experiments. If flies are snapped every 20 seconds, they may not have the freedom to consume appropriate amounts of food, and therefore their consumption of THIP or ATR may be smaller than in non-sleep deprived controls. Thus, it would be crucial to know whether the flies that are sleep-deprived (i.e. shaken every 20 seconds for 12 hours) actually consume comparable amounts of food (and therefore THIP) as those that are undisturbed. If not, then perhaps the transcriptional differences between the two groups are not sleep-specific, but instead reflect varying degrees of exposure to THIP.

      Please see our response to the similar critique above, and how Figure R2 addresses this concern.

      3- The authors should further discuss the slow action of THIP perfusion vs dFB activation, especially as flies only seem to fall asleep several minutes after THIP is being washed away. Is it a technical artifact? If not, it may not be unreasonable to hypothesize that THIP, at the concentration used, could prevent flies from falling asleep, and that its removal may lower the concentration to a point that allows its sleep-promoting action. The authors could easily test this by extending THIP treatment for another 4-5 minutes.

      The reviewer is partially correct in suggesting a technical artifact: THIP does not get washed away immediately after 5min of perfusion. The drip system we employ means that THIP concentration will slowly increase to the maximum concentration of 0.2mg/ml, and then slowly get diluted away at a rate of 1.25ml/minute (this is all in the Methods). In a previous study (Yap et al, Nature Communications, 2017) we used this exact same perfusion procedure to test a range of THIP concentrations, and settled on 0.2mg/ml as the lowest that reliably induced quiet sleep within 5 minutes. Higher concentrations induced quiet sleep faster, so the alternate explanation proposed by the Reviewer is not supported. We feel that our previous electrophysiological study provided the necessary groundwork for using the same approach and dosage here for our whole-brain imaging readout.

      c- Comments regarding the behavioral assays

      1- L319-322: the authors conclude that dFB stimulation and THIP consumption have similar behavioral effects on sleep. However, this is inaccurate as in Figure S1 they explain that one increases bout number in both day and night and the other one only during the day.

      We have now added a caveat about night bout architecture being different (lines 353-356). Figure S1 is now Figure 3.

      2- The behavioral definitions used for active and quiet sleep do not fit well with strong evidence that deep sleep (defined by lowered metabolic rates) is probably most closely associated with bouts of inactivity that are much longer than the >5min duration used here, i.e., probably 30min and longer (Stahl et al. 2017 Sleep 40: zsx084). Given that the authors are providing evidence that quiet sleep is correlated with changes in the expression of metabolism related genes, they should at least discuss the fact that reductions in metabolism have been shown to occur after relatively long bouts of inactivity and might reconsider their behavioral sleep analysis (i.e., their criteria for sleep state) with this in mind.

      Interestingly, induced sleep bout durations are on average longer for the optogenetic manipulation (40min vs 25min); this was evident in Figure S1C vs S1F (now Figure 3). So as discussed above, this provides a counterargument for sleep bout duration alone being indicative of metabolic processes associated with quiet sleep: the optogenetic dataset did not uncover metabolic-related pathways as relevant to that sleep manipulation. We refer to Stahl et al, Sleep, 2017, in our discussion (lines 748-750), making exactly this point about metabolic rates being decreased in longer sleep bouts, and flowing up with our observation that optogenetic flies sleep just as much, and their bouts are actually longer. So clearly different processes must be involved.

      d- Comments regarding the recordings of neuronal activity

      1- There is an additional concern regarding the proposed active and quiet sleep states that rest at the heart of this study. Here these two states in the fly are compared to the REM and NREM sleep states observed in mammals and the parallels between active fly sleep and REM and quiet fly sleep and NREM provide the framework for the study. The establishment of such parallel sleep states in the fly is highly significant and identifying the physiological and molecular correlates of distinct sleep stages in the fly is of critical importance to the field. However, the proposal that the dorsal fan shaped body (dFB) neurons promote active sleep runs counter to the prevailing model that these neurons act as a major site of sleep homeostasis. If quiet sleep were akin to NREM, wouldn't we expect the major site of sleep homeostasis in the brain to promote it? Furthermore, the authors state that the effects of dFB neuron excitation on transcription have "almost no overlap" (line 500) with the transcriptomic effects of sleep deprivation (Supplementary Table 3), which is not what would be expected if dFB neurons are tracking sleep pressure and promoting sleep, as suggested by a growing body of convergent work summarized on page four of the manuscript. Wouldn't the 10h excitation of the dFB neurons be predicted to mimic the effects of sleep deprivation if these neurons "...serve as the discharge circuit for the insect's sleep homeostat..." (line 60)? Shouldn't their prolonged excitation produce an artificial increase in sleep drive (even during sleep) that would favor deep, restorative sleep? How do the authors interpret their results with regard to the current prevailing model that dFB neurons act as a major site of sleep homeostasis? This study could be seen as evidence against it, but the authors do not discuss this in their Discussion.

      These are all excellent and thoughtful points, which have made us re-think parts of our discussion. First off, the potential comparison with REM and NREM is entirely speculative, and we have tried to make that more obvious in introduction) and the discussion (e.g, see lines 43, 708, 818). The evidence that the FB neurons (and maybe others) are involved in the homeostatic regulation of sleep is well-supported in the literature, so that part of the discussion holds. However, we concede that the timing of our sleep manipulations could benefit from more explanation. We conducted these during the flies’ subjective day, after the animals had presumably had a good night’s sleep. This means that we induced either kind of sleep for 10 daytime hours, which presumably replaced whatever behavioural states would ‘naturally’ be happening during the day. Female flies sleep less during the day than at night, and we have shown in previous work that daytime sleep quality is different than night-time sleep (van Alphen et al, Journal of Neuroscience, 2013), leading us to suggest that most ‘deep’ or quiet sleep happens at night, for flies. Following this reasoning, daytime optogenetic activation might not be depriving flies of much quiet sleep, or accumulating a deep sleep drive as the Reviewer proposes. Rather, both induced sleep manipulations could be providing 10 hours of either kind of sleep that the flies don’t really ‘need’. Why did we design it this way? Firstly, we were interested in simply asking what these chronic sleep manipulations do to gene expression in rested flies, and how they might be similar or different. We focussed on daytime manipulations to avoid precisely the confound of sleep pressure, and also because we observed red-light artifacts at night for our optogenetic experiments (which we reported). Our sleep deprivation strategy was designed specifically as a control for the THIP (Gaboxadol) experiments, to control for non-sleep related effects of the drug (see below our rationale for why this was less crucial for the optogenetic experiments). In conclusion, we had a logical rationale for how the experiments were done, centred on the straightforward question of whether these two different approaches to sleep induction were having similar effects in well-rested flies. In retrospect, we were not anticipating the Reviewer’s thoughtful logic regarding the dFB’s potential role in also regulating deep sleep homeostasis. We now provide some discussion along these lines to make readers aware of this line of reasoning, as well as our rationale for why prolonged optogenetic sleep induction was not sleep-depriving (lines 768-777).

      2- Regarding the physiological effects of Gaboxadol, to what extent is the quieting induced by this drug reminiscent of physiology of the brains of flies spontaneously meeting the behavioral criterion for quiet sleep? Given the relatively high dose of the drug being delivered to the de-sheathed brain in the imaging experiments (at least when compared to the dose used in the fly food), one worries that the authors may be inducing a highly abnormal brain state that might bear very little resemblance to the deeply sleeping brain under normal conditions. As the authors acknowledge, it is difficult to compare these two situations. Comparing the physiological state of brains put to sleep by Gaboxadol and brains that have spontaneously entered a deep sleep state therefore seems critical.

      As discussed above, our Gaboxadol (THIP) perfusion concentration (0.2mg/ml) was the minimal dosage that effectively induced sleep within 5 minutes, based upon previously published work (Yap et al, Nature Communications, 2017). Lower concentrations were unreliable, with some never inducing sleep at all. Comparisons with feeding THIP are tenuous, and we make that clear in our discussion (lines 731-735). Nevertheless, the Reviewer makes an excellent point about comparisons with spontaneous ‘quiet’ sleep. Here, we feel well supported (please see Author response image 3 below, comparing THIP-induced sleep (this work, B) and spontaneous sleep (A) from previous study). In our previous study (Tainton-Heap et al, 2021) we showed that neural activity and connectivity decreases during spontaneous quiet sleep. This is what we also see with THIP perfusion. In contrast, in Troup et al, J. of Neuroscience (2023) we confirm that neither neural activity nor connectivity changes during optogenetic R23E10 activation, and general anesthesia – unlike THIP – does NOT produce a quiet brain state. Our finding that THIP effects are nothing like general anesthesia (at the level of brain activity levels) suggests a physiological sleep state closer to spontaneous quiet sleep. We elaborate on this important observation in our results, also pointing to crucial differences with general anesthesia (lines 411-415).

      Author response image 3.

      THIP-induced sleep resembles quiet spontaneous sleep. A. Calcium imaging data from spontaneously sleeping flies, taken from Tainton-Heap et al, 2021. Left, percent neurons active; right, mean degree, a measure connectivity among active neurons. Both measures decrease during later stages of sleep. B. Calcium imaging data from flies induced to sleep with 5min of 0.2mg/ml THIP perfusion (this study). Left, percent neurons active; right, mean degree. Both measures are significantly decreased, resembling the later stages of spontaneous sleep, which we have termed ‘quiet sleep. Hence THIP-induced sleep resembles quiet sleep. Note that the genetic background is different in A and B, hence the different baseline activity levels.

      3- There are some issues with Figure 3, in particular 3C-D. It is not clear whether these panels show representative traces or an average, however both the baseline activity and fluorescence are different between C and D, in particular in their amplitude. Therefore, it is difficult to attribute the differences between C and D to the stimulation itself or to the previously different baseline. In addition, the fact that flies with dFB activation seem to keep a basal level of locomotor activity whereas THIP-treated ones don't is quite striking, however it is not being discussed. Finally, the authors claim that the flies eventually wake up from THIP-induced sleep (L360-361), however there are no data to support this statement.

      These are representative traces, which is a way of showing the raw calcium data (Cell ID) so readers can see for themselves that one manipulation silences whereas the other does not – even though flies become inactive for both. The Y-axis scale is standard deviation of the experiment mean. Since THIP decreases neural activity, then the baseline is comparatively higher. Since optogenetic activation does not change average neural activity levels, the baseline is centered on zero. This is an outcome of our analysis method and does not reflect any ‘true’ baseline. We have now clarified this in our figure legend. We now also confess that flies rendered asleep optogenetically can be ‘twitchy’ (line 374). Finally, we show data for 3 flies that were recorded until they woke up. The rest were verified behaviorally, after the experiment. This is now explained in the Methods.

      4- In Figure 4C, it is strange that the SEM is always exactly the same across the whole experiment. Readers should be aware that there might have been an issue when plotting the figure.

      This is not a mistake, the standard errors are just all quite close (between 0.17 and 0.22). This is because of the way we did the analysis, asking how many flies responded to each stimulus event, with incremental levels of responsiveness. This is explained in the Methods. The figure makes the important point of sleep and recovery.

      e- Comments regarding the transcript analyses

      1- General comment: the title of this manuscript is inaccurate - the "transcriptome" commonly refers to the entirety of all transcripts in a cell/tissue/organ/animal (including genes that are not differentially expressed following their interventions), and it is therefore impossible to "engage two non-overlapping transcriptomes" in the same tissue. Perhaps the word "transcriptional programs" or transcriptional profiles" would be more accurate here?

      We thank the Reviewer for this advice and have changed the title as proposed.

      2- Given the sensitivity of transcriptomic methods, there is a significant concern that the optogenetic experiments are not as well controlled as they could be. Given the need for supplemental all-trans retinal (ATR) for functional light gating of channelrhodopsins in the fly, it is convenient to use flies with Gal4-driven opsin that have not been given supplemental ATR as a negative control, particularly as a control for the effects of light. However, there is another critical control to do here. Flies bearing the UAS-opsin responder element but lacking the GAL4 driver and that have been fed ATR are critical for confirming that the observed effects of optogenetic stimulation are indeed caused by the specific excitation of the targeted neurons and not due to leaky opsin expression, or the effect of ATR feeding under light stimulation or some combination of these factors. Given the sensitivity of transcriptomic methods, it would be good to see that the candidate transcripts identified by comparing ATR+ and ATR- R23E10GAL4/UAS-Chrimson flies are also apparent when comparing R23E10GAL4/UAS-Chrimson (ATR+) with UAS-Chrimson (ATR+) alone.

      We have not done these experiments on UAS-Chrimson/+ controls. Like many others in our field, we viewed non-ATR flies as the best controls, because this involves identical genotypes. Since we were however aware that ATR feeding itself could be affect gene expression, we specifically checked for this with our early (1hour) collection timepoint. We only found 26 gene expression differences between ATR and -ATR flies at this early timepoint, compared with 277 for the 10-hour timepoint. We detail this rationale in our results, explaining why this is a convincing control for ATR feeding. If there was leaky opsin expression / activity, this would have been evident in our design. Regarding the cumulative effect of light, this would also have been accounted in our design, as only 1 hour would have elapsed in our first timepoint compared to 10 hours in our second. While the Reviewer is correct in saying that parental controls are called for in many Drosophila experiments, this becomes quickly unmanageable in transcriptomic studies, which is exactly why well-designed +ATR vs -ATR comparisons in the exact same strain are most appropriate. We feel that our 1-hr timepoint mostly addresses this concern.

      3- Figures about qPCR experiments (5G and 6G) are problematic. First, whereas the authors seem satisfied with the 'good correspondence' between their RNA-seq and qPCR results, this is true for only ~9/19 genes in 5G and 2/6 genes in 6G. Whereas discrepancies are not rare between RNA-seq and qPCR, the text in L460-461 and 540-541 is misleading. In addition, it is unclear whether the n=19 in L458 refers to the number of genes tested or the number of replicates. If the qPCR includes replicates, this should be more clearly mentioned, and error bars should be added to the corresponding figures.

      We consider that our qPCR validations were convincing, as they were all mostly changed in the ‘right’ direction. We agree that are some discrepancies, so have modified our language to reflect this. We have also clarified that 19 refers to the number of genes validated by qPCR in that THIP dataset. All qPCRs involved three technical replicates. We prefer to keep these histograms the way they are to convey these simple trends. For complete transparency, we now provide a supplemental Excel worksheet with all of the qPCR data, alongside corresponding RNAseq data and stats for the selected genes (Supplementary Table 9).

      4- There is a lack of error bars for all their RNAseq and qPCR comparisons, which is particularly surprising because the authors went to great lengths and analyzed an applaudably large amount of independent biological replicates, yet the variability observed in the corresponding molecular data is not reported.

      The genes reported in each of our datasets and associated supplemental figures and tables were all significant, as determined by criteria outlined in the Methods. However, we appreciate that readers might want to get a sense of the values and variances involved, as well as access to the entire gene datasets. We now provide all of these as additional ‘sheets’ in our existing supplemental tables (S2-S7), so this should be very easy to navigate and evaluate. In addition to the previously provided lists for significant genes, in the second Excel sheet (‘All genes’) readers will be able to see the data for all 5 replicates, for the significant genes as well as all other ~15,000 genes (listed in alphabetical order). We feel that this will be a helpful resource, because admittedly significance thresholds can still be a little arbitrary and some readers might want to look up ‘their’ genes of interest.

      Comments to authors

      Other comments

      1- Text in L441 & 606 is misleading. According to ref 52, AkhR is involved specifically in starvation-induced sleep loss, and not in general sleep regulation.

      Corrected.

      2- The language used in L568-570 and 573-574 is confusing. The authors should specify that the knock down of cholinergic subunits, rather than the subunits themselves is what causes sleep to increase or decrease.

      Corrected.

      3- The authors' investigation of cholinergic receptor subunits function is very preliminary, and it is difficult to draw any conclusion from what is presented here. In particular, their behavioral data is difficult to reconcile with the RNA-seq data showing overexpression of both short sleep increasing and short sleep decreasing subunits. Without knowing where in the brain these subunits are required for controlling sleep, the data in Figure 7 is difficult to appreciate.

      We have now conducted additional experiments where we specifically knocked down these alpha receptor subunits (all 7 of them) in the R23E10 neurons. This seemed an obvious knockdown location, to determine if any of these subunits regulated activity in the same sleep promoting neurons that were the focus of this study. We found that alpha1 knockdown in these neurons had similar sleep phenotypes, which we believe is an important result. Since this functional localisation is a logical ending for the paper, we have now made it the final figure.

      Suggestions & comments

      1- It would be interesting if the authors could discuss their findings that metabolism genes are downregulated in THIP flies in the context of recent work that showed upregulation of mitochondrial ROS after sleep deprivation (Kempf et al, 2019).

      We now add the Kempf 2019 reference and allude to how those findings could be consistent with ours.

      2- The fact that THIP-induced sleep persists long after THIP removal (Fig 3D) is very intriguing and interesting. This suggests that the drug might trigger a sleep-inducing pathway that can continue on its own without the drug, once activated.

      This is correct, and in stark contrast to the optogenetic manipulation we employ, which does not appear to show such sleep inertia. We have now added a sentence highlighting this interesting difference (lines 394-396).

      3- The authors identify many new genes regulated in response to specific methods for sleep induction. These are all potentially interesting candidates for further studies investigating the molecular basis of sleep. It would be interesting to know which of these genes are already known to display circadian expression patterns.

      By providing all of the gene lists, these are now available to ask questions such as these. We hesitate however to delve into this domain for this work, as our main goal was to compare these two kinds of sleep in flies.

      4- The brain-wide monitoring of neural activity invites a number of very exciting follow-up experiments - most importantly, it would be fascinating to establish, which neurons are active in the different phases the authors describe! Are these neurons that are involved in transmitting external visual stimuli to the central brain? Do they also project into the central complex? They could make use of the large collection of existing driver lines in the fly and they could also exploit the extraordinary knowledge of the connectome and transcriptome of the fly brain.

      Thank you for sharing our enthusiasm for these likely future directions.

      5- The Dalpha2,3,4,6 and 7 Knock-out strains they generate will be a useful reagent for the Drosophila neuroscience community once the efficiency/success of the knock-out has been confirmed by qPCR.

      These knockout strains have all been confirmed by our co-authors Hang Luong, Trent Perry, and Philip Batterham. These knockout confirmations are outlined in publications that we reference (Perry et al, 2021).

      Materials and methods:

      1- This study has employed custom-built apparatus and custom-written code/scripts, but these do not appear to be available to the reader. For the sake of replicability, the authors should make these available.

      The code/scripts are available via the University of Queensland research data management system as described in the Methods, and can be sent by the Lead Contact. The imaging hardware and analysis code are identical to what was described in a previous publication, and available as directed therein (Tainton-Heap et al, 2021).

      2- Also, the authors should give details on the food used to rear their flies. Fly media comes in several common forms and sleep is sensitive to diet.

      This has now been elaborated in the beginning of the Methods.

      3- The light regime used for optogenetic excitation of dFB neurons consists of 12h of uninterrupted bright red LED light. Most optogenetic stimulations consist of pulsed high frequency flashes interlaced with pauses in illumination. Can dFB neurons be driven constitutively with 12 hours of bright light?

      We showed in Tainton-Heap (2021) that 7Hz pulsed red light had exactly the same effect on R23E10/Chrimson readouts as continuous red light, which is why we opted here to provide continuous red light. That optogenetic sleep induction can be driven continuously for 12 hours is evident by our 24-hour sleep profiles. However, we agree that one could question whether sleep quality is similar after 12 hours. To address this, we did an additional experiment where we stimulated the flies hourly, to determine if their behavioural responsiveness to mechanical stimuli changed over the course of continued sleep induction, for both optogenetic and THIP-induced sleep. We present the data below in Author response image 4. As can be seen in these new analyses, while optogenetic sleep induction persists across 12 daytime hours (speed is close to zero throughout), flies do indeed become more responsive later in the day. This could have two different interpretations: either some sleep functions are being satisfied over time, or the activation regime is becoming less effective over time. Either way, these data show that at our 10-hour daytime timepoint, unstimulated flies are still largely inactive, even though their arousal thresholds might have gradually changed; so the uninterrupted red-light regime is still effective. The comparison with THIP is interesting: here there does not seem to be a change in responsiveness over time; the drug just decreases behavioral responsiveness throughout. Together, these experiments support our view that both approaches are sleep-promoting throughout the 12-hour day, although we appreciate that sleep quality is not identical.

      Author response image 4.

      A) The average speed of baseline (grey) and optogenetically-activated flies (green) across 24 hours. Red dots indicate vibration stimulus times. B) The average speed of control (grey) and THIP-fed flies (blue) across 24 hours. Flies are all R23E10/Chrimson. N= 87 for optogenetic, n=88 for -THIP, n=85 for +THIP.

      4- The authors use the SNAP apparatus to prevent THIP-treated flies from sleeping to tease out possible sleep-independent effects. This is an excellent control. Why have the authors not done the same with the optogenetic treatment? It's surprising not to see this control given the concern the authors express (lines 501 - 502) that the dFB manipulation might be paralyzing awake flies, which certainly seems possible given the light regimes used. Why not test this directly with SNAP?

      We appreciate that this may have been a valuable additional control. However, we designed this control for the THIP experiments specifically because of concerns about THIP’s (yet unknown) mechanism of action in flies. THIP is a gabaergic drug with most likely many off-target effects that have little to do with sleep, hence the need for a control where we compare to flies that ingested THIP but have been prevented from sleeping. In contrast, R23E10-driven sleep induction is exactly that, a circuit when activated that induces sleep. Whatever specific neurons might really be involved, the Gal4 circuit is sleep-inducing. This is well supported by multiple publications. The most appropriate control for assessing transcriptomic effects during optogenetic sleep here is not preventing sleep, but rather no increased sleep in flies that have not ingested ATR, and comparing that to effects of ATR alone, which is what we have done. Adding a sleep-deprivation layer onto both of these analyses may have been interesting, but a lot more analyses and not strictly required to identify relevant sleep-related genes. We have rephrased the misleading sentence about paralyzing flies, to instead clarify that lack of overlap with the SD dataset suggests that optogenetic activation is not preventing sleep functions from being engaged.

      5- A pairwise comparison of ZT01 and ZT10 does not address circadian expression cycles in a meaningful way. There will be strong effects of the LD cycle here. I suggest toning this down. (Though it is gratifying to see the expected changes in the core clock genes.)

      We have changed the language from ‘circadian’ to ‘light-dark’ to address this, although have kept the word ‘circadian’ when referring specifically to genes such as per, clock, timeless, etc.

      6- Line 109: There is a reference missing.

      We now provide the relevant reference.

      Results

      1- General comment regarding the figures: a general effort could be made to improve the design and quality of the figures and make them more readable. There are a lot of issues such as stretched or misaligned text, badly drawn frames, etc.

      We think we know which figures this might relate to (e.g., Figures 3,4B), so we have adjusted where appropriate.

      2- Instead of 'dFB-induced' (e.g., L77) it would be more accurate to use 'optogenetically-induced'

      Thank you for this helpful advice. We have changed our language throughout to say ‘optognetically-induced’

      3- Figure S1 should be integrated in the main figure to make the quantification more easily 4accessible.

      We have integrated Figure S1 into the main figures. It is now Figure 3.

      5- It would be good to include red light controls in Figure 2C, E, G.

      Making Figure S1 a main figure has better highlighted the fact that we have done red light controls (‘baseline’).

      6- line 313: Fig2E-H - these graphs would benefit if the authors made it more obvious where the maximum sleep amount would fall - i.e. the combination of bouts and minutes that add up to 12 hours (and therefore the entire day/night)

      If a fly were to sleep uninterrupted for all 12 hours of a day or night, that would amount to a sleep bout 720 minutes long. We do not feel that identifying this maximum on these graphs would be helpful. It should be clear from the data that a floor is reached with very few sleep bouts exceeding 60 minutes in our paradigm. To help orient the reader though, we now clarify in the figure legend that the maximum is 720 minutes or 12 hours.

      7- Fig. 2B, D: It was not clear why the authors took the 3-day average here. Doesn't that lead to a whole range of very different behaviors? I could, perhaps naively, imagine that a fly's behavior changes after 2 days of almost-permanent sleep?

      We took the 3-day average because the effect of THIP on each successive day was not significantly different (see Author response image 5, below). Flies wake up enough to have a good feed (see Author response image 2) and then go back to sleep. Since this is however an important point raised by the reviewer, we now mention in the Methods that sleep duration was not different among the 3 averaged days and nights (lines 193-195).

      Author response image 5.

      Data from THIP feeding experiment (Figure 2B) in manuscript, separated into 3 successive days and nights, with THIP-fed flies (blue) compared to controls (white). Averages  SD are shown, samples sizes are the same as in Figure 2D. No THIP data was significantly different across days and nights (ANOVA of means).

      8- In Figure 2C the authors compare optogenetically induced to "spontaneous sleep," which I think refers to baseline sleep before stimulation, according to the figure. I think the proper comparison would be to the red light control (ATR-); though see the comment above regarding optogenetic controls).

      This information was provided in Figure S1. We now provide it as a main Figure 3, as requested above.

      We also made a point about red light having an effect at night, which is why we focussed on daytime effects for our transcriptomic comparisons. We feel that the ATR-fed flies (minus red light) are an appropriate control here for optogenetically-induced sleep: same exact genotype and ATR feeding, just no optogenetic activation. We therefor would prefer to keep these graphs as they are, especially since we show -ATR data subsequently.

      9- Figures 3A and 4A are redundant; Figure 3B has some active ROIs that are outside of the brain. I am not sure how this is possible?

      We have removed the redundant 4A and replaced it with the THIP molecule to clearly signal what this figure is focussed on. In Figure 3B (now 4B), the brain mask is a visual estimate made from the middle of the image stack. Some neurons in other layers are outside this single-layer estimate. All neurons were all accounted for.

      10- Figure 4B is confusing. It took me a while to understand and so it can do with re-drawing in a more accessible way.

      We agree that this was confusing, e.g. there were too many arrows. We have redrawn and simplified (Now 5A).

      11- The authors state that flies wake up from THIP-induced sleep on the ball, but in Figure 4D there appears to be fewer samples for flies who have woken up from THIP (3) compared to those observed before THIP administration. Are flies dying?

      None of the flies died. Most flies were removed from imaging to confirm recovery, while 3 were left in our imaging setup to measure brain activity upon recovery. These results are in Figure 5C and now clarified in the Methods.

      12- Fig5C,D: I'm surprised that by far the most significant changes (in terms of log2-FC and p-val) occur in the sleep-deprived flies? It is not clear to me what the authors mean by effects that "relate waking process"? Perhaps they could elaborate on this?

      We have removed the phrase ‘relates to waking processes’. We now also remark on the high level of fold-change in many of these genes but refrain from discussing this further in the results. It is interesting though.

      13- The sentence in L425-428 is unclear - it would be good to rephrase this.

      We have rephrased this sentence, hopefully it’s clearer now.

      14- Text in L544-545 is confusing. What do you mean by 'less clear'?

      We have replaced ‘less clear’ with ‘not dominated by a single category’.

      15- It is unclear what is the control in Fig 7A. It would be good to mention what strain was used.

      Different knockout strains had different controls. These are identified in the figure legend and Methods.

      16- L579-581: it would be helpful to include this data in a supplementary figure.

      We now provide this as a supplementary figure as requested (Supplementary Figure 6).

      17- There is no information about R57C10 in the methods - it would be good to explain which neurons this line labels, and why you chose it.

      We now clarify in the methods that R57C10-Gal4 is a pan-neural driver, and provide a reference.

      18- Table S5 - If I'm not mistaken then the first line should say 1h, not 10h.

      Corrected

    1. Author response:

      The following is the authors’ response to the original reviews.

      We would like to thank the reviewers for helping us improve our article and software. The feedback that we received was very helpful and constructive, and we hope that the changes that we have made are indeed effective at making the software more accessible, the manuscript clearer, and the online documentation more insightful as well. A number of comments related to shared concerns, such as:

      • the need to describe various processing steps more clearly (e.g. particle picking, or the nature of ‘dust’ in segmentations)

      • describing the features of Ais more clearly, and explaining how it can interface with existing tools that are commonly used in cryoET

      • a degree of subjectivity in the discussion of results (e.g. about Pix2pix performing better than other networks in some cases.)

      We have now addressed these important points, with a focus on streamlining not only the workflow within Ais but also making interfacing between Ais and other tools easier. For instance, we explain more clearly which file types Ais uses and we have added the option to export .star files for use in, e.g., Relion, or meshes instead of coordinate lists. We also include information in the manuscript about how the particle picking process is implemented, and how false positives (‘dust’) can be avoided. Finally, all reviewers commented on our notion that Pix2pix can work ‘better’ despite reaching a higher loss after training. As suggested, we included a brief discussion about this idea in the supplementary information (Fig. S6) and used it to illustrate how Ais enables iteratively improving segmentation results. 

      Since receiving the reviews we have also made a number of other changes to the software that are not discussed below but that we nonetheless hope have made the software more reliable and easier to use. These include expanding the available settings, slight changes to the image processing that can help speed it up or avoid artefacts in some cases, improving the GUI-free usability of Ais, and incorporating various tools that should help make it easier to use Ais with remote data (e.g. doing annotation on an office PC, but model training on a more powerful remote PC). We have also been in contact with a number of users of the software, who reported issues or suggested various other miscellaneous improvements, and many of whom had found the software via the reviewed preprint.

      Reviewer 1 (Public Review):

      This paper describes "Ais", a new software tool for machine-learning-based segmentation and particle picking of electron tomograms. The software can visualise tomograms as slices and allows manual annotation for the training of a provided set of various types of neural networks. New networks can be added, provided they adhere to a Python file with an (undescribed) format. Once networks have been trained on manually annotated tomograms, they can be used to segment new tomograms within the same software. The authors also set up an online repository to which users can upload their models, so they might be re-used by others with similar needs. By logically combining the results from different types of segmentations, they further improve the detection of distinct features. The authors demonstrate the usefulness of their software on various data sets. Thus, the software appears to be a valuable tool for the cryo-ET community that will lower the boundaries of using a variety of machine-learning methods to help interpret tomograms. 

      We thank the reviewer for their kind feedback and for taking the time to review our article. On the basis of their  comments, we have made a number of changes to the software, article, and documentation, that we think have helped improve the project and render it more accessible (especially for interfacing with different tools, e.g. the suggestions to describe the file formats in more detail). We respond to all individual comments one-by-one below.

      Recommendations:

      I would consider raising the level of evidence that this program is useful to *convincing* if the authors would adequately address the suggestions for improvement below.

      (1) It would be helpful to describe the format of the Python files that are used to import networks, possibly in a supplement to the paper. 

      We have now included this information in both the online documentation and as a supplementary note (Supplementary Note 1). 

      (2) Likewise, it would be helpful to describe the format in which particle coordinates are produced. How can they be used in subsequent sub-tomogram averaging pipelines? Are segmentations saved as MRC volumes? Or could they be saved as triangulations as well? More implementation details like this would be good to have in the paper, so readers don't have to go into the code to investigate. 

      Coordinates: previously, we only exported arrays of coordinates as tab-separated .txt files, compatible with e.g. EMAN2. We now added a selection menu where users can specify whether to export either .star files or tsv .txt files, which together we think should cover most software suites for subtomogram averaging. 

      Triangulations: We have now improved the functionality for exporting triangulations. In the particle picking menu, there is now the option to output either coordinates or meshes (as .obj files). This was previously possible in the Rendering tab, but with the inclusion in the picking menu exporting triangulations can now be done for all tomograms at once rather than manually one by one.

      Edits in the text: the output formats were previously not clear in the text. We have now included this information in the introduction:

      “[…] To ensure compatibility with other popular cryoET data processing suites, Ais employs file formats that are common in the field, using .mrc files for volumes, tab-separated .txt or .star files for particle datasets, and the .obj file format for exporting 3D meshes.”

      (3) In Table 2, pix2pix has much higher losses than alternatives, yet the text states it achieves fewer false negatives and fewer false positives. An explanation is needed as to why that is. Also, it is mentioned that a higher number of epochs may have improved the results. Then why wasn't this attempted? 

      The architecture of Pix2pix is quite different from that of the other networks included in the test. Whereas all others are trained to minimize a binary cross entropy (BCE) loss, Pix2pix uses a composite loss function that is a weighted combination of the generator loss and a discriminator penalty, neither of which employ BCE. However, to be able to compare loss values, we do compute a BCE loss value for the Pix2pix generator after every training epoch. This is the value reported in the manuscript and in the software. Although Pix2pix’ BCE loss does indeed diminish during training, the model is not actually optimized to minimize this particular value and a comparison by BCE loss is therefore not entirely fair to Pix2pix. This is pointed out (in brief) in the legend to the able: 

      “Unlike the other architectures, Pix2pix is not trained to minimize the bce loss but uses a different loss function instead. The bce loss values shown here were computed after training and may not be entirely comparable.”

      Regarding the extra number of epochs for Pix2pix: here, we initially ran in to the problem that the number of samples in the training data was low for the number of parameters in Pix2pix, leading to divergence later during training. This problem did not occur for most other models, so we decided to keep the data for the discussion around Table 1 and Figure 2 limited to that initial training dataset. After that, we increased the sample size (from 58 to 170 positive samples) and trained the model for longer. The resulting model was used in the subsequent analyses. This was previously implicit in the text but is now mentioned explicitly and in a new supplementary figure. 

      “For the antibody platform, the model that would be expected to be one of the worst based on the loss values, Pix2pix, actually generates segmentations that are seem well-suited for the downstream processing tasks. It also output fewer false positive segmentations for sections of membranes than many other models, including the lowest-loss model UNet. Moreover, since Pix2pix is a relatively large network, it might also be improved further by increasing the number of training epochs. We thus decided to use Pix2pix for the segmentation of antibody platforms, and increased the size of the antibody platform training dataset (from 58 to 170 positive samples) to train a much improved second iteration of the network for use in the following analyses (Fig. S6).”

      (4) It is not so clear what absorb and emit mean in the text about model interactions. A few explanatory sentences would be useful here. 

      We have expanded this paragraph to include some more detail.

      “Besides these specific interactions between two models, the software also enables pitching multiple models against one another in what we call ‘model competition’. Models can be set to ‘emit’ and/or ‘absorb’ competition from other models. Here, to emit competition means that a model’s prediction value is included in a list of competing models. To absorb competition means that a model’s prediction value will be compared to all values in that list, and that this model’s prediction value for any pixel will be set to zero if any of the competing models’ prediction value is higher. On a pixel-by-pixel basis, all models that absorb competition are thus suppressed whenever their prediction value for a pixel is lower than that of any of the emitting models.”

      (5) Under Figure 4, the main text states "the model interactions described above", but because multiple interactions were described it is not clear which ones they were. Better to just specify again. 

      Changed as follows:

      “The antibody platform and antibody-C1 complex models were then applied to the respective datasets, in combination with the membrane and carbon models and the model interactions described above (Fig. 4b): the membrane avoiding carbon, and the antibody platforms colocalizing with the resulting membranes”.

      (6) The next paragraph mentions a "batch particle picking process to determine lists of particle coordinates", but the algorithm for how coordinates are obtained from segmented volumes is not described. 

      We have added a paragraph to the main text to describe the picking process:

      “This picking step comprises a number of processing steps (Fig. S7). First, the segmented (.mrc) volumes are thresholded at a user-specified level. Second, a distance transform of the resulting binary volume is computed, in which every nonzero pixel in the binary volume is assigned a new value, equal to the distance of that pixel to the nearest zero-valued pixel in the mask. Third, a watershed transform is applied to the resulting volume, so that the sets of pixels closest to any local maximum in the distance transformed volume are assigned to one group. Fourth, groups that are smaller than a user-specified minimum volume are discarded. Fifth, groups are assigned a weight value, equal to the sum of the prediction value (i.e. the corresponding pixel value in the input .mrc volume) of the pixels in the group. For every group found within close proximity to another group (using a user-specified value for the minimum particle spacing), the group with the lower weight value is discarded. Finally, the centroid coordinate of the grouped pixels is considered the final particle coordinate, and the list of all

      coordinates is saved in a tab-separated text file.

      “As an alternative output format, segmentations can also be converted to and saved as triangulated meshes, which can then be used for, e.g., membrane-guided particle picking. After picking particles, the resulting coordinates are immediately available for inspection in the Ais 3D renderer (Fig. S8).“

      The two supplementary figures are pasted below for convenience. Fig. S7 is new, while Fig. S8 was previously Fig. S10 -the reference to this figure was originally missing in the main text, but is now included.

      (7) In the Methods section, it is stated that no validation splits are used "in order to make full use of an input set". This sounds like an odd decision, given the importance of validation sets in the training of many neural networks. Then how is overfitting monitored or prevented? This sounds like a major limitation of the method. 

      In our experience, the best way of preparing a suitable model is to (iteratively) annotate a set of training images and visually inspect the result. Since the manual annotation step is the bottleneck in this process, we decided not to use validation split in order to make full use of an annotated training dataset (i.e. a validation split of 20% would mean that 20% of the manually annotated training data is not used for training)

      We do recognize the importance of using separate data for validation, or at least offering the possibility of doing so. We have now added a parameter to the settings (and made a Settings menu item available in the top menu bar) where users can specify what fraction (0, 10, 20, or 50%) of training datasets should be set aside for validation. If the chosen value is not 0%, the software reports the validation loss as well as the size of the split during training, rather than (as was done previously) the training loss. We have, however, set the default value for the validation split to 0%, for the same reason as before. We also added a section to the online documentation about using validation splits, and edited the corresponding paragraph in the methods section:

      “The reported loss is that calculated on the training dataset itself, i.e., no validation split was applied. During regular use of the software, users can specify whether to use a validation split or not. By default, a validation split is not applied, in order to make full use of an input set of ground truth annotations. Depending on the chosen split size, the software reports either the overall training loss or the validation loss during training.”

      (8) Related to this point: how is the training of the models in the software modelled? It might be helpful to add a paragraph to the paper in which this process is described, together with indicators of what to look out for when training a model, e.g. when should one stop training? 

      We have expanded the paragraph where we write about the utility of comparing different networks architectures to also include a note on how Ais facilitates monitoring the output of a model during training:

      “When taking the training and processing speeds in to account as well as the segmentation results, there is no overall best architecture. We therefore included multiple well-performing model architectures in the final library, in order to allow users to select from these models to find one that works well for their specific datasets. Although it is not necessary to screen different network architectures and users may simply opt to use the default (VGGNet), these results thus show that it can be useful to test different networks in order to identify one that is best. Moreover, these results also highlight the utility of preparing well-performing models by iteratively improving training datasets and re-training models in a streamlined interface. To aid in this process, the software displays the loss value of a network during training and allows for the application of models to datasets during training. Thus, users can inspect how a model’s output changes during training and decide whether to interrupt training and improve the training data or choose a different architecture.”

      (9) Figure 1 legend: define the colours of the different segmentations. 

      Done

      (10) It may be better to colour Figure 2B with the same colours as Figure 2A. 

      We tried this, but the effect is that the underlying density is much harder to see. We think the current grayscale image paired with the various segmentations underneath is better for visually identifying which density corresponds to membranes, carbon film, or antibody platforms.

      Reviewer 2 (Public Review):

      Summary: 

      Last et al. present Ais, a new deep learning-based software package for the segmentation of cryo-electron tomography data sets. The distinguishing factor of this package is its orientation to the joint use of different models, rather than the implementation of a given approach. Notably, the software is supported by an online repository of segmentation models, open to contributions from the community. 

      The usefulness of handling different models in one single environment is showcased with a comparative study on how different models perform on a given data set; then with an explanation of how the results of several models can be manually merged by the interactive tools inside Ais. 

      The manuscripts present two applications of Ais on real data sets; one is oriented to showcase its particlepicking capacities on a study previously completed by the authors; the second one refers to a complex segmentation problem on two different data sets (representing different geometries as bacterial cilia and mitochondria in a mouse neuron), both from public databases. 

      The software described in the paper is compactly documented on its website, additionally providing links to some YouTube videos (less than an hour in total) where the authors videocapture and comment on major workflows. 

      In short, the manuscript describes a valuable resource for the community of tomography practitioners. 

      Strengths: 

      A public repository of segmentation models; easiness of working with several models and comparing/merging the results. 

      Weaknesses: 

      A certain lack of concretion when describing the overall features of the software that differentiate it from others. 

      We thank the reviewer for their kind and constructive feedback. Following the suggestion to use the Pix2pix results to illustrate the utility of Ais for analyzing results, we have added a new supplementary figure (Fig. S6) and brief discussion, showing the use of Ais in iteratively improving segmentation results. We have also expanded the online documentation and included a note in the supplementary information about how models are saved/loaded (Supplemetary note 1) 

      Recommendations:

      I would like to ask the authors about some concerns about the Ais project as a whole: 

      (1) The website that accompanies the paper (aiscryoet.org), albeit functional, seems to be in its first steps. Is it planned to extend it? In particular, one of the major contributions of the paper (the maintenance of an open repository of models) could use better documentation describing the expected formats to submit models. This could even be discussed in the supplementary material of the manuscript, as this feature is possibly the most distinctive one of the paper. Engaging third-party users would require giving them an easier entry point, and the superficial mention of this aspect in the online documentation could be much more generous.

      We have added a new page to the online documentation, titled ‘Sharing models’ where we include an explanation of the structure of model files and demonstrate the upload page. We also added a note to the Supplementary Information that explains the file format for models, and how they are loaded/saved (i.e., that these standard keras model obects). 

      To make it easier to interface Ais with other tools, we have now also made some of the core functionality available (e.g. training models, batch segmentation) via the command line interface. Information on how to use this is included in the online documentation. All file formats are common formats used in cryoET, so that using Ais in a workflow with, e.g. AreTomo -> Ais -> Relion should now be more straightforward.

      (2) A different major line advanced by the authors to underpin the novelty of the software, is its claimed flexibility and modularity. In particular, the restrictions of other packages in terms of visualization and user interaction are mentioned. Although in the manuscript it is also mentioned that most of the functionalities in Ais are already available in major established packages, as a reader I am left confused about what exactly makes the offer of Ais different from others in terms of operation and interaction: is it just the two aspects developed in the manuscript (possibility of using different models and tools to operate model interaction)? If so, it should probably be stated; but if the authors want to pinpoint other aspects of the capacity of Ais to drive smoothly the interactions, they should be listed and described, instead of leaving it as an unspecific comment. As a potential user of Ais, I would suggest the authors add (maybe in the supplementary material) a listing of such features. Figure 1 does indeed carry the name "overview of (...) functionalities", but it is not clear to me which functionalities I can expect to be absent or differently solved on the other tools they mention.

      We have rewritten the part of the introduction where we previously listed the features as below. We think it should now be clearer for the reader to know what features to expect, as well as how Ais can interface with other software (i.e. what the inputs and outputs are). We have also edited the caption for Figure 1 to make it explicit that panels A to C represent the annotation, model preparation, and rendering steps of the Ais workflow and that the images are screenshots from the software.

      “In this report we present Ais, an open-source tool that is designed to enable any cryoET user – whether experienced with software and segmentation or a novice – to quickly and accurately segment their cryoET data in a streamlined and largely automated fashion. Ais comprises a comprehensive and accessible user interface within which all steps of segmentation can be performed, including: the annotation of tomograms and compiling datasets for the training of convolutional neural networks (CNNs), training and monitoring performance of CNNs for automated segmentation, 3D visualization of segmentations, and exporting particle coordinates or meshes for use in downstream processes. To help generate accurate segmentations, the software contains a library of various neural network architectures and implements a system of configurable interactions between different models. Overall, the software thus aims to enable a streamlined workflow where users can interactively test, improve, and employ CNNs for automated segmentation. To ensure compatibility with other popular cryoET data processing suites, Ais employs file formats that are common in the field, using .mrc files for volumes, tab-separated .txt or .star files for particle datasets, and the .obj file format for exporting 3D meshes.”

      “Figure 1 – an overview of the user interface and functionalities. The various panels represent sequential stages in the Ais processing workflow, including annotation (a), testing CNNs (b), visualizing segmentation (c). These images (a-c) are unedited screenshots of the software. a) […]”

      (3) Table 1 could have the names of the three last columns. The table has enough empty space in the other columns to accommodate this. 

      Done.

      (4) The comment about Pix2pix needing a larger number of training epochs (being a larger model than the other ones considered) is interesting. It also lends itself for the authors to illustrate the ability of their software to precisely do this: allow the users to flexibly analyze results and test hypothesis

      Please see the response to Reviewer 1 comment #3. We agree that this is a useful example of the ability to iterate between annotation and training, and have added an explicit mention of this in the text:

      “Moreover, since Pix2pix is a relatively large network, it might also be improved further by increasing the number of training epochs. In a second iteration of annotation and training, we thus increased the size of the antibody platform training dataset (from 58 to 170 positive samples) and generated an improved Pix2pix model for use in the following analyses.”

      Reviewer 3 (Public Review):

      We appreciate the reviewer’s extensive and very helpful feedback and are glad to read that they consider Ais potentially quite useful for the users. To address the reviewer’s comments, we have made various edits to the text, figures, and documentation, that we think have helped improve the clarity of our work. We list all edits below. 

      Summary

      In this manuscript, Last and colleagues describe Ais, an open-source software package for the semi-automated segmentation of cryo-electron tomography (cryo-ET) maps. Specifically, Ais provides a graphical user interface (GUI) for the manual segmentation and annotation of specific features of interest. These manual annotations are then used as input ground-truth data for training a convolutional neural network (CNN) model, which can then be used for automatic segmentation. Ais provides the option of several CNNs so that users can compare their performance on their structures of interest in order to determine the CNN that best suits their needs. Additionally, pre-trained models can be uploaded and shared to an online database. 

      Algorithms are also provided to characterize "model interactions" which allows users to define heuristic rules on how the different segmentations interact. For instance, a membrane-adjacent protein can have rules where it must colocalize a certain distance away from a membrane segmentation. Such rules can help reduce false positives; as in the case above, false negatives predicted away from membranes are eliminated. 

      The authors then show how Ais can be used for particle picking and subsequent subtomogram averaging and for the segmentation of cellular tomograms for visual analysis. For subtomogram averaging, they used a previously published dataset and compared the averages of their automated picking with the published manual picking. Analysis of cellular tomogram segmentation was primarily visual. 

      Strengths:

      CNN-based segmentation of cryo-ET data is a rapidly developing area of research, as it promises substantially faster results than manual segmentation as well as the possibility for higher accuracy. However, this field is still very much in the development and the overall performance of these approaches, even across different algorithms, still leaves much to be desired. In this context, I think Ais is an interesting package, as it aims to provide both new and experienced users with streamlined approaches for manual annotation, access to a number of CNNs, and methods to refine the outputs of CNN models against each other. I think this can be quite useful for users, particularly as these methods develop. 

      Weaknesses: 

      Whilst overall I am enthusiastic about this manuscript, I still have a number of comments: 

      (1) On page 5, paragraph 1, there is a discussion on human judgement of these results. I think a more detailed discussion is required here, as from looking at the figures, I don't know that I agree with the authors' statement that Pix2pix is better. I acknowledge that this is extremely subjective, which is the problem. I think that a manual segmentation should also be shown in a figure so that the reader has a better way to gauge the performance of the automated segmentation.

      Please see the answer to Reviewer 1’s comment #3.

      (2) On page 7, the authors mention terms such as "emit" and "absorb" but never properly define them, such that I feel like I'm guessing at their meaning. Precise definitions of these terms should be provided. 

      We have expanded this paragraph to include some more detail:

      “Besides these specific interactions between two models, the software also enables pitching multiple models against one another in what we call ‘model competition’. Models can be set to ‘emit’ and/or ‘absorb’ competition from other models. Here, to emit competition means that a model’s prediction value is included in a list of competing models. To absorb competition means that a model’s prediction value will be compared to all values in that list, and that this model’s prediction value for any pixel will be set to zero if any of the competing models’ prediction value is higher. On a pixel-by-pixel basis, all models that absorb competition are thus suppressed whenever their prediction value for a pixel is lower than that of any of the emitting models.” 

      (3) For Figure 3, it's unclear if the parent models shown (particularly the carbon model) are binary or not.

      The figure looks to be grey values, which would imply that it's the visualization of some prediction score. If so, how is this thresholded? This can also be made clearer in the text. 

      The figures show the grayscale output of the parent model, but this grayscale output is thresholded to produce a binary mask that is used in an interaction. We have edited the text to include a mention of thresholding at a user-specified threshold value:

      “These interactions are implemented as follows: first, a binary mask is generated by thresholding the parent model’s predictions using a user-specified threshold value. Next, the mask is then dilated using a circular kernel with a radius 𝑅, a parameter that we call the interaction radius. Finally, the child model’s prediction values are multiplied with this mask.”

      To avoid confusion, we have also edited the figure to show the binary masks rather than the grayscale segmentations. 

      (4) Figure 3D was produced in ChimeraX using the hide dust function. I think some discussion on the nature of this "dust" is in order, e.g. how much is there and how large does it need to be to be considered dust? Given that these segmentations can be used for particle picking, this seems like it may be a major contributor to false positives. 

      ‘Dust’ in segmentations is essentially unavoidable; it would require a perfect model that does not produce any false positives. However, when models are sufficiently accurate, the volume of false positives is typically smaller than that of the structures that were intended to be segmented. In these cases, discarding particles based on size is a practical way of filtering the segmentation results. Since it is difficult to generalize when to consider something ‘dust’ we decided to include this additional text in the Method’s section rather than in the main text:

      “… with the use of the ‘hide dust’ function (the same settings were used for each panel, different settings used for each feature).

      This ‘dust’ corresponds to small (in comparison to the segmented structures of interest) volumes of false positive segmentations, which are present in the data due to imperfections in the used models. The rate and volume of false positives can be reduced either by improving the models (typically by including more examples of the images of what would be false negatives or positives in the training data) or, if the dust particles are indeed smaller than the structures of interest, they can simply be discarded by filtering particles based on their volume, as applied here. In particle picking a ‘minimum particle volume’ is specified – particles with a smaller volume are considered ‘dust’.

      In combination with the newly included text about the method of converting volumes into lists of coordinates (see Reviewer 1’s comment #6).

      “Third, a watershed transform is applied to the resulting volume, so that the sets of pixels closest to any local maximum in the distance transformed volume are assigned to one group. Fourth, groups that are smaller than a user-specified minimum volume are discarded…”

      We think it should now be clearer that (some form of) discarding ‘dust’ is a step that is typically included in the particle picking process.

      (5) Page 9 contains the following sentence: "After selecting these values, we then launched a batch particle picking process to determine lists of particle coordinates based on the segmented volumes." Given how important this is, I feel like this requires significant description, e.g. how are densities thresholded, how are centers determined, and what if there are overlapping segmentations? 

      Please see the response to Reviewer 1’s comment #6.

      (6) The FSC shown in Figure S6 for the auto-picked maps is concerning. First, a horizontal line at FSC = 0 should be added. It seems that starting at a frequency of ~0.045, the FSC of the autopicked map increases above zero and stays there. Since this is not present in the FSC of the manually picked averages, this suggests the automatic approach is also finding some sort of consistent features. This needs to be discussed. 

      Thank you for pointing this out. Awkwardly, this was due to a mistake made while formatting the figure. In the two separate original plots, the Y axes had slightly different ranges, but this was missed when they were combined to prepare the joint supplementary figure. As a result, the FSC values for the autopicked half maps are displayed incorrectly. The original separate plots are shown below to illustrate the discrepancy:

      Author response image 1.

      The corrected figure is Figure S9 in the manuscript. The values of 44 Å and 46 Å were not determined from the graph and remain unchanged.

      (7) Page 11 contains the statement "the segmented volumes found no immediately apparent false positive predictions of these pores". This is quite subjective and I don't know that I agree with this assessment. Unless the authors decide to quantify this through subtomogram classification, I don't think this statement is appropriate. 

      We originally included this statement and the supplementary figure because we wanted to show another example of automated picking, this time in the more crowded environment of the cell. We do agree that it requires better substantiation, but also think that the demonstration of automated picking of the antibody platforms and IgG3-C1 complexes for subtomogram averaging suffices to demonstrate Ais’ picking capabilities. Since the supplementary information includes an example of picked coordinates rendered in the Ais 3D viewer (Figure S7) that also used the pore dataset, we still include the supplementary figure (S10) but have edited the statement to read:

      “Moreover, we could identify the molecular pores within the DMV, and pick sets of particles that might be suitable for use in subtomogram averaging (see Fig. S11).”

      We have also expanded the text that accompanies the supplementary figure to emphasize that results from automated picking are likely to require further curation, e.g. by classification in subtomogram averaging, and that the selection of particles is highly dependent on the thresholds used in the conversion from volumes to lists of coordinates.

      (8) In the methods, the authors note that particle picking is explained in detail in the online documentation. Given that this is a key feature of this software, such an explanation should be in the manuscript. 

      Please see the response to Reviewer 1’s comment #6. 

      Recommendations:

      (9) The word "model" seems to be used quite ambiguously. Sometimes it seems to refer to the manual segmentations, the CNN architectures, the trained models, or the output predictions. More precision in this language would greatly improve the readability of the manuscript.

      This was indeed quite ambiguous, especially in the introduction. We have edited the text to be clearer on these differences. The word ‘model’ is now only used to refer to trained CNNs that segment a particular feature (as in ‘membrane model’ or ‘model interactions’). Where we used terms such as ‘3D models’ to describe scenes rendered in 3D, we now use ‘3D visualizations’ or similar terms. Where we previously used the term ‘models’ to refer to CNN architectures, we now use terms such as ‘neural network architectures’ or ‘architecture’. Some examples:

      … with which one can automatically segment the same or any other dataset …

      Moreover, since Pix2pix is a relatively large network, …       

      … to generate a 3D visualization of ten distinct cellular …

      … with the use of the same training datasets for all network architectures …

      In Figure 1, the text in panels D and E is illegible. 

      We have edited the figure to show the text more clearly (the previous images were unedited screenshots of the website).

      (10) Prior to the section on model interactions, I was under the impression that all annotations were performed simultaneously. I think it could be clarified that models are generated per annotation type. 

      Multiple different features can be annotated (i.e. drawn by hand by the user) at the same time, but each trained CNN only segments one feature. CNNs that output segmentations for multiple features can be implemented straightforwardly, but this introduces the need to provide training data where for every grayscale image, every feature is annotated. This can make preparing the training data much more cumbersome. Reusability of the models is also hampered. We now mention the separateness of the networks explicitly in the introduction:

      “Multiple features, such as membranes, microtubules, ribosomes, and phosphate crystals, can be segmented and edited at the same time across multiple datasets (even hundreds). These annotations are then extracted and used as ground truth labels upon which to condition multiple separate neural networks, …”

      (11) On page 6, there is the text "some features are assigned a high segmentation value by multiple of the networks, leading to ambiguity in the results". Do they mean some false features? 

      To avoid ambiguity of the word ‘features’, we have edited the sentence to read:

      “… some parts of the image are assigned a high segmentation value by multiple of the networks, leading to false classifications and ambiguity in the results.”

      (12) Figures 2 and 3 would be easier to follow if they had consistent coloring. 

      We have changed the colouring in Figure 2 to match that of Figure 3 better:

      (13) For Figure 3D, I'm confused as to why the authors showed results from the tomogram in Figure 2B. It seems like the tomogram in Figure 3C would be a more obvious choice, as we would be able to see how the 2D slices look in 3D. This would also make it easier to see the effect of interactions on false negatives. Also, since the orientation of the tomogram in 2B is quite different than that shown in 3D, it's a bit difficult to relate the two.

      We chose to show this dataset because it exemplifies the effects of both model competition and model interactions better than the tomogram in Figure 3C. See Figure 3D and Author response image 2 for a comparison:

      Author response image 2.

      (14) I'm confused as to why the tomographic data shown in Figures 4D, E, and F are black on white while all other cryo-ET data is shown as white on black. 

      The images in Figure 4DEF are now inverted.

      (15) For Figure 5, there needs to be better visual cueing to emphasize which tomographic slices are related to the segmentations in Panels A and B. 

      We have edited the figure to show more clearly which grayscale image corresponds to which segmentation:

      (16) I don't understand what I should be taking away from Figures S1 and S2. There are a lot of boxes around membrane areas and I don't know what these boxes mean. 

      We have added a more descriptive text to these figures. The boxes are placed by the user to select areas of the image that will be sampled when saving training datasets.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      The study examines how pyruvate, a key product of glycolysis that influences TCA metabolism and gluconeogenesis, impacts cellular metabolism and cell size. It primarily utilizes the Drosophila liver-like fat body, which is composed of large post-mitotic cells that are metabolically very active. The study focuses on the key observations that overexpression of the pyruvate importer MPC complex (which imports pyruvate from the cytoplasm into mitochondria) can reduce cell size in a cell-autonomous manner. They find this is by metabolic rewiring that shunts pyruvate away from TCA metabolism and into gluconeogenesis. Surprisingly, mTORC and Myc pathways are also hyper-active in this background, despite the decreased cell size, suggesting a non-canonical cell size regulation signaling pathway. They also show a similar cell size reduction in HepG2 organoids. Metabolic analysis reveals that enhanced gluconeogenesis suppresses protein synthesis. Their working model is that elevated pyruvate mitochondrial import drives oxaloacetate production and fuels gluconeogenesis during late larval development, thus reducing amino acid production and thus reducing protein synthesis. 

      Strengths: 

      The study is significant because stem cells and many cancers exhibit metabolic rewiring of pyruvate metabolism. It provides new insights into how the fate of pyruvate can be tuned to influence Drosophila biomass accrual, and how pyruvate pools can influence the balance between carbohydrate and protein biosynthesis. Strengths include its rigorous dissection of metabolic rewiring and use of Drosophila and mammalian cell systems to dissect carbohydrate:protein crosstalk. 

      Weaknesses: 

      However, questions on how these two pathways crosstalk, and how this interfaces with canonical Myc and mTORC machinery remain. There are also questions related to how this protein:carbohydrate crosstalk interfaces with lipid biosynthesis. Addressing these will increase the overall impact of the study. 

      We thank the reviewer for recognizing the significance of our work and for providing constructive feedback. Our findings indicate that elevated pyruvate transport into mitochondria acts independently of canonical pathways, such as mTORC1 or Myc signaling, to regulate cell size. To investigate these pathways, we utilized immunofluorescence with well-validated surrogate measures (p-S6 and p-4EBP1) in clonal analyses of MPC expression, as well as RNAseq analyses in whole fat body tissues expressing MPC. These methods revealed surprising hyperactivation of mTORC1 and Myc signaling in Drosophila fat body cells expressing MPC, which are dramatically smaller than control cells. One explanation of these seemingly contradictory observations could be an excess of nutrients that activate mTORC1 or Myc pathways. However, our data is inconsistent with a nutrient surplus that could explain this hyperactivation. Instead, we observed reduced amino acid abundance upon MPC expression, which is very surprising given the observed hyperactivation of mTORC1. This led us to hypothesize the existence of a feedback mechanism that senses an inappropriate reduction in cell size and activates signaling pathways to promote cell growth. The best-characterized “sizer” pathway for mammalian cells is the Cyclin D/CDK4 complex, which has been well studied in the context of cell size regulation of the cell cycle (PMID 10970848, 34022133). However, the mechanisms that sense cell size in post-mitotic cells, such as fat body cells and hepatocytes, remain poorly understood. Investigating the hypothesized size-sensing mechanisms at play here is a fascinating direction for future research.

      For the current study, we conducted epistatic analyses with mTORC1 pathway members by overexpressing PI3K and knocking down the TORC1 inhibitor Tuberous Sclerosis Complex 1 (Tsc1). These manipulations increased the size of control fat body cells but not those overexpressing the MPC (Supplementary Fig. 3c, 3d). Regarding Myc, its overexpression increased the size of both control and MPC+ clones (Supplementary Fig. 3e), but Myc knockdown had no additional effect on cell size in MPC+ clones (Supplementary Fig. 3f). These results suggest that neither mTORC1, PI3K, nor Myc is epistatic to the cell size effects of MPC expression. Consequently, we shifted our focus to metabolic mechanisms regulating biomass production and cell size.

      When analyzing cellular biomolecules contributing to biomass, we observed a significant impact on protein levels in Drosophila fat body cells and mammalian MPC-expressing HepG2 spheroids. Triglyceride abundance in MPC-expressing HepG2 spheroids and whole fat body cells showed a statistically insignificant decrease compared to controls. Furthermore, lipid droplets in fat body cells were comparable in MPC-expressing clones when normalized to cell size.

      Interestingly, RNA-seq analysis revealed modestly increased expression of fatty acid and cholesterol biosynthesis pathways in MPC-expressing fat body cells. Upregulated genes included major SREBP targets, such as ATPCL (2.08-fold), FASN1 (1.15-fold), FASN2 (1.07-fold), and ACC (1.26-fold). Since mTORC1 promotes SREBP activation and MPC-expressing cells showed elevated mTOR activity and upregulation of SREBP targets, we hypothesize that SREBP is modestly activated in these cells. Nonetheless, our data on amino acid abundance and its impact on protein synthesis activity suggest that protein abundance is likely to play a prominent causal role in regulating cell size in response to increased pyruvate transport into mitochondria.

      Reviewer #2 (Public review): 

      In this manuscript, the authors leverage multiple cellular models including the drosophila fat body and cultured hepatocytes to investigate the metabolic programs governing cell size. By profiling gene programs in the larval fat body during the third instar stage - in which cells cease proliferation and initiate a period of cell growth - the authors uncover a coordinated downregulation of genes involved in mitochondrial pyruvate import and metabolism. Enforced expression of the mitochondrial pyruvate carrier restrains cell size, despite active signaling of mTORC1 and other pathways viewed as traditional determinants of cell size. Mechanistically, the authors find that mitochondrial pyruvate import restrains cell size by fueling gluconeogenesis through the combined action of pyruvate carboxylase and phosphoenolpyruvate carboxykinase. Pyruvate conversion to oxaloacetate and use as a gluconeogenic substrate restrains cell growth by siphoning oxaloacetate away from aspartate and other amino acid biosynthesis, revealing a tradeoff between gluconeogenesis and provision of amino acids required to sustain protein biosynthesis. Overall, this manuscript is extremely rigorous, with each point interrogated through a variety of genetic and pharmacologic assays. The major conceptual advance is uncovering the regulation of cell size as a consequence of compartmentalized metabolism, which is dominant even over traditional signaling inputs. The work has implications for understanding cell size control in cell types that engage in gluconeogenesis but more broadly raise the possibility that metabolic tradeoffs determine cell size control in a variety of contexts. 

      We thank the reviewer for their thoughtful recognition of our efforts, and we are honored by the enthusiasm the reviewer expressed for the findings and the significance of our research. We share the reviewer’s opinion that our work might help to unravel metabolic mechanisms that regulate biomass gain independent of the well-known signaling pathways.

      Reviewer #3 (Public review): 

      Summary: 

      In this article, Toshniwal et al. investigate the role of pyruvate metabolism in controlling cell growth. They find that elevated expression of the mitochondrial pyruvate carrier (MPC) leads to decreased cell size in the Drosophila fat body, a transformed human hepatocyte cell line (HepG2), and primary rat hepatocytes. Using genetic approaches and metabolic assays, the authors find that elevated pyruvate import into cells with forced expression of MPC increases the cellular NADH/NAD+ ratio, which drives the production of oxaloacetate via pyruvate carboxylase. Genetic, pharmacological, and metabolic approaches suggest that oxaloacetate is used to support gluconeogenesis rather than amino acid synthesis in cells over-expressing MPC. The reduction in cellular amino acids impairs protein synthesis, leading to impaired cell growth. 

      Strengths: 

      This study shows that the metabolic program of a cell, and especially its NADH/NAD+ ratio, can play a dominant role in regulating cell growth.

      The combination of complementary approaches, ranging from Drosophila genetics to metabolic flux measurements in mammalian cells, strengthens the findings of the paper and shows a conservation of MPC effects across evolution.

      Weaknesses: 

      In general, the strengths of this paper outweigh its weaknesses. However, some areas of inconsistency and rigor deserve further attention. 

      Thank you for reviewing our manuscript and offering constructive feedback. We appreciate your recognition of the significance of our work and your acknowledgment of the compelling evidence we have presented. We have carefully revised the manuscript in line with the reviewers' recommendations.

      The authors comment that MPC overrides hormonal controls on gluconeogenesis and cell size (Discussion, paragraph 3). Such a claim cannot be made for mammalian experiments that are conducted with immortalized cell lines or primary hepatocytes. 

      We appreciate the reviewer’s insightful comment. Pyruvate is a primary substrate for gluconeogenesis, and our findings suggest that increased pyruvate transport into mitochondria increases the NADH-to-NAD+ ratio, and thereby elevates gluconeogenesis. Notably, we did not observe any changes in the expression of key glucagon targets, such as PC, PEPCK2, and G6PC, suggesting that the glucagon response is not activated upon MPC expression. By the statement referenced by the reviewer, we intended to highlight that excess pyruvate import into mitochondria drives gluconeogenesis independently of hormonal and physiological regulation. 

      It seems the reviewer might also have been expressing the sentiment that our in vitro models may not fully reflect the in vivo situation, and we completely agree.  Moving forward, we plan to perform similar analyses in mammalian models to test the in vivo relevance of this mechanism. For now, we will refine the language in the manuscript to clarify this point.

      Nuclear size looks to be decreased in fat body cells with elevated MPC levels, consistent with reduced endoreplication, a process that drives growth in these cells. However, acute, ex vivo EdU labeling and measures of tissue DNA content are equivalent in wild-type and MPC+ fat body cells. This is surprising - how do the authors interpret these apparently contradictory phenotypes? 

      We thank the reviewer for raising this important issue. The size of the nucleus is regulated by DNA content and various factors, including the physical properties of DNA, chromatin condensation, the nuclear lamina, and other structural components (PMID 32997613). Additionally, cytoplasmic and cellular volume also impact nuclear size, as extensively documented during development (PMID 17998401, PMID 32473090).

      In MPC-expressing cells, it is plausible that the reduced cellular volume impacts chromatin condensation or the nuclear lamina in a way that slightly decreases nuclear size without altering DNA content. Specifically, in our whole-fat body experiments using CG-Gal4 (as shown in Supplementary Figure 2a-c), we noted that after 12 hours of MPC expression, cell size was significantly reduced (Supplementary Figure 2c and Author Response Figure 1A). However, the reduction in nuclear size is modestly different at 24 hours and significantly different at 36 hours (Author Response Figure 1B), suggesting that the reduction in cell size is a more acute response to MPC expression, followed only later by effects on nuclear size.

      In clonal analyses, this relationship was further clarified. MPC-expressing cells with a size greater than 1000 µm² displayed nuclear sizes comparable to control cells, whereas those with a drastic reduction in cell size (less than 1000 µm²) exhibited smaller nuclei (Author Response Figure 1C and 1D). These observations collectively suggest that changes in nuclear size are more likely to be downstream rather than upstream of cell size reduction. Given that DNA content remains unaffected, we focused on investigating the rate of protein synthesis. Our findings suggest that protein synthesis might play a causal role in regulating cell size, thereby reinforcing the connection between cellular and nuclear size in this context.

      Author response image 1.<br />

      Cell Size vs. Nuclear Size in MPC-Expressing Fat Body Cells A. Cell size comparison between control (blue, ay-GFP) and MPC+ (red, ay-MPC) fat body cells over time, measured in hours after MPC expression induction. B. Nuclear area measurements from the same fat body cells in ay-GFP and ay-MPC groups. C. Scatter plot of nuclear area vs. cell area for control (ay-GFP) cells, including the corresponding R<sup>2</sup> value. D. Scatter plot of nuclear area vs. cell area for MPC-expressing (ay-MPC) cells, with the respective R² value.

      This figure highlights the relationship between nuclear and cell size in MPC-expressing fat body cells, emphasizing the distinct cellular responses observed following MPC induction.

      In Figure 4d, oxygen consumption rates are measured in control cells and those overexpressing MPC. Values are normalized to protein levels, but protein is reduced in MPC+ cells. Is oxygen consumption changed by MPC expression on a per-cell basis? 

      As described in the manuscript, MPC-expressing cells are smaller in size. In this context, we felt that it was most appropriate to normalize oxygen consumption rates (OCR) to cellular mass to enable an accurate interpretation of metabolic activity. Therefore, we normalized OCR with protein content to account for variations in cellular size and (probably) mitochondrial mass. 

      Trehalose is the main circulating sugar in Drosophila and should be measured in addition to hemolymph glucose. Additionally, the units in Figure 4h should be related to hemolymph volume - it is not clear that they are. 

      We appreciate this valuable suggestion. In the revised manuscript, we have quantified trehalose abundance in circulation and within fat bodies. As described in the Methods section and following the approach outlined in Ugrankar-Banerjee et al. (2023, we bled 10 larvae (either control or MPC-expressing) using forceps onto parafilm. From this, 2 microliters of hemolymph were collected for glucose measurement. The hemolymph was treated with trehalase overnight, and the resulting glucose derived from trehalose was measured. We have observed that trehalose levels were also elevated in hemolymph of fat body-specific MPC-expressing larvae, further supporting our conclusion that MPC expression in fat body induces a hyperglycemic state. These data are now included in Figure 4h of the revised manuscript, and the details are further mentioned in the revised materials and methods.  

      Measurements of NADH/NAD ratios in conditions where these are manipulated genetically and pharmacologically (Figure 5) would strengthen the findings of the paper. Along the same lines, expression of manipulated genes - whether by RT-qPCR or Western blotting - would be helpful to assess the degree of knockdown/knockout in a cell population (for example, Got2 manipulations in Figures 6 and S8). 

      We appreciate this suggestion, which will provide additional rigor to our study. We have already quantified NADH/NAD+ ratios in HepG2 cells under UK5099, NMN, and Asp supplementation, as presented in Figure 6k. As suggested, we have quantified the expression of Got2 manipulations mentioned in Figure 6j using RT-qPCR, this data is presented in revised Supplementary Figure 8f-h. In addition, Supplementary Figure 8i has been updated with western blot analysis of Got2 expression in knock-out cells used to perform the size analysis in HepG2 cells.

      Additionally, we have also analysed the efficiency of pcb (Supplementary Figure 6a-c), pdha (Supplementary Figure 6f-h), dlat (Supplementary Figure 6f, g and i), pepck2 (Supplementary Figure 6n-p), fbp  (Supplementary Figure 6n, m, q)  manipulations used to modulate the expression of these genes. These validations will ensure the robustness of our findings and strengthen the conclusions of our study.

      Reviewer #1 (Recommendations for the authors): 

      General questions: 

      (1) MPC over-expression in HepG2 cells altered the redox balance and the NADH/NAD+ ratio. This is suggested to help drive the metabolic rewiring from protein to carbohydrate biosynthesis. In line with this overexpression of Nmnat (which makes NAD+) or NDX rescues cell size and elevates protein biosynthesis. However, mechanistically it is unclear exactly how these redox NAD+ changes directly impact protein biosynthesis. Some additional explanations will strengthen this portion of the study. 

      Our data indicate that the altered redox state of the cell, particularly elevated NADH levels, affects the rate of protein synthesis. A similar relationship between redox balance and protein synthesis has been observed during embryonic development (PMID: 39879975), although the underlying mechanism remains uncharacterized. Our study suggests that increased NADH levels reprogram cellular carbohydrate metabolism, shifting it from glycolysis toward gluconeogenesis. This metabolic shift necessitates the use of oxaloacetate by PEPCK2, instead of its diversion toward GTP-mediated aspartate synthesis. Aspartate, which can be anaplerotically converted into glutamate and proline, plays a critical role in protein biosynthesis. Thus, the conversion of oxaloacetate to phosphoenolpyruvate represents a key metabolic node influencing protein synthesis under altered redox conditions. Additionally, since aspartate serves as a precursor for NAD biosynthesis, this may suggest a feedforward loop reinforcing the metabolic rewiring. Nonetheless, the precise relationship between NADH concentration and redox status and the regulation of protein synthesis warrants further investigation in future studies.

      (2) In the MPC1/2 (MPC+) over-expression background, can blocking of gluconeogenesis downstream in the carbohydrate synthesis pathway rescue the phenotype? 

      We knocked down FBPase (Drosophila fbp) using an RNAi construct, achieving approximately 60% reduction in FBPase expression in Drosophila. Notably, FBPase knockdown in fat body cells overexpressing MPC rescued the reduced cell size phenotype. These findings are presented in Figure 4o and Supplementary Figures 6n–q.

      (3) Biomass accrual and cell size are also influenced by lipogenesis. The study suggests mTORC and Myc are uncoupled to cell size determination per se, but how lipogenesis regulatory pathways like SREBP are impacted by MPC overexpression is not really explored. How lipid membrane synthesis inter-relates to this protein/carbohydrate crosstalk would add to the understanding of the system. 

      As mentioned above - When analyzing cellular biomolecules contributing to biomass, we observed a significant impact on protein levels in Drosophila fat body cells and mammalian MPC-expressing HepG2 spheroids. Triglyceride abundance in MPC-expressing HepG2 spheroids and whole fat body cells showed a statistically insignificant decrease compared to controls. Furthermore, lipid droplets in fat body cells were comparable in MPC-expressing clones when normalized to cell size.

      Interestingly, RNA-seq analysis revealed increased expression of fatty acid and cholesterol biosynthesis pathways in MPC-expressing fat body cells. Upregulated genes included major SREBP targets, such as ATPCL (2.08-fold), FASN1 (1.15-fold), FASN2 (1.07-fold), and ACC (1.26-fold). Since mTOR promotes SREBP activation and MPC-expressing cells showed elevated mTOR activity and upregulation of SREBP targets, we hypothesize that SREBP is modestly activated in these cells. Nonetheless, our data on amino acid abundance and its impact on protein synthesis activity suggest that protein abundance, rather than lipids, is likely to play a larger causal role in regulating cell size in response to increased pyruvate transport into mitochondria.

      Reviewer #2 (Recommendations for the authors): 

      I have only minor suggestions for the authors to consider. 

      Minor points 

      (1) Wherever possible, scale bars should be labeled with units or indicated comparisons (e.x. Supplementary Fig. 1). To make the data as accessible as possible, it would be helpful for the authors to include the data presented in Supplementary Figure 1 as an associated table as well. 

      We have corrected this in the revised manuscript and included the table. 

      (2) To support the conclusions about TCA cycle flux (lines 280-284), it will be helpful for the authors to consider relative metabolite pool sizes (which they should have on hand) in addition to labeling rate and fraction. 

      We thank the reviewer for this suggestion. We have included the metabolite counts with fractional abundance changes side by side in Supplementary Figure 5. 

      (3) believe (?) there is a typo in lines 326-328; PEPCK KO increases (not decreases) the size of spheroids/cells. 

      We thank the reviewer for pointing out this error. We have corrected this in the revised manuscript.

      (4) Supplementary Figure 7b: PHD has 3 phospho sites that have independent regulation; the specific phosphosite queried should be listed on the figure and unless all 3 sites are probed the claims about lack of change in phosphorylation (line 337) should be removed. 

      We thank the reviewer for bringing this to our attention. We have included this in the revised manuscript.

      (5) (Optional) I appreciate the effort the authors undertook to acquire cytoplasmic and mitochondrial ratios of NADH/NAD. While I recognize that many labs perform this assay, it is difficult for this reviewer to envision how accurately these values reflects the ratios present in the intact cell given how quickly these redox couples interconvert and significant post-harvest metabolic flux (see for ex PMID: 31767181), even with the extremely rapid fractionation protocol described in the methods. The present data certainly support the notion that MPC+ cells are more reduced, but these ratios may reflect a capacity for reductive metabolism rather than a bona fide NADH/NAD ratio; for example, Figure 7f shows almost identical NADH/NAD ratios in the cytoplasm and mitochondria, even though these compartments are frequently considered to have (sometimes vastly) different redox states. If the authors are willing, I would support them by including a brief discussion of the caveat of this method for new readers in the field. 

      We agree with this important note from the reviewers. This is an important caveat of the technique that we used for these analyses. We have included a description of this caveat in the manuscript (Revised Manuscripts lines 393 to 395).

      Reviewer #3 (Recommendations for the authors): 

      Minor points: 

      (1) Line 327 - "smaller" should be "bigger". 

      We thank the reviewer for pointing out this error. We have corrected this in the revised manuscript.

      (2) For Figure 7 - references to panels e and f in the text, and descriptions of e and f in the Figure Legend are switched with regard to the Figure itself. 

      We thank the reviewer for pointing out this error. We have corrected this in the revised manuscript.

      (3) Line 449 - "reduced" is missing its R 

      We thank the reviewer for pointing out this error. We have corrected this in the revised manuscript.

      (4) Some additional, careful proofreading is needed - several other punctuation errors were found. 

      We thank the reviewer for pointing out these errors. 

      We thank the reviewer for bringing this to our attention. We have conducted very careful proofreading and corrected errors.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Jin, Briggs, and colleagues use light sheet imaging to reconstruct the islet threedimensional Ca2+ network. The authors find that early/late responding (leader) cells are dynamic over time, and located at the islet periphery. By contrast, highly connected or hub cells are stable and located toward the islet center. Suggesting that the two subpopulations are differentially regulated by fuel input, glucokinase activation only influences leader cell phenotype, whereas hubs remain stable.

      Strengths:

      The studies are novel in providing the first three-dimensional snapshot of the beta cell functional network, as well as determining the localization of some of the different subpopulations identified to date. The studies also provide some consensus as to the origin, stability, and role of such subpopulations in islet function.

      We thank the reviewers for their positive assessment.

      Weaknesses:

      Experiments with metabolic enzyme activators do not take into account the influence of cell viability on the observed Ca2+ network data. Limitations of the imaging approach used need to be recognized and evaluated/discussed.

      We worked very hard to make sure the islets remained stable and healthy over the duration of imaging time course. We imaged the islet in 3D and observed that all betacells displayed glucose-dependent oscillations, which can only arise from functioning cells. From the raw calcium traces (displayed in the figures) we observed no detectable loss of signal over 60 min of continuous imaging regardless of drug treatment; this is because the laser excitation is below the bleach threshold for GCaMP6s, and it is bleaching that generates phototoxicity. To demonstrate this clearly, we performed a bleach test using 6x laser power; in this case calcium amplitude dropped 30% over a 60 min of imaging, however islet calcium oscillatory behavior was preserved. Light-sheet is well documented to be 1000x more gentle than other optical sectioning techniques, which is why it was chosen for this application.

      Regarding the limitations of imaging approach, we recognized studying islets ex vivo is necessarily performed in the absence of native surrounding tissue, as highlighted in the discussion.

      Reviewer #2 (Public Review):

      The manuscript by Erli Jin, Jennifer Briggs et al. utilizes light sheet microscopy to image islet beta cell calcium oscillations in 3D and determine where beta cell populations are located that begin and coordinate glucose-stimulated calcium oscillations. The light sheet technique allowed clear 3D mapping of beta cell calcium responses to glucose, glucokinase activation, and pyruvate kinase activation. The manuscript finds that synchronized beta-cells are found at the islet center, that leader beta cells showing the first calcium responses are located on the islet periphery, that glucokinase activation helped maintain beta cells that lead calcium responses, and that pyruvate kinase activation primarily increases islet calcium oscillation frequency. The study is well-designed, contains a significant amount of high-quality data, and the conclusions are largely supported by the results.

      It has recently been shown that beta cells within islets containing intact vasculature (such as those in a pancreatic slice) show different calcium responses compared to isolated islets (such as that shown in PMID: 35559734). It would be important to include some discussion about the potential in vitro artifacts in calcium that arise following islet isolation (this could be included in the discussion about the limitations of the study).

      Although isolated islets reproduce the slow oscillatory calcium behavior observed in vivo, we agree that missing elements such as blood flow, cholinergic innervation, and surrounding tissues may each impact islet calcium responses. Pancreatic regional blood flow also links the endocrine and exocrine signaling which can directly influence the behavior of beta cells. We have highlighted some of these issues in the discussion “In addition to α-cells, vasculature may also impact islet Ca2+ responses, and may induce additional heterogeneity in vivo.” (see line 375, Ref. 46).

      Reviewer #3 (Public Review):

      Summary:

      Jin, Briggs et al. made use of light-sheet 3D imaging and data analysis to assess the collective network activity in isolated mouse islets. The major advantage of using whole islet imaging, despite compromising on the speed of acquisition, is that it provides a complete description of the network, while 2D networks are only an approximation of the islet network. In static-incubation conditions, excluding the effects of perfusion, they assessed two subpopulations of beta cells and their spatial consistency and metabolic dependence.

      Strengths:

      The authors confirmed that coordinated Ca2+ oscillations are important for glycemic control. In addition, they definitively disproved the role of individual privileged cells, which were suggested to lead or coordinate Ca²⁺ oscillations. They provided evidence for differential regional stability, confirming the previously described stochastic nature of the beta cells that act as strongly connected hubs as well as beta cells in initiating regions (doi.org/10.1103/PhysRevLett.127.168101).

      The fact that islet cores contain beta cells that are more active and more coordinated has also been readily observed in high-frequency 2D recordings (e.g. DOI: 10.2337/db22-0952), suggesting that the high-speed capture of fast activity can partially compensate for incomplete topological information.

      They also found an increased metabolic sensitivity of mantle regions of an islet with a subpopulation of beta cells with a high probability of leading the islet activity which can be entrained by fuel input. They discuss a potential role of alpha/delta cell interaction, however relative lack of beta cells in the islet border region could also be a factor contributing to less connectivity and higher excitability.

      The Methods section contains a useful series of direct instructions on how to approach fast 3D imaging with currently available hardware and software.

      The Discussion is clear and includes most of the issues regarding the interpretation of the presented results.

      Some issues concerning inconsistencies between data presented and statements made as well as statistical analysis need to be addressed.

      Taken together it is a strong technical paper to demonstrate the stochasticity regarding the functions subpopulations of beta cells in the islets may have and how less well-resolved approaches (both missing spatial resolution as well as missing temporal resolution) led us to jump to unjustified conclusions regarding the fixed roles of individual beta cells within an islet.

      We thank the reviewers for the comments on the many strengths of the manuscript and address the specific critiques below.

      Recommendations for the authors:

      Reviewing Editor Comments:

      Essential revisions:

      (1) How useful is GK activation as a subpopulation-level perturbation, given that all beta cells would be affected? Previous studies by the authors have shown that GK gradients likely dictate subpopulation behaviour, so the concern here is that GK activation across all cells might mask the influence of such gradients i.e. a U-shaped effect. Also, does the GK activator differentially penetrate the islet such that first responders/leaders are more vulnerable than hubs?

      As we previously published, non-saturating concentrations of GK activator (as used here) have the same effect on calcium oscillations as raising glucose (PMID:33147484). In other words, the activator boosts the activity of the endogenous GK. To the second point, recent ex vivo islet studies (PMID: 28380380) document the islet penetration of a fluorescent glucose analogue within seconds even under static conditions, and in our study the islets calcium oscillations reached steady state, so we are not concerned about drug penetration. The real limitation with any drug study in the islet is that non-beta cells are also activated; this limitation is included in the discussion along with the recommendation that genetic tools are needed to assess the effect of GK activation in the various endocrine subpopulations. 

      An additional concern with the GK activation experiment is that GK activation might push beta cells into a more stressed state such that they are more susceptible to phototoxicity. Although the authors state that photobleaching is low, they provide no data to support such a statement. Given the long duration of imaging and acquisition rate, phototoxicity might be more of an issue, especially with GK activation. Some further analysis (e.g. apoptosis) would be useful here to exclude an effect of beta cell viability versus GK activation on the observed phenotype of the different subpopulations.

      Acute GK activation (for 30min) does not stress the islet; the drug has the same effect as raising glucose (PMID: 33147484). To determine whether photobleaching was impacted by GK activation, we examined the peak of consecutive oscillations in response to vehicle and GK activator. The average photobleaching was less than 2% of the calcium fluorescence over 30min of continuous imaging. Furthermore, GKa activation did not significantly increase photobleaching (see Author response image 1). 

      Author response image 1.

      To the reviewer’s second point, apoptosis cannot occur on the timescale of the drug treatment (30min), and raw calcium traces are included showing that all beta cells display oscillatory behavior throughout the course of the experiment.

      (2) The authors show that glucokinase activation increases the duration of islet calcium oscillations and in some islets (3 of 15 islets) causes "a Ca2+ plateau." The authors indicate that "Glucokinase, as the 'glucose sensor' for the β-cell, controls the input of glucose carbons into glycolysis, and opens KATP channels." It would be nice to have some experimental evidence that the change in oscillation rate caused by the glucokinase activator is due to KATP activation. This could be accomplished by treating islets with subthreshold KATP activators (e.g., diazoxide) or subthreshold KATP inhibitors (e.g., tolbutamide).

      The statement that glucokinase activation opens KATP channels was a typo; glucose metabolism closes KATP channels by raising the ATP/ADP ratio. We now include additional citations that document the relationship between GK and KATP and the oscillatory behavior. See Ref 22 (PMID: 33147484) and Ref 34 (PMID: 33147484).

      The manuscript finds that "Early phase cells were maintained to a greater degree upon GKa application." Yet GKa is proposed to activate KATP. Some discussion about how the early phase is maintained in cell populations by GKa activation in the context of KATP activity would be useful.

      As discussed above, we meant to say that GKa will close KATP and apologize for the confusion. As we mentioned in the discussion, early phase cells are most likely maintained to a great degree following GK activation as result of enhanced GK gradient and reduced effect of stochastic alpha cell input. 

      (3) Membrane potential depolarization precedes calcium channel activation and subsequent calcium entry. In many cases, electrical coupling across beta cells happens on millisecond timescale. It would be good to confirm that the calcium is showing the same time scale in terms of elevation following beta cell membrane potential depolarization. One concern is that the islet beta cells could be depolarizing at the same speed and lagging in terms of calcium channel activation and calcium entry.

      We thank the reviewer for making this point, which is almost certainly true, particularly since plasma membrane calcium influx is not the sole source of intracellular calcium. Previously published “simultaneous” recordings of Vm and calcium show their same phase relationship but do not have sufficient time resolution to capture depolarization of each cell. A quantification of phase lag would require the field to generate mice with voltage sensors expressed in beta cells; these tools are not yet available.  

      A related issue: in the text, the authors discuss changes in membrane potential (not been measured in this study), while in the figures they exclusively describe Ca2+ oscillations (which were measured). Examples are on lines 149, 150, 153, 154, 263. It is recommended that the silent and active phases in the Results section describe processes actually measured in this study as shown in 6A.

      To clarify, we did not use the term ‘membrane potential’ anywhere in the manuscript. We do sometimes refer to calcium influx as a proxy for membrane depolarization; we think this is valid given the abundant evidence that these processes are interdependent in beta cells.

      (4) It would be good to include the timing of the phases of calcium entry. When was the beta cell calcium entry monitored for the response time? Were the response times between the late and early phases consistent for each oscillation? It looks as if the start of the calcium upstroke was similar for many beta cells (such as for the Figure 2I traces). It would be nice to include a shorter time duration graph of calcium oscillation traces right when the upstroke starts. This would allow the community to observe the differences in the start time of calcium entry. 

      We agree this is an important point. We now include an inset showing the expanded time scale of the calcium upstroke in Fig.2I. The response time spread between early and late phase cells is now shown in Fig.7F (and in Author response image 2). We also quantified the coefficient of variation in the response time spread (0 = no variation and 1 = maximal variation) and found no significant differences between metabolic activators (Author response image 2). 

      Author response image 2.

      Also, for most of the GCaMP6s traces shown, the authors indicate that they are plotted as F/F0. However, this normalization (F/F0) is not done for the actual traces shown. For example, Figure 2D shows the traces starting from what looks to be 0 to 0.3 F/F0, but the traces for an F/F0 group should all start at 1. Please change this for all representative oscillations so the start of calcium entry for example traces all line up.

      This has been corrected in Fig. 2D, I and Fig. 3B. Also Fig.6 should be F not F/F0

      Reviewer #1 (Recommendations for the authors):

      (1) Line 53: "Silencing the electrical activity of these hub cells with optogenetics was found to abolish the coordination within that plane of the islet". The authors should acknowledge that studies also showed that beta cell transcription factor (Pdx1/Mafa) dosage was important for hub cell phenotype and islet function.

      Thank you, this reference to Nasteska et al. (PMID: 33514698, Ref. 16) has been added to the discussion.

      (2) Light sheet imaging is used to image the 3D islet volume. Whilst speed is undoubtedly an advantage of this technique, axial resolution is ~1.1 µm over 4 µm z-step size. How confident are the authors that single nuclei can be reliably identified given their ~6 µm size in a beta cell (e.g. do some elongated nuclear appear, which could be "doublets")?

      The axial resolution of 1.1 µm exceeds the resolution needed for the Nyquist criterion (i.e. sampling every 2-3 µm). As a practical matter, it is not possible to doublecount nuclei because the software will exclude nuclei that occupy the same volume. Only a very elongated nucleus (>10 µm) would be double counted and this does not occur.

      (3) The authors discuss the advantages of the light sheet imaging approach used, including speed and phototoxicity. Some more balance is needed here since other approaches such as two-photon excitation achieve similar speeds with much better axial resolution (see dozens of neural circuit studies).

      We are careful to point out that two-photon excitation has better axial resolution, better tissue penetration, and often higher speeds (kHz using linescans) – however these neuronal studies are limited to the cells in a few planes and the laser power is orders of magnitude higher than lightsheet. For this reason, two photon imaging has not been used to image islet calcium in three dimensions. The bottom line is lightsheet trades axial resolution for gentle volumetric imaging. 

      (4) Line 340: "Laser ablation or optogenetic inactivation of these early phase cells would be predicted to have little impact on islet function, as suggested previously by electrophysiological studies in which surface β-cells have been voltage-clamped with no impact on β-cell oscillations". This statement is slightly ambiguous since the authors showed in their previous studies that laser ablation of first responder cells/leaders was able to influence the Ca2+ network. Do the authors mean that laser ablation would only temporarily influence islet function before another cell picked up the role of a first responder/leader? As written, the sentence seems to imply that first responders/leaders are unimportant for the islet function.

      We intended to imply that the oscillatory system is sufficiently robust that a new cell take over when leader cells are ablated. We also cite Korosak et al. (PMID:34723613, Ref. 40) and Dwulet et al. (PMID: 33939712, Ref. 15) to make this point, although to clarify we are not examining first responders in this study.

      (5) Line 369: "In contrast with leader cells, we found that the highly synchronized cells are both spatially and temporally stable." The sentence needs qualifying- what would spatiotemporal stability be expected to confer on such a subpopulation?

      We believe that the spatiotemporal stability of highly synchronized cells is a consequence of beta cells in the center of the islet lacking the stochastic input of nearby alpha cells; we raise this point in the discussion: “The preponderance of α-cells on the periphery of mouse islets, which influence β-cell oscillation frequency, would be expected to disrupt β-cell synchronization on the periphery and stabilize it in the islet center – which is precisely the pattern of network activity we observed.” (see line 372). 

      (6) Line 370: "However, in conflict with the description of hub cells as intermingled with other cells throughout the islet, the location of such cells in 3D space is close to the center." The study by Johnston et al did not have the axial resolution to exclude that some cells might have been grouped together.

      We agree and have included the reviewer’s comment in the text (See line 384); that’s an important reason for conducting this 3D study.  

      (7) Line 380: "One explanation may be that paracrine communication within the islet determines which region of cells will show high or low degree. For example, more peripheral cells that are in contact with nearby δ-cells may show some suppression in their Ca2+ dynamics, and thus reduced synchronization." A potentially exciting future study. Should however probably cite DOI s41467-022-31373-6 here.

      We thank the reviewer for their input. This reference to Ren et al. (PMID:35764654) was previously included as Ref. 42 (now Ref. 45)

      Reviewer #3 (Recommendations for the authors):

      (1) There are in fact no radially oriented networks in the core of an islet (l. 130, Figure 4) apart from the fact that every hub has somewhat radially oriented edges. For radiality to have some general meaning, the normalized distance from the geometric center would need to be lower than 0.4. The networks are centrally located, which does not change the major conclusions of the study.

      Thank you for pointing out this imprecise language. We did not intend to imply that the functional network is orientated radially. We corrected the text (see line 131, 145) to indicate that the cells with high and low synchronization are distributed in a radial pattern. 

      (2) The study would benefit from acknowledging that Ca2+ influx is not a sole mechanism to drive insulin secretion and that KATP channels are not the sole target sensitive to changes in the cytosolic (global or local) ADP and ATP concentration or that there is an absolute concentration-dependence of these ligands on KATP channels. The relatively small conductance changes that have been found to be associated with active and silent phases (closing and opening of the KATP channels as interpreted by the authors, respectively, doi: 10.1152/ajpendo.00046.2013) and should be due to metabolic factors, could be also associated to desensitization of KATP channels to ATP due to the increase in cytosolic Ca2+ changes after intracellular Ca2+ flux (DOI: 10.1210/endo.143.2.8625) as they have been found to operate also at time scales, significantly faster (DOI: 10.2337/db22-0952) than reported before (refs. 21,22). Metabolic changes influence intracellular Ca2+ flux as well.

      The reviewer is absolutely correct that there are amplifying factors and other sources of calcium beyond plasma membrane influx and there are other mechanisms that regulate insulin secretion beyond calcium levels. These alternative mechanisms are introduced in Refs. 1-2, however they are not the focus of this study. 

      (3) There is no explanation for why KL divergence is so different between the pre-test regional consistency of the islets used to test the vehicle compared to those where GKa and PKa have been tested.

      We thank the reviewer for their careful observation. This arises because there are larger differences between preparations than within a preparation. This has been described previously (PMID: 16306370 and 20037650) and could be expected to account for the differences in KL divergence between animals. 

      (4) Statistical analysis would profit from testing the normality of the data distribution before choosing the statistical test and then learning the difference between parametric and nonparametric tests. For example, in Figures 3CD and 5EF, the data density is lower at the calculated mean than below and above this value and there are other examples in other figures too.

      We thank the reviewer for this very important comment, and we apologize for the oversight on our part. To address this comment, we conducted two normality tests: Anderson-Darling and Kolmogorov-Smirnov on all statistical analyses in the manuscript. If the data were not normally distributed, we changed the analysis to Wilcoxon matchedpairs signed rank test (non-parametric version of t-tests) or the Friedman test (nonparametric version of ANOVA). Three results were changed based on this statistical correction: Figure 4D, also 5F 3D (from P=0.01 to P=0.0526), Figure 5F  ¼ z-depth (P = 0.005 to P = 0.012). We have updated the manuscript methods, results, and figures accordingly. Importantly, these results did not change the main points of the paper.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript the authors re-examine the developmental origin of cortical oligodendrocyte (OL) lineage cells using a combination of strategies, focussing on the question of whether the LGE generates cortical OL cells. The paper is interesting to myelin biologists, the methods used are appropriate and, in general, the study is well-executed, thorough, and persuasive, but not 100% convincing.

      Thank you very much for approving our paper.

      Strengths, weaknesses, and recommendations:

      The first evidence presented that the LGE does not generate OLs for the cortex is that there are no OL precursors 'streaming' from the LGE during embryogenesis, unlike the MGE (Figure 1A). This in itself is not strong evidence, as they might be more dispersed. In fact, in the images shown, there is no obvious 'streaming' from the MGE either. Note that in Figure 1 there is no reference to the star that is shown in the figure.

      We totally agree with you. While OPC migration stream is not strong evidence to support that the LGE does not generate OPCs for the cortex, when considering our additional evidence, the absence of obvious 'streaming' from LGE to cortex provided supplementary support for this conclusion. Finally, we have removed the star in the figure.

      The authors then electroporate a reporter into the LGE at E13.5 and examine the fate of the electroporated cells (Figures 1C-E). They find that electroporated cells became neurons in the striatum and in the cortex but no OLs for the cortex. There are two issues with this: first, there is no quantification, which means there might indeed be a small contribution from the LGE that is not immediately obvious from snapshot images. Second, it is unexpected to find labelled neurons in the cortex at all since the LGE does not normally generate neurons for the cortex. Electroporations are quite crude experiments as targeting is imprecise and variable and not always discernible at later stages. For example, in Figure 1D, one can see tdTOM+ cells near the AEP, as well as the striatum. Hence, IUE cannot on its own be taken as proof that there is no contribution of the LGE to the cortical OL population.

      Thank you for your constructive suggestions.

      (1) Following the reviewer's suggestion, we have added these statistics, please see Figure 1F.

      (2) The reviewer raised a good point. We occasionally found a very small number of electroporated cells in the MGE/AEP VZ in our IUE system. Therefore, we can identify these electroporated cells in the cortex, most of them expressed the neuronal marker NeuN. We suspect these are MGE-derived cortical interneurons. It's worth noting that these electroporated cells (MGE-derived) are not glia cells. The probable reason may be that MGE/AEP generate cortical OPCs mainly before E13.5 (in this study we performed IUE at E13.5).

      The authors then use an alternative fate-mapping approach, again with E13.5 electroporations (Figure 2). They find only a few GFP+ cells in the cortex at E18 (Figures 2C-D) and P10 (Figure 2E) and these are mainly neurons, not OL lineage cells. Again, there is no quantification.

      Thank you very much for your suggestions. Actually, in this fate-mapping approach, the electroporated cells in the cortex is very few. We analyzed four mice, and found that all GFP positive cells (139 GFP+) did not express OLIG2, SOX10 and PDGFRA.

      Figure 3 is more convincing, but the experiments are incomplete. Here the authors generate triple-transgenic mice expressing Cre in the cortex (Emx1-Cre) and the MGE (Nkx2.1-Cre) as well as a strong nuclear reporter (H2B-GFP). They find that at P0 and P10, 97-98% of OL-lineage cells (SOX10+ or PDGFRA+) in the cortex are labelled with GFP (Figure 3). This is a more convincing argument that the LGE/CGE might not contribute significant numbers of OL lineage cells to the cortex, in contrast to the Kessaris et at. (2006) paper, which showed that Gsh2-Cre mice label ~50% of SOX10+ve cells in the motor cortex at P10. The authors of the present paper suggest that the discrepancy between their study and that of Kessaris et al. (2006) is based on the authors' previous observation (Zhang et al 2020) (https://doi.org/10.1016/j.celrep.2020.03.027) that GSH2 is expressed in intermediate precursors of the cortex from E18 onwards. If correct, then Kessaris et al. might have mistakenly attributed Gsh2-Cre+ lineages to the LGE/CGE when they were in fact intrinsic to the cortex. However, the evidence from Zhang et al 2020 that GSH2 is expressed by cortical intermediate precursors seems to rest solely on their location within the developing cortex; a more convincing demonstration would be to show that the GSH2+ putative cortical precursors co-label for EMX1 (by immunohistochemistry or in situ hybridization), or that they co-label with a reporter in Emx1-driven reporter mice. This demonstration should be simple for the authors as they have all the necessary reagents to hand. Without these additional data, the assertion that GSX2+ve cells in the cortex are derived from the cortical VZ relies partly on an act of faith on the part of the reader. Note that Tripathi et al. (2011, "Dorsally- and ventrally-derived oligodendrocytes have similar electrical properties but myelinate preferred tracts." J. Neurosci. 31, 6809-6819) found that the Gsh-Cre+ OL lineage contributed only ~20% of OLs to the mature cortex, not ~50% as reported by Kessaris et al. (2006). If it is correct that these Gsh2-derived OLs are from the cortical anlagen as the current paper claims, then it would raise the possibility that the ventricular precursors of GSH2+ intermediate progenitors are not uniformly distributed through the cortical VZ but are perhaps localized to some part of it. Then the contribution of Gsh2-derived OLs to the cortical population could depend on precisely where one looks relative to that localized source. It would be a nice addition to the current manuscript if the authors could explore the distribution of their GSH2+ intermediate precursors throughout the developing cortex. In any case, Tripathi et al. (2011) should be cited.

      Thank you for your constructive suggestions.

      (1) We used the Emx1Cre; RosaH2B-GFP mouse and found that nearly all GSX2+ cells in the cortical SVZ are derived from the Emx1+ lineage at P0 (Please see our new Figure 3-supplement 1A-C). 

      (2) According to your suggestion, we have cited this paper (Tripathi et al.) in our revised manuscript.

      (3) The study conducted by Kessaris et al. (2006) revealed that roughly 50% of cortical oligodendrocytes (OLs) originate from the Gsx2 lineage (LGE/CGE-derived). In contrast, Tripathi et al. (2011) observed that Gsx2-derived OLs contribute only around 20% to the corpus callosum (CC). To investigate the reasons behind these disparate findings, we conducted three experiments. Firstly, using Emx1Cre; RosaH2B-GFP mice, we found that approximately 89% of lateral CC (LCC) OLs originate from the Emx1 lineage, with only around 11% derived from the ventral source (refer to Author response image 1A and B below). Secondly, employing Nkx2-1Cre; RosaH2B-GFP mice, we determined that approximately 11% of LCC OLs originate from the Nkx2.1 lineage (refer to pictures C and D below). Finally, we found that approximately 98.3% of lateral LCC OLs originate from both Emx1 and Nkx2.1 lineages, with only around 1.7% possibly derived from the LGE (see Author response image 1E and F below). Taken together, our results indicate that approximately 89% of LCC OLs originate from the Emx1 lineage, while 11% of LCC OLs are derived from the medial ganglionic eminence (MGE).

      It is worth noting that OLs from Emx1 and Nkx2.1 lineages were equally distributed in the medial CC (mCC) (see Author response image 1G below). This finding suggests that MGE-derived OLs exhibit spatial heterogeneity in their distribution within the CC. These results provide evidence that the contribution of the lateral ganglionic eminence (LGE) and caudal ganglionic eminence (CGE) to CC OLs is minimal.

      Author response image 1.

      Finally, the authors deleted Olig2 in the MGE and found a dramatic reduction of PDGFRA+ and SOX10+ cells in the cortex at E14 and E16 (Figure 4A-F). This further supports their conclusion that, at least at E16, there is no significant contribution of OLs from ventral sources other than the MGE/AEP. This does not exclude the possibility that the LGE/CGE generates OLs for the cortex at later stages. Hence, on its own, this is not completely convincing evidence that the LGE generates no OL lineage cells for the cortex.

      There are three reasons why we didn't analyze Olig2-NCKO mice after E16.5. 1. The expression of Nkx2.1Cre is lower within the dorsal-most region of the MGE than other Nkx2.1-expressing regions. Even at E15.5, we can still find a small number of OPCs in the lateral cortex. We speculate that these OPCs are derived from dorsal MGE. 2. Considering the possibility of incomplete recombination in Olig2 gene locus, we guess OPCs (Olig2+) in the lateral cortex are derived from MGE. Indeed, we found a few OPCs in the MGE/AEP in the Olig2-NCKO mice (Figure 4F). 3. The recent study (bioRxiv preprint doi: https://doi.org/10.1101/2024.01.23.576886) showed that the contribution of LGE/CGE to cortical OPCs is minimal, which further supporting our findings. Taken together, our results provide additional evidence supporting the limited contribution of the LGE/CGE to cortical OPCs (OLs).

      Reviewer #2 (Public Review):

      Traditional thinking has been that cortical oligodendrocyte progenitor cells (OPCs) arise in the development of the brain from the medial ganglionic eminence (MGE), lateral/caudal ganglionic eminence (LGE/CGE), and cortical radial glial cells (RGCs). Indeed a landmark study demonstrated some time ago that cortical OPCs are generated in three waves, starting with a ventral wave derived from the medial ganglionic eminence (MGE) or the anterior entopeduncular area (AEP) at embryonic day E12.5 (Nkx2.1+ lineage), followed by a second wave of cortical OLs derived from the lateral/caudal ganglionic eminences (LGE/CGE) at E15.5 (Gsx2+/Nkx2.1- lineage), and then a final wave occurring at P0, when OPCs originate from cortical glial progenitor cells (Emx1+ lineage). However, the authors challenge the idea in this paper that cortical progenitors are produced from the LGE. They have found previously that cortical glial progenitor cells were also found to express Gsx2, suggesting this may not have been the best marker for LGE-derived OPCs. They have used fate mapping experiments and lineage analyses to suggest that cortical OPCs do not derive from the LGE.

      Strengths:

      (1) The data is high quality and very well presented, and experiments are thoughtful and elegant to address the questions being raised.

      (2) The authors use two elegant approaches to lineage trace LGE derived cells, namely fate mapping of LGE-derived OPCs by combining IUE (intrauterine electroporation) with a Cre recombinase-dependent IS reporter, and Lineage tracing of LGE-derived OPCs by combining IUE with the PiggyBac transposon system. Both approaches show convincingly that labelled LGE-derived cells that enter the cortex do not express OPC markers, but that those co-labelling with oligodendrocyte markers remain in the striatum.

      (3) The authors then use further approaches to confirm their findings. Firstly they lineage trace Emx1-Cre; Nkx2.1-Cre; H2B-GFP mice. Emx1-Cre is expressed in cortical RGCs and Nkx2.1-Cre is specifically expressed in MGE/AEP RGCs. They find that close to 98% of OPCs in the cortex co-label with GFP at later times, suggesting the contribution of OPCs from LGE is minimal.

      (4) They use one further approach to strengthen the findings yet further. They cross Nkx2.1-Cre mice with Olig2 F/+ mice to eliminate Olig2 expression in the SVZ/VZ of the MGE/AEP (Figures 4A-B). The generation of MGE/AEP-derived OPCs is inhibited in these Olig2-NCKO conditional mice. They find that the number of cortical progenitors at E16.5 is reduced 10-fold in these mice, suggesting that LGE contribution to cortical OPCs is minimal.

      We thank the reviewer for summarizing the strengths of our manuscript.

      Weaknesses:

      (1) The authors use IUE in experiments mentioned in point 2 of 'Strengths' above (Figures 1 and 2) and claim that the reporter was delivered specifically into LGE VZ at E13.5 using this IUE. It would be nice to see some sort of time course of delivery after IUE to show the reporter is limited to LGE VZ at early times post-IUE.

      Thank you very much for your suggestions. Indeed, when using IUE in our system, we occasionally found a small number of electroporated cells in the MGE/AEP VZ. Thus, we can find very few electroporated cells (MGE/AEP-derived) in the cortex and these electroporated cells are neuron (perhaps interneuron).

      (2) In the experiments mentioned in point 3 of 'Strengths' (Figure 3), statistical analysis showed that only approximately 2% of OPCs were GFP-negative cells. This 2% could possibly be derived from the LGE/CGE so does not totally rule out that LGE contributes some cortical OPCs.

      Thank you for your constructive suggestions. We apologize for any imprecise descriptions. Despite we suspect that this 2% may originate from MGE {Considering the possibility of incomplete recombination in Olig2 gene locus, we guess the OPCs (Olig2+) may be derived from MGE. Indeed, we found a few OPCs in the MGE/AEP in the Olig2-NCKO mice (Figure 4F)} or from the dMGE (The expression of Nkx2.1Cre is lower within the dorsal-most region of the MGE than in other Nkx2.1-expressing regions). Anyway, we have softened the assertion everywhere in our revised manuscript.

      (3) In the experiments mentioned in point 4 of 'Strengths' (Figure 4), they do still find cortical OPCs at E16.5 in the Olig2-NCKO conditional mice. It is unclear whether this is due to the recombination efficiency of the CRE enzyme not being 100%, or whether there is some LGE contribution to the cortical OPCs.

      This experiment alone may not provide strong evidence to support that LGE do not contribute to the cortical OPCs during development. However, when combing our other results with this result, we can confirm that the contribution of LGE to cortical OPCs is minimal. Furthermore, a recent study reported that LGE/CGE-derived OLs make minimum contributions to the neocortex and corpus callosum,which further supporting the reliability of our conclusion.

      We would like to thank the reviewers and editors for their valuable comments and suggestions again.

      Impact of Study:

      The authors show elegantly and convincingly that the contribution of the LGE to the pool of cortical OPCs is minimal. The title should perhaps be that the LGE contribution is minimal rather than no contribution at all, as they are not able to rule out some small contribution from the LGE. These findings challenge the traditional belief that the LGE contributes to the pool of cortical OPCs. The authors do show that the LGE does produce OPCs, but that they tend to remain in the striatum rather than migrate into the cortex. It is interesting to wonder why their migration patterns may be different from the MGE-derived OPCs which migrate to the cortex. The functional significance of these different sources of OPCs for adult cortex in homeostatic or disease states remains unclear though.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for The Authors):

      (1) Change the title to e.g. 'limited contribution of the LGE to cortical oligodendrocytes'. Alternatively, It might be more useful to highlight where they come from, e.g. "Cortical oligodendrocytes originate predominantly or exclusively from the MGE and cortical VZ"

      As suggested, we have changed the old title to the following: The lateral/caudal ganglionic eminence makes a limited contribution to cortical oligodendrocytes

      (2) Demonstrate using lineage tracing that GSH2+ cells in the cortex are derived from the Emx1-lineage, e.g. using immunohistochemistry for GSX2 and a reporter in Emx1-Cre mice crossed to a reporter.

      In our revised manuscript, we have added a new figure (Figure 3-supplement 1A-C) to demonstrate that the GSX2+ cells in the cortex are derived from the Emx1-lineage.

      (3) Make it clear in their discussion that they have not explored the CGE so it is possible that this region generates some OLs.

      The Emx1Cre; Nkx2.1Cre; H2B-GFP mice showed that only ~2% cortical OLs are derived from LGE/CGE. Actually, considering the efficiency of Cre enzyme recombination and the relatively low Cre activity in the dMGE of Nkx2.1Cre, the actual contribution of LGE/CGE-derived cortical OLs could be even lower than our current observation. Therefore, our results demonstrate that the LGE/CGE generate very few,possibly even no,OLs for the cortex.

      (4) Soften the assertion that the LGE does not generate any OL lineage cells that reach the cortex by e.g. changing the word 'sole' to 'predominant' (line 88) and, elsewhere in the paper, leaving open the possibility that small numbers of LGE-derived OLs might enter the cortex, depending on where exactly one looks.

      As suggested, we have softened the assertion everywhere in our manuscript.

      (5) Lines 255-260: 'First, the time window during which the MGE generates OLs is very brief, perhaps occurring before MGE neurogenesis. The high level of SHH in the MGE allows for the production of a small population of cortical OPCs around E12.5. Subsequently, multipotent intermediate progenitors begin to express DLX transcription factors resulting in ending the generation of OPCs in the MGE'. What is the evidence that OL genesis precedes neurogenesis? If there is none (as I suspect) then this statement should be removed.

      The editors raised a good point. We have no strong evidence to support that OL genesis precedes neurogenesis in MGE, thus, we removed these sentences in our manuscript.

      (6) Figure 1E should show quantification of cells as a % of electroporated cells and as a % of PDGFRA+ or OLIG2+ or SOX10+ cells, so that the reader might have a clear view of the extent of labelling.

      Done.

      (7) Figure 4: This is interesting but incomplete. At E14.5 the authors show the presence of PDGFRA+cells in the telencephalon. However, at E16.5 they show images only of the dorsal-most region of the cortex. If the LGE/CGE begins to generate OLPs for the early cortex, they would be expected to appear near the cortico-striatal boundary, as shown in Kessaris 2006 Fig1g-h. In the current manuscript, the authors do not show these regions, or the LGE and CGE, in their images. It is essential to show PDGFRA immunolabelling at the cortico-striatal boundary and also in the LGE and CGE at E16.5 in control and Olig2 mutant mice. It is also necessary to extend this analysis to E18.5, perhaps showing PDGFRA+ cells streaming from the cortical VZ/SVZ.

      There are three reasons why we didn't analyze Olig2-NCKO mice after E16.5. 1.Frankly, the expression of Nkx2.1Cre is lower within the dorsal-most region of the MGE than other Nkx2.1-expressing regions. Even at E15.5, we can still find a small number of OPCs in the lateral cortex. We guess these OPCs are derived from dMGE. 2. Considering the possibility of incomplete recombination in Olig2 gene locus, we guess OPCs (Olig2+) are derived from MGE. In fact, we found a few OPCs in the MGE/AEP in the Olig2-NCKO mice (Figure 4F). 3. The recent study (bioRxiv preprint doi: https://doi.org/10.1101/2024.01.23.576886) showed that the contribution of LGE/CGE to cortical OPCs is minimal. Taken together, our results provide additional evidence supporting the limited contribution of the LGE/CGE to cortical OPCs (OLs).

      (8) Cite Tripathi et al. (2011) and mention the disparity between the findings of that paper and Kessaris et al. (2006) and possible reasons - see main review above.

      Done.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript "Self-inhibiting percolation and viral spreading in epithelial tissue" describes a model based on 5-state cellular automata of development of an infection. The model is motivated and qualitatively justified by time-resolved measurements of expression levels of viral, interferon-producing, and antiviral genes. The model is set up in such a way that the crucial difference in outcomes (infection spreading vs. confinement) depends on the initial fraction of special virus-sensing cells. Those cells (denoted as 'type a') cannot be infected and do not support the propagation of infection, but rather inhibit it in a somewhat autocatalytic way. Presumably, such feedback makes the transition between two outcomes very sharp: a minor variation in concentration of ``a' cells results in qualitative change from one outcome to another. As in any percolation-like system, the transition between propagation and inhibition of infection goes through a critical state with all its attributes. A power-law distribution of the cluster size (corresponding to the fraction of infected cells) with a fairly universal exponent and a cutoff at the upper limit of this distribution.

      Strengths:

      The proposed model suggests an explanation for the apparent diversity of outcomes of viral infections such as COVID.

      Author response: We thank the referee for the concise and accurate summary of our work.

      Weaknesses:

      Those are not real points of weakness, though I think addressing them would substantially improve the manuscript.

      Author response: Below we will address these point by point.

      The key point in the manuscript is the reduction of actual biochemical processes to the NOVAa rules. I think more could be said about it, be it referring to a set of well-known connections between expression states of cells and their reaction to infection or justifying it as an educated guess.

      Author response: We have now improved this part in the model section. We have added a few sentences explaining how the cell state transitions are motivated by the UMAP results:

      “The cell state transitions triggered by IFN signaling or viral replication are known in viral infection, but how exactly the transitions are orchestrated for specific infections is poorly understood. The UMAP cell state distribution hints at possible preferred transitions between states. The closer two cell states are on the UMAP, the more likely transitions between them are, all else being equal. For instance, the antiviral state (𝐴) is easily established from a susceptible cell (𝑂), but not from the fully virus-hijacked cell (𝑉 ). The IFN-secreting cell state (𝑁) requires the co-presence of the viral and antiviral genes and thus the cell cluster is located between the antiviral state (𝐴) and virus-infected state (𝑉 ) but distant from the susceptible cells (𝑂).

      Inspired by the UMAP data visualization (Fig. 1a), we propose the following transitions between five main discrete cell states”

      Another aspect where the manuscript could be improved would be to look a little beyond the strange and 'not-so-relevant for a biomedical audience' focus on the percolation critical state. While the presented calculation of the precise percolation threshold and the critical exponent confirm the numerical skills of the authors, the probability that an actual infected tissue is right at the threshold is negligible. So in addition to the critical properties, it would be interesting to learn about the system not exactly at the threshold: For example, how the speed of propagation of infection depends on subcritical p_a and what is the cluster size distribution for supercritical p_a.

      Author response: We agree that further exploring the model away from the critical threshold is worthwhile. While our main focus has been on explaining the large degree of heterogeneity in outcomes – readily explained as a consequence of the sharp threshold-like behavior – we now include plots of the time-evolution of the infection (as well as the remaining states) over time for subcritical values of pa. The plots can be found in Figure S4 of the supplement.

      Reviewer #2 (Public Review):

      Xu et al. introduce a cellular automaton model to investigate the spatiotemporal spreading of viral infection. In this study, the author first analyzes the single-cell RNA sequencing data from experiments and identifies four clusters of cells at 48 hours post-viral infection, including susceptible cells (O), infected cells (V), IFN-secreting cells (N), and antiviral cells (A). Next, a cellular automaton model (NOVAa model) is introduced by assuming the existence of a transient pre-antiviral state (a). The model consists of an LxL lattice; each site represents one cell. The cells change their state following the rules depending on the interaction of neighboring cells. The model introduces a key parameter, p_a, representing the fraction of pre-antiviral state cells. Cell apoptosis is omitted in the model. Model simulations show a threshold-like behavior of the final attack rate of the virus when p_a changes continuously. There is a critical value p_c, so that when p_a < p_c, infections typically spread to the entire system, while at a higher p_a > p_c, the propagation of the infected state is inhibited. Moreover, the radius R that quantifies the diffusion range of N cells may affect the critical value p_c; a larger R yields a smaller value of the critical value p_c. The structure of clusters is different for different values of R; greater R leads to a different microscopic structure with fewer A and N cells in the final state. Compared with the single-cell RNA seq data, which implies a low fraction of IFN-positive cells - around 1.7% - the model simulation suggests R=5. The authors also explored a simplified version of the model, the OVA model, with only three states. The OVA model also has an outbreak size. The OVA model shows dynamics similar to the NOVAa model. However, the change in microstructure as a function of the IFN range R observed in the NOVAa model is not observed in the OVA model.

      Author response: We thank the referee for the comprehensive summary of our work.

      Data and model simulation mainly support the conclusions of this paper, but some weaknesses should be considered or clarified.

      Author response: Thank you - we will address these point by point below.

      (1) In the automaton model, the authors introduce a parameter p_a, representing the fraction of pre-antiviral state cells. The authors wrote: ``The parameter p_a can also be understood as the probability that an O cell will switch to the N or A state when exposed to the virus of IFNs, respectively.' Nevertheless, biologically, the fraction of pre-antiviral state cells does not mean the same value as the probability that an O cell switches to the N or A state. Moreover, in the numerical scheme, the cell state changes according to the deterministic role N(O)=a and N(a)=A. Hence, the probability p_a did not apply to the model simulation. It may need to clarify the exact meaning of the parameter p_a.

      Author response: We acknowledge that this was an imprecise formulation, and have now changed it.

      What we tried to convey with that comment was that, alternatively to having a certain fraction of cells be in the a state initially, one could instead have devised a model in which We should note that even the current model has a level of stochasticity, since we choose the cells to be updated with a constant probability rate - we choose N cells to update in each timestep, with replacement.

      However, based on your suggestion, we simulated a version of the dynamics which included stochastic conversion, i.e. each action of a cell on a nearby cell happens only with a probability p_conv (and the original model is recovered as the p_conv=1 scenario). Of course, this slows down the dynamics (or effectively rescales time by a factor p_conv), but crucially we find that it does not appreciably affect the location of the threshold p_c. Below we include a parameter scan across p_a values for R=1 and p_conv=0.5, which shows that the threshold continues to appear at around p_a=27%. each O-state cell simply had a probability to act as an a-state cell upon exposure to the virus or to interferons, i.e. to switch to an N state (if exposed to virus) or to the A state (if exposed to interferons). In this simplified model, there would be no functional difference, since it would simply amount to whether each cell had a probability to be designated an a-cell initially (as in our model), or upon exposure. So our remark mainly served to explain that the role of the p_a parameter is simply to encode that a certain fraction of virus-naive cells behave this way (whether predetermined or not).

      (2) The current model is deterministic. However, biologically, considering the probabilistic model may be more realistic. Are the results valid when the probability update strategy is considered? By the probability model, the cells change their state randomly to the state of the neighbor cells. The probability of cell state changes may be relevant for the threshold of p_a. It is interesting to know how the random response of cells may affect the main results and the critical value of p_a.

      Author response: This is a good point - we are firm believers in the importance of stochasticity. We should note that even the current model has a level of stochasticity, since we choose the cells to be updated with a constant probability rate - we choose N cells to update in each timestep, with replacement.

      However, based on your suggestion, we simulated a version of the dynamics which included stochastic conversion, i.e. each action of a cell on a nearby cell happens only with a probability p_conv (and the original model is recovered as the p_conv=1 scenario). Of course, this slows down the dynamics (or effectively rescales time by a factor p_conv), but crucially we find that it does not appreciably affect the location of the threshold p_c. Below we include a parameter scan across p_a values for R=1 and p_conv=0.5, which shows that the threshold continues to appear at around p_a=27%.

      We now discuss these findings in the supplement and include the figure below as Fig. S5.

      Author response image 1.

      (3) Figure 2 shows a critical value p_c = 27.8% following a simulation on a lattice with dimension L = 1000. However, it is unclear if dimension changes may affect the critical value.

      Author response: Re-running the simulations on a lattice 4x as large (i.e. L=2000) yields a similar critical value of 27-28% for R=1, so we are confident that finite size effects do not play a major role at L=1000 and beyond. For R=5, however, we find that a minimum lattice size greater than L=1000 is necessary to determine the critical threshold. Concretely, we find that the threshold value pc for R=5 changes somewhat when the lattice size is increased from 1000 to 2000, but is invariant under a change from 2000 to 3000, so we conclude that L=2000 is sufficient for R=5. The pc value for R=5 cited in the manuscript (~0.4%) was determined from simulations at L=2000.

      Reviewer #3 (Public Review):

      Summary:

      This study considers how to model distinct host cell states that correspond to different stages of a viral infection: from naïve and susceptible cells to infected cells and a minority of important interferon-secreting cells that are the first line of defense against viral spread. The study first considers the distinct host cell states by analyzing previously published single-cell RNAseq data. Then an agent-based model on a square lattice is used to probe the dependence of the system on various parameters. Finally, a simplified version of the model is explored, and shown to have some similarity with the more complex model, yet lacks the dependence on the interferon range. By exploring these models one gains an intuitive understanding of the system, and the model may be used to generate hypotheses that could be tested experimentally, telling us "when to be surprised" if the biological system deviates from the model predictions.

      Author response: Thank you for the summary! We agree with the role that you describe for a model such as this one.

      Strengths:

      -  Clear presentation of the experimental findings and a clear logical progression from these experimental findings to the modeling.

      -  The modeling results are easy to understand, revealing interesting behavior and percolation-like features.

      -  The scaling results presented span several decades and are therefore compelling. - The results presented suggest several interesting directions for theoretical follow-up work, as well as possible experiments to probe the system (e.g. by stimulating or blocking IFN secretion).

      Weaknesses:

      -  Since the "range" of IFN is an important parameter, it makes sense to consider lattice geometries other than the square lattice, which is somewhat pathological. Perhaps a hexagonal lattice would generalize better.

      -  Tissues are typically three-dimensional, not two-dimensional. (Epithelium is an exception). It would be interesting to see how the modeling translates to the three-dimensional case. Percolation transitions are known to be very sensitive to the dimensionality of the system.

      Author response: We agree that probing different lattice geometries (2- and 3-dimensional alike) would be interesting and worthwhile. However, for this manuscript, we prefer to confine the analysis to the current, simple case. We do agree, however, that an extensive exploration of the role of geometry is an interesting future possibility.

      -  The fixed time-step of the agent-based modeling may introduce biases. I would consider simulating the system with Gillespie dynamics where the reaction rates depend on the ambient system parameters.

      -  Single-cell RNAseq data typically involves data imputation due to the high sparsity of the measured gene expression. More information could be provided on this crucial data processing step since it may significantly alter the experimental findings.

      Justification of claims and conclusions:

      The claims and conclusions are well justified.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      It is necessary to explain what UMAP does. Is clustering done in the space of twenty-something original dimensions or 2D? How UMAP1 and UMAP2 are selected and are those the same in all plots?

      Author response: We have now added a few sentences to clarify the point raised above - the second snippet explains how clustering is performed:

      “As a dimension reduction algorithm, UMAP is a manifold learning technique that favors the preservation of local distances over global distances (McInnes et al., 2018; Becht et al., 2019). It constructs a weighted graph from the data points and optimizes the graph layout in the low-dimensional space.”

      “We cluster the cells with the principal components analysis (PCA) results from their gene expression. With the first 16 principal components, we calculate k-nearest neighbors and construct the shared nearest neighbor graph of the cells then optimize the modularity function to determine clusters. We present the cluster information on the UMAP plane and use the same UMAP coordinates for all the plots in this paper hereafter.”

      Figure 1, what do bars in the upper right corners of panels d,e,f, and g indicate? ``Averaged' refers to time average? Something is missing in ``Cell proportions are labeled with corresponding colors in a)' .

      Author response: Thank you - we have now modified the figure caption. The bars in the upper right corners of panels d, e, f are color keys for gene expression, the brighter the color is, the higher the gene expression is.

      “Averaged” gene expression refers to the mean expression of that particular gene across the cells within each indicated cluster.

      The lines in c) correspond to cell proportions in different states at different time points. The same state in 1) and c) is shown in the same color.

      Line 46, ``However' does not sound right in this context. Would ``Also' be better?

      Author response: We agree and have corrected it in the revised manuscript.

      Line 96``The viral genes are also partially expressed in these cells, but different from the 𝑁 cluster, the antiviral genes are fully expressed (Fig. S1 and S2).' The sentence needs to be rephrased.

      Author response: We have rephrased the sentence: “As in the N cluster, the viral gene E is barely detected in these cells, indicating incomplete viral replication. However, in contrast to the N cluster, the antiviral genes are expressed to their full extent (Fig. S1 and S2).”

      Line 126, missing "be", ``large' -> ``larger'.

      Author response: Thank you, we have now corrected these typos.

      Line 139-140 The logical link between ignoring apoptosis and the diffusion of IFN is unclear.

      Author response: We modified the sentence as “Here, we assume that the secretion of IFNs by the 𝑁 cells is a faster process than possible apoptosis (Wen et al., 1997; Tesfaigzi, 2006) of these cells and that the diffusion of IFNs to the neighborhood is not significantly affected by apoptosis.”

      Fig. 2a Do the yellow arrows show the effect of IFN and the purple arrows the propagation of viral infection?

      Author response: That is correct. We have added this information to the figure caption: “The straight black arrows indicate transitions between cell states. The curved yellow arrows indicate the effects of IFNs on activating antiviral states. The curved purple arrows indicate viral spread to cells with 𝑂 and 𝑎 states.”

      Fig. 3, n(s) as the axis label vs P(s) in the text? How do the curves in panel a) look when the p_a is well above or below p_c?

      Author response: Thank you. We have edited the labels in the figure to reflect the symbols used in the text.

      Boundary conditions? From Fig. 4, apparently periodic?

      Author response: Yes, we use periodic boundary conditions in the model. We clarify it in the model section now (last sentence).

      It will be good to see a plot with time dependences of all cell types for a couple of values of p_a, illustrating propagation and cessation of the infection.

      Author response: We agree, and have added a Figure S4 in the supplement which explores exactly that. Thank you for the suggestion.

      A verbal qualitative description of why p_a has such importance and how the infection is terminated for large p_a would help.

      Reviewer #2 (Recommendations For The Authors):

      Below are two minor comments:

      (1) In the single-cell RNA sequencing data analysis, the authors describe the cell clusters O, V, A, and N. However, showing how the clusters are identified from the data might be more straightforward.

      Author response: Technically, we cluster the cells using principal components analysis (PCA) results of their gene expression. With the first 16 principal components, we calculate k-nearest neighbors and construct the shared nearest neighbor graph of the cells and then optimize the modularity function to determine clusters. We manually annotate the clusters with O, V, A, and N based on the detected abundance of viral genes, antiviral genes, and IFNs.

      (2) In Figure 3, what does n(s) mean in Figure 3a? And what is the meaning of the distribution P(s) of infection clusters? It may be stated clearly.

      Author response: The use of n(s) was inconsistent, and we have now edited the figure to instead say P(s), to harmonize it with the text. P(s) is the distribution of cluster sizes, s, expressed as a fraction of the whole system. In other words, once a cluster has reached its final size, we record s=(N+V)/L^2 where N and V are the number of N and V state cells in the cluster (note that, by design, each simulation leads to a single cluster, since we seed the infection in one lattice point). We now indicate more clearly in the caption and the main text what exactly P(s) and s refer to.

      Reviewer #3 (Recommendations For The Authors):

      - Would the authors kindly share the simulation code with the community? Also, the data analysis code should be shared to follow current best practices. This needs to be standard practice in all publications. I would go as far as to say that in 2024 publishing a data analysis / simulation study without sharing the relevant code should be ostracized by the community.

      Author response: We absolutely agree and have created a GitHub repository in which we share the C++ source code for the simulations and a Python notebook for plotting. The public repository can be found at https://github.com/BjarkeFN/ViralPercolation. We add this information in supplement under section “Code availability”.

      ­

      - I would avoid the use of the wording "critical" threshold since this is almost guaranteed to infuriate a certain type of reader.

      ­

      - Line 265 has a curious use of " ... " which should be replaced with something more appropriate.

      Author response: Thank you for pointing it out! We have checked the typos.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1:

      The manuscript suggests the zebrafish homolog of ctla-4 and generates a new mutant in it. However, the locus that is mutated is confusingly annotated as both CD28 (current main annotation in ZFIN) and CTLA-4/CD152 (one publication from 2020), see: https://zfin.org/ZDB-GENE-070912-128. Both human CTLA-4 and CD28 align with relatively similar scores to this gene. There seem to be other orthologs of these receptors in the zebrafish genome, including CD28-like (https://zfin.org/ZDB-GENE-070912-309) which neighbors the gene annotated as CD28 (exhibiting similar synteny as human CD28 and CTLA-4). It would be helpful to provide more information to distinguish between this family of genes and to further strengthen the evidence that this mutant is in ctla-4, not cd28. Also, is one of these genes in the zebrafish genome (e.g. cd28l) potentially a second homolog of CTLA-4? Is this why this mutant is viable in zebrafish and not mammals? Some suggestions:

      (a) A more extensive sequence alignment that considers both CTLA-4 and CD28, potentially identifying the best homolog of each human gene, especially taking into account any regions that are known to produce the functional differences between these receptors in mammals and effectively assigns identities to the two genes annotated as "cd28" and "cd28l" as well as the gene "si:dkey-1H24.6" that your CD28 ORF primers seem to bind to in zebrafish.

      In response to the reviewer's insightful suggestions, we have conducted more extensive sequence alignment and phylogenetic analyses that consider both CTLA-4, CD28, and CD28-like molecules, taking into account key regions crucial for the functionalities and functional differences between these molecules across various species, including mammals and zebrafish.

      Identification of zebrafish Ctla-4: We identified zebrafish Ctla-4 as a homolog of mammalian CTLA-4 based on key conserved structural and functional characteristics. Structurally, the Ctla-4 gene shares similar exon organization compared to mammalian CTLA-4. Ctla-4 is a type I transmembrane protein with typical immunoglobulin superfamily features. Multiple amino acid sequence alignments revealed that Ctla-4 contains a <sup>113</sup>LFPPPY<sup>118</sup> motif and a <sup>123</sup>GNGT<sup>126</sup> motif in the ectodomain, and a tyrosine-based <sup>206</sup>YVKF<sup>209</sup> motif in the distal C-terminal region. These motifs closely resemble MYPPPY, GNGT, and YVKM motifs in mammalian CTLA-4s, which are essential for binding to CD80/CD86 ligands and molecular internalization and signaling inhibition. Despite only 23.7% sequence identity to human CTLA-4, zebrafish Ctla-4 exhibits a similar tertiary structure with a two-layer β-sandwich architecture in its extracellular IgV-like domain. Four cysteine residues responsible for the formation of two pairs of disulfide bonds (Cys<sup>20</sup>-Cys<sup>91</sup>/Cys<sup>46</sup>-Cys<sup>65</sup> in zebrafish and Cys<sup>21</sup>-Cys<sup>92</sup>/Cys<sup>48</sup>-Cys<sup>66</sup> in humans) that connect the two-layer β-sandwich are conserved. Additionally, a separate cysteine residue (Cys<sup>120</sup> in zebrafish and Cys<sup>120</sup> in humans) involved in dimerization is also present, and Western blot analysis under reducing and non-reducing conditions confirmed Ctla-4’s dimerization. Phylogenetically, Ctla-4 clusters with other known CTLA-4 homologs from different species with high bootstrap probability, while zebrafish Cd28 groups separately with other CD28s. Functionally, Ctla-4 is predominantly expressed on CD4<sup>+</sup> T and CD8<sup>+</sup> T cells in zebrafish. It plays a pivotal inhibitory role in T cell activation by competing with CD28 for binding to CD80/86, as validated through a series of both in vitro and in vivo assays, including microscale thermophoresis assays which demonstrated that Ctla-4 exhibits a significantly higher affinity for Cd80/86 than Cd28 (KD = 0.50 ± 0.25 μM vs. KD = 2.64 ± 0.45 μM). These findings confirm Ctla-4 as an immune checkpoint molecule, reinforcing its identification within the CTLA-4 family.

      Comparison between zebrafish Cd28 and "Cd28l": Zebrafish Cd28 contains an extracellular SYPPPF motif and an intracellular FYIQ motif. The extracellular SYPPPF motif is essential for binding to Cd80/CD86, while the intracellular FYIQ motif likely mediates kinase recruitment and co-stimulatory signaling. In contrast, the "Cd28l" molecule lacks the SYPPPF motif, which is critical for Cd80/CD86 binding, and exhibits strong similarity in its C-terminal 79 amino acids to Ctla-4 rather than Cd28. Consequently, "Cd28l" resembles an atypical Ctla-4-like molecule but fails to exhibit Cd80/CD86 binding activity.

      We have incorporated the relevant analysis results into the main text of the revised manuscript and updated Supplementary Figure 1. Additionally, we provide key supplementary analyses here for the reviewer's convenience.  

      Author response image 1.

      Illustrates the alignment of Ctla-4 (XP_005167576.1) and Ctla-4-like (XP_005167567.1, previously referred to as "Cd28l") in zebrafish, generated using ClustalX and Jalview. Conserved and partially conserved amino acid residues are highlighted in color gradients ranging from carnation to red, respectively. The B7-binding motif is encircled with a red square.

      (b) Clearer description in the main text of such an analysis to better establish that the mutated gene is a homolog of ctla-4, NOT cd28.

      We appreciate the reviewer's advice. Additional confirmation of zebrafish Ctla-4 is detailed in lines 119-126 of the revised manuscript.

      (c) Are there mammalian anti-ctla-4 and/or anti-cd28 antibodies that are expected to bind to these zebrafish proteins? If so, looking to see whether staining is lost (or western blotting is lost) in your mutants could be additionally informative. (Our understanding is that your mouse anti-Ctla-4 antibody is raised against recombinant protein generated from this same locus, and so is an elegant demonstration that your mutant eliminates the production of the protein, but unfortunately does not contribute additional information to help establish its homology to mammalian proteins).

      This suggestion holds significant value. However, a major challenge in fish immunology research is the limited availability of antibodies suitable for use in fish species; antibodies developed for mammals are generally not applicable. We attempted to use human and mouse anti-CTLA-4 and anti-CD28 antibodies to identify Ctla-4 and Cd28 in zebrafish, but the results were inconclusive, with no expected signals. This outcome likely arises from the low sequence identity between human/mouse CTLA-4 and CD28 and their zebrafish homologs (ranging from 21.3% to 23.7% for CTLA-4 and 21.2% to 24.0% for CD28). Therefore, developing specific antibodies against zebrafish Ctla-4 is essential for advancing this research.

      The methods section is generally insufficient and doesn't describe many of the experiments performed in this manuscript. Some examples:

      (a) No description of antibodies used for staining or Western blots (Figure1C, 1D, 1F).

      (b) No description of immunofluorescence protocol (Figure 1D, 1F).

      (c) No description of Western blot protocol (Figure 1C, 2C).

      (d) No description of electron microscopy approach (Figure 2K).

      (e) No description of the approach for determining microbial diversity (Entirety of Figure 6).

      (f) No description of PHA/CFSE/Flow experiments (Figure 7A-E).

      (g) No description of AlphaFold approach (Figures 7F-G).

      (h) No description of co-IP approach (Figure 7H).

      (i) No description of MST assay or experiment (Figure 7I).

      (j) No description of purification of recombinant proteins, generation of anti-Ctla-4 antibody, or molecular interaction assays (Figures S2 and S6).

      We apologize for this oversight. The methods section was inadvertently incomplete due to an error during the file upload process at submission. This issue has been addressed in the revised manuscript. We appreciate your understanding.

      Figure 5 suggests that there are more Th2 cells 1, Th2 cells 2, and NKT cells in ctla-4 mutants through scRNA-seq. However, as the cell numbers for these are low in both genotypes, there is only a single replicate for each genotype scRNA-seq experiment, and dissociation stress can skew cell-type proportions, this finding would be much more convincing if another method that does not depend on dissociation was used to verify these results. Furthermore, while Th2 cells 2 are almost absent in WT scRNA-seq, KEGG analysis suggests that a major contributor to their clustering may be ribosomal genes (Fig. 5I). Since no batch correction was described in the methods, it would be beneficial to verify the presence of this cluster in ctla-4 mutants and WT animals through other means, such as in situ hybridization or transgenic lines.   

      We are grateful for the insightful comments provided by the reviewer. Given that research on T cell subpopulations in fish is still in its nascent stages, the availability of specific marker antibodies and relevant transgenic strains remains limited. Our single-cell RNA sequencing (scRNA-seq) analysis revealed that a distinct Th2 subset 2 was predominantly observed in Ctla-4 mutants but was rare in wild-type zebrafish, it suggests that this subset may primarily arise under pathological conditions associated with Ctla-4 mutation. Due to the near absence of Th2 subset 2 in wild-type samples, KEGG enrichment analysis was performed exclusively on this subset from Ctla-4-deficient intestines. The ribosome pathway was significantly enriched, suggesting that these cells may be activated to fulfill their effector functions. However, confirming the presence of Th2 subset 2 using in situ hybridization or transgenic zebrafish lines is currently challenging due to the lack of lineage-specific markers for detailed classification of Th2 cell subsets and the preliminary nature of scRNA-seq predictions.

      To address the reviewers' suggestion to confirm compositional changes in Th2 and NKT cells using dissociation-independent methods, we quantified mRNA levels of Th2 (il4, il13, and gata3) and NKT (nkl.2, nkl.4, and prf1.1) cell marker genes via RT-qPCR in intestines from wild-type and mutant zebrafish. As shown in Figure S7B and S7C, these markers were significantly upregulated in Ctla-4-deficient intestines compared to wild-type controls. This indicates an overall increase in Th2 and NKT cell activity in mutant zebrafish, aligning with our scRNA-seq analysis and supports the validity of our initial findings.

      Before analyzing the scRNA-seq data, we performed batch correction using the Harmony algorithm via cloud-based Cumulus v1.0 on the aggregated gene-count matrices. This methodological detail has been included in the “Materials and Methods” section of the revised manuscript. Moreover, the RT-qPCR results are presented in Supplementary Figures S7B and S7C.

      Quality control (e.g., no. of UMIs, no. of genes, etc.) metrics of the scRNAseq experiments should be presented in the supplementary information for each sample to help support that observed differential expression is not merely an outcome of different sequencing depths of the two samples.

      As illustrated in Fig. S5, the quality control data have been supplemented to include the effective cell number of the sample, along with pre- and post-filtering metrics such as nFeature_RNA, nCount_RNA and mitochondrial percentage (percent.mito). Furthermore, scatter plots comparing the basic information of the sample cells before and after filtering are provided.

      Some references to prior research lack citations. Examples:

      (a)"Given that Ctla-4 is primarily expressed on T cells (Figure 1E-F), and its absence has been shown to result in intestinal immune dysregulation, indicating a crucial role of this molecule as a conserved immune checkpoint in T cell inhibition."

      The references were incorporated into line 71 of the revised manuscript.

      (b) Line 83: Cite evidence/review for the high degree of conservation in adaptive immunity.

      The references were incorporated into line 93 of the revised manuscript.

      (c) Lines 100-102: Cite the evidence that MYPPPY is a CD80/86 binding motif.

      The references were incorporated into line 117 of the revised manuscript.

      The text associated with Figure 8 (Lines 280-289) does not clearly state that rescue experiments are being done in mutant zebrafish.

      We have provided a clear explanation of the rescue experiments conducted in Ctla-4-deficient zebrafish. This revision has been incorporated into line 319.

      Line 102: Is there evidence from other animals that LFPPPY can function as a binding site for CD80/CD86? Does CD28 also have this same motif?

      The extracellular domains of CTLA-4 and CD28, which bind to CD80/CD86, are largely conserved across various species. This conservation is exemplified by a central PPP core motif, although the flanking amino acids exhibit slight variations. In mammals, both CTLA-4 and CD28 feature the conserved MYPPPY motif. By contrast, in teleost fish, such as rainbow trout, CTLA-4 contains an LYPPPY motif, while CD28 has an MYPPPI motif (Ref. 1). Grass carp CTLA-4 displays an LFPPPY motif, whereas its CD28 variant bears an IYPPPF motif. Yeast two-hybrid assays confirm that these motifs facilitate interactions between grass carp CTLA-4 and CD28 with CD80/CD86 (Ref. 2). Similarly, zebrafish Ctla-4 contains the LFPPPY motif observed in grass carp, while Cd28 exhibits a closely related SYPPPF motif.

      References:

      (1) Bernard, D et al. (2006) Costimulatory Receptors in a Teleost Fish: Typical CD28, Elusive CTLA-4. J Immunol. 176: 4191-4200.

      (2) Lu T Z et al. (2022) Molecular and Functional Analyses of the Primordial Costimulatory Molecule CD80/86 and Its Receptors CD28 and CD152 (CTLA-4) in a Teleost Fish. Frontiers in Immunology. 13:885005.

      Line 110-111: Suggest adding citation of these previously published scRNAseq data to the main text in addition to the current description in the Figure legend.

      The reference has been added in line 129 in the main text.

      Figure 3B: It would be helpful to label a few of the top differentially expressed genes in Panel B?

      The top differentially expressed genes have been labeled in Figure 3B.

      Figure 3G: It's unclear how this analysis was conducted, what this figure is supposed to demonstrate, and in its current form it is illegible.

      Figure 3G displays a protein-protein interaction network constructed from differentially expressed genes. The densely connected nodes, representing physical interactions among proteins, provide valuable insights for basic scientific inquiry and biological or biomedical applications. As proteins are crucial to diverse biological functions, their interactions illuminate the molecular and cellular mechanisms that govern both healthy and diseased states in organisms. Consequently, these networks facilitate the understanding of pathogenic and physiological processes involved in disease onset and progression.

      To construct this network, we first utilized the STRING database (https://string-db.org) to generate an initial network diagram using the differentially expressed genes. This diagram was subsequently imported into Cytoscape (version 3.9.1) for visualization and further analysis. Node size and color intensity reflect the density of interactions, indicating the relative importance of each protein. Figure 3G illustrates that IL1β was a central cytokine hub in the disease process of intestinal inflammation in Ctla-4-deficient zebrafish.

      Expression scale labeling:

      (a) Most gene expression scales are not clearly labeled: do they represent mean expression or scaled expression? Has the expression been log-transformed, and if so, which log (natural log? Log10? Log2?). See: Figure 3E, 3I, 4D, 4E, 5B, 5G, 5H, 6I.

      The gene expression scales are detailed in the figure legends. Specifically, Figures 3E, 3I, and 6I present heatmaps depicting row-scaled expression levels for the corresponding genes. In contrast, Figures 4D and 4E display heatmaps illustrating the mean expression of these genes. Additionally, the dot plots in Figures 5B, 5G, and 5H visualize the mean expression levels of the respective genes.

      (b) For some plots, diverging color schemes (i.e. with white/yellow in the middle) are used for non-diverging scales and would be better represented with a sequential color scale. See: 4D, 4E, and potentially others (not fully clear because of the previous point).

      The color schemes in Figures 4D and 4E have been updated to a sequential color scale. The gene expression data depicted in these figures represent mean expression values and have not undergone log transformation. This information has been incorporated into the figure legend for clarity.

      Lines 186-187: Though it is merely suggested, apoptotic gene expression can be upregulated as part of the dissociation process for single-cell RNAseq. This would be much stronger if supported by a staining, such as anti-Caspase 3.

      Following the reviewer's insightful recommendations, we conducted a TUNEL assay to evaluate apoptosis in the posterior intestinal epithelial cells of both wild-type and Ctla-4-deficient zebrafish. As expected, our results demonstrate a significant increase in epithelial cell apoptosis in Ctla-4-deficient zebrafish compared with wild-type fish. The corresponding data are presented in Figure S6D and have been incorporated into the manuscript. Detailed protocols for the TUNEL assay have also been included in the Materials and Methods section.

      Author response image 2.

      Illustrates the quantification of TUNEL-positive cells per 1 × 10<sup>4</sup> μm<sup>2/⁻</sup> in the posterior intestines of both wild-type (WT) and ctla-4<sup>⁻/⁻</sup> zebrafish (n = 5). The data demonstrate a comparative analysis of apoptotic cell density between the two genotypes.

      Lines 248-251: This manuscript demonstrates gut inflammation and also changes in microbial diversity, but I don't think it demonstrates an association between them, which would require an experiment that for instance rescues one of these changes and shows that it ameliorates the other change, despite still being a ctla-4 mutant.

      We appreciate the valuable comments from the reviewer. Recently, the relationship between inflammatory bowel disease (IBD) and gut microbial diversity has garnered considerable attention, with several key findings emerging from human IBD studies. For instance, patients with IBD (including ulcerative colitis and Crohn's disease) exhibit reduced microbial diversity, which is correlated with disease severity. This decrease in microbial richness is thought to stem from the loss of normal anaerobic bacteria, such as Bacteroides, Eubacterium, and Lactobacillus (Refs. 1-6). Research using mouse models has shown that inflammation increases oxygen and nitrate levels within the intestinal lumen, along with elevated host-derived electron acceptors, thereby promoting anaerobic respiration and overgrowth of Enterobacteriaceae (Ref 7). Consistent with these findings, our study observed a significant enrichment of Enterobacteriaceae in the inflamed intestines of Ctla-4-deficient zebrafish, which supporting the observations in mice. Despite this progress, the zebrafish model for intestinal inflammation remains under development, with limitations in available techniques for manipulating intestinal inflammation and reconstructing gut microbiota. These challenges hinder investigations into the association between intestinal inflammation and changes in microbial diversity. We plan to address these issues through ongoing technological advancements and further research. We thank the reviewer for their understanding.

      References:

      (1) Ott S J, Musfeldt M, Wenderoth D F, Hampe J, Brant O, Fölsch U R et al. (2004) Reduction in diversity of the colonic mucosa associated bacterial microflora in patients with active inflammatory bowel disease. Gut 53:685-693.

      (2) Manichanh C, Rigottier-Gois L, Bonnaud E, Gloux K, Pelletier E, Frangeul L et al. (2006) Reduced diversity of faecal microbiota in Crohn's disease revealed by a metagenomic approach. Gut 55:205-211.

      (3) Qin J J, Li R Q, Raes J, Arumugam M, Burgdorf K S, Manichanh C et al. (2010) A human gut microbial gene catalogue established by metagenomic sequencing. Nature 464:59-U70.

      (4) Sha S M, Xu B, Wang X, Zhang Y G, Wang H H, Kong X Y et al. (2013) The biodiversity and composition of the dominant fecal microbiota in patients with inflammatory bowel disease. Diagn Micr Infec Dis 75:245-251.

      (5) Ray K. (2015) IBD. Gut microbiota in IBD goes viral. Nat Rev Gastroenterol Hepatol 12:122.

      (6) Papa E, Docktor M, Smillie C, Weber S, Preheim S P, Gevers D et al. (2012) Non-Invasive Mapping of the Gastrointestinal Microbiota Identifies Children with Inflammatory Bowel Disease. Plos One 7: e39242-39254.

      (7) Hughes E R, Winter M G, Duerkop B A, Spiga L, de Carvalho T F, Zhu W H et al. (2017) Microbial Respiration and Formate Oxidation as Metabolic Signatures of Inflammation-Associated Dysbiosis. Cell Host Microbe 21:208-219.

      Lines 270-272 say that interaction between Cd28/ctla-4 and Cd80/86 was demonstrated through bioinformatics, flow-cytometry, and Co-IP. Does this need to reference Fig S6D for the flow data? Figures 7F-G are very hard to read or comprehend as they are very small. Figure 7H is the most compelling evidence of this interaction and might stand out better if emphasized with a sentence referencing it on its own in the manuscript. 

      In this study, we utilized an integrated approach combining bioinformatics prediction, flow cytometry, and co-immunoprecipitation (Co-IP) to comprehensively investigate and validate the interactions between Cd28/Ctla-4 and Cd80/86. Flow cytometry analysis, as depicted in Supplementary Figure 6D (revised as Supplementary Figure 8F), demonstrated the surface expression of Cd80/86 on HEK293T cells and quantified their interactions with Cd28 and Ctla-4. These experiments not only validated the interactions between Cd80/86 and Cd28/Ctla-4 but also revealed a dose-dependent relationship, providing robust supplementary evidence for the molecular interactions under investigation. Furthermore, in Figure 7F-G, the axis font sizes were enlarged to improve readability. Additionally, in response to reviewers' feedback, we have emphasized Figure 7H, which presents the most compelling evidence for molecular interactions, by including a standalone sentence in the text to enhance its prominence.

      For Figure 7A-E, for non-immunologists, it is unclear what experiment was performed here - it would be helpful to add a 1-sentence summary of the assay to the main text or figure legend.

      We apologize for this oversight. Figures 7A–E illustrate the functional assessment of the inhibitory role of Ctla-4 in Cd80/86 and Cd28-mediated T cell activation. A detailed description of the methodologies associated with Figures 7A–E is provided in the ‘Materials and Methods’ section of the revised manuscript.

      For Figure 7F-G, it is extremely hard to read the heat map legends and the X and Y-axis. Also, what the heatmaps show and how that fits the overall narrative can be elaborated significantly.

      We regret this oversight. To enhance clarity, we have increased the font size of the heatmap legends and the X and Y-axes, as shown in the following figure. Additionally, a detailed analysis of these figures is provided in lines 299–306 of the main text.

      In general, the main text that accompanies Figure 7 should be expanded to more clearly describe these experiments/analyses and their results.

      We have conducted a detailed analysis of the experiments and results presented in Figure 7. This analysis is described in lines 278-314.

      Reviewer #2:

      The scRNASeq assay is missing some basic characterization: how many WT and mutant fish were assayed in the experiment? how many WT and mutant cells were subject to sequencing? Before going to the immune cell types, are intestinal cell types comparable between the two conditions? Are there specific regions in the tSNE plot in Figure 4A abundant of WT or ctla-4 mutant cells?

      In the experiment, we analyzed 30 wild-type and 30 mutant zebrafish for scRNA-seq, with an initial dataset comprising 8,047 cells in the wild-type group and 8,321 cells in the mutant group. Sample preparation details are provided on lines 620-652. Due to the relatively high expression of mitochondrial genes in intestinal tissue, quality control filtering yielded 3,263 cells in the wild-type group and 4,276 cells in the mutant group. Given that the intestinal tissues were dissociated using identical protocols, the resulting cell types are comparable between the two conditions. Both the wild-type and Ctla-4-deficient groups contained enterocytes, enteroendocrine cells, smooth muscle cells, neutrophils, macrophages, B cells, and a cluster of T/NK/ILC-like cells. Notably, no distinct regions were enriched for either condition in the tSNE plot (Figure 4A).

      The cell proliferation experiment using PHA stimulation assay demonstrated the role of Ctla-4 in cell proliferation, while the transcriptomic evidence points towards activation rather than an overall expansion of T-cell numbers. This should be discussed towards a more comprehensive model of how subtypes of cells can be differentially proliferating in the disease model.

      In the PHA-stimulated T cell proliferation assay, we aimed to investigate the regulatory roles of Ctla-4, Cd28, and Cd80/86 in T cell activation, focusing on validating Ctla-4's inhibitory function as an immune checkpoint. While our study examined general regulatory mechanisms, it did not specifically address the distinct roles of Ctla-4 in different T cell subsets. We appreciate the reviewer's suggestion to develop a more comprehensive model that elucidates differential T cell activation across various subsets in disease models. However, due to the nascent stage of research on fish T cell subsets and limitations in lineage-specific antibodies and transgenic strains, such investigations are currently challenging. We plan to pursue these studies in the future. Despite these constraints, our single-cell RNA sequencing data revealed an increased proportion of Th2 subset cells in Ctla-4-deficient zebrafish, as evidenced by elevated expression levels of Th2 markers (Il4, Il13, and Gata3) via RT-qPCR (see Figures S7B). Notably, recent studies in mouse models have shown that naïve T cells from CTLA-4-deficient mice tend to differentiate into Th2 cells post-proliferation, with activated Th2 cells secreting higher levels of cytokines like IL-4, IL-5, and IL-13, thereby exerting their effector functions (Refs. 1-2). Consequently, our findings align with observations in mice, suggesting conserved CTLA-4 functions across species. We have expanded the "Discussion" section to clarify these points.

      References:

      (1) Bour-Jordan H, Grogan J L, Tang Q Z, Auger J A, Locksley R M, Bluestone J A et al. (2003) CTLA-4 regulates the requirement for cytokine-induced signals in T<sub>H</sub>2 lineage commitment. Nature Immunology 4: 182-188.

      (2) Khattri Roli, Auger, Julie A, Griffin Matthew D, Sharpe Arlene H, Bluestone Jeffrey A et al. (1999) Lymphoproliferative Disorder in CTLA-4 Knockout Mice Is Characterized by CD28-Regulated Activation of Th2 Responses. The Journal of Immunology 162:5784-5791.

      It would be nice if the authors could also demonstrate whether other tissues in the zebrafish have an inflammation response, to show whether the model is specific to IBD.

      In addition to intestinal tissues, we also performed histological analysis on the liver of Ctla-4-deficient zebrafish. The results showed that Ctla-4 deficiency led to mild edema in a few hepatocytes, and lymphocyte infiltration was not significant. Compared to the liver, we consider intestinal inflammation to be more pronounced.

      Some minor comments on terminology

      (a) "multiomics" usually refers to omics experiments with different modalities (e.g. transcriptomics, proteomics, metabolomics etc), while the current paper only has transcriptomics assays. I wouldn't call it "multiomics" analysis.

      We appreciate the reviewer's attention to this issue. The "multi-omics" has been revised to "transcriptomics".

      (b) In several parts of the figure legend the author mentioned "tSNE nonlinear clustering" (Figures 4A and 5A). tSNE is an embedding method rather than a clustering method.

      The "tSNE nonlinear clustering" has been revised to "tSNE embedding”.

      (c) Figure 1E is a UMAP rather than tSNE.

      The "tSNE" has been revised to "UMAP" in the figure legend in line 1043.

      Reviewer #3: 

      Line 28: The link is not directly reflected in this sentence describing CTLA-4 knockout mice.

      We appreciate the reviewer for bringing this issue to our attention. We have expanded our description of CTLA-4 knockout mice on lines 77-84.

      Line 80-83: There is a lack of details about the CTLA-4-deficient mice. The factor that Th2 response could be induced has been revealed in mouse model. See the reference entitled "CTLA-4 regulates the requirement for cytokine-induced signals in TH2 lineage commitment" published in Nature Immunology.

      We thank the reviewer for providing valuable references. We have added descriptions detailing the differentiation of T cells into Th2 cells in CTLA-4-deficient mice on lines 78–81, and the relevant references have been cited in the revised manuscript.

      To better introduce the CTLA-4 immunobiology, the paper entitled "Current Understanding of Cytotoxic T Lymphocyte Antigen-4 (CTLA-4) Signaling in T-Cell Biology and Disease Therapy" published in Molecules and Cells should be referred.

      We have provided additional details on CTLA-4 immunology (lines 75-84) and have included the relevant reference in the revised manuscript.

      In current results, there are many sentences that should be moved to the discussion, such as lines 123-124, lines 152-153, lines 199-200, and lines 206-207. So, the result sections just describe the results, and the discussions should be put together in the discussion.

      We have relocated these sentences to the 'Discussion' section and refined the writing.

      In the discussion, the zebrafish enteritis model, such as DSS/TNBS and SBMIE models, should also be compared with the current CTLA-4 knockout model. Also, the comparison between the current fish IBD model and the previous mouse model should also be included, to enlighten the usage of CTLA-4 knockout zebrafish IBD model.

      We compared the phenotypes of our current Ctla-4-knockout zebrafish IBD model with other models, including DSS-induced IBD models in zebrafish and mice, as well as TNBS- and SBM-induced IBD models in zebrafish. The details are included in the "Discussion" section (lines 353-365).

      As to the writing, the structure of the discussion is poor. The paragraphs are very long and hard to follow. Many findings from current results were not yet discussed. I just can't find any discussion about the alteration of intestinal microbiota.

      In response to the reviewers' constructive feedback, we have revised and enhanced the discussion section. Furthermore, we have integrated the most recent research findings relevant to this study into the discussion to improve its relevance and comprehensiveness.

      In the discussion, the aerobic-related bacteria in 16s rRNA sequencing results should be focused on echoing the histopathological findings, such as the emptier gut of CTLA-4 knockout zebrafish.

      As mentioned above, the discussion section has been revised and expanded to provide a better understanding of the potential interplay among intestinal inflammatory pathology, gut microbiota alterations, and immune cell dysregulation in Ctla-4-deficient zebrafish. Furthermore, promising avenues for future research that warrant further investigation were also discussed.

      In the current method, there are no descriptions for many used methods, which already generated results, such as WB, MLR, MST, Co-IP, AlphaFold2 prediction, and how to make currently used anti-zfCTLA4 antibody. Also, there is a lack of description of the method of the husbandry of knockout zebrafish line.

      We regret these flaws. The methods section was inadvertently incomplete due to an error during the file upload process at submission. This issue has been rectified in the revised manuscript. Additionally, Ctla-4-deficient zebrafish were reared under the same conditions as wild-type zebrafish, and the rearing methods are now described in the "Generation of Ctla-4-deficient zebrafish" section of the Materials and Methods.

      Line 360: the experimental zebrafish with different ages could be a risk for unstable intestinal health. See the reference entitled "The immunoregulatory role of fish-specific type II SOCS via inhibiting metaflammation in the gut-liver axis" published in Water Biology and Security. The age-related differences in zebrafish could be observed in the gut.

      We appreciate the reviewers' reminders. The Ctla-4 mutant zebrafish used in our experiments were 4 months old, while the wild-type zebrafish ranged from 4 to 6 months old. These experimental fish were relatively young and uniformly distributed in age. During our study, we examined the morphological structures of the intestines in zebrafish aged 4 to 6 months and observed no significant abnormalities. These findings align with previous research indicating no significant difference in intestinal health between 3-month-old and 6-month-old wild-type zebrafish (Ref. 1). Consequently, we conclude that there is no notable aging-related change in the intestines of zebrafish aged 4 to 6 months. This reduces the risk associated with age-related variables in our study. We have added an explanation stating that the Ctla-4 mutant zebrafish used in the experiments were 4 months old (Line 449) in the revised manuscript.

      Reference

      (1) Shan Junwei, Wang Guangxin, Li Heng, Zhao Xuyang et al. (2023) The immunoregulatory role of fish-specific type II SOCS via inhibiting metaflammation in the gut-liver axis. Water Biology and Security 2: 100131-100144.

      Section "Generation of Ctla-4-deficient zebrafish": There is a lack of description of PCR condition for the genotyping.

      The target DNA sequence was amplified at 94 °C for 4 min, followed by 35 cycles at 94°C for 30 s, 58°C for 30 s and 72°C for 30 s, culminating in a final extension at 72 °C for 10 min. The polymerase chain reaction (PCR) conditions are described in lines 458-460.

      How old of the used mutant fish? There should be a section "sampling" to provide the sampling details.

      The "Sampling" information has been incorporated into the "Materials and Methods" section of the revised manuscript. Wild-type and Ctla-4-deficient zebrafish of varying months were housed in separate tanks, each labeled with its corresponding birth date. Experiments utilized Ctla-4-deficient zebrafish aged 4 months and wild-type zebrafish aged between 4 to 6 months.

      Line 378-380: The index for the histopathological analysis should be detailed, rather than just provide a reference. I don't think these indexes are good enough to specifically describe the pathological changes of intestinal villi and mucosa. It is suggested to improve with detailed parameters. As described in the paper entitled "Pathology of Gastric Intestinal Metaplasia: Clinical Implications" published in Am J Gastroenterol., histochemical, normal gastric mucins are pH neutral, and they stain magenta with periodic acid-Schiff (PAS). In an inflamed gut, acid mucins replace the original gastric mucins and are stained blue with Alcian blue (AB). So, to reveal the pathological changes of goblet cells and involved mucin components, AB staining should be added. Also, for the number of goblet cells in the inflammatory intestine, combining PAS and AB staining is the best way to reveal all the goblet cells. In Figure 2, there were very few goblet cells. The infiltration of lymphocytes and the empty intestinal lumen could be observed. Thus, the ratio between the length of intestinal villi and the intestinal ring radius should calculated.

      In response to the reviewers’ valuable suggestions, we have augmented the manuscript by providing additional parameters related to the pathological changes observed in the Ctlta-4-deficient zebrafish intestines, including the mucin component changes identified through PAS and AB-PAS staining, the variations in the number of goblet cells evaluated by AB-PAS staining, and the ratio of intestinal villi length to the intestinal ring radius, as illustrated in the following figures. These new findings are detailed in the "Materials and Methods" (lines 563-566) and "Results" (lines 143-146) sections, along with Supplementary Figure S3 of the revised manuscript.

      Section "Quantitative real-time PCR": What's the machine used for qPCR? How about the qPCR validation of RNA seq data? I did not see any related description of data and methods for qPCR validation. In addition, beta-actin is not a stable internal reference gene, to analyze inflammation and immune-related gene expression. See the reference entitled "Actin, a reliable marker of internal control?" published in Clin Chim Acta. Other stable housekeeping genes, such as EF1alpha and 18s, could be better internal references.

      RT-qPCR experiments were conducted using a PCR thermocycler device (CFX Connect Real-Time PCR Detection System with Precision Melt Analysis<sup>TM</sup> Software, Bio-Rad, Cat. No. 1855200EM1). This information has been incorporated into lines 608-610 of the "Materials and Methods" section. In these experiments, key gene sequences of interest, including il13, mpx, and il1β, were extracted from RNA-seq data for RT-qPCR validation. To ensure accurate normalization, potential internal controls were evaluated, and β-actin was identified as a suitable candidate due to its consistent expression levels in the intestines of both wild-type and Ctla-4-deficient zebrafish. The use of β-actin as an internal control is further supported by its application in recent studies on intestinal inflammation (Refs 1–2).

      References:

      (1) Tang Duozhuang, Zeng Ting, Wang Yiting, Cui Hui et al. (2020) Dietary restriction increases protective gut bacteria to rescue lethal methotrexate-induced intestinal toxicity. Gut Microbes 12: 1714401-1714422.

      (2) Malik Ankit, Sharma Deepika et al. (2023) Epithelial IFNγ signaling and compartmentalized antigen presentation orchestrate gut immunity. Nature 623: 1044-1052.

      How to generate sCtla-4-Ig, Cd28-Ig and Cd80/86? No method could be found.

      We apologize for the omission of these methods. The detailed protocols have now been added to the "Materials and Methods" section of the revised manuscript (lines 464-481).

      Figure 5: As reviewed in the paper entitled "Teleost T and NK cell immunity" published in Fish and Shellfsh Immunology, two types of NK cell homologues have been described in fish: non-specific cytotoxic cells and NK-like cells. There is no NKT cell identified in the teleost yet. Therefore, "NKT-like" could be better to describe this cell type.

      We refer to "NKT" cells as "NKT-like" cells, as suggested.

      For the supplementary data of scRNA-seq, there lacks the details of expression level.

      The expression levels of the corresponding genes are provided in Supplemental Table 4.

      Supplemental Table 1: There are no accession numbers of amplified genes.

      The accession numbers of the amplified genes are included in Supplemental Table 1.

      The English needs further editing.

      We have made efforts to enhance the English to meet the reviewers' expectations.

      Line 32: The tense should be the past.

      This tense error has been corrected.

      Line 363-365: The letter of this approval should be provided as an attachment.

      The approval document is provided as an attachment.

      Line 376: How to distinguish the different intestinal parts? Were they judged as the first third, second third, and last third parts of the whole intestine?

      The differences among the three segments of zebrafish intestine are apparent. The intestinal tube narrows progressively from the anterior to the mid-intestine and then to the posterior intestine. Moreover, the boundaries between the intestinal segments are well-defined, facilitating the isolation of each segment.

      Line 404: Which version of Cytoscape was used?

      The version of Cytoscape used in this study is 3.9.1. Information about the Cytoscape version is provided on line 603.

      The product information of both percoll and cell strainer should be provided.

      The information regarding Percoll and cell strainers has been added on lines 626 and 628, respectively.

      Line 814: Here should be a full name to tell what is MST.

      The acronym MST stands for "Microscale Thermophoresis", a technique that has been referenced on lines 1157-1158.

    1. Author Response

      The following is the authors’ response to the original reviews.

      In this manuscript, Xie et al report the development of SCA-seq, a multiOME mapping method that can obtain chromatin accessibility, methylation, and 3D genome information at the same time. This method is highly relevant to a few previously reported long read sequencing technologies. Specifically, NanoNome, SMAC-seq, and Fiber-seq have been reported to use m6A or GpC methyltransferase accessibility to map open chromatin, or open chromatin together with CpG methylation; Pore-C and MC-3C have been reported to use long read sequencing to map multiplex chromatin interactions, or together with CpG methylation. Therefore, as a combination of NanoNome/SMAC-seq/Fiber-seq and Pore-C/MC-3C, SCA-seq is one step forward. The authors tested SCA-seq in 293T cells and performed benchmark analyses testing the performance of SCA-seq in generating each data module (open chromatin and 3D genome). The QC metrics appear to be good and the methods, data and analyses broadly support the claims. However, there are some concerns regarding data analysis and conclusions, and some important information seems to be missing.

      1. The chromatin accessibility tracks from SCA-seq seem to be noisy, with higher background than DNase-seq and ATAC-seq (Fig. 2f, Fig. 4a and Fig. S5). Also, SCA-seq is much less sensitive than both DNase-seq and ATAC-seq (Figs. 2a and 2b). This and other limitations of SCA-seq (high background, high sequencing cost, requirement of specific equipment, etc) need to be carefully discussed.

      We thank the reviewer for the important comment about noisy GpC methylation signal in SCA-seq. We acknowledge that the SCA-seq signal presented in Fig. 2f, Fig. 4a, and Fig. S5 in our first draft was indeed noisy, as we present the raw 1D genomic signal. In this revision, we have taken steps to reduce the noise in GpC methylation signal by identifying the accessible regions on each segment of every single molecule. For each segment, we performed the sliding window analysis (50bp window sliding by a 10 bp step) with binomial test to identify accessible windows that significantly deviate from background GpC methylation ratio. The overlapping accessible windows (p < 0.05 for binomial test and contain at least two GpC sites) on the single fragments are merged as accessible region. Then we retain the GpC methylation signal inside the accessible region to reduce the background noise (Sfig 5ab). The details of the noise filtering steps are described in the Methods section (page 22 lines 13-23).

      Visually, we can observe from the updated exemplary view of 1D signal track that the noise is dramatically reduced in filtered SCA-seq GpC methylation signal compared to the raw signal (Sfig5c). The clean SCA-seq GpC methylation 1D signals were also updated (Fig2f and Fig4a). We have observed an increase in the TSS enrichment score, which is a commonly used metric for assessing the signal-to-noise ratios in ATAC-seq data quality control. Specifically, the TSS enrichment score increased to 2.74 when using the filtered signal, compared to 1.93 when using the raw signal (Sfig5d). After noise filtering, 80% of SCA-seq 1D peaks overlaps with peaks called by ATAC-seq and/or DNase-seq (Fig2ab), compared to 74% from the raw signal in the first draft.

      We thank the reviewer for raising up the concern about the sequencing cost and requirement of specific equipment. The sequencing cost is approximately 1300 USD per sample to sequence 30X depth human sample and obtain saturated GpC methylation signal (Sfig4d) as well as loop signal similar to the NGS-based Hi-C (Fig3gh). Considering that SCA-seq simultaneously provides higher-order chromatin structure and chromatin accessibility at single molecule resolution, we believe the cost is acceptable. However, it is worth noting that SCA-seq requires a regular Oxford nanopore sequencer with R9.4.1 chip, which is currently available but might be discontinued by Oxford Nanopore in the future. We have addressed all these concerns in the discussion section.

      1. In Fig. 2f, many smaller peaks are present besides the major peaks. Are they caused by baseline DNA methylation? How many of the small methylation signals are called peaks? In Fig. 4a, it seems that the authors define many more enhancers from SCA-seq data than what will be defined from ATAC-seq or DHS. Are those additional enhancers false positives? Also, it is difficult to distinguish the gray "inaccessible segments" from the light purple "accessible segments.

      We thank the reviewer for bringing up these concerns.

      Regarding the smaller peaks in the 1D genomic GpC methylation signal, we have addressed this issue by implementing the noise filtering in this revision, the small peaks on 1D tracks are greatly reduced (Fig2f, Sfig5c). It is important to note that SCA-seq generates accessibility signals specifically on ligation junctions, which differs from the one-dimensional (1D) signals obtained through ATAC-seq or DNase-seq. The presence of remaining small peaks in the SCA-seq data can be attributed to the varied sequencing depth, which is influenced by the enriched spatial interactions occurring in regions of the genome that are enriched with ligation junctions. In general, the SCA-seq 1D peaks are well correlated with the high confidence peaks from 1D track of ATAC-seq and DNase-seq (Fig2b).

      We apologize for the lack of clarity in our enhancer annotation. The enhancer regions were obtained from The Ensembl Regulatory Build (PMID: 25887522). We have now included this information in the method section (page 24 line 16).

      We thank the reviewer for pointing out this visualization problem. The color scheme has been revised, with purple now representing the inaccessible segments and yellow representing the accessible segments.

      1. For 3D genome analysis, it is important to provide information about data yield from SCA-seq. With 30X sequencing depth, how many contacts are obtained (with long-read sequencing, this should be the number of ligation junctions)? How is the number compared to Hi-C.

      We thank the reviewer for raising up this crucial point about the sequencing yield that we missed. We have now included this information in the revised result section (page 11, lines 11-14).

      We have checked the public data of a successful HEK293T Hi-C run (PMID: 34400762). The Hi-C experiment produced 699,464,541 reads (105G base), and we obtained 388,031,859 contacts.

      From 100G bases of HEK293T SCA-seq data, we obtained 81,229,369 ligation junctions and 378,848,187 virtual pairwise contacts (3.8M pairwise contacts per Gb). The SCA-seq performance of virtual pairwise contact number per Gb is similar to that of PORE-C (PMID: 35637420).

      1. Fig 3j. Because SCA-seq only do GpC methylation, the capability to detect the footprint at individual CTCF peaks depends on the density of GpC nearby. Have the authors taken GpC density into account when defining CTCF sites with or without footprint?

      We appreciate the reviewer for bringing up the concern about the GpC site density at CTCF site. We would like to highlight that Battaglia et al. have demonstrated the feasibility of identifying transcription factor binding events using GpC labeling (PMID: 36195755). In our study, we have implemented a high-resolution sliding window approach to enhance the sensitivity of CTCF binding detection. We have taken GpC density into account by performing a sliding window (50 bp window, 10 bp step) binomial test on every single molecule overlapping with CTCF site to call accessible region. The detailed steps to call accessible region has been described in the answer of the first question. Based on the pattern in Fig3j, we identify CTCF footprints if the accessible regions are called nearby the CTCF sites (at least 20 bp away from the center of CTCF sites) but not on the CTCF sites.

      To ensure that the GpC site density is sufficient for binomial test of each sliding window of the regions around CTCF site genome-wide, we examined the number of GpC sites in each window. Our analysis revealed that GpC sites are evenly distributed, and over 87% of the windows contain at least 2 GpC sites, which qualifies them for a binomial test (Author response image 1). This indicates that we are able to detect the CTCF footprint at most of the CTCF sites, taking into consideration the GpC density.

      Author response image 1.

      Genome wide GpC site density at CTCF site centered region. Distribution of the number of GpC sites (y-axis) at each 50 bp sliding window region (x-axis) was presented in violin plots.

      1. This study only performs higher resolution chromatin interaction analysis based on individual read concatenates. It is unclear to me if the data have enough depth to perform loop analysis with Hi-C pipelines.

      We thank the reviewer for highlighting this important concern about the depth of data for performing loop analysis. We have performed Aggregate peak analysis for SCA-seq and Hi-C side-by-side using hiccups function in Juicer (v1.9.9) (PMID: 27467249). We acknowledge that the level of loop signal enrichment is relatively weaker (one-fold less) in SCA-seq compared to Hi-C (Fig3h). This difference can be attributed to the lower sequencing yield per Gb in SCA-seq, which resulted in 4.93M pairwise contacts per Gb, compared to the 7M contacts per Gb in Hi-C. Despite this discrepancy, we were still able to observe the clear genome-wide loop enrichment pattern in SCA-seq (Fig3gh).

      1. It appears that SCA-seq is of low efficiency in detecting chromatin interactions. As shown in Fig. S7a, 65.4% of sequenced reads contained only one restriction enzyme (RE) fragment/segment (with no genomic contact), which is much higher than that reported in published PORE-C methods. In addition, Fig. S7g is very confusing and in conflict with Fig. S7a. For example, in Fig. S7g, 21.4% and 22.2% of CSA-seq concatemers contain one and two segments, whereas the numbers are 65.4% and 14.7% in Fig. S7a, respectively. Please explain.

      We apologize for the confusion in sfig7a and sfig7g.

      Sfig7a was intended to illustrate the cardinality count of concatemers with only chr7 segments included, representing the intra-chromosome cardinality instead of the genome-wide cardinality. We have revised sfig7a and its corresponding figure legend to clarify that the figure describes segments of intra-chromosome interactions.

      On the other hand, sfig7g shows the concatemers including both intra-chromosome and inter-chromosome segments, which explains the differences in the percentages of different cardinality ranges compared to Figure S7a. Moreover, the percentages reported in Figure S7g are similar to what is typically reported in PORE-C methods when considering both intra- and inter-chromosome interactions.

      To provide a comprehensive view of the genome-wide concatemer cardinality distribution, we have also included a histogram in Fig3k, which demonstrates the detailed distribution of cardinality for genome-wide concatemers.

      1. I disagree with the rationale of the entire Fig. S9. Biologically there is no evidence that chromatin accessibility will change due to genome interactions (the opposite is more likely), therefore the definition of "expected chromatin accessibility" is hard to believe. If the authors truly believe this is possible, they will need to test their hypothesis by deleting cohesin and check if the chromatin accessibility driven by "power center" are truly abolished. The math in Fig. S9 is also confusing. Firstly, the dimension of the contact matrix in Fig. S9 appears to be wrong, it should have 8 rows. Secondly, I don't understand why the interaction matrix is not symmetric. Third, if I understand correctly the diagonal of the matrix should be all 1, it is also hard to understand why the matrix only has 1, 0 or -1. It appears that the authors assume that the observed accessibility is a simple sum of the expected accessibility of all its interacting regions; this is wrong. In my opinion, the whole Fig. S9 should be deleted unless the authors can make sense of it and ideally also provide more evidence.

      I apologize for any confusion caused by the rationale and figures in Fig. S9. The purpose of the hypothesis presented in the figure is to explore the potential relationship between chromatin accessibility and genome interactions. While there is currently no direct biological evidence supporting this hypothesis, it is a possibility that warrants further investigation.

      Regarding the suggestion to delete Fig. S9 unless more evidence is provided, it is important to note that this paper primarily focuses on the methodology and theoretical framework. Experimental validation of the hypothesis falls outside the scope of this particular study.

      We have made corrections to the schematic matrix in Fig. S9 to accurately represent the dimensions and symmetry. The numbers in the matrix represent mean accessible values of the contacts. Specifically, accessible-accessible contacts are represented by 2, accessible-inaccessible contacts are represented by 0, and inaccessible-inaccessible contacts are represented by -2.

      Minor concerns:

      1. The authors may want to clearly demonstrate the specificity and sensitivity of the ATAC part and the efficiency of the Hi-C part of SCA-seq.

      We appreciate the reviewer’s suggestion to demonstrate the specificity and sensitivity of the ATAC-seq part and the efficiency of the Hi-C part in SCA-seq.

      We considered the non-peak region genomic bins shared by ATAC-seq and DNase-seq as true negatives and the overlapping peaks of ATAC-seq and DNase-seq as true positives. Based on these criteria, the specificity of SCA-seq 1D peaks is calculated as TN / N, where TN represents the number of true negatives (89107) and N represents the sum of true negatives and false positives (89107 + 9345). The resulting specificity is 0.91. The sensitivity of SCA-seq 1D peaks is calculated as TP / P, where TP represents the number of true positives (33190) and P represents the sum of true positives and false negatives (33190 + 11758). The resulting sensitivity is 0.73.

      We evaluate the efficiency of spatial interaction by the restriction enzyme digested fragments recovered in the pairwise contacts that contain ligation junctions. In SCA-seq, the efficiency is calculated as the number of dpnII digested fragments recovered by pairwise contacts (5625908) divided by the total number of in silico dpnII digested fragments (7127633). The resulting efficiency is 0.79.

      We have now included this information in the revised result section (page 8 lines 15-18)

      1. Fig 4g, colors with apparent differences might be used to clearly discriminate the three types of interactions (I-I, I-A and A-A).

      We appreciate the reviewer for bringing up the issue regarding the visualization in Fig 4g. The color scheme has been revised, with purple now representing I-I interactions, orange representing I-A interactions, and red representing A-A interactions. We believe that these modifications have significantly improved the clarity.

      1. Fig. 4c, when fitting an unknown curve, R-square becomes meaningless.

      We appreciate the reviewer for pointing out the issue regarding the interpretation of R-square. We have removed the R-square value from Fig. 4c.

      1. Fig 5a, "oCGIs comprised 65% CGIs that did not directly contact enhancers or promoters". Should it be "oCGIs comprised 65% of all CGIs"?

      We appreciate the reviewer for pointing out the clarification needed in Fig 5a. We have revised the phrase in the figure legend to accurately state that “oCGIs comprised 65% of all CGIs”. Thank you for bringing this to our attention.

      1. Page 15 lines 5-8, "By examining the methylation status on reads, as expected, these read segments demonstrated lower CpG methylation and higher chromatin accessibility (GpC methylation), which further supports their roles in gene activation (Fig 5b)". This statement seems to be inconsistent with the figure legend.

      We appreciate the reviewer for pointing out the inconsistency in the legend of Fig 5b. We have revised the legend of Fig 5b to accurately highlight the low CpG methylation on oCGI regions. Thank you for bringing this to our attention.

      1. Language editing and proof reading are needed.

      I apologize for any errors or mistakes in the language. We have carefully reviewed the manuscript and made the necessary language editing and proofreading revisions to ensure its quality for publication.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1:

      (1) The mechanism by which fenofibrate rescues memory loss in Kallistatin-transgenic mice is unclear. As a PPARalpha agonist, does fenofibrate target the Kallistatin pathway directly or indirectly? Please provide a discussion based on literature supporting either possibility.

      Thank you for your important suggestion. Fenofibrate is indeed acting as a PPARα agonist. Fenofibrate has been shown to protect memory and cognitive function by downregulating α- and β-secretases[1]. Activation of PPARα can reduce Aβ plaques by upregulating ADAM10, thereby protecting memory and cognition[2]. Whereas, Fenofibrate can also act through a PPARα-independent pathway[3]. In our previous study, we proved that Fenofibrate can directly down-regulate the expression of Kallistatin in hepatocytes[4]. Here, our findings showed that Kallistatin induces cognitive memory deterioration by increasing amyloid-β plaques accumulation and tau protein hyperphosphorylation (Fig. 1-3), and Fenofibrate can directly down-regulate the serum level of Kallistatin (Fig. 8G). In addition, the expression of PPARα in the hippocampal tissue of Kallistatin (KAL-TG) mice showed no significant difference compared to the WT group (Author response image 1A-B). Therefore, we think Fenofibrate may improve memory and cognitive function at least in part through a PPARα-independent effect, which provides a new mechanism of Fenofibrate in AD with elevated Kallistatin levels.

      Author response image 1.

      (A-B) Protein levels of PPARα were tested by western blot analysis in hippocampal tissue, then statistically analyzed the above results.

      (2) The current study exclusively investigated the hippocampus. What about other cognitive memory-related regions, such as the prefrontal cortex? Including data from these regions or discussing the possibility of their involvement could provide a more comprehensive understanding of the role of Kallistatin in memory impairment.

      Thank you for your suggestion. In addition to hippocampal tissue analysis, we performed immunohistochemical detection of Aβ and phosphorylated Tau levels in the prefrontal cortex. Our findings revealed that KAL-TG mice exhibited significantly elevated Aβ and phosphorylated Tau levels in the prefrontal cortex compared to WT mice. These observations align with the pathological patterns observed in hippocampal tissues, demonstrating consistent neurodegenerative pathology across both the hippocampus and prefrontal cortex. The data for this part are seen as follows.

      Author response image 2.

      (A-B) Immunofluorescence staining of Aβ and phosphorylated tau (p-tau T231) was carried out in the prefrontal cortex tissue of KAL-TG and WT mice. Error bars represented the Standard Error of Mean (SEM); **p < 0.01. Scale bar, 100 μm.

      (3) Fenofibrate rescued phenotypes in Kallistatin-transgenic mice while rosiglitazone, a PPARgamma agonist, did not. This result contradicts the manuscript's emphasis on a PPARgamma-associated mechanism. Please address this inconsistency.

      Thank you for the reminder. In fact, our results showed a trend towards improved memory and cognitive function in KAL-TG mice treated with Rosiglitazone, although its effect is not as significant as that of Fenofibrate. Several studies have reported that Rosiglitazone has a beneficial effect on memory and cognitive function in mouse models of dementia, while these studies involve treatment periods of 3 to 4 months[5, 6], whereas our treatment period was only one month. Extending the treatment period with Rosiglitazone may result in a more pronounced improvement. In addition, Fenofibrate may have a PPAR-independent pathway by downregulating Kallistatin directly as discussed above and then show stronger effects.

      (4) Most of the immunohistochemistry images are unclear. Inserts have similar magnification to the original representative images, making judgments difficult. Please provide larger inserts with higher resolution.

      According to your suggestion, we provided larger inserts with higher resolution in Fig 3A and Fig 4B, as follows:

      (5) The immunohistochemistry images in different figures were taken from different hippocampal subregions with different magnifications. Please maintain consistency, or explain why CA1, CA3, or DG was analyzed in each experiment.

      Thank you for your advice. The trends of changes in different brain regions(including CA1, CA3, or DG) are consistent. Following your suggestion, we have now selected the DG region replaced the different hippocampal subregions with the DG area, and re-conducted the statistical analysis in Fig 5I & 6C, as follows. Due to the significant deposition of Aβ only in the CA1 region, Fig 2A was not replaced.

      (6) Figure 5B is missing a title. Please add a title to maintain consistency with other graphs.

      Thanks for your suggestion. We have added a title to Figure 5B, as follows:

      (7) Please list statistical methods used in the figure legends, such as t-test or One-way ANOVA with post-hoc tests.

      Thanks for your suggestion. We have listed the statistical methods used in the figure legends.

      Reviewer #2:

      (1) It was suggested that Kallistatin is primarily produced by the liver. The study demonstrates increased Kallistatin levels in the hippocampus tissue of AD mice. It would be valuable to clarify if Kallistatin is also increased in the liver of AD mice, providing a comprehensive understanding of its distribution in disease states.

      Thank you for your suggestion. We extracted liver tissue from APP/PS1 mice, and the Western blot results indicated that the expression of Kallistatin in the liver of APP/PS1 mice was elevated, as follows:

      Author response image 3.

      (A-B) Protein levels of Kallistatin were tested by western blot analysis in the liver tissue, then statistically analyzed the above results. Error bars represented the Standard Error of Mean (SEM); **p < 0.01.

      (2) Does Kallistatin interact directly with Notch1 ligands? Clarifying this interaction mechanism would enhance understanding of how Kallistatin influences Notch1 signaling in AD pathology.

      Thank you for your suggestion. This study reveals that Kallistatin directly binds to Notch1 and contributes to the activation of the Noch1-HES1 signaling pathway. As for whether Kallistatin can bind to the ligands of Notch1, it needs to conduct further investigations in future studies. Our preliminary data showed that Jagged1 was upregulated in the hippocampal tissues of KAL-TG mice by qPCR and Western blot analyses.

      Author response image 4.

      Kallistatin promoted Notch ligand Jagged1 expression to activate Notch1 signaling. (A) QPCR analysis of Notch ligands (Dll1, Dll3, Jagged1, Jagged2) expression in the 9 months hippocampus tissue. (B) Western blotting analysis of Notch ligand Jagged1 expression in the hippocampus tissue. (C) Western blotting analysis of Notch ligand Jagged1 expression in the hippocampus primary neuron. β-actin served as the loading control. Error bars represented the Standard Error of Mean (SEM); *p < 0.05.

      (3) Is there any observed difference in AD phenotype between male and female Kallistatin-transgenic (KAL-TG) mice? Including this information would address potential gender-specific effects on cognitive decline and pathology.

      Thank you for your suggestion. Actually, we have previously used female mice for Morris Water Maze experiments, and the results showed that both male and female KAL-TG mice exhibited a phenotype of decreased memory and cognitive function compared to the gender-matched WT group, while there was no significant difference between male and female KAL-TG mice as follows:

      Author response image 5.

      (A-D) Behavioral performance was assessed through the Morris water maze test. (A) The escape latency time was presented during 1-5 days. (B-D) Cognitive functions were evaluated by spatial probe test on day 6, then analyzing each group of mice crossing platform times(B), time percent in the targeted area (C), and the path traces heatmap (D). Error bars represented the Standard Error of Mean (SEM); F represents Female, M represents Male, and TG refers to KAL-TG; *p < 0.05.

      (4) It is recommended to include molecular size markers in Western blots for clarity and accuracy in protein size determination.

      Thank you for your reminder. We have shown the molecular weight of each bolt.

      (5) The language should be revised for enhanced readability and clarity, ensuring that complex scientific concepts are communicated effectively to a broader audience.

      According to your suggestion, we have polished the article for enhancing readability and clarity.

      Reviewer #3:

      (1) The authors did not illustrate whether the protective effect of fenofibrate against AD depends on Kallistatin.

      Thank you for your important suggestion. Fenofibrate is indeed acting as a PPARα agonist. Fenofibrate has been shown to protect memory and cognitive function by downregulating α- and β-secretases[1]. Activation of PPARα can reduce Aβ plaques by upregulating ADAM10, thereby protecting memory and cognition[2]. Whereas, Fenofibrate can also act through a PPARα-independent pathway[3]. In our previous study,we proved Fenofibrate can directly down-regulate the expression of KAL in hepatocytes[4]. Here, our findings showed that Kallistatin induces cognitive memory deterioration by increasing amyloid-β plaques accumulation and tau protein hyperphosphorylation (Fig. 1-3), and Fenofibrate can directly down-regulate the serum level of Kallistatin (Fig. 8G). In addition, the expression of PPARα in the hippocampal tissue of Kallistatin (KAL-TG) mice showed no significant difference compared to the WT group (Author response image 1-B). Therefore, we think Fenofibrate may improve memory and cognitive function at least in part through downregulatin Kallistatin. To conclusively determine whether fenofibrate’s therapeutic effects depend on Kallistatin, future studies should employ Kallistatin-knockout AD animal models to evaluate fenofibrate’s impact on cognitive and memory functions. These investigations will further clarify the mechanistic underpinnings of fenofibrate in AD therapy.

      (2) The conclusions are supported by the results, but the quality of some results should be improved.

      Thank you for your kind suggestion. We have updated the magnified images in the immunohistochemistry section of the article, ensuring that the fields of view for the immunohistochemistry are within the same brain region, and have shown the molecular weights in each bolt. Additionally, we have conducted a quantitative analysis of the protein levels in the Western blot results presented in Fig6&8.

      (3) Figures 2c, 3c, and 4a present the Western blot results of p-tau from mice of different ages on one membrane, showing age-dependent expression. The authors analyzed the results of mice of different ages in one statistical chart, which will create ambiguity with the results of the representative images. For example, the expression of p-tau 396 in the blot was lower in the WT-12 M group than in the WT-9 M group (Figure 3c), which is contradictory to the statistical analysis.

      Thank you for your reminder. The statistical presentation here does not match the figure. At that time, the WB experiments for the hippocampal tissue at each age group were conducted separately, and it was not appropriate to compare different age groups together. This graph cannot illustrate age dependency. We have replaced the statistical graph in Figure 3B&D, as follows:

      (4) Figure 4b shows that KAL-TG-9 M had greater BACE1 expression than KAL-TG-12 M. Furthermore, the nuclei are not uniformly colored. Please provide more representative figures.

      Thank you for your reminder. Due to the fact that these sets of data were not processed in a single batch, the ages in the graph are not comparable. Regarding the issue of inconsistent nuclear staining, we have provided another representative image from this group, as follows:

      (5) Unclear why the BACE1 and Aβ levels seems less with KAL+shHES1 treatment than GFP+shNC treatment (Fig 6H)? This finding contradicts the conclusion.

      Thank you for your reminder. This experiment was repeated three times, and here, we have represented the representative results along with the corresponding statistical data. There are no difference between KAL+shHES1 treatment and GFP+shNC treatment. We have updated the Fig. 6H.

      (6) The Western blot results in figure 6e-h, 8h-i, and S3-S5 were not quantified.

      Thank you for your reminder. We have added statistical graphs and original images of the pictures in figure 6e-h, 8h-i, and S3-S5.

      (7) The authors did not provide the detection range of the Aβ42 ELISA kit.

      Thank you for your suggestion. The Aβ42 ELISA kit is from the IBL, with the product number 27721. Its standard range is 1.56 - 100 pg/mL, and the sensitivity is 0.05 pg/mL.

      (8)The authors did not specify the sex of the mice. This is important since sex could have had a dramatic impact on the results.

      Thank you for your suggestion. The results we present in the text are all statistically obtained from male mice. Actually, we have previously used female mice for Morris Water Maze experiments, and the results showed that both male and female KAL-TG mice exhibited a phenotype of decreased memory and cognitive function compared to the gender-matched WT group, while there was no significant difference between male and female KAL-TG mice (Author response image 5).

      Minor:

      (1) In Figure 2b, there are no units for the vertical coordinates of the statistical graph.

      Thank you for your reminder. We have added units for the vertical coordinates in Figure 2b.

      (2) In Figure 2c, the left Y-axis title is lacking in the statistic chart.

      Thank you for your reminder. We have added the left Y-axis title in the statistic chart.

      Reference:

      (1) Assaf N, El-Shamarka ME, Salem NA, Khadrawy YA, El Sayed NS. Neuroprotective effect of PPAR alpha and gamma agonists in a mouse model of amyloidogenesis through modulation of the Wnt/beta catenin pathway via targeting alpha- and beta-secretases. Progress in Neuro-Psychopharmacology and Biological Psychiatry 2020, 97: 109793.

      (2) Rangasamy SB, Jana M, Dasarathi S, Kundu M, Pahan K. Treadmill workout activates PPARα in the hippocampus to upregulate ADAM10, decrease plaques and improve cognitive functions in 5XFAD mouse model of Alzheimer’s disease. Brain, Behavior, and Immunity 2023, 109: 204-218.

      (3) Yuan J, Tan JTM, Rajamani K, Solly EL, King EJ, Lecce L, et al. Fenofibrate Rescues Diabetes-Related Impairment of Ischemia-Mediated Angiogenesis by PPARα-Independent Modulation of Thioredoxin-Interacting Protein. Diabetes 2019, 68(5): 1040-1053.

      (4) Fang Z, Shen G, Wang Y, Hong F, Tang X, Zeng Y, et al. Elevated Kallistatin promotes the occurrence and progression of non-alcoholic fatty liver disease. Signal Transduct Target Ther 2024, 9(1): 66.

      (5) Nelson ML, Pfeifer JA, Hickey JP, Collins AE, Kalisch BE. Exploring Rosiglitazone's Potential to Treat Alzheimer's Disease through the Modulation of Brain-Derived Neurotrophic Factor. Biology (Basel) 2023, 12(7).

      (6) Pedersen WA, McMillan PJ, Kulstad JJ, Leverenz JB, Craft S, Haynatzki GR. Rosiglitazone attenuates learning and memory deficits in Tg2576 Alzheimer mice. Exp Neurol 2006, 199(2): 265-273.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Weiler, Teichert, and Margrie systematically analyzed long-range cortical connectivity, using a retrograde viral tracing strategy to identify layer and region-specific cortical projections onto the primary visual, primary somatosensory, and primary motor cortices. Their analysis revealed several hundred thousand inputs into each region, with inputs originating from almost all cortical regions but dominated in number by connections within cortical sub-networks (e.g. anatomical modules). Generally, the relative areal distribution of contralateral inputs followed the distribution of corresponding ipsilateral inputs. The largest proportion of inputs originated from layer 6a cells, and this layer 6 dominance was more pronounced for contralateral than ipsilateral inputs, which suggests that these connections provide predominantly feedback inputs. The hierarchical organization of input regions was similar between ipsi- and contralateral regions, except for within-module connections, where ipsilateral connections were much more feed-forward than contralateral. These results contrast earlier studies which suggested that contralateral inputs only come from the same region (e.g. V1 to V1) and from L2/3 neurons. Thus, these results provide valuable data supporting a view of interhemispheric connectivity in which layer 6 neurons play an important role in providing modulatory feedback.

      The conclusions of this paper are mostly well-supported by the data and analysis, but additional consideration of possible experimental biases is needed.

      We thank the reviewer for their positive feedback on our manuscript.

      Further discussion or analysis is needed about possible biases in uptake efficiency for different cell types. Is it possible that the nuclear retro-AAV has a tropism for layer 6 axons? Quantitative comparisons with results obtained with alternative methods such as rabies virus (Yao et al., 2023) or anterograde tracing (Harris et al., 2019) may be helpful for this.

      We appreciate this technical comment. For the reasons indicated below we are confident that our AAV approach successfully and rather comprehensively labels inputs to the three target areas. Firstly, in the brains in which we injected our retrograde nuclear-AAV tracer into VISp, SSp-bfd or MOp we found several instances where layer 5 and/or layer 2/3 as was the dominant cortical projection layer (please see e.g. Figure 3 heatmaps). This was true for both ipsilateral and contralateral projection. 

      Secondly, by way of comparison Yao et al., 2023 injected rabies virus into VISp (but not in SSp-bfd or MOp) and their results show notable similarities to ours: 1) They show that contralateral inputs to VISp (and higher visual areas) were mainly located in Layers 5 and 6. 2) Retrogradely labelled neurons in higher visual areas revealed anatomical hierarchy that reflects the known functional hierarchy of the mouse cortical visual system and that shown by our retro-AAV approach. Thus, as AAV and rabies based tracing lead to similar results, this is further evidence against bias via tropism of our AAV tracer. That said, direct comparisons of the results between our study and the Yao et al., 2023 study should be viewed with some caution since Yao et. al.  injected rabies virus into specific Cre-driver lines in which the rabies virus targets individual genetically defined cell types in specific layers. Importantly, because of the lack of a specific cre-driver line, L6 cortico-cortical (L6 CC) cells could not be targeted by their approach. Thus, the dataset in Yao et al., overlook the contribution of L6 CCs due to the lack of available Cre-lines. 

      Thirdly, in a recent study (Weiler et al., 2024) we found that in a specific pathway (SSp-bfd→ VISp) both retro-AAV and the more traditional non-viral tracer cholera toxin subunit B (CTB) identified neurons in Layer 6 as the main source of projection neurons. The same results for the same pathway was shown by Bieler et al., 2019 (Bieler et al., 2017) using Fluorogold for retrograde tracing. Thus, the described dominance of Layer 6 projection neurons in specific pathways is likely not the result of a tropism of retro-AAV tracers. 

      Please also see that we have now further extended the summary of these points in our revised manuscript in the discussion section (e.g. lines 457-463): 

      Quantitative analysis of the injection sites should be included to account for possible biases. For example, L6 neurons are known to be the main target of contralateral inputs into the visual cortex (Yao et al., 2023). Thus, if the injections are biased towards or against layer 6 neurons, this may change the layer distribution of retrogradely labeled input cells. Comparison across biological replicates may help reveal sensitivity to particular characteristics of the injections.

      In response to the reviewers' feedback, please see we have now quantified the injection volume per cortical layer, as shown in the revised Fig. S3D. Our results indicate that the injections were not biased toward Layer 6. Instead, the injected tracer volumes in Layers 1, 4, 5, and 6 were similar across all animals and injected areas. However, we observed that the injected tracer volume in Layer 2/3 tended to be higher than in other layers. Although the tracer volumes in Layers 2/3 appeared to be higher, the proportion of input neurons located in Layers 2/3 for most of the cortical projection areas was consistently lower than that from Layer 6. These findings provide strong evidence against injection bias towards L6 inputs.

      The possibility of labelling axons of passage within the white matter should be addressed. This could potentially lead to false positive connections, contributing to the broad connectivity from most cortical regions that were observed.

      For clarification, please see Fig.S2B in our revised manuscript. In this panel we plot the average percentage volume of the viral boli in the target areas and in all other nearby structures including the white matter. The percentage of virus injected into the white matter (WM) was 0.0824 ± 0.0759% for VISp and 0.0650 ± 0.0481 for SSp-bfd injections. Notably, injections into MOp showed no leakage into white matter (0%). These minimal volumes of virus in the white matter are unlikely to significantly influence the observed profile of widespread connectivity. Please see we have added a sentence to the Results section (lines 84-86) where we state that we only used brains that had a transduction of the white matter below 0.1%.

      Reviewer #2 (Public review):

      Summary:

      Weiler et al use retrograde tracers, two-photon tomography, and automatic cell detection to provide a detailed quantitative description of the laminar and area sources of ipsi- and contralateral cortico-cortical inputs to two primary sensory areas and a primary motor area. They found considerable bilateral symmetry in the areas providing cortico-cortical inputs. However, although the same regions in both hemispheres tended to supply inputs, a larger proportion of inputs from contralateral areas originated from deeper layers (L5 and L6).

      Strengths:

      The study applies state-of-the-art anatomical methods, and the data is very effectively presented and carefully analyzed. The results provide many novel insights into the similarities and differences of inputs from the two hemispheres. While over the past decade there have been many studies quantitatively and comprehensively describing cortico-cortical connections, by directly comparing inputs from the ipsi and contralateral hemispheres, this study fills in an important gap in the field. It should be of great utility and an important reference for future studies on inter-hemispheric interactions.

      We thank the reviewer for this encouraging feedback on our manuscript.

      Weaknesses:

      Overall, I do not find any major weakness in the analyses or their interpretation. However, one must keep in mind that the study only analyses inputs projecting to three areas. This is not an inherent flaw of the study; however, it warrants caution when extrapolating the results to callosal projections terminating in other areas. As inputs to two primary sensory areas and one is the primary motor cortex are studied, some of the conclusions could potentially be different for inputs terminating in high-order sensory and motor areas. Given that primary areas were injected, there are few instances of feedforward connections sampled in the ipsilateral hemisphere. The study finds that while ipsi-projections from the visual cortex to the barrel cortex are feedforward given its fILN values, those from the contralateral visual cortex are feedback instead. One is left to wonder whether this is due to the cross-modal nature of these particular inputs and whether the same rule (that contralateral inputs consistently exhibit feedback characteristics regardless of the hierarchical relationship of their ipsilateral counterparts with the target area,) would also apply to feedforward inputs within the same sensory cortices.

      We acknowledge that what we find for primary sensory and motor target areas may not hold for other functionally different areas such as anterior cingulate cortex, retrosplenial cortex or frontal lobe that might be expected to receive strong feedforward cortical input. To begin to understand the organization of the global cortical input we have however first explored with primary sensory and motor areas. Please see that we have now added a sentence to the Discussion section of our manuscript that highlights the importance of investigating the hierarchical organization of intra and interhemispheric input onto higher cortical areas or within subregions of a given sensory area.

      Another issue that is left unexplored is that, in the current analyses the barrel and primary visual cortex are analyzed as a uniform structure. It is well established that both the laminar sources of callosal inputs and their terminations differ in the monocular and binocular areas of the visual cortex (border with V2L). Similarly, callosal projections differ when terminating the border of S1 (a row of whiskers), and then in other parts of S1. Thus, some of the conclusions regarding the laminar sources of callosal inputs might depend on whether one is analyzing inputs terminating or originating in these border regions.

      The aim of the present study was to analyse the global projectome to the VISp, SSp-bfd and MOp, irrespective of which subregions were included. Importantly, we purposely injected rather large bolus volumes to achieve large sample sizes of target neurons in each cortical layer.  For SSp-bfd, we utilised our previously reconstructed barrel map (Weiler et al., 2024) to precisely map our viral injection sites onto the barrels (Author response image 1). Analysis revealed that the six injection sites consistently encompassed 7–13 barrels (Author response image 1, three exemplary injection sites). Additionally, we determined the centres of mass for each injection site and mapped them onto the barrel map. Four of the injection sites were located in the lateral part of SSp-bfd, two in the central region, and none in the medial part. Notably, the injection sites within SSp-bfd exhibited significant overlap. As a result, a selective analysis of callosal projections targeting these injection sites would likely not yield distinct projection patterns, as the projectomes would inevitably include projections to surrounding barrels, leading to contamination.

      Author response image 1.

      Left: exemplary Injection sites mapped onto the 3D barrel map of SSp-bfd within the Mouse Allen Brain Atlas. Barrels were reconstructed using a specialized software as described previously (Weiler et al., 2024) Right: Centres of mass of all SSp-bfd injection sites mapped onto the 3D barrel map.

      Due to the fact we covered a significant proportion of the respective target primary sensory area any further subdivision of these data is not possible and requires more tailored injections into clearly defined subareas. Investigating the separate projectomes onto these subregions (e.g. onto V1M and V1B) remains an important interesting research question that we, at least in part, will address in a future study.

      Finally, while the paper emphasizes that projections from L6 "dominate" intra and contralateral cortico-cortical inputs, the data shows a more nuanced scenario. While it is true that the areas for which L6 neurons are the most common source of cortico-cortical projections are the most abundant, the picture becomes less clear when considering the number of neurons sending these connections. In fact, inputs from L2/3 and L5 combined are more abundant than those from L6 (Figure 3B), challenging the view that projections from L6 dominate ipsi- and contralateral projecting cortico-cortical inputs.

      We agree in the case of the barrel cortex, layer 5 significantly contributes in terms of the number of brain areas projecting from within the ipsilateral and contralateral hemispheres. Please see we have replaced the term “dominates” in the title, abstract and in the manuscript where relevant.

      Recommendations for the authors:  

      Reviewer #1 (Recommendations for the authors):

      The sections analyzing the role of L6 towards feedback (pg. 11-13, Figure 6) were a bit verbose and confusing to me. Three possible models are proposed:

      (1) a decrease in L23 projections, (2) an increase in L56 projections, or (3) both.

      However, what is being quantified appears to be the fractions inputs, with L23. L5, and L6 summing to 1. Thus, a decrease in L23 would necessarily result in an increase in L56 projections. It seems like it would make more sense to quantify the percent change in the total number of inputs (rather than fractional) from each layer so that the 3 models are actually independent possibilities.

      The issue with the suggested analysis is that, with one exception (one area projecting to MOp), the number of projection neurons in contralateral areas is always ~60-80% lower compared to their ipsilateral counterparts. Consequently, this is also true for the number of projection neurons in the different cortical layers. Thus, quantifying the percentage change from the ipsilateral to the contralateral hemisphere in the total number of inputs from each layer will always result in negative values. 

      Nevertheless, we addressed the reviewer’s issue by calculating the preservation index (1(ipsi-contra)/(ipsi+contra)) for the sensory-motor areas independently for the absolute number of neurons within L2/3, 5 and 6 for the cortical areas projecting to VISp, SSp-bfd and MOp (see Author response image 2). When analysing the shift from the ipsilateral to the contralateral hemisphere, we observed that significantly more projection neurons were preserved in L6 compared to L2/3 for VISp and SSp-bfd. This shows that the number of L6 projection neurons declines less from the ipsilateral to the contralateral hemisphere compared to L2/3. However, our focus was on the fraction of projection neurons within each layer relative to the other layers per hemisphere (see Fig.6 of our manuscript). This measure is critical for distinguishing between feedforward and feedback connectivity. Calculating the change for each layer independently unfortunately does not provide insights into this comparison, as it does not capture the relative distribution of projection neurons across layers, which is central to our analysis. Therefore, we chose to present the data as layer fractions normalised within each hemisphere separately, enabling a comparison of relative changes between hemispheres, as shown in Fig.6 in the manuscript. We agree that with our approach a decrease in the fraction of L2/3 neurons would necessarily lead to an increase in the fraction of L5+6 neurons. However, as we analysed the fractional change for L5 and L6 separately, we found that the fraction of projection neurons in L5 generally showed only minor changes, while the fraction of L6 projection neurons increased substantially (Fig.6C). In addition, excluding L5 from the ipsi- or contralateral default network had significant effects on the fILN in only a relatively small number of projection areas. Excluding L6 resulted in significant changes in many more projection areas than layer 5.

      Author response image 2.

      Preservation index for L2/3, L5 and L6 of the 24 sensory-motor areas projecting onto the three target areas VISp, SSp-bfd and MOp.

      Reviewer #2 (Recommendations for the authors):

      I feel that there are a few conclusions that could be strengthened in the paper:

      (1) The laminar sources of callosal inputs and their terminations differ in the monocular and binocular areas of the visual cortex (border with V2L. Similarly, callosal inputs are different close to the border of S1 with S2 than in the rest of the barrel cortex. From the methods sections and Figure S2, it seems that some injections targeted the V1 binocular zone while others were aimed at the monocular zone. Thus, it would be of interest to compare the laminar distribution and fILM of the contra inputs in inputs to the binocular and monocular zones (and S1 border vs the rest, if possible within this dataset).

      Please see the answer for the reviewer’s second point in the public review (above).

      (2) The results are currently a bit unclear on whether the contra inputs reflect the cortical hierarchy. Figure 4E-F makes it clear that the ipsi and contra fILMs do not always match. However, it seems from the plots in Figure 4D and Figure S6 that, while the contra fILM values are always higher, there might be a correlation between the ipsi and contra fILM. This could be addressed by directly plotting contra vs ipsi fILM.

      Similarly, it would be useful to directly address if there is any hint of the visual hierarchy, as calculated in Figure S5 for the contra inputs.

      Regarding the first point of the reviewer: We appreciate this comment. We do indeed find a positive correlation between the fILN ipsilateral and fILN contralateral across the individual cortical areas for all three targets. (please see Author response image 3 below). This is indeed an interesting observation that indicates a high degree of preservation concerning the rank order of the anatomical hierarchy within the input arising from both hemispheres. Please see we have included this new figure 4F into the manuscript and added a sentence in the results (lines 282-288): 

      Regarding the second point of the reviewer: For visual hierarchy, although weaker, we find that the hierarchical ranking of the higher cortical visual areas is preserved for the contralateral hemisphere (see Author response image 3 below). 

      Author response image 3.

      Rank ordered average fILN values (± sem) of higher visual cortical areas of the ventral and dorsal visual stream for the ipsilateral and contralateral hemisphere.

      (3) I find the emphasis in the title and other parts of the paper on Layer 6 corticocortical cells dominating the anatomical organization of intra and interhemispheric feedback a bit of an overstatement. While it is true that the areas for which L6 is the most abundant source of cortico-cortical projections are the most abundant (Figure 3C), when just focusing on the number of neurons sending corticocortical connections (Figure 3B), this is less clear. Ipsi connections are roughly divided 1/3, 1/3 , 1/3 between L2/3 , L5 and L6. In the contra, while projections from L6 neurons are the most abundant, there are not a majority and are less than those of L2/3 and L5 together. I suggest revising the statement about L6 cells dominating cortico-cortical connections to more accurately reflect these nuances.

      (4) The observations from Figure 3 discussed above suggest that L6 inputs dominate in areas with less abundant projections to the injected areas. Is this the case? Is the fraction of L6 inputs inversely correlated with the number of inputs from that area?

      Please see the following correlation plots for the total number of inputs versus the fraction of L6 inputs per area for both the ipsilateral and contralateral hemisphere. We do find on the ipsilateral hemisphere a negative correlation between the total number of inputs and the L6 input fraction for VISp and to a lesser degree for SSp-bfd. Interestingly, we find the opposite correlation for the ipsilateral MOp, contralateral VISp, SSp-bfd and MOp (Author response image 4, Author response table 1). While this is an interesting finding, the correlations often appeared to be weak and often absent within the individual animals and across the three target areas (Author response table 1). Thus, these correlations are seemingly not a general feature of cortical connectivity.

      Author response image 4.

      Total number of cells versus fraction of cells within L6 per cortical brain area (average across animals) for the ipsilateral (top) and contralateral (bottom) hemisphere for the three target areas VISp, SSp-bfd and MOp.

      Author response table 1: Respective correlations between total numbers of cells and fraction of cells within L6 per cortical brain area for the ipsilateral and contralateral hemisphere for the three target areas (significant correlations highlighted with green).

      Minor issues:

      (5) Where was the mouse in Figure 3A injected?

      In this exemplary mouse the retrograde tracer was injected into VISp. We added this information in the Figure legend of Figure 3A1. 

      (6) Clarify in panel 4F that the position of the circle corresponds to the area location.

      Done as suggested. 

      References

      Bieler M, Sieben K, Cichon N, Schildt S, Röder B, Hanganu-Opatz IL. 2017. Rate and Temporal Coding Convey Multisensory Information in Primary Sensory Cortices. eNeuro 4. doi:10.1523/ENEURO.0037-17.2017

      Weiler S, Rahmati V, Isstas M, Wutke J, Stark AW, Franke C, Graf J, Geis C, Witte OW, Hübener M, Bolz J, Margrie TW, Holthoff K, Teichert M. 2024. A primary sensory cortical interareal feedforward inhibitory circuit for tacto-visual integration. Nat Commun 15:3081. doi:10.1038/s41467-024-47459-2

      Yao S, Wang Q, Hirokawa KE, Ouellette B, Ahmed R, Bomben J, Brouner K, Casal L, Caldejon S, Cho A, Dotson NI, Daigle TL, Egdorf T, Enstrom R, Gary A, Gelfand E, Gorham M, Griffin F, Gu H, Hancock N, Howard R, Kuan L, Lambert S, Lee EK, Luviano J, Mace K, Maxwell M, Mortrud MT, Naeemi M, Nayan C, Ngo N-K, Nguyen T, North K, Ransford S, Ruiz A, Seid S, Swapp J, Taormina MJ, Wakeman W, Zhou T, Nicovich PR, Williford A, Potekhina L, McGraw M, Ng L, Groblewski PA, Tasic B, Mihalas S, Harris JA, Cetin A, Zeng H. 2023. A whole-brain monosynaptic input connectome to neuron classes in mouse visual cortex. Nat Neurosci 26:350–364. doi:10.1038/s41593-022-01219-x

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      The authors analyzed the causative association between circulating immune cells and periodontitis, and reported three risk immune cells related to periodontitis. The significance of the findings is fundamental, which substantially advances our understanding of periodontitis. The strength of evidence is convincing.

      Reviewer #1 (Public Review):

      Ye et al. used Mendelian randomization method to evaluate the causative association between circulating immune cells and periodontitis and finally screened out three risk immune cells related to periodontitis. Overall, this is an important and novel piece of work that has the potential to contribute to our understanding of the causal relationship between circulating immune cells related to periodontitis. However, there are still some concerns that need to be addressed.

      We sincerely appreciate the constructive feedback from the editor and reviewers, which has been instrumental in enhancing the quality of our manuscript.

      (1) The authors used 1e-9 as the threshold to select effective instrumental variables (IVs), which should give the corresponding references. Meanwhile, the authors should test and discuss the potential impact of inconsistent thresholds for exposure (1e-9, 5e-6 were selected by the author respectively) and outcome IVs (5e-8) on the robustness of the results.

      Thank you for your insightful comments. We have selected two GWAS databases as the data sources for the exposure group: the BCC Consortium with a sample size of 563,946, and the Sardinian cohort of 3,757. The considerable disparity in sample size between them may result in variations in outcomes, primarily showcased in the differences in positive SNP numbers. We, therefore, adopted an unconventional (non 5e-8) yet rigorously controlled screening strategy, an approach that is widely accepted in MR studies (Li et al., 2022; Liu et al., 2023). We believe that the present thresholds are sufficiently rigorous to guarantee the validity of the subsequent Mendelian randomization analysis.

      However, employing two distinct methods in exposure screening is not typical, and we posit that this method can be viewed as an innovative strategy, providing a reference for future research dealing with two databases with significant discrepancies (Huang et al., 2023; Kong et al., 2023). As you perceptively noted, we acknowledge that this strategy may exert a certain influence on the research outcomes, and we have factored this potential limitation into our manuscript. “Third, the considerable variation in sample size between the two exposure databases contributes to the discrepancies in the number of positive SNPs. Despite our exploration of multiple selection thresholds for IVs, the inconsistency in screening methods and the discrepancy in the included SNPs could potentially introduce bias.” (Page 14)

      As for the "outcome IVs with 5e-8" you mentioned, we didn't implement this screening threshold in the outcome IVs. Indeed, we applied the same screening criteria as specified at 5e-06 (refer to Stable 2). Is the statement that you're referring to the following: "Additionally, SNPs that displayed a direct association with the outcome would also be excluded to uphold the third MR assumption (P < 5e-8)" (Page 6)? In this context, we adopted a standard criterion in the IVs screening process to remove SNPs directly associated with the outcome.

      Reference

      Huang W, Wang Z, Zou C, Liu Y, Pan Y, Lu J, Zhou K, Jiao F, Zhong S, Jiang G. 2023. Effects of metabolic factors in mediating the relationship between Type 2 diabetes and depression in East Asian populations: A two-step, two-sample Mendelian randomization study. J Affect Disorders 335:120–128. doi:10.1016/j.jad.2023.04.114

      Kong L, Ye C, Wang Y, Zheng J, Zhao Z, Li M, Xu Y, Lu J, Chen Y, Xu M, Wang W, Ning G, Bi Y, Wang T. 2023. Causal effect of lower birthweight on non-alcoholic fatty liver disease and mediating roles of insulin resistance and metabolites. Liver Int 43:829–839. doi:10.1111/liv.15532

      Li P, Wang H, Guo L, Gou X, Chen G, Lin D, Fan D, Guo X, Liu Z. 2022. Association between gut microbiota and preeclampsia-eclampsia: a two-sample Mendelian randomization study. Bmc Med 20:443. doi:10.1186/s12916-022-02657-x Liu B, Lyu L, Zhou W, Song J, Ye D, Mao Y, Chen G-B, Sun X. 2023. Associations of the circulating levels of cytokines with risk of amyotrophic lateral sclerosis: a Mendelian randomization study. Bmc Med 21:39. doi:10.1186/s12916-023-02736-7

      (2) What is the reference for selecting Smoking, Fasting plasma glucose, and BMI as covariates? They do not seem to be directly related to immune cells as confounding factors.

      The variables of Smoking, Fasting Plasma Glucose (FPG), and Body Mass Index (BMI) are commonly used as covariates in multivariable Mendelian randomization studies (Kong et al., 2023; Liu et al., 2023). The association between Smoking, FPG, and BMI with immune cells may not be immediately apparent. However, these factors have been identified as potential confounders that could impact overall health, which in turn may indirectly modulate systemic immune responses, susceptibility, and inflammation.

      (1) . Smoking: It has been well-documented that smoking can cause inflammation and impair immune function, thereby increasing individual's susceptibility to infections and diseases (Shiels et al., 2014). As such, smoking is recognized as a covariate that could potentially influence the outcomes of an investigation into immune cells.

      (2) FPG: Elevated FPG levels indicate poor glycemic control, potentially leading to conditions like diabetes (Choi et al., 2018). Consequently, studies have demonstrated that elevated FPG levels can compromise the immune system's ability to combat infections.

      (3) BMI: It is a measure of body fat that takes into account a person's weight and height. Both obesities, characterized by a high BMI, and underweights, characterized by a low BMI, have been associated with a range of health issues, inclusive of a compromised immune system (Piñeiro-Salvador et al., 2022). Consequently, BMI is factored in as a covariate in this study.

      We have thus incorporated these factors as covariates in our study to mitigate their potential confounding effects. The selection of these covariates is primarily guided by previous research and established knowledge concerning the potential influences on immune function. We appreciate your query and will ensure to clarify this point in our revised manuscript. “We have incorporated covariates, including the number of cigarettes smoked, fasting plasma glucose (FPG) levels, and body mass index (BMI) into the MVMR analysis, given that these factors could indirectly affect systemic immune responses and inflammation (Liu et al., 2023).” (Page 6-7)

      Reference

      Choi S-C, Titov AA, Abboud G, Seay HR, Brusko TM, Roopenian DC, Salek-Ardakani S, Morel L. 2018. Inhibition of glucose metabolism selectively targets autoreactive follicular helper T cells. Nat Commun 9:4369. doi:10.1038/s41467-018-06686-0

      Kong L, Ye C, Wang Y, Zheng J, Zhao Z, Li M, Xu Y, Lu J, Chen Y, Xu M, Wang W, Ning G, Bi Y, Wang T. 2023. Causal effect of lower birthweight on non-alcoholic fatty liver disease and mediating roles of insulin resistance and metabolites. Liver Int 43:829–839. doi:10.1111/liv.15532

      Liu Y, Lai H, Zhang R, Xia L, Liu L. 2023. Causal relationship between gastro-esophageal reflux disease and risk of lung cancer: insights from multivariable Mendelian randomization and mediation analysis. Int J Epidemiol 52:1435–1447. doi:10.1093/ije/dyad090

      Piñeiro-Salvador R, Vazquez-Garza E, Cruz-Cardenas JA, Licona-Cassani C, García-Rivas G, Moreno-Vásquez J, Alcorta-García MR, Lara-Diaz VJ, Brunck MEG. 2022. A cross-sectional study evidences regulations of leukocytes in the colostrum of mothers with obesity. BMC Med 20:388. doi:10.1186/s12916-022-02575-y

      Shiels MS, Katki HA, Freedman ND, Purdue MP, Wentzensen N, Trabert B, Kitahara CM, Furr M, Li Y, Kemp TJ, Goedert JJ, Chang CM, Engels EA, Caporaso NE, Pinto LA, Hildesheim A, Chaturvedi AK. 2014. Cigarette smoking and variations in systemic immune and inflammation markers. J Natl Cancer Inst 106:dju294. doi:10.1093/jnci/dju294

      (3) It is not entirely clear about the correction of P-value for the total number of independent statistical tests.

      In our study, we used the Bonferroni correction to adjust the P-values for multiple comparisons. The adjusted P-value is calculated as the original P-value times the total number of independent statistical tests. Specifically, we applied multiple corrections in the following two aspects: First, we corrected the results of the FUSION algorithm in TWAS, with a correction value of P < 6.27 ×10-6 (0.05/7,890 genes) (Page 8). Second, we performed multiple corrections on the initial results of MR (P < 0.05/17 traits = 0.003). However, none of the results met the criteria after the correction, which is one of the limitations detailed in the discussion section of our study (Page 14).

      (4) The author used whole blood data to apply FUSION algorithm. Although whole blood is a representative site, the authors should add FUSION testing of periodontally relevant tissues, such as oral mucosa.

      We appreciate your insightful comments and suggestions. We concur that employing periodontally relevant tissues, like oral mucosa, for FUSION testing might yield more precise and pertinent results. However, in the Genotype-Tissue Expression project (GTEx) database, we could not find transcriptome data related to oral tissues, such as gums, oral mucosa, and alveolar bone (Review Table 1). Owing to the limitations of the database, in the context of our study, we primarily relied on whole blood data, given its availability and the extensive precedent documented in the literature for its utilization (Xu et al., 2023; Yuan et al., 2022).

      We acknowledge that this is a limitation of our study and will certainly consider incorporating periodontally relevant tissues in our future research. In the revised manuscript, we have explicitly stated this limitation and underscored the necessity for additional studies to corroborate our findings with periodontally relevant tissues. Fifth, we relied on the whole blood data For FUSION algorithm due to the lack of transcriptome data associated with oral tissues (such as gums, oral mucosa, and alveolar bone) in the GTEx database. “Fifth, we relied on the whole blood data For FUSION algorithm due to the lack of transcriptome data associated with oral tissues (such as gums, oral mucosa, and alveolar bone) in the GTEx database. This has led to an excessive focus on systemic immunological changes, thereby overlooking the significance of alterations in local periodontal tissue immunity. Such an oversight could potentially compromise the precision and pertinence of our research findings.” (Page 15)

      Author response table 1.

      Organizations and Samplesize in the GTEx database

      Reference

      Xu J, Si H, Zeng Y, Wu Y, Zhang S, Shen B. 2023. Transcriptome-wide association study reveals candidate causal genes for lumbar spinal stenosis. Bone Joint Res 12:387–396. doi:10.1302/2046-3758.126.BJR-2022-0160.R1

      Yuan J, Wang T, Wang L, Li P, Shen H, Mo Y, Zhang Q, Ni C. 2022. Transcriptome‐wide association study identifies PSMB9 as a susceptibility gene for coal workers’ pneumoconiosis. Environmental Toxicology 37:2103–2114. doi:10.1002/tox.23554

      (5) The authors chose gingival hyperplasia as a secondary validation phenotype of periodontitis in this study. However, gingival recession, as another important phenotype associated with periodontitis, should also be tested and discussed.

      We appreciate your insightful feedback highlighting the significance of incorporating gingival recession as a phenotype in periodontitis studies. Our emphasis on gingival hyperplasia in the study was primarily dictated by the initial study design and the data available from FinnGen R9K11. Notwithstanding the lack of gingival recession data in the available databases, we identified chronic gingivitis data in an earlier version of the Finnish database (FinnGen R5K11) as an alternative. We performed a Mendelian Randomization analysis on this dataset, with the results integrated into Supplementary Table 10. Concurrently, Table 1, Supplementary Table 1, Figure 4, and the corresponding descriptions in the manuscript were updated. We trust this adjustment can address the limitations identified in our research. We are confident that this not only augments the comprehensiveness of our study but also fosters a more holistic comprehension of periodontal disease.

      (6) This study used GLIDE data as a replicated validation, but the results were inconsistent with FinnGen's dataset.

      Thank you for your insightful comments and for bringing this issue to our attention. Indeed, it is of utmost importance to ensure the validity and reliability of our findings across various datasets. The observed inconsistency between the GLIDE data and FinnGen's dataset could be attributed to several reasons.

      Firstly, this discrepancy might originate from the differences in population composition. The former is grounded on a comprehensive meta-analysis of cohorts focusing on periodontitis, whereas the latter utilizes a dataset from a full-phenotype cohort. In the former, the ratio of periodontitis to the control groups is approximately 1:2. In contrast, the ratio in the latter seems to be minuscule. The sample size in the FinnGen data may not suffice to detect the effects observed in the GLIDE dataset, given that larger exposure sizes enhance the ability to detect genuine associations.

      Moreover, the heterogeneity of periodontitis can potentially result in variable outcomes. Phenotypic definition methods differ between the two databases. The GLIDE database diagnoses based on the criteria of Centers for Disease Control and Prevention/American Academy of Periodontology (CDC/AAP) and Community Periodontal Index (CPI) for physical signs. While the FinnGen database adopts the International Classification of Diseases (ICD) 10 standard for a comprehensive diagnosis. The former database employs a more practical yet broader standard for periodontitis, which might encompass pseudo-periodontitis.

      Finally, the observed differences could be attributed to the variations in immune responses at distinct stages of periodontitis. During the initial stages of periodontitis, neutrophils and macrophages primarily mediate the immune response. With the progression of the disease, the involvement of T cells and B cells increases, thereby leading to a more intricate immune response (Darveau, 2010). Besides, the immune system's response to these oral health conditions is not uniform and can be influenced by multiple factors, including the individual's overall health, genetics, and lifestyle, potentially impacting the results (Hung et al., 2023).

      Reference

      Darveau RP. 2010. Periodontitis: a polymicrobial disruption of host homeostasis. Nat Rev Microbiol 8:481–490. doi:10.1038/nrmicro2337

      Hung M, Kelly R, Mohajeri A, Reese L, Badawi S, Frost C, Sevathas T, Lipsky MS. 2023. Factors Associated with Periodontitis in Younger Individuals: A Scoping Review. J Clin Med 12:6442. doi:10.3390/jcm12206442

      Reviewer #2 (Public Review):

      This manuscript presents a well-designed study that combines multiple Mendelian randomization analyses to investigate the causal relationship between circulating immune cells and periodontitis. The main conclusions of the manuscript are appropriately supported by the statistics, and the methodologies used are comprehensive and rigorous.

      These findings have significant implications for periodontal care and highlight the potential for systemic immunomodulation management on periodontitis, which is of interest to readers in the fields of periodontology, immunology, and epidemiology.

      We greatly appreciate the positive feedback and valuable insights provided by the reviewer, which have significantly contributed to the improvement of our manuscript.

      Reviewer #2 (Recommendations for The Authors):

      *Abstract

      Line 30-32: "Two-sample bidirectional univariable MR followed by sensitivity testing, multivariable MR, subgroup analysis, and the Bayesian model averaging (MR-BMA) were performed to explore the causal association between them. " What does the term "them" refer to here, please clarify it. The research method here is unclear, please reorganize it.

      Line 39: "S100A9 and S100A12" here should be italic.

      We appreciate your meticulous suggestions and have revised the methods section accordingly. Additionally, the two genes have been highlighted in italics for emphasis.

      "Univariable MR, multivariable MR, subgroup analysis, reverse MR, and Bayesian model averaging (MR-BMA) were utilized to investigate the causal relationships. Furthermore, transcriptome-wide association study (TWAS) and colocalization analysis were deployed to pinpoint the underlying genes." (Page 1)

      Introduction

      Line 78-80: "As reported, the number of immune cells in periodontal tissue changes as periodontitis progresses, featuring an increase in monocytes, and B cells and a decrease in T cells." Does the author mean that both monocytes and B cells increase as periodontitis progresses?

      We are grateful for your meticulous reading and perceptive inquiries. We would like to confirm the accuracy of your understanding. In lines 78-80, our intended message was to communicate that with the progression of periodontitis, there is an increase in both monocytes and B cells in the periodontal tissue. This represents a typical immune response to the infection, where these cells play a pivotal role in counteracting periodontal pathogens. To enhance clarity, we have revised these lines in the manuscript as follows:

      "With the progression of periodontitis, there is a significant alteration in the quantity of immune cells present within the periodontal tissue. Specifically, an increase in the count of both monocytes and B cells is observed, whereas a decrease is noted in the count of T cells." (Page 3)

      Method

      Line 164-165: "As the main test, the MVMR-IVW method, offered by the MVMR-least absolute shrinkage and selection operator (MVMR-LASSO), and the MVMR-Egger method were chosen." The author's expression here is ambiguous.

      In response to your comment on the ambiguity in lines 164-165, we have revised the sentence for clarity. We hope this addresses your concern and clarifies our point more effectively.

      "The MVMR-IVW method was utilized as the primary test, supplemented by the MVMR-least absolute shrinkage and selection operator (MVMR-LASSO) and the MVMR-Egger method." (Page 7)

      Table 1: FinnGen has a greater sample size and more SNPs than GLIDE; why do authors choose the latter as the primary analysis?

      Our choice to utilize GLIDE as the primary analysis tool, instead of FinnGen, was mainly guided by the specific research question we aimed to address. Despite FinnGen offering a larger sample size and more SNPs, GLIDE offers a more specialized and targeted dataset that suits the unique requirements of our study. In most MR studies, a similar strategy is adopted, wherein a large database of disease GWAS meta is utilized for exploration, followed by validation in full phenotype cohort (such as UKBiobank and FinnGen) (Liu et al., 2023; Yuan et al., 2023). To summarize, the reasons may primarily include the following:

      Firstly, GLIDE offers a concentrated and targeted methodology for examining genetic data pertinent to periodontitis. This dataset is grounded in a comprehensive meta-analysis of cohorts centered on periodontitis, wherein the ratio of periodontitis cases to control groups is approximately 1:2. Conversely, the proportion in FinnGen seems to be negligible, given that it employs a dataset derived from a comprehensive phenotype cohort. Consequently, employing the GLIDE database as a primary investigative tool can generate more abundant genetic information associated with periodontitis.

      Furthermore, the methodological facets of GLIDE align more accurately with the analytical framework of our study. For instance, the diagnostic criteria methods vary between the two databases. The GLIDE database derives its basis from the Centers for Disease Control and Prevention/American Academy of Periodontology (CDC/AAP) and Community Periodontal Index (CPI) for physical indicators. In contrast, the FinnGen database employs the International Classification of Diseases (ICD) 10 standard for an exhaustive diagnosis. The former adopts a more pragmatic, yet broader, standard for diagnosing periodontitis. The latter continues to use concepts of diseases such as "chronic periodontitis", which have been replaced by "periodontitis" in the latest disease classification from the "2017 World Workshop on the Classification of Periodontal and Peri-Implant Diseases and Conditions" in the periodontal field (Caton et al., 2018).

      Reference

      Caton JG, Armitage G, Berglundh T, Chapple ILC, Jepsen S, Kornman KS, Mealey BL, Papapanou PN, Sanz M, Tonetti MS. 2018. A new classification scheme for periodontal and peri-implant diseases and conditions - Introduction and key changes from the 1999 classification. J Clin Periodontol 45 Suppl 20:S1–S8. doi:10.1111/jcpe.12935

      Liu Y, Lai H, Zhang R, Xia L, Liu L. 2023. Causal relationship between gastro-esophageal reflux disease and risk of lung cancer: insights from multivariable Mendelian randomization and mediation analysis. Int J Epidemiol 52:1435–1447. doi:10.1093/ije/dyad090

      Yuan S, Xu F, Li X, Chen J, Zheng J, Mantzoros CS, Larsson SC. 2023. Plasma proteins and onset of type 2 diabetes and diabetic complications: Proteome-wide Mendelian randomization and colocalization analyses. Cell Rep Med 4:101174. doi:10.1016/j.xcrm.2023.101174

      Result

      Line 224: "The observed significant results remained robust after removing pleiotropic SNPs." It is not clear what the authors mean by "remain robust".

      Line 229-231: "The causal relationship between neutrophils and periodontitis remained stable with no evidence of heterogeneity or pleiotropy." It is also not clear what the authors mean by "remain stable". How does the author get to the conclusion that there is no evidence of heterogeneity or pleiotropy?

      Figure S5: Please offer a brief explanation on how to investigate outlier or influential changes using scatter plots and Cochran's Q test and Cook's distance.

      Line 224: We apologize for the confusion caused by the term "remain robust". In the revised manuscript, we clarified this by stating, "The observed significant results are considered 'robust' if the effect of sensitivity analyses was identical to that of Inverse Variance Weighted (IVW) method, yielding a P-value less than 0.05." (Page 6)

      Line 229-231: We used the terms "remain stable" and "remain robust" interchangeably to express the same idea. To clarify, we have now unified the expression in the revised manuscript. As for the conclusion of "no evidence of heterogeneity or pleiotropy", it is derived from the results of Cochran's Q and Egger's intercept tests (P<0.05). We have added this explanation to the revised manuscript for better clarity.

      Figure S5: In the revised manuscript and Table, we have provided a succinct explanation regarding the investigation of outliers or influential changes as follows: " A genetic variant was defined as either an outlier or an influential variant if it possessed a q-value exceeding 10 or if its Cook's distance surpassed the median of the corresponding F-distribution. " (Page 7)

      We have made all the necessary changes in the revised manuscript based on your comments. We hope our responses and revisions adequately address your concerns.

      Discussion

      I have consulted several pieces of literature to ensure a thorough explanation, which may be helpful for your writing.

      (1) Hajishengallis G, Li X, Divaris K, Chavakis T. Maladaptive trained immunity and clonal hematopoiesis as potential mechanistic links between periodontitis and inflammatory comorbidities. Periodontol 2000. 2022;89(1):215-230. doi:10.1111/prd.12421

      (2) Hajishengallis G, Chavakis T. Mechanisms and Therapeutic Modulation of Neutrophil-Mediated Inflammation. J Dent Res. 2022;101(13):1563-1571. doi:10.1177/00220345221107602

      We appreciate your valuable feedback and the additional references you provided to enrich our manuscript. Upon receiving your comments, we have meticulously reviewed and incorporated the suggested literature into our revised manuscript. These references have furnished insightful information, which has been assimilated into the revised manuscript (Page 12) to enhance the explanation of the mechanisms of neutrophil-mediated inflammation and the potential association between periodontitis and inflammatory comorbidities.

      "The quantity and functionality of neutrophils both act as critical indicators of inflammation severity. The reduction in neutrophil count and inflammatory mediators, observed after successful periodontitis treatment, suggests a reduction in systemic inflammation (Hajishengallis , 2022)." (Page 12)

      "Trained myeloid cells have the potential to amplify the functionality of neutrophils, thereby fortifying the body's defense against subsequent infections. Nevertheless, within the framework of chronic inflammation, these cells could potentially intensify tissue damage (Hajishengallis and Chavakis, 2022)." (Page 12)

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This manuscript reveals important insights into the role of ipsilateral descending pathways in locomotion, especially following unilateral spinal cord injury. The study provides solid evidence that this method improves the injured side's ability to support weight, and as such the findings may lead to new treatments for stroke, spinal cord injuries, or unilateral cerebral injuries. However, the methods and results need to be better detailed, and some of the statistical analysis enhanced.

      Thank you for your assessment. We incorporated various text improvements in the final version of the manuscript to address the weaknesses you have pointed out. The specific improvements are outlined below.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This manuscript provides potentially important new information about ipsilateral cortical impact on locomotion. A number of issues need to be addressed.

      Strengths:

      The primary appeal and contribution of this manuscript are that it provides a range of different measures of ipsilateral cortical impact on locomotion in the setting of impaired contralateral control. While the pathways and mechanisms underlying these various measures are not fully defined and their functional impacts remain uncertain, they comprise a rich body of results that can inform and guide future efforts to understand cortical control of locomotion and to develop more effective rehabilitation protocols.

      Weaknesses:

      (1) The authors state that they used a cortical stimulation location that produced the largest ankle flexion response (lines 102-104). Did other stimulation locations always produce similar, but smaller responses (aside from the two rats that showed ipsilateral neuromodulation)? Was there any site-specific difference in response to stimulation location?

      We derived motor maps in each rat, akin to the representation depicted in Fig 6. In each rat, alternative cortical sites did, indeed, produce distal or proximal contralateral leg flexion responses. Distal responses were more likely to be evoked in the rostral portion of the array, similarly to proximal responses early after injury. This distribution in responses across different cortical sites is reported in this study (Fig. 6) and is consistent with our prior work. The Results section has been revised to provide additional clarification of the passage you indicated and context for the data presented in Figure 6:

      On page 4, we have clarified: “Stimulation through these channels produced a strong whole-leg flexion movement, with an evident distal component. From visual inspection, all responding electrodes in the array produced contralateral leg flexion, although with different strength of contraction for a fixed stimulation intensity (100μA). Moreover, some sites did not present a distal movement component, failing in eliciting ankle flexion and resulting in a generally weaker proximal flexion.”

      On page 12, we have further noted: “By visually inspecting the responses elicited by stimulation delivered through each of the array electrodes, we categorized movements as proximal or distal. This classification was based on whether the ankle participated in the evoked response or if the movement was restricted to the proximal hindlimb. Each leg was scored independently.”

      (2) Figure 2: There does not appear to be a strong relationship between the percentage of spared tissue and the ladder score. For example, the animal with the mild injury (based on its ladder score) in the lower left corner of Figure 2A has less than 50% spared tissue, which is less spared tissue than in any animal other than the two severe injuries with the most tissue loss. Is it possible that the ladder test does not capture the deficits produced by this spinal cord injury? Have the authors looked for a region of the spinal cord that correlates better with the deficits that the ladder test produces? The extent of damage to the region at the base of the dorsal column containing the corticospinal tract would be an appropriate target area to quantify and compare with functional measures.

      In Fig. S6 of our 2021 publication "Bonizzato and Martinez, Science Translational Medicine", we investigated the predictive value of tissue sparing in specific sub-regions of the spinal cord for ladder performance. Among others, we examined the correlation between the accuracy of left leg ladder performance in the acute state and the preservation of the corticospinal tract (CST). Our results indicated that dorsal CST sparing serves as a mild predictor for ladder deficits, confirming the results obtained in this study.

      (3) Lines 219-221: The authors state that "phase-coherent stimulation reinstated the function of this muscle, leading to increased burst duration (90{plus minus}18% of the deficit, p=0.004, t-test, Fig. 4B) and total activation (56{plus minus}13% of the deficit, p=0.014, t-test, Fig. 3B). This way of expressing the data is unclear. For example, the previous sentence states that after SCI, burst duration decreased by 72%. Does this mean that the burst duration after stimulation was 90% higher than the -72% level seen with SCI alone, i.e., 90% + -72% = +18%? Or does it mean that the stimulation recovered 90% of the portion of the burst duration that had been lost after SCI, i.e., -72% * (100%-90%)= -7%? The data in Figure 4 suggests the latter. It would be clearer to express both these SCI alone and SCI plus stimulation results in the text as a percent of the pre-SCI results, as done in Figure 4.

      Your assessment is correct; we intended to report that the stimulation recovered 90% of the portion of the burst duration that had been lost after SCI. This point has been clarified (see page 9):

      “…leading to increased burst duration (recovered 90±18% of the lost burst duration, p=0.004, t-test, Fig. 4B) and total activation (recovered 56±13% of the total activation, p=0.014, t-test, Fig. 3B)”

      (4) Lines 227-229: The authors claim that the phase-dependent stimulation effects in SCI rats are immediate, but they don't say how long it takes for these effects to be expressed. Are these effects evident in the response to the first stimulus train, or does it take seconds or minutes for the effects to be expressed? After the initial expression of these effects, are there any gradual changes in the responses over time, e.g., habituation or potentiation?

      The effects are immediately expressed at the very first occurrence of stimulation. We never tested a rat completely naïve to stimuli, as each treadmill session involves prior cortical mapping to identify a suitable active site for involvement in locomotor experiments. Yet, as demonstrated in Supplementary Video 1 accompanying our 2021 publication on contralateral effects of cortical stimulation, "Bonizzato and Martinez, Science Translational Medicine," the impact of phase-dependent cortical stimulation on movement modulation is instantaneous and ceases promptly upon discontinuation of the stimulation. We did not quantify potential gradual changes in responsiveness over time, but we cannot exclude that for long stimulation sessions (e.g., 30 min or more), stimulus amplitude may need to be slightly increased over time to compensate habituation.

      (5) Awake motor maps (lines 250-277): The analysis of the motor maps appears to be based on measurements of the percentage of channels in which a response can be detected. This analytic approach seems incomplete in that it only assesses the spatial aspect of the cortical drive to the musculature. One channel could have a just-above-threshold response, while another could have a large response; in either case, the two channels would be treated as the same positive result. An additional analysis that takes response intensity into account would add further insight into the data, and might even correlate with the measures of functional recovery. Also, a single stimulation intensity was used; the results may have been different at different stimulus intensities.

      We confirm that maps of cortical stimulation responsiveness may vary at different stimulus amplitudes. To establish an objective metric of excitability, we identified 100µA as a reliable stimulation amplitude across rats and used this value to build the ipsilateral motor representation results in Figure 6. This choice allows direct comparison with Figure 6 of our 2021 article, related to contralateral motor representation. The comparison reveals a lack of correlation with functional recovery metrics in the ipsilateral case, in contrast to the successful correlation achieved in the contralateral case.

      Regarding the incorporation of stimulation amplitudes into the analysis, as detailed in the Method section (lines 770-771), we systematically tested various stimulation amplitudes to determine the minimal threshold required for eliciting a muscle twitch, identified as the threshold value. This process was conducted for each electrode site.

      Upon reviewing these data, we considered the possibility of presenting an additional assessment of ipsilateral cortical motor representation based on stimulation thresholds. However, the representation depicted in the figure did not differ significantly from the data presented in Figure 6A. Furthermore, this representation introduced an additional weakness, as it was unclear how to represent the absence of a response in the threshold scale. We chose to arbitrarily designate it as zero on the inverse logarithmic scale, where, for reference, 100 µA is positioned at 0.2 and 50 µA at 0.5.

      In conclusion, we believe that the conclusions drawn from this analysis align substantially with those in the text. The addition of the threshold analysis, in our assessment, would not contribute significantly to improving the manuscript.

      Author response image 1.

      Threshold analysis

      Author response image 2.

      Occurrence probability analysis, for comparison.

      (6) Lines 858-860: The authors state that "All tests were one-sided because all hypotheses were strictly defined in the direction of motor improvement." By using the one-sided test, the authors are using a lower standard for assessing statistical significance that the overwhelming majority of studies in this field use. More importantly, ipsilateral stimulation of particular kinds or particular sites might conceivably impair function, and that is ignored if the analysis is confined to detecting improvement. Thus, a two-sided analysis or comparable method should be used. This appropriate change would not greatly modify the authors' current conclusions about improvements.

      Our original hypothesis, drawn from previous studies involving cortical stimulation in rats and cats, as well as other neurostimulation research for movement restoration, posited a favorable impact of neurostimulation on movement. Consistent with this hypothesis, we designed our experiments with a focus on enhancing movement, emphasizing a strict direction of improvement.

      It's important to note that a one-sided test is the appropriate match for a one-sided hypothesis, and it is not a lower standard in statistics. Each experiment we conducted was constructed around a strictly one-sided hypothesis: the inclusion of an extensor-inducing stimulus would enhance extension, and the inclusion of a flexion-inducing stimulus would enhance flexion. This rationale guided our choice of the appropriate statistical test.

      We acknowledge your concern regarding the potential for ipsilateral stimulation to have negative effects on locomotion, which might not be captured when designing experiments based on one-sided hypotheses. That is, when hypothesizing that an extensor stimulus would enhance extension (a one-sided hypothesis) in a functional task, and finding an opposite result (inhibition), statistical rigor would impose that we cannot present that result as significant. This concern is valid, and we explicitly mentioned our design choice it in the method section, Quantification and statistical analyses:

      “All tests were one-sided, as our hypotheses were strictly defined to predict motor improvement. Specifically, we hypothesized that delivering an extension-inducing stimulus would enhance leg extension, and delivering a flexion-inducing stimulus would enhance leg flexion. Consequently, any potentially statistically significant result in the opposite direction (e.g., inhibition) would not be considered. However, no such occurrences were observed.”

      As a final note, even if such opposite observations were made, they could serve as the basis for triggering an ad-hoc follow-up study.

      Reviewer #1 also provided several detailed suggestions in the section “Recommendations for the authors”. We estimated that each of them was beneficial for the correctness or for the readability of the text, and thus all were incorporated into the final version.

      Reviewer #2 (Public Review):

      Summary:

      The authors' long-term goals are to understand the utility of precisely phased cortex stimulation regimes on recovery of function after spinal cord injury (SCI). In prior work, the authors explored the effects of contralesion cortex stimulation. Here, they explore ipsilesion cortex stimulation in which the corticospinal fibers that cross at the pyramidal decussation are spared. The authors explore the effects of such stimulation in intact rats and rats with a hemisection lesion at the thoracic level ipsilateral to the stimulated cortex. The appropriately phased microstimulation enhances contralateral flexion and ipsilateral extension, presumably through lumbar spinal cord crossed-extension interneuron systems. This microstimulation improves weight bearing in the ipsilesion hindlimb soon after injury, before any normal recovery of function would be seen. The contralateral homologous cortex can be lesioned in intact rats without impacting the microstimulation effect on flexion and extension during gait. In two rats ipsilateral flexion responses are noted, but these are not clearly demonstrated to be independent of the contralateral homologous cortex remaining intact.

      Strengths:

      This paper adds to prior data on cortical microstimulation by the laboratory in interesting ways. First, the strong effects of the spared crossed fibers from the ipsi-lesional cortex in parts of the ipsi-lesion leg's step cycle and weight support function are solidly demonstrated. This raises the interesting possibility that stimulating the contra-lesion cortex as reported previously may execute some of its effects through callosal coordination with the ipsi-lesion cortex tested here. This is not fully discussed by the authors but may represent a significant aspect of these data. The authors demonstrate solidly that ablation of the contra-lesional cortex does not impede the effects reported here. I believe this has not been shown for the contra-lesional cortex microstimulation effects reported earlier, but I may be wrong. Effects and neuroprosthetic control of these effects are explored well in the ipsi-lesion cortex tests here.

      In the revised version of the manuscript, we incorporated various text improvements to address the points you have highlighted in your review. Additionally, we have integrated the suggested discussion topic on callosal coordination related to contralateral cortical stimulation. The discussion section now incorporates:

      “Since bi-cortical interactions in sculpting descending commands are known (Brus-Ramer et al., 2009), and in light of the changes we report in ipsilesional motor cortex excitability, the role of the ipsilateral cortex in mediating or supporting functional descending commands from the contralateral cortex, particularly the immediate increase in flexion of the affected hindlimb and long-term recovery of functional control (Bonizzato & Martinez, 2021), could be further explored.”

      The localization of the specific channels closest to the interhemispheric fissure (Fig. 7D) may suggest the involvement of transcallosal interactions in mediating the transmission of the cortical command generated in the ipsilateral motor cortex (Brus-Ramer, Carmel, & Martin, 2009). “While ablation experiments (Fig. 8) refute this hypothesis for ipsilateral extension control, they do not conclusively determine whether a different efferent pathway is involved in ipsilateral flexion control in this specific case."

      Weaknesses:

      Some data is based on very few rats. For example (N=2) for ipsilateral flexion effects of microstimulation. N=3 for homologous cortex ablation, and only ipsi extension is tested it seems. There is no explicit demonstration that the ipsilateral flexion effects in only 2 rats reported can survive the contra-lateral cortex ablation.

      We agree with this assessment. The ipsilateral flexion representation is here reported as a rare but consistent phenomenon, which we believe to have robustly described with Figure 7 experiments. We underlined in the text that the ablation experiment did not conclude on the unilateral-cortical nature of ipsilateral flexion effects, by replacing the sentence with the following:

      “While ablation experiments (Fig. 8) refute this hypothesis for ipsilateral extension control, they do not conclusively determine whether a different efferent pathway is involved in ipsilateral flexion control in this specific case."

      Some improvements in clarity and precision of descriptions are needed, as well as fuller definitions of terms and algorithms.

      Likely Impacts: This data adds in significant ways to prior work by the authors, and an understanding of how phased stimulation in cortical neuroprosthetics may aid in recovery of function after SCI, especially if a few ambiguities in writing and interpretation are fully resolved.

      The manuscript text has been revised in its final version, and we sought to eliminate all ambiguity in writing and data interpretation.

      In the section “Recommendations for the authors” Reviewer #2 also suggested to better define multiple terms throughout the manuscript. A clarification was added for each.

      The Reviewer pointed out that we might have overlooked a correlation between locomotor recovery and motor maps increase in Figure 6. We re-approached this evaluation and found that the reviewer is correct. We were led to think that there was no correlation by “horizontally” looking at whether motor map size across rats would predict locomotor scores (as it did in the case of contralateral cortex mapping, Bonizzato and Martinez, 2021). However we now found a strong correlation between changes that happen over time for each rat and locomotor recovery, a result that was only hinted with no appropriate quantification in the previous version of the manuscript. We have now reformulated the results of Figure 6 on page 12, to include this result, and we would like to thank the reviewer for having noticed this opportunity.

      Finally, we have expanded the discussion to include the following points:

      The possibility that hemi-cortex coordination of contralesional microstimulation inputs may explain the Sci Transl Med results for contralesional cortex ICMS, which warrants further investigation.

      The recognition that the ablation experiments do not provide conclusive evidence regarding ipsilateral flexion control and whether an alternative efferent pathway might be involved in this specific case.

      Reviewer #3 (Public Review):

      Summary:

      This article aims to investigate the impact of neuroprosthesis (intracortical microstimulation) implanted unilaterally on the lesion side in the context of locomotor recovery following unilateral thoracic spinal cord injury.

      Strength:

      The study reveals that stimulating the left motor cortex, on the same side as the lesion, not only activates the expected right (contralateral) muscle activity but also influences unexpected muscle activity on the left (ipsilateral) side. These muscle activities resulted in a substantial enhancement in lift during the swing phase of the contralateral limb and improved trunk-limb support for the ipsilateral limb. They used different experimental and stimulation conditions to show the ipsilateral limb control evoked by the stimulation. This outcome holds significance, shedding light on the engagement of the "contralateral projecting" corticospinal tract in activating not only the contralateral but also the ipsilateral spinal network.

      The experimental design and findings align with the investigation of the stimulation effect of contralateral projecting corticospinal tracts. They carefully examined the recovery of ipsilateral limb control with motor maps. They also tested the effective sites of cortical stimulation. The study successfully demonstrates the impact of electrical stimulation on the contralateral projecting neurons on ipsilateral limb control during locomotion, as well as identifying important stimulation spots for such an effect. These results contribute to our understanding of how these neurons influence bilateral spinal circuitry. The study's findings contribute valuable insights to the broader neuroscience and rehabilitation communities.

      Thank you for your assessment of this manuscript. The final version of the manuscript incoporates your suggestions for improving term clarity and we enhanced the discussion on the mechanisms of spinal network engagement, as outlined below.

      Weakness:

      The term "ipsilateral" lacks a clear definition in the title, abstract, introduction, and discussion, potentially causing confusion for the reader.

      [and later] However, in my opinion, readers can easily link the ipsilateral cortical network to the ipsilateral-projecting corticospinal tract, which is less likely to play a role in ipsilateral limb control in this study since this tract is disrupted by the thoracic spinal injury.

      In order to mitigate the risk of having readers linking the effects of ipsilateral cortical stimulation with ipsilateral-projecting corticospinal tract, we specified:

      In the abstract, we precise that our goal was: “to investigate the functional role of the ipsilateral motor cortex in rat movement through spared contralesional pathways.”

      In the introduction: “In most cases, this lesion also disrupts all spinal tracts descending on the same side as the cortex under investigation at the thoracic level, meaning that the transmission of cortical commands to the ipsilesional hindlimb must depend on crossed descending tracts (Fig. S1).”

      The unexpected ipsilateral (left) muscle activity is most likely due to the left corticospinal neurons recruiting not only the right spinal network but also the left spinal network. This is probably due to the joint efforts of the neuroprosthesis and activation of spinal motor networks which work bilaterally at the spinal level.

      We agree with your assessment and the discussion section now emphasizes the effects of supraspinal drive onto spinal circuits.

      In the section “Recommendations for the authors” Reviewer #3 suggested to provide an early reminder to the reader that the focus is on exploring the control of the ipsilateral limb through the corticospinal tract of the same side, projecting contralaterally. We did so in the abstract and introduction, as presented above.

      The reviewer also suggested that the discussion could be shorter. While we recognize it covers diverse subjects that may appeal to different readers, we believe omitting some sections could limit its overall scope. The manuscript underwent three revisions and a thorough dialogue with reviewers from diverse backgrounds, and we are hesitant to undo some of these improvements.

      Moreover, the section falls short of fully exploring the involvement of contralateral projecting corticospinal neurons in spinal networks for diverse motor behaviors. It could potentially delve into aspects like the potential impact of corticospinal inputs on gating the cross-extensor reflex loop and elucidating the mechanisms underlying the recruitment of the ipsilateral spinal network for generating ipsilateral limb movements. Is it a direct control on motor neurons or via existing spinal circuits?

      The discussion section now includes the potential spinal circuits through which corticospinal neurons may affect motor control and reflexes.

      Reviewer #3 also provided several detailed suggestions in the sub-section “Minor points”. We estimated that all of them were beneficial for the correctness or for the readability of the text, and thus were incorporated into the final version. Some of the questions raised were answered directly in the text (defining “% of chronic map” and rephrasing the original Line 479). We would like to answer here below two remaining questions:

      Fig. 3C I wonder what is the average latency between stimulation onset and onset of right ankle flexor activity. Is the latency fixed, or variable (which probably indicates that the Cortical activation signal is integrated with spinal CPG activity.)

      ICMS trains, unfortunately, do not allow for precise dissection of transmission timing. Single pulses at 100 µA are insufficient to generate motoneuron responses and require multiple pulses to build up cortical transmission. Alstermark et al. (Journal of Neurophysiology, 2004) used two to four stimuli with higher amplitudes to investigate forelimb transmission timing. In our 2021 Science Translational Medicine paper, we employed single pulses at 1 mA to establish transmission delays from the contralateral cortex to the ankle flexor. However, the circuits recruited at 1 mA are not directly comparable to those activated by shorter trains.

      In this study, we used cortical trains of approximately 14 pulses, typical of ICMS protocols. Each pulse could potentially be the first to generate a response volley in the ankle flexor, with delays measured at 30 to 60 ms from ICMS train onset. While we believe that cortical commands are necessarily integrated with spinal CPG activity—as indicated in Figures 1B and 3D, where timing is crucial and descending commands can be gated out if delivered off-phase—the variability in latency that we recorded could be attributed to any of the following factors: cortical activation build-up, integration within reticular relay networks, or CPG integration.

      Fig. 4A. Why is the activity of under contralateral ankle flexor intact condition is later than the stimulation condition?

      We timed the stimulation to coincide with the contralateral leg lift and did not adjust its onset relative to spontaneous walking in SCI rats. Although stimulation could induce leg lift, as shown in Fig. 4A, SCI rats exhibited a slightly earlier and stronger activation of the right (contralateral) ankle flexor muscle even during spontaneous walking. This phenomenon is attributed to the deficits observed on the left side. The stronger right leg bears the body weight, as illustrated in Fig. 3, and thus, during body advancement, the right leg is engaged sooner and more rapidly (with a shorter swing phase) to provide support (right foot forward).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      (1) Although there are many citations acknowledging relevant previous work, there often isn't a very granular attribution of individual previous findings to their sources. In the results section, it's sometimes ambiguous when the paper is recapping established background and when it is breaking new ground. For example, around equation 8 in the results (sv = r - rho*t), it would be good to refer to previous places where versions of this equation have been presented. Offhand, McNamara 1982 (Theoretical Population Biology) is one early instance and Fawcett et al. 2012 (Behavioural Processes) is a later one. Line 922 of the discussion seems to imply this formulation is novel here.

      We would like to clarify that original manuscript equation 8, , as we derive, is not new, as it is similarly expressed in prior foundational work by McNamara (1982), and we thank the reviewer for drawing our attention to the extension of this form by Fawcett, McNamara, Houston (2012).

      We now so properly acknowledge this foundational work and extension in the results section…

      “This global reward-rate equivalent immediate reward (see Figure 4) is the subjective value of a pursuit, svPursuit (or simply, sv, when the referenced pursuit can be inferred), as similarly expressed in prior foundational work (McNamara 1982), and subsequent extensions (see (Fawcett, McNamara, Houston (2012)).”

      …and in the Discussion section at the location referenced by the reviewer:

      “From it, we re-expressed the pursuit’s worth in terms of its global reward rate-equivalent immediate reward, i.e., its ‘subjective value’, reprising McNamara’s foundational formulation (McNamara 1982).”

      (2) The choice environments that are considered in detail in the paper are very simple. The simplicity facilitates concrete examples and visualizations, but it would be worth further consideration of whether and how the conclusions generalize to more complex environments. The paper considers "forgo" scenario in which the agent can choose between sequences of pursuits like A-B-A-B (engaging with option B at all opportunities, which are interleaved with a default pursuit A) and A-A-A-A (forgoing option B). It considers "choice" scenarios where the agent can choose between sequences like A-B-A-B and A-C-A-C (where B and C are larger-later and smaller-sooner rewards, either of which can be interleaved with the default pursuit). Several forms of additional complexity would be valuable to consider. [A] One would be a greater number of unique pursuits, not repeated identically in a predictable sequence, akin to a prey-selection paradigm. It seems to me this would cause t_out and r_out (the time and reward outside of the focal prospect) to be policy-dependent, making the 'apportionment cost' more challenging to ascertain. Another relevant form of complexity would be if there were [B] variance or uncertainty in reward magnitudes or temporal durations or if [C] the agent had the ability to discontinue a pursuit such as in patch-departure scenarios.

      A) We would like to note that the section “Deriving Optimal Policy from Forgo Decision-making worlds”, addresses the reviewer’s scenario of n-number of pursuits”, each occurring at their own frequency, as in prey selection, not repeating identically in a predictable sequence. Within our subsection “Parceling the world…”, we introduce the concept of dividing a world (such as that) into the considered pursuit type, and everything outside of it. ‘Outside’ would include any number of other pursuits currently part of any policy, as the reviewer intuits, thus making t<sup>out</sup> and r<sup>out</sup> policy dependent. Nonetheless, a process of excluding (forgoing) pursuits by comparing the ‘in’ to the ‘out’ reward rate (section “Reward-rate optimizing forgo policy…”) or its equivalent sv (section “The forgo decision can also be made from subjective value), would iteratively lead to the global reward rate maximizing policy. This manner of parceling into ‘in’ and ‘out’ thus simplifies visualization of what can be complex worlds. Simpler cases that resemble common experimental designs are given in the manuscript to enhance intuition.

      We thank the reviewer for this keen suggestion. We now include example figures (Supplemental 1 & 2) for multi-pursuit worlds which have the same (Supplemental 1) and different pursuit frequencies (Supplemental 2), which illustrate how this evaluation leads to reward-rate optimization. This addition demonstrates how an iterative policy would lead to reward rate maximization and emphasizes how parcellating a world into ‘in’ and ‘out’ of the pursuit type applies and is a useful device for understanding the worth of any given pursuit in more complex worlds. The policy achieving the greatest global reward rate can be realized through an iterative process where pursuits with lower reward rates than the reward rate obtained from everything other than the considered pursuit type are sequentially removed from the policy.

      B) We would also emphasize that the formulation here contends with variance or uncertainty in the reward magnitudes or temporal durations. The ‘in’ pursuit is the average reward and the average time of the considered pursuit type, as is the ‘out’ the average reward and average time outside of the considered pursuit type.

      C) In this work, we consider the worth of initiating one-or-another pursuit (from having completed a prior one), and not the issue of continuing within a pursuit (having already engaged it), as in patch/give-up. Handling worlds in which the agent may depart from within a pursuit, which is to say ‘give-up’ (as in patch foraging), is outside the scope of this work.

      (3) I had a hard time arriving at a solid conceptual understanding of the 'apportionment cost' around Figure 5. I understand the arithmetic, but it would help if it were possible to formulate a more succinct verbal description of what makes the apportionment cost a useful and meaningful quality to focus on.

      We thank the reviewer for pressing for a succinct and intuitive verbal description.

      We added the following succinct verbal description of apportionment cost… “Apportionment cost is the difference in reward that can be expected, on average, between a policy of taking versus a policy of not taking the considered pursuit, over a time equal to its duration.” This definition appears in new paragraphs (as below) describing apportionment cost in the results section “Time’s cost: opportunity & apportionment costs determine a pursuit’s subjective value”, and is accompanied by equations for apportionment cost, and a figure giving its geometric depiction (Figure 5). We also expanded original figure 5 and its legend (so as to illustrate the apportionment scaling factor and the apportionment cost), and its accompanying main text, to further illustrate and clarify apportionment cost, and its relationship to opportunity cost, and time’s cost.

      “What, then, is the amount of reward by which the opportunity cost-subtracted reward is scaled down to equal the sv of the pursuit? This amount is the apportionment cost of time. The apportionment cost of time (height of the brown vertical bar, Figure 5F) is the global reward rate after taking into account the opportunity cost (slope of the magenta-gold dashed line in Figure 5F) times the time of the considered pursuit. Equally, the difference between the inside and outside reward rates, times the time of the pursuit, is the apportionment cost when scaled by the pursuit’s weight, i.e., the fraction that the considered pursuit is to the total time to traverse the world (Equation 9, right hand side). From the perspective of decision-making policies, apportionment cost is the difference in reward that can be expected, on average, between a policy of taking versus a policy of not taking the considered pursuit, over a time equal to its duration (Equation 9 center, Figure 5F).

      Equation 9. Apportionment Cost.

      While this difference is the apportionment cost of time, the opportunity cost of time is the amount that would be expected from a policy of not taking the considered pursuit over a time equal to the considered pursuit’s duration. Together, they sum to Time’s Cost (Figure 5G). Expressing a pursuit’s worth in terms of the global reward rate obtained under a policy of accepting the pursuit type (Figure 5 left column), or from the perspective of the outside reward and time (Figure 5 right column), are equivalent. However, the latter expresses sv in terms that are independent of one another, conveys the constituents giving rise to global reward rate, and provides the added insight that time’s cost comprises an apportionment as well as an opportunity cost.”

      The above definition of apportionment cost adds to other stated relationships of apportionment cost found throughout the paper (original lines 434,435,447,450).

      I think Figure 6C relates to this, but I had difficulty relating the axis labels to the points, lines, and patterned regions in the plot.

      We thank the reviewer for pointing out that this figure can be made to be more easily understood.

      We have done so by breaking its key features over a greater number of plots so that no single panel is overloaded. We have also changed text in the legend to clarify how apportionment and opportunity costs add to constitute time’s cost, and also correspondingly in the main text.

      I also was a bit confused by how the mathematical formulation was presented. As I understood it, the apportionment cost essentially involves scaling the rest of the SV expression by t<sup>out</sup>/(t<sup>in</sup> + t<sup>out</sup>).

      The reviewer’s understanding is correct: the amount of reward of the pursuit that remains after subtracting the opportunity cost, when so scaled, is equivalent to the subjective value of that pursuit. The amount by which that scaling decreases the rest of the SV expression is equal to the apportionment cost of time.

      The way this scaling factor is written in Figure 5C, as 1/(1 + (1/t<sup>out</sup>) t<sup>in</sup>), seems less clear than it could be.

      To be sure, we present the formula in original Figure 5C in this manner to emphasize the opportunity cost subtraction as separable from the apportionment rescaling, expressing the opportunity cost subtraction and the apportionment scaling component of the equation as their own terms in parentheses.

      But we understand the reviewer to be referring to the manner by which we chose to express the scaling term. We presented it in this way in the original manuscript, (rather than its more elegant form recognized by the reviewer) to make direct connection to temporal discounting literature. In this literature, discounting commonly takes the same mathematical form as our apportionment cost scaling, but whereas the steepness of discounting in this literature is controlled by a free fit parameter, k, we show how for a reward rate maximizing agent, the equivalent k term isn’t a free fit parameter, but rather is the reciprocal of the time spent outside the considered pursuit type.

      We take the reviewer’s advice to heart, and now first express subjective value in the format that emphasizes opportunity cost subtraction followed by an apportionment downscaling, identifying the apportionment scaling term, t<sup>out</sup>/(t<sup>out</sup> + t<sup>in</sup>), ie the outside weight. Figure 5 now shows the geometric representation of apportionment scaling and apportionment cost. Only subsequently in the discounting function section then do we now in the revised manuscript rearrange this subjective value expression to resemble the standard discounting function form.

      Also, the apportionment cost is described in the text as being subtracted from sv rather than as a multiplicative scaling factor.

      What we describe in the original text is how apportionment cost is a component of time’s cost, and how sv is the reward less time’s cost. It would be correct to say that apportionment cost and opportunity cost are subtracted from the pursuit’s reward to yield the subjective value of the pursuit. This is what we show in the original Figure 5D graphically. Original Figure 5 and accompanying formulas at its bottom show the equivalence of expressing sv in terms of subtracting time’s cost as calculated from the global reward rate under a policy of accepting the considered pursuit, or, of subtracting opportunity cost and then scaling the opportunity cost subtracted reward by the apportionment scaling term, thereby accounting for the apportionment cost of time.

      The revision of original figure 5, its figure legend, and accompanying text now make clear the meaning of apportionment cost, how it can be considered a subtraction from the reward of a pursuit, or, equivalently, how it can be thought of as the result of scaling down of opportunity cost subtracted reward.

      It could be written as a subtraction, by subtracting a second copy of the rest of the SV expression scaled by t_in/(t_in + t_out). But that shows the apportionment cost to depend on the opportunity cost, which is odd because the original motivation on line 404 was to resolve the lack of independence between terms in the SV expression.

      On line 404 of the original manuscript, we point out that the simple equation―which is a reprisal of McNamara’s insight―is problematic in that its terms on the RHS are not independent: the global reward rate is dependent on the considered pursuit’s reward (see Fig5B). The alternative expression for subjective value that we derive expresses sv in terms that are all independent of one another. We may have unintentionally obscured that fact by having already defined rho<sup>in</sup> as r<sup>in</sup>/ t<sup>in</sup> and rho<sup>out</sup> as r<sup>out</sup>/t<sup>out</sup> on lines 306 and 307.

      Therefore, in the revision, Ap 8 is expressed so to keep clear that it uses terms that are all independent of one another, and only subsequently express this formula with the simplifying substitution, rho<sup>out</sup>.

      That all said, we understand the reviewer’s point to be that the parenthetical terms relating the opportunity cost and the apportionment rescaling both contain within them the parameter t<sup>out</sup>, and in this way these concepts we put forward to understand the alternative equation are non-independent. That is correct, but it isn’t at odds with our objective to express SV in terms that are independent with one another (which we do). Our motivation in introducing these concepts is to provide insight and intuition into the cost of time (especially now with a clear and simple definition of apportionment cost stated). We go to lengths to demonstrate their relationship to each other.

      (4) In the analysis of discounting functions (line 664 and beyond), the paper doesn't say much about the fact that many discounting studies take specific measures to distinguish true time preferences from opportunity costs and reward-rate maximization.

      We understand the reviewer’s comment to connote that temporal decision-making worlds in which delay time does not preclude reward from outside the current pursuit is a means to distinguish time preference from the impact of opportunity cost. One contribution of this work is to demonstrate that, from a reward-rate maximization framework, an accounting of opportunity cost is not sufficient to understand apparent time preferences as distinguishable from reward-rate maximization. The apportionment cost of time must also be considered to have a full appreciation of the cost of time. For instance, let us consider a temporal decision-making world in which there is no reward received outside the considered pursuit. In such a world, there is no opportunity cost of time, so apparent temporal discounting functions would appear as if purely hyperbolic as a consequence of the apportionment cost of time alone. Time preference, as revealed experimentally by the choices made between a SS and a LL reward, then, seem confounding, as preference can reverse from a SS to a LL option as the displacement of those options (maintaining their difference in time) increases (Green, Fristoe, and Myerson 1994; Kirby and Herrnstein 1995). While this shift, the so-called “Delay effect”, could potentially arise as a consequence of some inherent time preference bias of an agent, we demonstrate that a reward-rate maximal agent exhibits hyperbolic discounting, and therefore it would also exhibit the Delay effect, even though it has no time preference.

      In the revision we now make reference to the Delay Effect (in abstract, results new section “The Delay Effect” with new figure 14, and in the discussion), which is taken as evidence of time preference in human and animal literature, and note explicitly how a reward-rate maximizing agent would also exhibit this behavior as a consequence of apparent hyperbolic discounting.

      In many of the human studies, delay time doesn't preclude other activities.

      Our framework is generalizable to worlds in which being in pursuit does not preclude an agent from receiving reward during that time at the outside reward rate. Original Ap 13 solves for such a condition, and shows that in this context, the opportunity cost of time drops out of the SV equation, leaving only the consequences of the apportionment cost of time. We made reference to this case on lines 1032-1034 of the original manuscript: “In this way, such hyperbolic discounting models [models that do not make an accounting of opportunity cost] are only appropriate in worlds with no “outside” reward, or, where being in a pursuit does not exclude the agent from receiving rewards at the rate that occurs outside of it (Ap. 13).”

      The note and reference is fleeting in the original work. We take the reviewer’s suggestion and now add paragraphs in the discussion on the difference between humans and animals in apparent discounting, making specific note of human studies in which delay time doesn’t preclude receiving outside reward while engaged in a pursuit. Relatedly, hyperbolic discounting is oft considered to be less steep in humans than in animals. As the reviewer points out, these assessments are frequently made under conditions in which being in a pursuit does not preclude receiving reward from outside the pursuit. When humans are tested under conditions in which outside rewards are precluded, they exhibit far steeper discounting. We now include citation to that observation (Jimura et al. 2009). We handle such conditions in original AP 13, and show how, in such worlds, the opportunity cost of time drops out of the equation. The consequence of this is that the apparent discounting function would become less steep (the agent would appear as if more patient), consistent with reports.

      “Relating to the treatment of opportunity cost, we also note that many investigations into temporal discounting do not make an explicit distinction between situations in which 1) subjects continue to receive the usual rewards from the environment during the delay to a chosen pursuit, and 2) situations in which during a chosen pursuit’s delay no other rewards or opportunities will occur (Kable & Glimcher, 2007; Kirby & Maraković, 1996; McClure, Laibson, Loewenstein, & Cohen, 2004). Commonly, human subjects are asked to answer questions about their preferences between options for amounts they will not actually earn after delays they will not actually have to wait, during which it is unclear whether they are really investing time away from other options or not (Rosati et al., 2007). In contrast, in most animal experiments, subjects actually receive reward after different delays during which they do not receive new options or rewards. By our formulation, when a pursuit does not exclude the agent from receiving rewards at the rate that occurs outside, the opportunity cost of time drops out of the subjective value equation (Ap 12).

      Equation 10. The value of initiating a pursuit when pursuit does not exclude receiving rewards at the outside rate (Ap 12)

      Therefore, the reward-rate maximizing discounting function in these worlds is functionally equivalent to the situation in which the outside reward rate is zero, and will―lacking an opportunity cost―be less steep. This rationalizes why human discounting functions are often reported to be longer (gentler) than animal discounting functions: they are typically tested in conditions that negate opportunity cost, whereas animals are typically tested in conditions that enforce opportunity costs. Indeed, when humans are made to wait for actually received reward, their observed discounting functions are much steeper (Jimura et al. 2009). “

      In animal studies, rate maximization can serve as a baseline against which to measure additional effects of temporal discounting. This is an important caveat to claims about discounting anomalies being rational under rate maximization (e.g., line 1024).

      We agree that the purpose of this reward-rate maximizing framework is to serve as a point of comparison in which effects of temporal intervals and rewards that define the environment can be analyzed to better understand the manner in which animals and humans deviate from this ideal behavior. Our interest in this work is in part motivated by a desire to have a deeper understanding of what “true” time preference means. Using the reward-rate maximizing framework here provides a means to speak about time preferences (ie biases) in terms of deviation from optimality. From this perspective, a reward-rate maximal agent doesn’t exhibit time preference: its actions are guided solely by reward-rate optimizing valuation. Therefore, one contribution of this work is to show that purported signs of time preference (hyperbolic discounting, magnitude, sign, and (now) delay effect) can be explained without invoking time preference. What errors from optimality that remain following an proper accounting of reward-rate maximizing behavior should then, and only then, be considered from the lens of time preference (bias).

      (5) The paper doesn't feature any very concrete engagement with empirical data sets. This is ok for a theoretical paper, but some of the characterizations of empirical results that the model aims to match seem oversimplified. An example is the contention that real decision-makers are optimal in accept/reject decisions (line 816 and elsewhere). This isn't always true; sometimes there is evidence of overharvesting, for example.

      We would like to note that the scope of this paper is limited to examining the value of initiating a pursuit, rather than the value of continuing within a pursuit. The issue of continuing within a pursuit constitutes a third fundamental topology, which could be called give-up or patch-foraging, and is complex and warrants its own paper. In Give-up topologies, which are distinct from Forgo, and Choice topologies, the reviewer is correct in pointing out that the preponderance of evidence demonstrates that animals and humans are as if overpatient, adopting a policy of investing too much time within a pursuit, than is warranted_._ In Forgo instances, however, the evidence supports near optimality.

      (6) Related to the point above, it would be helpful to discuss more concretely how some of this paper's theoretical proposals could be empirically evaluated in the future. Regarding the magnitude and sign effects of discounting, there is not a very thorough overview of the several other explanations that have been proposed in the literature. It would be helpful to engage more deeply with previous proposals and consider how the present hypothesis might make unique predictions and could be evaluated against them.

      We appreciate the reviewer’s point that there are many existing explanations for these various ‘anomalous’ effects. We hold that the point of this work is to demonstrate that these effects are consistent with a reward-rate maximizing framework so do not require additional assumptions, like separate processes for small and large rewards, or the inclusion of a utility function.

      Nonetheless, there is a diversity of explanations for the sign and magnitude effect, and, (now with its explicit inclusion in the revision) the delay effect. Therefore, we now also include reference to additional work which proffers alternative explanations for the sign and magnitude effects, (as reviewed by (Kalenscher and Pennartz 2008; Frederick et al. 2002)), as well as a scalar timing account of non-stationary time preference (Gibbon, 1977).

      With respect to making predictions, this framework makes the following in regards to the magnitude, sign, and (now in the revision) delay effect: in Discussion, Magnitude effect subsection: “The Magnitude Effect should be observed, experimentally, to diminish when 1) increasing the outside time while holding the outside reward constant, (thus decreasing the outside reward rate), or when 2) decreasing the outside reward while holding the outside time constant (thus decreasing the outside reward rate). However, 3) the Magnitude Effect would exaggerate as the outside time increased while holding the outside reward rate constant.”, in Sign effect subsection: “…we then also predict that the size of the Sign effect would diminish as the outside reward rate decreases (and as the outside time increases), and in fact would invert should the outside reward rate turn negative (become net punishing), such that punishments would appear to discount more steeply than rewards.” Delay effect subsection: “...a sign of irrationality is that a preference reversal occurs at delays greater than what a reward-rate-maximizing agent would exhibit.”

      A similar point applies to the 'malapportionment hypothesis' although in this case there is a very helpful section on comparisons to prior models (line 1163). The idea being proposed here seems to have a lot in common conceptually with Blanchard et al. 2013, so it would be worth saying more about how data could be used to test or reconcile these proposals.

      We thank the reviewer for holding that the section of model comparisons to be very helpful. We believe the text previously dedicated to this issue to be sufficient in this regard. We have, however, adding substantively to the Malapportionment Hypothesis section (Discussion) and its accompanying figure, to make explicit a number of predictions from the Malapportionment hypothesis as it relates to Hyperbolic discounting, the Delay Effect, and the Sign and Magnitude Effects.

      Reviewer #1 Recommendations

      (1) As a general note about the figures, it would be helpful to specify, either graphically or in the caption, what fixed values of reward sizes and time intervals are being assumed for each illustration.

      Thank you for the suggestion. We attempted to keep graphs as uncluttered as possible, but agree that for original figures 4,5,16, and 17, which didn’t have numbered axes, that we should provide the amounts in the captions in the revised figures (4,5, and now 17,18). These figures did not have numerics as their shapes and display are to illustrate the form of the relationship between vectors, being general to the values they may take.

      We now include in the captions for these figures the parameter amounts used.

      (2) Should Equation 2 have t in the denominator instead of r?

      Indeed. We thank the reviewer for catching this typographical error.

      We have corrected it in the revision.

      (3) General recommendation:

      My view is that in order for the paper's eLife assessment to improve, it would be necessary to resolve points 1 through 4 listed under "weaknesses" in my public review, which pertain to clarity and acknowledgement of prior work. I think a lot hinges on whether the authors can respond to point #3 by making a more compelling case for the usefulness and generality of the 'apportionment cost' concept, since that idea is central to the paper's contribution.

      We believe these critical points (1-4) to improve the paper will now have been addressed to the reviewer’s satisfaction.

      Reviewer #2 (Public review):

      While the details of the paper are compelling, the authors' presentation of their results is often unclear or incomplete:

      (1) The mathematical details of the paper are correct but contain numerous notation errors and are presented as a solid block of subtle equation manipulations. This makes the details of the authors' approach (the main contribution of the paper to the field) highly difficult to understand.

      We thank the reviewers for having detected typographical errors regarding three equations. They have been corrected. The first typographical error in the original main text (Line 277) regards equation 2 and will be corrected so that equation 2 appears correctly as

      The second typo regards the definition of the considered pursuit’s reward rate which appear in the original main text (line 306), and has been corrected to appear as

      The third typographical error occurred in conversion from Google Sheets to Microsoft Word appearing in the original main text (line 703) and regards the subjective value expression when no reward is received in an intertrial interval (ITI). It has been corrected to appear as

      (2) One of the main contributions of the paper is the notion that time’s cost in decision-making contains an apportionment cost that reflects the allocation of decision time relative to the world. The authors use this cost to pose a hypothesis as to why subjects exhibit sub-optimal behavior in choice decisions. However, the equation for the apportionment cost is never clearly defined in the paper, which is a significant oversight that hampers the effectiveness of the authors' claims.

      We thank the reviewer for pressing on this critical point. Reviewers commonly identified a need to provide a concise and intuitive definition of apportionment cost, and to explicitly solve and provide for its mathematical expression.

      We added the following succinct verbal description of apportionment cost… “Apportionment cost is the difference in reward that can be expected, on average, between a policy of taking versus a policy of not taking the considered pursuit, over a time equal to its duration.” This definition appears in new paragraphs (as below) describing apportionment cost in the results section “Time’s cost: opportunity & apportionment costs determine a pursuit’s subjective value”, and is accompanied by equations for apportionment cost, and a figure giving its geometric depiction (Figure 5). We also expanded original figure 5 and its legend (so as to illustrate the apportionment scaling factor and the apportionment cost), and its accompanying main text, to further illustrate and clarify apportionment cost, and its relationship to opportunity cost, and time’s cost.

      “What, then, is the amount of reward by which the opportunity cost-subtracted reward is scaled down to equal the sv of the pursuit? This amount is the apportionment cost of time. The apportionment cost of time (height of the brown vertical bar, Figure 5F) is the global reward rate after taking into account the opportunity cost (slope of the magenta-gold dashed line in Figure 5F) times the time of the considered pursuit. Equally, the difference between the inside and outside reward rates, times the time of the pursuit, is the apportionment cost when scaled by the pursuit’s weight, i.e., the fraction that the considered pursuit is to the total time to traverse the world (Equation 9, right hand side). From the perspective of decision-making policies, apportionment cost is the difference in reward that can be expected, on average, between a policy of taking versus a policy of not taking the considered pursuit, over a time equal to its duration (Equation 9 center, Figure 5F).

      Equation 9. Apportionment Cost.

      While this difference is the apportionment cost of time, the opportunity cost of time is the amount that would be expected from a policy of not taking the considered pursuit over a time equal to the considered pursuit’s duration. Together, they sum to Time’s Cost (Figure 5G). Expressing a pursuit’s worth in terms of the global reward rate obtained under a policy of accepting the pursuit type (Figure 5 left column), or from the perspective of the outside reward and time (Figure 5 right column), are equivalent. However, the latter expresses sv in terms that are independent of one another, conveys the constituents giving rise to global reward rate, and provides the added insight that time’s cost comprises an apportionment as well as an opportunity cost.”

      (3) Many of the paper's figures are visually busy and not clearly detailed in the captions (for example, Figures 6-8). Because of the geometric nature of the authors' approach, the figures should be as clean and intuitive as possible, as in their current state, they undercut the utility of a geometric argument.

      We endeavored to make our figures as simple as possible. We have made in the revision changes to figures that we believe improve their clarity. These include: 1) breaking some figures into more panels when more than one concept was being introduced (such as in revised Figure 5 , 6, 7, and 8), 2) using the left hand y axis for the outside reward, and the right hand axis for the inside reward when plotting the “in” and “outside” reward, and indicating their respective numerics (which run in opposite directions), 3) adding a legend to the figures themselves where needed (revised figures 10, 11, 12, 14) 4) adding the values used to the figure captions, where needed, and 5) ensuring all symbols are indicated in legends.

      (4) The authors motivate their work by focusing on previously-observed behavior in decision experiments and tell the reader that their model is able to qualitatively replicate this data. This claim would be significantly strengthened by the inclusion of experimental data to directly compare to their model's behavior. Given the computational focus of the paper, I do not believe the authors need to conduct their own experiments to obtain this data; reproducing previously accepted data from the papers the authors' reference would be sufficient.

      Our objective was not to fit experimentally observed data, as is commonly the goal of implementation/computational models. Rather, as a theory, our objective is to rationalize the broad, curious, and well-established pattern of temporal decision-making behaviors under a deeper understanding of reward-rate maximization, and from that understanding, identify the nature of the error being committed by whatever learning algorithm and representational architecture is actually being used by humans and animals. In doing so, we make a number of important contributions. By identifying and analyzing reward-rate-maximizing equations, we 1) provide insight into what composes time’s cost and how the temporal structure of the world in which it is embedded (its ‘context’) impacts the value of a pursuit, 2) rationalize a diverse assortment of temporal decision-making behaviors (e.g., Hyperbolic discounting, the Magnitude Effect, the Sign Effect, and the Delay effect), explaining them with no assumed free-fit parameter, and then, by analyzing error in parameters enabling reward-rate maximization, 3) identify the likely source of error and propose the Malapportionment Hypothesis. The Malapportionment Hypothesis identifies the underweighting of a considered pursuit’s “outside”, and not error in pursuit’s reward rates, as the source of error committed by humans and animals. It explains why animals and humans can present as suboptimally ‘impatient’ in Choice, but as optimal in Forgo. At the same time, it concords with numerous and diverse observations in decision making regarding whether to initiate a pursuit. The nature of this error also, then, makes numerous predictions. These insights inform future computational and experimental work by providing strong constraints on the nature of the algorithm and representational architecture used to learn and represent the values of pursuits. Rigorous test of the Malapportionment Hypothesis will require wholly new experiments.

      In the revision, we also now emphasize and add predictions of the Malapportionment Hypothesis, updated its figure (Figure 21), its legend, and its paragraphs in the discussion.

      “We term this reckoning of the source of error committed by animals and humans the Malapportionment Hypothesis, which identifies the underweighting of the time spent outside versus inside a considered pursuit but not the misestimation of pursuit rates, as the source of error committed by animals and humans (Figure 21). This hypothesis therefore captures previously published behavioral observations (Figure 21A) showing that animals can make decisions to take or forgo reward options that optimize reward accumulation (Krebs et al., 1977; Stephens and Krebs, 1986; Blanchard and Hayden, 2014), but make suboptimal decisions when presented with simultaneous and mutually exclusive choices between rewards of different delays (Logue et al., 1985; Blanchard and Hayden, 2015; Carter and Redish, 2016; Kane et al., 2019). The Malapportionment Hypothesis further predicts that apparent discounting functions will present with greater curvature than what a reward-rate-maximizing agent would exhibit (Figure 21B). While experimentally observed temporal discounting would have greater curvature, the Malapportionment Hypothesis also predicts that the Magnitude (Figure 21C) and Sign effect (Figure 21D) would be less pronounced than what a reward-rate-maximizing agent would exhibit, with these effects becoming less pronounced the greater the underweighting. Finally, with regards to the Delay Effect (Figure 21E), the Malapportionment Hypothesis predicts that preference reversal would occur at delays greater than that exhibited by a reward-rate-maximizing agent, with the delay becoming more pronounced the greater the underweighting outside versus inside the considered pursuit by the agent.”

      (5) While the authors reference a good portion of the decision-making literature in their paper, they largely ignore the evidence-accumulation portion of the literature, which has been discussing time-based discounting functions for some years. Several papers that are both experimentally-(Cisek et al. 2009, Thurs et al. 2012, Holmes et al. 2016) and theoretically-(Drugowitsch et al. 2012, Tajima et al. 2019, Barendregt et al. 22) driven exist, and I would encourage the authors to discuss how their results relate to those in different areas of the field.

      In this manuscript, we consider the worth of initiating one or another pursuit having completed a prior one, and not the issue of continuing within a pursuit having already engaged in it. The worth of continuing a pursuit, as in patch-foraging/give-up tasks, constitutes a third fundamental time decision-making topology which is outside the scope of the current work. It engages a large and important literature, encompassing evidence accumulation, and requires a paper on the value of continuing a pursuit in temporal decision making, in its own right, that can use the concepts and framework developed here. The excellent works suggested by the reviewer will be most relevant to that future work concerning patch-foraging/give-up topologies.

      Reviewer #2 Recommendations:

      (1) In Equation 1, the term rho_d is referred to as the reward rate of the default pursuit, when it should be the reward of the default pursuit.

      Regarding Equation 1, it is formulated to calculate the average reward received and average time spent per unit time spent in the default pursuit. So, f<sub>i</sub> is the encounter rate of pursuit i for one unit of time spent in the default pursuit (lines 259-262). Added to the summation in the numerator, we have the average reward obtained in the default pursuit per unit time () and in the denominator we have the time spent in the default pursuit per unit time (1).

      We have added clarifying text to assist in meaning of the equation in Ap 1, and thank the reviewer for pointing out this need.

      (2) The notation for "in" and "out" of a considered pursuit type begins as being used to describe the contribution from a single pursuit (without inter-trial interval) towards global reward rate and the contribution of all other factors (other possible pursuits and inter-trial interval) towards global reward rate, respectively, but is then used to describe the pursuit's contribution and the inter-trial interval's contribution, respectively, to the global reward rate. This should be cleaned up to be consistent throughout, or at the very least, it should be addressed when this special case is considered the default.

      As understood by the reviewer, “in” and “out” of the considered pursuit type describes the general form by which a world can be cleaved into these two parts: the average time and reward received outside of the considered pursuit type for the average time and reward received within that pursuit type. A specific, simple, and common experimental instance would be a world composed of one or another pursuit and an intertrial interval.

      We now make clear how such a world composed of a considered pursuit and an inter trial interval would be but one special case. In example cases where t<sup>out</sup> represents the special case of an inter-trial interval, this is now stated clearly. For instance, we do so when discussing how a purely hyperbolic discounting function would apply in worlds in which no reward is received in t<sup>out</sup>, stating that this is often the case common to experimental designs where t<sup>out</sup> represents an intertrial interval with no reward. Importantly, by the new inclusion of illustrated worlds in the revision that have n-number pursuits that could occur from a default pursuit and 1) equal frequency (Supplemental 1), and 2) at differing frequencies (Supplemental 2), we make more clear the generalizability and utility of this t<sup>out</sup>/tin concept.

      (3) Figure 5 should make clear the decomposition of time's cost both graphically and functionally. As it stands, the figure does not define the apportionment cost.

      In the revision of original fig 5, we now further decompose the figure to effectively convey 1) what opportunity cost, and (especially) 2) the apportionment cost is, both graphically and mathematically, 3) how time’s cost is comprised by them, 4) how the apportionment scaling term scales the opportunity-cost-subtracted reward by time’s allocation to equal the subjective value, and 4) the equivalence between the expression of time’s cost using terms that are not independent of one another with the expression of time’s cost using terms that are independent of one another.

      (4) Figures 6-8 do not clearly define the dots and annuli used in panels B and C.

      We have further decomposed figures 6-8 so that the functional form of opportunity, apportionment, and time’s cost can be more clearly appreciated, and what their interrelationship is with respect to changing outside reward and outside time, and clearly identify symbols used in the corresponding legends.

      (5) The meaning of a negative subjective value should be specifically stated. Is it the amount a subject would pay to avoid taking the considered pursuit?

      As the reviewer intuits, negative subjective value can be considered the amount an agent ought be willing to pay to avoid taking the considered pursuit.

      We now include the following lines in “The forgo decision can also be made from subjective value” section in reference to negative subjective value…

      “A negative subjective value thus indicates that a policy of taking the considered pursuit would result in a global reward rate that is less than a policy of forgoing the considered pursuit. Equivalently, a negative subjective value can be considered the amount an agent ought be willing to pay to avoid having to take the considered pursuit.”

      (6) Why do you define the discounting function as the normalized subjective value? This choice should be justified, via literature citations or a well-described logical argument.

      The reward magnitude normalized subjective value-time function is commonly referred to as the temporal discounting function as it permits comparison of the discount rate isolated from a difference in reward magnitude and/or sign and is deeply rooted in historical precedent. As the reviewer points out, the term is overloaded, however, as investigations in which comparisons between the form of subjective value-time functions is not needed tend to refer to these functions as temporal discounting functions as well.

      We make clear in the revised text in the introduction our meaning and use of the term, the justification in doing so, and its historical roots.

      “Historically, temporal decision-making has been examined using a temporal discounting function to describe how delays in rewards influence their valuation. Temporal discounting functions describe the subjective value of an offered reward as a function of when the offered reward is realized. To isolate the form of discount rate from any difference in reward magnitude and sign, subjective value is commonly normalized by the reward magnitude when comparing subjective value-time functions (Strotz, 1956, Jimura, 2009). Therefore, we use the convention that temporal discounting functions are the magnitude-normalized subjective value-time function (Strotz, 1956).”

      Special addition. In investigating the historical roots of the discounting function prompted by the reviewer, we learned (Grüne-Yanoff 2015) that it was Mazur that simply added the “1+k” in the denominator of the hyperbolic discounting function. Our derivation for the reward-rate optimal agent makes clear why apparent temporal discounting functions ought have this general form.

      Therefore, we add the following to the “Hyperbolic Temporal Discounting Function section in the discussion…

      “It was Ainslie (Ainslie, 1975) who first understood that the empirically observed “preference reversals” between SS and LL pursuits could be explained if temporal discounting took on a hyperbolic form, which he initially conjectured to arise simply from the ratio of reward to delay (Grüne-Yanoff 2015). This was problematic, however, on two fronts: 1) as the time nears zero, the value curve goes to infinity, and 2) there is no accommodation of differences observed within and between subjects regarding the steepness of discounting. Mazur (Mazur, 1987) addressed these issues by introducing 1 + k into the denominator, providing for the now standard hyperbolic discounting function, . Introduction of “1” solved the first issue, though “it never became fully clear how to interpret this 1” (Grüne-Yanoff 2015; interviewing Ainslie). Introduction of the free-fit parameter, k, accommodated the variability observed across and within subjects by controlling the curvature of temporal discounting, and has become widely interpreted as a psychological trait, such as patience, or willingness to delay gratification (Frederick et al., 2002).”

      …continuing later in that section to explain why the reward-rate optimal agent would exhibit this general form…

      “Regarding form, our analysis reveals that the apparent discounting function of a reward-rate-maximizing agent is a hyperbolic function…

      …which resembles the standard hyperbolic discounting function, , in the denominator, where . Whereas Mazur introduced 1 + k to t in the denominator to 1) force the function to behave as t approaches zero, and 2) provide a means to accommodate differences observed within and between subjects, our derivation gives cause to the terms 1 and k, their relationship to one another, and to t in the denominator. First, from our derivation, “1” actually signifies taking t<sub>out</sub> amount of time expressed in units of t<sub>out</sub> (t<sub>out</sub>/t<sub>out</sub>=1) and adding it to t<sub>in</sub>  amount of time expressed in units of t<sub>out</sub> (ie, the total time to make a full pass through the world expressed in terms of how the agent apportions its time under a policy of accepting the considered pursuit).”

      Additional Correction. In revising the section, “Hyperbolic Temporal Discounting Functions” in the discussion, we also detected an error in our description of the meaning of suboptimal bias for SS. In the revision, the sentence now reads…

      More precisely, what is meant by this suboptimal bias for SS is that the switch in preference from LL to SS occurs at an outside reward rate that is lower—and/or an outside time that is greater —than what an optimal agent would exhibit.”

      (7) Figure 15B should have negative axes defined for the pursuit's now negative reward.

      Yes- excellent point.

      To remove ambiguity regarding the valence of inside and outside reward magnitudes, we have changed all such figures so that the left hand y-axis is used to signify the outside reward magnitude and sign, and so that the right hand y-axis is used to signify the inside reward magnitude and sign.

      With respect to the revision of original 15B, this change now makes clear that the inside reward label and numerics on the right hand side of the graph run from positive (top) to negative (bottom) values so that it can now be understood that the magnitude of the inside reward is negative in this figure (ie, a punishment). The left hand y-axis labeling the outside reward magnitude has numerics that run in the opposite direction, from negative (top) to positive (bottom). In this figure, the outside reward rate is positive whereas the inside reward rate is negative.

      (8) When comparing your discounting function to the TIMERR and Heuristic models, it would be useful to include a schematic plot illustrating the different obtainable behaviors from all models rather than just telling the reader the differences.

      We hold that the descriptions and references are sufficient to address these comparisons.

      (9) I would strongly suggest cleaning up all appendices for notation…

      The typographical errors that have been noted in these reviews have all been corrected. We believe the reviewer to be referring here to the manner that we had cross-referenced Equations in the appendices and main text which can lead to confusion between whether an equation number being referenced is in regard to its occurrence in the main text or its occurrence in the appendices.

      In the revision, we eliminate numbering of equations in the appendices except where an equation occurs in an appendix that is referenced within the main text. In the main text, important equations are numbered sequentially and note the appendix from which they derive. If an equation in an appendix is referenced in the main text, it is noted within the appendix it derives.

      …and replacing some of the small equation manipulations with written text describing the goal of each derivation.

      To increase clarity, we have taken the reviewer’s helpful suggestion, adding helper text in the appendices were needed, and have bolded the equations of importance within the Appendices (rather than removing equation manipulations making clear steps of derivation).

      (10) I would suggest moving the table in Appendix 11 to the main text where misestimation is referenced.

      So moved. This appendix now appears in the main text as table 1 “Definitions of misestimating global reward rate-enabling parameters”.

      Reviewer #3 (Public review):

      One broad issue with the paper is readability. Admittedly, this is a complicated analysis involving many equations that are important to grasp to follow the analyses that subsequently build on top of previous analyses.

      But, what's missing is intuitive interpretations behind some of the terms introduced, especially the apportionment cost without referencing the equations in the definition so the reader gets a sense of how the decision-maker thinks of this time cost in contrast with the opportunity cost of time.

      We thank the reviewer for encouraging us to formulate a succinct and intuitive statement as to the nature of apportionment cost. We thank the reviewer for pressing for a succinct and intuitive verbal description.

      We added the following succinct verbal description of apportionment cost… “Apportionment cost is the difference in reward that can be expected, on average, between a policy of taking versus a policy of not taking the considered pursuit, over a time equal to its duration.” This definition appears in a new paragraph (as below) describing apportionment cost in the results section “Time’s cost: opportunity & apportionment costs determine a pursuit’s subjective value”, and is accompanied by equations for apportionment cost, and a figure giving its geometric depiction (Figure 5). We also expanded original figure 5 and its legend (so as to illustrate the apportionment scaling factor and the apportionment cost), and its accompanying main text, to further illustrate and clarify apportionment cost, and its relationship to opportunity cost, and time’s cost.

      “What, then, is the amount of reward by which the opportunity cost-subtracted reward is scaled down to equal the sv of the pursuit? This amount is the apportionment cost of time. The apportionment cost of time (height of the brown vertical bar, Figure 5F) is the global reward rate after taking into account the opportunity cost (slope of the magenta-gold dashed line in Figure 5F) times the time of the considered pursuit. Equally, the difference between the inside and outside reward rates, times the time of the pursuit, is the apportionment cost when scaled by the pursuit’s weight, i.e., the fraction that the considered pursuit is to the total time to traverse the world (Equation 9, right hand side). From the perspective of decision-making policies, apportionment cost is the difference in reward that can be expected, on average, between a policy of taking versus a policy of not taking the considered pursuit, over a time equal to its duration (Equation 9 center, Figure 5F).

      Equation 9. Apportionment Cost.

      While this difference is the apportionment cost of time, the opportunity cost of time is the amount that would be expected from a policy of not taking the considered pursuit over a time equal to the considered pursuit’s duration. Together, they sum to Time’s Cost (Figure 5G). Expressing a pursuit’s worth in terms of the global reward rate obtained under a policy of accepting the pursuit type (Figure 5 left column), or from the perspective of the outside reward and time (Figure 5 right column), are equivalent. However, the latter expresses sv in terms that are independent of one another, conveys the constituents giving rise to global reward rate, and provides the added insight that time’s cost comprises an apportionment as well as an opportunity cost.”

      The above definition of apportionment cost adds to other stated relationships of apportionment cost found throughout the paper (original lines 434,435,447,450).

      Re-analysis of some existing empirical data through the lens of their presented objective functions, especially later when they describe sources of error in behavior.

      Our objective was not to fit experimentally observed data, as is commonly the goal of implementation/computational models. Rather, as a theory, our objective is to rationalize the broad, curious, and well-established pattern of temporal decision-making behaviors under a deeper understanding of reward-rate maximization, and from that understanding, identify the nature of the error being committed by whatever learning algorithm and representational architecture is actually being used by humans and animals. In doing so, we make a number of important contributions. By identifying and analyzing reward-rate-maximizing equations, we 1) provide insight into what composes time’s cost and how the temporal structure of the world in which it is embedded (its ‘context’) impacts the value of a pursuit, 2) rationalize a diverse assortment of temporal decision-making behaviors (e.g., Hyperbolic discounting, the Magnitude Effect, the Sign Effect, and the Delay effect), explaining them with no assumed free-fit parameter, and then, by analyzing error in parameters enabling reward-rate maximization, 3) identify the likely source of error and propose the Malapportionment Hypothesis. The Malapportionment Hypothesis identifies the underweighting of a considered pursuit’s “outside”, and not error in pursuit’s reward rates, as the source of error committed by humans and animals. It explains why animals and humans can present as suboptimally ‘impatient’ in Choice, but as optimal in Forgo. At the same time, it concords with numerous and diverse observations in decision making regarding whether to initiate a pursuit. The nature of this error also, then, makes numerous predictions. These insights inform future computational and experimental work by providing strong constraints on the nature of the algorithm and representational architecture used to learn and represent the values of pursuits. Rigorous test of the Malapportionment Hypothesis will require wholly new experiments.

      In the revision, we also now emphasize and add predictions of the Malapportionment Hypothesis, augmenting its figure (Figure 21), its legend, and its paragraphs in the discussion.

      “We term this reckoning of the source of error committed by animals and humans the Malapportionment Hypothesis, which identifies the underweighting of the time spent outside versus inside a considered pursuit but not the misestimation of pursuit rates, as the source of error committed by animals and humans (Figure 21). This hypothesis therefore captures previously published behavioral observations (Figure 21A) showing that animals can make decisions to take or forgo reward options that optimize reward accumulation (Krebs et al., 1977; Stephens and Krebs, 1986; Blanchard and Hayden, 2014), but make suboptimal decisions when presented with simultaneous and mutually exclusive choices between rewards of different delays (Logue et al., 1985; Blanchard and Hayden, 2015; Carter and Redish, 2016; Kane et al., 2019). The Malapportionment Hypothesis further predicts that apparent discounting functions will present with greater curvature than what a reward-rate-maximizing agent would exhibit (Figure 21B). While experimentally observed temporal discounting would have greater curvature, the Malapportionment Hypothesis also predicts that the Magnitude (Figure 21C) and Sign effect (Figure 21D) would be less pronounced than what a reward-rate-maximizing agent would exhibit, with these effects becoming less pronounced the greater the underweighting. Finally, with regards to the Delay Effect (Figure 21E), the Malapportionment Hypothesis predicts that preference reversal would occur at delays greater than that exhibited by a reward-rate-maximizing agent, with the delay becoming more pronounced the greater the underweighting outside versus inside the considered pursuit by the agent.”

      Reviewer #3 Recommendations:

      As mentioned above, the readability of this paper should be improved so that the readers can follow the derivations and your analyses better. To this end, careful numbering of equations, following consistent equation numbering formats, and differentiating between appendix referencing and equation numbering would have gone a long way in improving the readability of this paper. Some specific questions are noted below.

      To increase clarity, in the revision we eliminated numbering of equations in the appendices except where an equation occurs in an appendix that is referenced within the main text. In the main text, important equations are thus numbered sequentially as they appear and note the appendix from which they derive. If an equation in an appendix is referenced in the main text, it is noted within the appendix it derives.

      (1) In general, it is unclear what the default pursuit is. From the schematic on the left (forgo decision), it appears to be the time spent in between reward-giving pursuits. However, this schematic also allows for smaller rewards to be attained during the default pursuit as do subsequent equations that reference a default reward rate. Here is where an example would have really benefited the authors in getting their point across as to what the default pursuit is in practice in the forgo decisions and how the default reward rate could be modulated.

      (1) The description of the default pursuit has been modified in section “Forgo and Choice decision topologies” to now read… “After either the conclusion of the pursuit, if accepted, or immediately after rejection, the agent returns to a pursuit by default (the “default” pursuit). This default pursuit effectively can be a waiting period over which reward could be received, and reoccurs until the next pursuit opportunity becomes available.” (2) Additionally, helper text has been added to Ap1 regarding the meaning of time and reward spent in the default pursuit. Finally, (3) new figures concerning n-pursuits occurring at the same (Supplement 1) or different (Supplement 2) frequencies from a default pursuit is now added, providing examples as suggested by the reviewer.

      (2) I want to clarify my understanding of the topologies in Figure 1. In the forgo, do they roam in the "gold" pursuit indefinitely before they are faced with the purple pursuit? In general, comparing the 2 topologies, it seems like in the forgo decision, they can roam indefinitely in the gold topology or choose the purple but must return to the gold.

      The reviewer’s understanding of the topology is correct. The agent loops across one unit time in the default gold pursuit indefinitely, though the purple pursuit (or any pursuit that might exist in that world) occurs on exit from gold at its frequency per unit time. The default gold pursuit will then itself have an average duration in units of time spent in gold. As the reviewer states, the agent can re-enter into gold from having exited gold, and can enter gold from having exited purple, but cannot re-enter purple from having exited purple; rather, it must enter into the default pursuit.

      …Another point here is that this topology is highly simplified (only one considered pursuit). So it may be helpful to either add a schematic for the full topology with multiple pursuits or alternatively, provide the corresponding equations (at least in appendix 1 and 2) for the simplified topology so you can drive home the intuition behind derived expressions in these equations.

      We understand the reviewer to be noting that, while, the illustrated example is of the simple topology, the mathematical formulation handles the case of n-number pursuits, and that illustrating a world in which there are a greater number of pursuits, corresponding to original appendices 1&2, would assist readers in understanding the generality of these equations.

      An excellent suggestion. We have now n-pursuit world illustrations where each pursuit occurs at the same (Supplemental Figure 1) and at different frequencies (Supplemental Figure 2) to the manuscript, and have added text to assist in understanding the form of the equation and its relationship to unit time in the default pursuit in the main and in the appendices.

      (3) In Equation and Appendix 1, there are a few things that are unclear. Particularly, why is the expected time of the default option E(t_default )= 1/(∑_(i=1)^n f_i )? Similarly, why is the E(r_default )= ρ_d/(∑_(i=1)^n f_i )? Looking at the expression for E(r_default ), it implies that across all pursuits 1 through n, the default option is encountered only once. Ultimately, in Equation 1.4, (and Equation 1), the units of the two terms in the numerator don't seem to match. One is a reward rate (ρ_d) and the other is a reward value. This is the most important equation of the paper since the next several equations build upon this. Therefore, the lack of clarity here makes the reader less likely to follow along with the analysis in rigorous detail. Better explanations of the terms and better formatting will help alleviate some of these issues.

      The equation is formulated to calculate the average reward received and average time spent per unit time spent in the default pursuit. So, f<sub>i</sub> is the encounter rate of pursuit i for one unit of time spent in the default pursuit. Added to the summation in the numerator we have the average reward obtained in the default pursuit per unit time () and in the denominator we have the time spent in the default pursuit per unit time (1).

      Text explaining the above equation has been added to Ap 1.

      (4) In equation and appendix 2, I'm trying to relate the expressions for t_out and r_out to the definitions "average time spent outside the considered pursuit". If I understand the expression in Equation 2.4 on the right-hand side, the numerator is the total time spent in all of the pursuits in the environment and the denominator refers to the number of times the considered pursuit is encountered. It is unclear as to why this is the average time spent outside the considered pursuit. In my mind, the expression for average time spent outside the considered pursuit would look something like t_out=1+ ∑_(i≠in)〖p_i t_i 〗= 1+ ∑_(i≠in)〖f_i/(∑_(j=1)^n f_j ) * t_i 〗. It is unclear how these expressions are then equivalent.

      Regarding the following equation,

      f<sub>i</sub> is the probability that pursuit i will be encountered during a single unit of time spent in the default pursuit. The numerator of the expression is the average amount of time spent across all pursuits, excepting the considered pursuit, per unit time spent in the default pursuit. Note that the + 1 in the numerator is accounting for the unit of time spent in the default pursuit and is added outside of the sum. Since f<sub>in</sub> is the probability that the considered pursuit will be encountered per unit of time spent in the default pursuit, is the average amount of time spent in the default pursuit between encounters of the considered pursuit. By multiplying the average time spent across all outside pursuits per unit of time in the default pursuit by the average amount of time spent in the default pursuit between encounters of the considered pursuit, we get the average amount of time spent outside the considered pursuit per encounter of the considered pursuit. This is calculated as if the pursuit encounters are mutually exclusive within a single unit of time spent within the default pursuit, as this is the case as the length of our unit time (delta t) approaches zero.

      The above text explaining the equation has been added to Ap 2.

      (5) In Figure 3, one huge advantage of this separation into in-pursuit and out-of-pursuit patches is that the optimal reward rate maximizing rule becomes one that compares ρ_in and ρ_out. This contrasts with an optimal foraging rule which requires comparing to the global reward rate and therefore a circularity in solution. In practice, however, it is unclear how ρ_out will be estimated by the agent.

      How, in practice, a human or animal estimates the reward rates―be they the outside and/or global reward rate under a policy of accepting a pursuit―is the crux of the matter. This work identifies equations that would enable a reward-rate maximizing agent to calculate and execute optimal policies and emphasizes that the effective reward rates and weights of pursuits must be accurately appreciated for global reward rate optimization. In so doing, it makes a reckoning of behaviors commonly but erroneously treated as suboptimal. Then, by examining the consequences of misestimation of these enabling parameters, it identifies mis-weighting pursuits as the nature of the error committed by whatever algorithm and representational architecture is being used by humans and animals (the Malapportionment Hypothesis). This curious pattern identified and analyzed in this work thus provides a clue into the nature of the learning algorithm and means of representing the temporal structure of the environment that is used by humans and animals―the subject of future work.

      We note, however, that we do discuss existing models that grapple with how, in practice, how a human or animal may estimate the outside reward rate. Of particular importance is the TIMERR model, which estimates the outside reward rate from its past experience, and can make an accounting of many qualitative features widely observed. However, while appealing, it would mix prior ‘in’ and ‘outside’ experiences within that estimate, and so would fail to perform forgo tasks optimally. Something is still amiss, as this work demonstrates.

      (6) The apportionment time cost needs to be explained a little bit more intuitively. For instance, it is clear that the opportunity cost of time is the cost of not spending time in the rest of the environment relative to the current pursuit. But given the definition of apportionment cost here in lines 447- 448 "The apportionment cost relates to time's allocation in the world: the time spent within a pursuit type relative to the time spent outside that pursuit type, appearing in the denominator." The reference to the equation (setting aside the confusion regarding which equation) within the definition makes it a bit harder to form an intuitive interpretation of this cost. Please reference the equation being referred to in lines 447-448, and again, an example may help the authors communicate their point much better

      We thank the reviewer for pressing on this critical point.

      Action: We added the following succinct verbal description of apportionment cost… “Apportionment cost is the difference in reward that can be expected, on average, between a policy of taking versus a policy of not taking the considered pursuit, over a time equal to its duration.” This definition appears in a new paragraph (as below) describing apportionment cost in the results section “Time’s cost: opportunity & apportionment costs determine a pursuit’s subjective value”, and is accompanied by equations for apportionment cost, and a figure giving its geometric depiction (Figure 5).

      “What, then, is the amount of reward by which the opportunity cost-subtracted reward is scaled down to equal the sv of the pursuit? This amount is the apportionment cost of time. The apportionment cost of time (height of the brown vertical bar, Figure 5F) is the global reward rate after taking into account the opportunity cost (slope of the magenta-gold dashed line in Figure 5F) times the time of the considered pursuit. Equally, the difference between the inside and outside reward rates, times the time of the pursuit, is the apportionment cost when scaled by the pursuit’s weight, i.e., the fraction that the considered pursuit is to the total time to traverse the world (Equation 9, right hand side). From the perspective of decision-making policies, apportionment cost is the difference in reward that can be expected, on average, between a policy of taking versus a policy of not taking the considered pursuit, over a time equal to its duration (Equation 9 center, Figure 5F).

      Equation 9. Apportionment Cost.

      While this difference is the apportionment cost of time, the opportunity cost of time is the amount that would be expected from a policy of not taking the considered pursuit over a time equal to the considered pursuit’s duration. Together, they sum to Time’s Cost (Figure 5G). Expressing a pursuit’s worth in terms of the global reward rate obtained under a policy of accepting the pursuit type (Figure 5 left column), or from the perspective of the outside reward and time (Figure 5 right column), are equivalent. However, the latter expresses sv in terms that are independent of one another, conveys the constituents giving rise to global reward rate, and provides the added insight that time’s cost comprises an apportionment as well as an opportunity cost.”

      (7) The analyses in Figures 6 and 7 give a nice visual representation of how the time costs are distributed as a function of outside reward and time spent. However, without an expression for apportionment cost it is hard to intuitively understand these visualizations. This also relates to the previous point of requiring a more intuitive explanation of apportionment costs in relation to the opportunity cost of time. Based on my quick math, it seems that an expression for apportionment cost would be as follows: (r_in- ρ_out*t_in)*(t_in⁄t_out )/(t_in⁄t_out +1 ). The condition described in Figure 7 seems like the perfect place to compute the value of just apportionment cost when the opportunity cost is zero. It would be helpful to introduce the equation here.

      We designed original figure 7, as the reviewer appreciates, to emphasize that time has a cost even when there is no opportunity cost, being due entirely to the apportionment cost of time.

      We now provide the mathematical expression of apportionment cost and apportionment scaling in Figure 5, the point in the main text of its first occurrence.

      …and have expanded original figure 5, its legend (so as to illustrate the apportionment scaling factor and the apportionment cost), and its accompanying main text, to further illustrate and clarify apportionment cost, and its relationship to opportunity cost, and time’s cost.

      (8) The analysis regarding choice decisions is relatively straightforward, pending the concerns for the main equations listed above for the forgo decisions. Legends certainly would have helped me grasp Figures 10-12 better.

      We believe the reviewer is referring to missing labels for the Sooner Smaller pursuit, and the Larger Later Pursuit in these figures? We used the same conventions as in Figure 9, but we see now that adding these labels to these figures would be helpful, and add them in the revision.

      We have now added to the figures themselves figure legends indicating the Sooner Small Pursuit and the Larger Later Pursuit. We have also added to the main text to emphasize the points made in these figures regarding the impact of opportunity cost and apportionment cost.

      (9) The derivation of the temporal discounting function from subjective reward rate is much appreciated as it provides further evidence for potential equivalence between reward rate optimization and hyperbolic discounting, which is known to explain a slew of decision-making behaviors in the economics literature.

      We thank and greatly appreciate the reviewer for this recognition.

      In response to the reviewer’s comment, we have added text that further relates reward rate optimization to hyperbolic discounting…

      (1) We add discussion of how our normative derivation gives explanation to Mazur’s ad hoc addition of 1 + k to Ainslie’s reward/time hyperbolic discounting conception. See new first paragraph under “Hyperbolic Temporal Discounting Functions” for the historical origins of the standard hyperbolic equation (which are decidedly not normatively derived). And then see our discussion (new second paragraph in sections “The apparent discounting function of global….”) of how our normative derivation gives explanation to “1”, “k”, and their relationship to each other.

      (2) We add explicit treatment of the Delay Effect in a new “The Delay Effect” section of the results along with a figure, and in its corresponding Discussion section.

      Minor comments:

      (1) Typo in equation 2, should be t_i in the denominator within the summation, not r_i .

      We thank the reviewer for catching this typo, and have corrected it in the revision.

      (2) Before equation 6, typo when defining ρ_in= r_in/(t_in.). Should be t_in in the denominator, not r_out.

      We thank the reviewer for catching this typo, and have corrected it in the revision.

      (3) Please be consistent with equation numbers, placement of equation references, and the reason for placing appendix numbers. This will improve readability immensely.

      To increase clarity, in the revision we eliminated numbering of equations in the appendices except where an equation occurs in an appendix that is referenced within the main text. In the main text, important equations are thus numbered sequentially and note the appendix from which they derive. If an equation in an appendix is referenced in the main text, it is noted within the appendix it derives.

      (4) Line 505 - "dominants" should be dominates.

      Typo fixed as indicated

      (5) Figures 10-12: add legends to the figures.

      Now so included.

      (6) Lines 701-703: please rewrite the equation separately. It is highly unclear what rt is here.

      We thank the reviewer for bringing attention to this error. The error arose in converting from Google Sheets to Microsoft Word.

      The equation has now been corrected.

      Additional citations noted in reply and appearing in Main text

      Ainslie, George. 1975. “Specious Reward: A Behavioral Theory of Impulsiveness and Impulse Control.” Psychological Bulletin 59: 257–72.

      Frederick, Shane, George Loewenstein, Ted O. Donoghue, and T. E. D. O. Donoghue. 2002. “Time Discounting and Time Preference : A Critical Review.” Journal of Economic Literature 40: 351–401.

      Gibbon, John. 1977. “Scalar Expectancy Theory and Weber’s Law in Animal Timing.” Psychological Review 84: 279–325.

      Green, Leonard, Nathanael Fristoe, and Joel Myerson. 1994. “Temporal Discounting and Preference Reversals in Choice between Delayed Outcomes.” Psychonomic Bulletin & Review 1: 383–89.

      Grüne-Yanoff, Till. 2015. “Models of Temporal Discounting 1937-2000: An Interdisciplinary Exchange between Economics and Psychology.” Science in Context 28 (4): 675–713.

      Jimura, Koji, Joel Myerson, Joseph Hilgard, Todd S. Braver, and Leonard Green. 2009. “Are People Really More Patient than Other Animals? Evidence from Human Discounting of Real Liquid Rewards.” Psychonomic Bulletin & Review 16: 1071–75.

      Kalenscher, Tobias, and Cyriel M. A. Pennartz. 2008. “Is a Bird in the Hand Worth Two in the Future? The Neuroeconomics of Intertemporal Decision-Making.” Progress in Neurobiology 84 (3): 284–315.

      Kirby, Kris N., and R. J. Herrnstein. 1995. “Preference Reversals Due to Myopic Discounting of Delayed Reward.” Psychological Science 6 (2): 83–89.

      Mazur, James E. 1987. “An Adjusting Procedure for Studying Delayed Reinforcement.” In The Effect of Delay and of Intervening Events on Reinforcement Value., 55–73. Quantitative Analyses of Behavior, Vol. 5. Hillsdale, NJ, US: Lawrence Erlbaum Associates, Inc.

      McNamara, John. 1982. “Optimal Patch Use in a Stochastic Environment.” Theoretical Population Biology 21 (2): 269–88.

      Rosati, Alexandra G., Jeffrey R. Stevens, Brian Hare, and Marc D. Hauser. 2007. “The Evolutionary Origins of Human Patience: Temporal Preferences in Chimpanzees, Bonobos, and Human Adults.” Current Biology: CB 17: 1663–68.

      Strotz, R. H. 1956. “Myopia and Inconsistency in Dynamic Utility Maximization.” The Review of Economic Studies 23: 165–80.

    1. Author response:

      The following is the authors’ response to the original reviews.

      This valuable study combines multidisciplinary approaches to examine the role of insulin-like growth factor 2 mRNA-binding protein 2 (IGF2BP2) as a potential novel host dependency factor for Zika virus. The main claims are partially supported by the data, but remain incomplete. The evidence would be strengthened by improving the immunofluorescence analyses, addressing the role of IGF2BP2 in "milder" infections, and elucidating the role of IGF2BP2 in the biogenesis of the viral replication organelle. With the experimental evidence strengthened, this work will be of interest to virologists working on flaviviruses.

      We thank the reviewers for their feedback and constructive suggestions. In this revised version of the manuscript, we have addressed the reviewer’s comments to the best of our ability as detailed below. We believe that the newly incorporated data strengthens our study and conclusions. We hope that this revised manuscript will satisfy the reviewers and will be of high interest to flavivirologists.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This study investigated the co-option of IGF2BP2, an RNA-binding protein by ZIKV proteins. Designed experiments evaluated if IFG2BP2 co-localized to sites of viral RNA replication, interacted with ZIKV proteins, and how ZIKV infection changed the IGF2BP2 interactome.

      Strengths:

      The authors have used multiple interdisciplinary techniques to address several questions regarding the interaction of ZIKV proteins and IGF2BP2.

      The findings could be exciting, specifically regarding how ZIKV infection alters the interactome of IGF2BP2.

      We thank the reviewer for acknowledging the multidisciplinary approach of our study and its exciting potential.

      Weaknesses:

      Significant concerns regarding the current state of the figures, descriptions in the figure legends, and the quality of the immunofluorescence and electron microscopy exist.

      In this new version of the manuscript, we have improved the quality of the microscopy data and included the requested information in the figure legends as described below in the Recommendations section.

      Reviewer #2 (Public Review):

      Clément Mazeaud et al. identified the insulin-like growth factor 2 mRNA-binding protein 2 (IGF2BP2) as a proviral cellular protein that regulates Zika virus RNA replication by modulating the biogenesis of virus-induced replication organelles.

      The absence of IGF2BP2 specifically dampens ZIKV replication without having a major impact on DENV replication. The authors show that ZIKV infection changes IGF2BP2 cellular distribution, which relocates to the perinuclear viral replication compartment. These assays were conducted by infecting cells with an MOI of 10 for 48 hours. Considering the ZIKV life cycle, it is noteworthy that at this time there may be a cytopathic effect. One point of concern arises regarding how the authors can ascertain that the observed change in localization is a consequence of the infection rather than of the cytopathic effect. To address this concern, shorter infection periods (e.g., 24 hours post-infection) or additional controls, such as assessing cellular proteins that do not change their localization or infecting with another flavivirus lacking the IGF2BP2 effect, could be incorporated into their experiments.

      We thank the reviewer for these relevant comments regarding the specificity of IGF2BP2 relocalization to the ZIKV replication compartment.

      It is noteworthy that we chose the 2-day post-infection time point for our analyses because it corresponds to the peak of replication with much more titers produced compared to those at 24 hours post-infection (generally ~106 PFU/mL vs. ~104 PFU/mL). Consistently, the abundance of viral replication factories is more obvious at this time-point. A MOI of 5-10 was chosen to maximize the % of infected cells. That said, as suggested by the reviewer, we have analyzed the distribution of IGF2BP2 in ZIKV-infected cells at one-day post-infection, and we provide evidence in Figure S1 that IGF2BP2 relocalizes to the dsRNA-containing compartment at this time point.

      Importantly, we now show in Figure S5 that in contrast to IGF2BP2, other host RNA-binding proteins such as LARP1 and DDX5 do not accumulate to ZIKV replication compartment at 2 days post-infection. LARP1 actually seems to be excluded from it while DDX5 remains nuclear. Of note, consistent with the ZIKV-induced decrease in expression observed in western blots (Fig 4A), the intensity of DDX5 signal decreases in infected cells. Altogether, this demonstrates that the IGF2BP2 relocalization phenotype is specific and is not due to ZIKV-induced cell death.

      By performing co-immunoprecipitation assays on mock and infected cells that express HAtagged IGF2BP2, the authors propose that the observed change in IGF2BP2 localization results from its recruitment to the replication compartment by the viral NS5 polymerase and associated with the viral RNA. Given that both IGF2BP2 and NS5 are RNA-binding proteins, it is plausible that their interaction is mediated indirectly through the RNA molecule. Notably, the authors do not address the treatment of lysates with RNase before the IP assay, leaving open the possibility of this indirect interaction between IGF2BP2 and NS5.

      We agree with the hypothesis of the reviewer. As suggested, we have performed coimmunoprecipitation assays following RNase A treatment of the cell lysates. As shown in new Fig S6, the abundance of ZIKV NS5 co-immunoprecipitating with IGF2BP2-HA is drastically decreased upon RNase A treatment compared to the untreated condition. This demonstrates that the IGF2BP2/NS5 interaction is mostly RNA-dependent, which is not surprising as RNA is often a structural component of ribonucleoprotein complexes. Of note, the same is observed with ATL2. This new set of data allows us to refine our model of Figure 11 and the discussion as they strongly suggest that the direct binding of IGF2BP2 to viral RNA (evidenced in vitro; Fig 5D) is required for subsequent association with NS5 and ER-shaping protein ATL2. This is in line with the fact that viral RNA is a co-factor in the biogenesis of ER-derived ZIKV vesicle packets (PMID: 32640225). However, we cannot exclude a contribution of cellular RNA in these processes as discussed.   

      In in vitro binding assays, the authors demonstrate that the RNA-recognition motifs of the IGF2BP2 protein specifically bind to the 3' nontranslated region (NTR) of the ZIKV genome, excluding binding to the 5' NTR. However, they cannot rule out the possibility of this host protein associating with other regions of the viral genome. Using a reporter ZIKV subgenomic replicon system in IGF2BP2 knock-down cells, they additionally demonstrate that IGF2BP2 enhances viral genome replication. Despite its proviral function, the authors note that the "overexpression of IGF2BP2 had no impact on total vRNA levels." However, the authors do not delve into a discussion of this latter statement.

      We agree with the reviewer’s comments. We now mention in the discussion that we cannot exclude the possibility that IGF2BP2 associates with RNA motifs within the coding region of the viral genomic RNA, especially considering that it contains N6A-methylated sequences (PMID: 27773535; 27773536; 29373715). Moreover, we discuss the observation that IGF2BP2 overexpression has no impact on vRNA levels (as well as titers). We believe that this is because endogenous IGF2BP2 is highly expressed in cancer cells such as the Huh7.5 and JEG-3 cells used here and is presumably not limiting for viral replication in our system (PMID: 38320625; 35111811; 34309973; 35023719; 37088822; 33224879; 35915142).

      In this study, the authors extend their findings by illustrating that ZIKV infection triggers a remodeling of IGF2BP2 ribonucleoprotein complex. They initially evaluate the impact of ZIKV infection on IGF2BP2's interaction with its endogenous mRNA ligands. Their results reveal that viral infection alters the binding of specific mRNA ligands, yet the physiological consequences of this loss of binding in the cell remain unexplored. 

      We acknowledge that it would be of interest to further study the physiological relevance of the modulation of IGF2BP2 ribo-interactome. Since we have focused here on the role of IGF2BP2 in viral replication, we feel that this will be the focus of future studies notably involving a larger omic-centered approach to identify the most impacted IGF2BP2 mRNA ligands. Of note, Gokhale and colleagues have already reported that CIRBP, TNRC6A and PUM2 proteins regulates the replication of Flaviviridae (PMID: 31810760).

      Additionally, the authors demonstrate that ZIKV infection modifies the IGF2BP2 interactome. Through proteomic assays, they identified 62 altered partners of IGF2BP2 following ZIKV infection, with proteins associated with mRNA splicing and ribosome biogenesis being the most represented. In particular, the authors focused their research on the heightened interaction between IGF2BP2 and Atlastin 2, an ER-shaping protein reported to be involved in flavivirus vesicle packet formation. The validation of this interaction by Western blot assays prompted an analysis of the effect of ZIKV on organelle biogenesis using a newly described replication-independent vesicle packet induction system. Consequently, the authors demonstrate that IGF2BP2 plays a regulatory role in the biogenesis of ZIKV replication organelles.

      Based on these findings and previously published data, the authors propose a model outlining the role of IGF2BP2 in ZIKV infectious cycle, detailing the changes in IGF2BP2 interactions with both cellular and viral proteins and RNAs that occur during viral infection.

      The conclusions drawn in this paper are generally well substantiated by the data.

      We thank the reviewers for this encouraging general comments on our study.

      However, it is worth noting that the majority of infections were conducted at a high MOI for 48 hours, spanning more than one infectious cycle. To enhance the robustness of their findings and mitigate potential cell stress, it would be valuable to observe these effects at shorter time intervals, such as 24 hours post-infection.

      As explained above, IGF2BP2 relocalization to the (dsRNA-enriched) replication compartment was also observed in ZIKV infected cells at one day post-infection.

      Furthermore, the assertion regarding the association of IGF2BP2 with NS5 could be strengthened through additional immunoprecipitation (IP) assays. These assays, performed in the presence of RNAse treatment, would help exclude the possibility of an indirect interaction between IGF2BP2 and NS5 (both RNA-binding proteins) through viral RNA, thus providing more confidence in the observed association.

      See above for our answer and the description of the new data of Fig. S7.

      Reviewer #3 (Public Review):

      Summary:

      The manuscript by Mazeaud and colleagues pursued a small-scale screen of a targeted RNAi library to identify novel players involved in Zika (ZIKV) and dengue (DENV) virus replication. Loss-of-function of IGF2BP2 resulted in reduced titers for ZIKV of the Asian and African lineages in hepatic Huh7.5 cells, but not for either of the four DENV serotypes nor West Nile virus (WNV). The phenotype was further confirmed in two additional cell lines and using a ZIKV reporter virus. In addition, using immunoprecipitation assays the interaction between IGF2BP2 and ZIKV NS5 protein and RNA genome was detected. The work addressed the role of IGF2BP2 in the infected cell combining confocal microscopy imaging, and proteomic analysis. The approach indicated an altered distribution of IGF2BP2 in infected cells and changes in the protein interactome including disrupted association with partner mRNAs and modulation of the abundance of a specific set of protein partners in IGF2BP2 immunoprecipitated ribonucleoprotein (RNP) complexes. Finally, based on the changes in IGF2BP2 interactome and specifically the increment in the abundance of Atlastin 2, the biogenesis of ZIKV replication organelles (vRO) is investigated using a genetic system that allows virus replication-independent assembly of vRO. Electron microscopy showed that knockdown of IGF2BP2 expression reduced the number of cells with vRO.

      Strengths:

      The role of IGF2BP2 as a proviral factor for ZIKV replication is novel. The study follows a logical flow of experiments that altogether support the assembly of a specialized RNP complex containing IGF2BP2 and ZIKV NS5 and RNA genome.

      We thank the reviewer for their positive feedback on our study and its novelty.

      Weaknesses:

      The statistical analysis should clearly indicate the number of biological replicates of experiments to support statistical significance.

      This information has been included in all figure legends.

      The claim that IGF2BP2 knockdown impairs de novo viral organelle biogenesis and viral RNA synthesis is built upon data that show a reduction in RNA synthesis <0.5-fold as assessed using a reporter replicon, thus suggesting a limited impact of the knockdown on RNA replication.

      We agree that a 50% decrease in the replication of our reporter replicon might be considered mild. However, we want to pinpoint that in an infectious set-up, the phenotypes were higher as demonstrated by an 80% decrease in viral particle production even when IGF2BP2 levels were never depleted more that 80% compared to endogenous levels. Moreover, our findings were validated through the analysis of de novo vRO biogenesis by electron microscopy in a replication-independent set-up. Together, these experiments provide compelling evidence for a role for IGF2BP2 in the early stages of viral genome replication.

      Validation of IGF2BP2 partners that are modulated upon ZIKV infection (i.e. virus yield in knocked down cells) can be relevant especially for partners such as Atlastin 2, as the hypothesis of a role for IGF2BP2 RNP in vRO biogenesis is based on the observed increase in the abundance of Atlastin 2 in the RNP complex preciìtated from infected cells.

      First, we would like to emphasize that the proviral role of ATL2 in flavivirus replication, including links to vRO biogenesis, was already reported in two independent studies notably by one of the co-authors (PMID: 31636417; 31534046). Therefore, we have chosen to discuss these previous studies in the manuscript rather than repeating published experiments.  Second, we agree that it would be interesting to further interrogate the role of modulated IGF2BP2 protein partners in ZIKV replication. However, these experiments would constitute a new project per se involving fastidious RNAi-based phenotypic screening and subsequent functional characterization of the identified hits. Therefore, this will be the focus of follow-up studies.  

      Recommendations for the Authors:

      Reviewer #1 (Recommendations For The Authors):

      All IFAs claimed that showing co-localization is minimal, this needs to be addressed.

      We have performed colocalization analyses for relevant images in the revised manuscript (see below and Figs. 4B, 5A, S4A-C and S5A-D. Although this quantification increases confidence in our analysis, we were still cautious in our conclusions, stating that colocalization was partial and that IGF2BP2 accumulates in the replication compartment.

      Western blots and IPs need to be quantified.

      As requested, we have included WB quantification in Figs. 2A, 4A, 4D, 8B-D, S6C and S7D.

      Figure 1: What is the strain background for the ZIKV reporter virus?

      As indicated in the legend of Figure 1E of the primary submission, the Rluc-expressing ZIKV reporter virus (ZIKV-R2A) was based on the FSS13025 isolate (Asian lineage)(PMID: 27198478). To clarify this, we have also indicated the strain background in the main text of the Results and Material & Methods sections.

      Figure 2A: If shGF2BP2 reduces viral titer, the NS3 should show a reduction in 2A, but it doesn't.

      We agree with the reviewer. Although NS3 seems not to be decreased upon IGF2BP2 knockdown in the experiment initially shown in Figure 2A, it should be noted that our homemade rat anti-NS3 antibody is highly sensitive, leading to signal saturation that makes it challenging to distinguish changes in NS3 expression without diluting substantially the lysate sample before the PAGE-SDS. The initial reason for including Fig 2A was not to make a statement about viral protein expression but to validate IGF2BP2 knock-down efficiency. Conclusions about NS3 levels in the initial figure are further complicated by the high MOI of ZIKV was used in Huh7.5 cells which are not quantitative for viral replication measurements. To address this issue, we assessed the impact of IGF2BP2 knockdown on viral protein abundance (as a read-out of overall viral replication) with a lower MOI of ZIKV. The results of the repeat experiment (seen in the new Fig. 2A) show that IGF2BP2 knockdown leads to a decrease in the abundance of NS4A, NS5 and NS3, which is consistent with the titer decrease phenotypes.

      Figure S3: The re-localization claimed is minimal and does not show overlap with NS3. The dsRNA is difficult to see here. Suggest improving the immunofluorescence images and reducing the claim for "strong" co-option of RNP complexes.

      In addition to replication complexes, NS3 labels convoluted membranes which are devoid of dsRNA and IGF2BP2 and surround the cage-like replication compartment as large puncta (PMID: 27545046; 33432690; 28249158). The signal overlap is more obvious between IGF2BP2 and NS3/dsRNA-containing areas, which is reflected by the Mander’s coefficients that have been included in the revised version (Fig. S5C-D). We have also adjusted the text to conclude that the colocalization was partial and that IGF2BP2 accumulated in the replication compartment. We acknowledge that the dsRNA signal is weak, and we have updated the images (and others, when relevant) to better visualize this viral component. Moreover, we have rephrased the sentence to remove the word “strongly”.

      Figure 4A: Western blot needs quantification.

      This is now included in the figure.

      Figure 4B: As in many of the IFAs, the co-localization is only partial. Additionally, the dsRNA is not visible. So the images need to be improved. The colocalization should be quantified across the cell diameter.

      We changed the color and intensity of the dsRNA staining to make it more visible. Mander’s colocalization coefficients have been determined and included in Figures 4B and S5C-D.

      Figure 4C: It is difficult to understand what the +/- is on the blots for the cell extracts and the anti-HA IP samples. It is not described in the figure legend or the text.

      As already indicated on the right of the panel, the +/- indicates whether or not IGF2BP2-HA was overexpressed in the cells. In the revised version, this is clarified in the figure legend.

      Figure 5A: Once again similar to other IFAs, the co-localization is only minimal and thus difficult to claim as "co-localization" is actually happening. It would be good to either improve the images or discuss this observation in the text and reduce the claim of colocalization. Specifically, since the two proteins might be co-localizing in specific regions which would make it a very interesting observation. Also, quantification of co-localizing regions would be beneficial.

      We have included the requested colocalization analysis. We have been cautious to indicate that colocalization was only partial. It is noteworthy that, despite many efforts in the optimization of the cell permeabilization procedure, we noticed that the FISH probes were not very efficient in accessing the perinuclear area of the infected cells, where replication complexes accumulate. In that respect, it is likely that this imaging approach “miss” some of the IGF2BP2/vRNA complexes and that the determined colocalization factor is underestimated. This explains why the confirmation of the vRNA/IGF2BP2 complex with a biochemical approach (Fig. 5B) was very relevant.

      Figure 5D: It is unclear what the blue squares represent. Clearer figure legends and text would be beneficial.

      As stated in the initial figure, the blue squares indicate values obtained with the ZIKV 5’ UTR probe while the green circles involve a 3’ UTR probe. We have further emphasized this information in the figure legend to make it clearer.

      Figure 6B. The graph is missing the data and X-axis label for shIGF2BP2.

      We had initially omitted the values of the conditions with shIGF2BP2 and the replicationdead GAA replicon, since this viral system does not allow accumulation of viral genomes or proteins and was not relevant at the 48h time point. We thought that the inclusion of the shNT/GAA condition was enough an internal negative control of viral replication since values for shIGF2BP2/GAA did not exceed background. Nevertheless, we have now included this condition in the revised figure.

      Figure 7D: It is unclear what the -/+ signs are in the cell extracts and the IP blots. Specifically, since there is an NS5 signal in the (-) lanes.

      As explained above, the +/- indicates whether IGF2BP2-HA was overexpressed. The meaning of these symbols is now further clarified in the figure legend.

      Figure 8C: The circles with the different colors are not clearly described. What does it mean?

      As indicated in the figure (left part), the red and green circles identify the partners of the STRING network whose association with IGF2BP2 is decreased and increased during infection, respectively. We have included this information in the figure legend.

      Figure 9: The electron microscopy to quantify vesicles should be carried out using whole-cell tomography in order to get the most accurate quantification of the vesicles following different treatments. This is because if you only look at one cell profile (slice), the number of vesicles might be less in that profile and more in another below or above it. It is unclear how many cell profiles were used for the quantification and how the calculations were carried out.

      We agree with the reviewer that ideally, one should perform 3D electron tomography to precisely assess the morphology of VPs. Regardless the fact that we do not possess the imaging infrastructure to perform that type of analysis, such an approach would represent a tremendous amount of work if one would like to process at least 200-400 vesicles from > 50 cells and their whole cytoplasm (as we did). Despite not having 3D images, this number of data points is sufficient to see general changes in viral replication vesicle morphology, especially considering that Huh7-Lunet cells are relatively flat cells. (PMID: 32640225; 36700643; 34696522; 31636417). Furthermore, since IGF2BP2 knockdown decreases the abundance of VPs and does not impact their diameter, we believe that the addition of sophisticated 3D analysis would not bring any new and relevant information and that the TEM data stand by themselves for the conclusion we made. A more refined morphological analysis to determine how IGF2BP2 is structurally involved in virus-mediated membrane reorganization could be the focus of a future study.

      We feel that we have already provided sufficient information about the quantification in the Material & Methods section of the first version of the manuscript: “Quantification was performed by systematically surveying cells and evaluating the presence of VPs. Only cells with >2 VPs were considered as positive. For each condition, >50 cells were surveyed over 4 biological replicas. All observed VPs were imaged, and VP diameters were determined using ImageJ by measuring the distance across two axes and averaging”.

      Reviewer #2 (Recommendations For The Authors):

      The inclusion of a control in the knock-down and infection assays with the reporter virus could enhance the validity of the findings. Introducing STAT2 knockdown, a recognized antiviral protein for ZIKV, as a control would provide a valuable benchmark to evaluate the extent of viral enhancement in the experiments. This additional control not only supports the proposed function of LARP1 in virus assembly/release but also strengthens the overall interpretation of the results.

      We agree that adding a positive control could have been relevant for assessing the extent of replication modulation, especially for increases such as that observed with shLARP1. However, finding such control proteins in our system was a challenge. Indeed, STAT2 would not have been a good control for these experiments since we used Huh7.5 cells for the RNAi mini-screening, which do not express a functional RIG-I protein, and generally do not produce type I and III interferons. Thus, STAT2 knockdown is not expected to result in an increase in replication. That said, we feel that it was unnecessary to include a control for replication inhibition here given that only a few statistically reliable candidates we obtained. Instead, we have opted for an extensive secondary validation approach by assessing the proviral role of IGF2BP2 for multiple viruses - DENV1-2-3-4, WNV and SARS-CoV-2, and 3 ZIKV strains in three relevant cell types.

      Additionally, in Figure S4, the authors employ an antibody against NS5 that specifically recognizes ZIKV NS5 but not DENV NS5. Given the objective of highlighting distinctions between these two viruses, it is advisable to use an antibody that detects DENV NS5 as well. This approach would contribute to a more comprehensive comparison, ensuring a balanced representation of both viruses in the experimental analysis.

      We thank the reviewer for this relevant suggestion. We have repeated the coimmunoprecipitation assays using antibodies specific to DENV NS5 (Aithor response image 1). While we specifically pulled down ZIKV NS5 with IGF2BP2-HA as expected, this was not the case for DENV NS5 when using extracts from DENV-infected cells despite our multiple attempts. Indeed, the amount of pulled-down DENV NS5 with IGF2BP2-HA was always comparable to that in the negative control (“empty” pWPI lentivirus-transduced cells, “-“ condition), which corresponds to non-specific binding to the HA-resin. Thus, while the antibody was very efficient at detecting DENV NS5 in the cell extracts, no specific binding between DENV NS5 and IGF2BP2-HA could be evidenced. Consistent with our different replication phenotypes between DENV and ZIKV, this strongly supports that the NS5/IGF2BP2 interaction is specific to ZIKV. The specificity of the IGF2BP2 interaction with ZIKV NS5 compared to DENV NS5 is discussed in the updated manuscript.

      Author response image 1.

      DENV NS5 is not specifically co-immunoprecipitated with IGF2BP2-HA in contrast to ZIKV NS5. Huh7.5 cells stably expressing IGF2BP2-HA (+) and control cells (-) were infected with ZIKV H/PF/2013 at a MOI of 10 or left uninfected. Two days later, cell extracts were prepared and subjected to RNase A treatment (+) or not (-) before anti-HA immunoprecipitations. The resulting complexes were analyzed by western blotting for their abundance in the indicated proteins.

      Reviewer #3 (Recommendations For The Authors):

      (1) Statistical analysis. Please clearly indicate what columns and error bars represent for bar graphs such as those presented in Figures 1A-D and F, Figures 2B-C, and bottom panels in DE, Figure 3, Figure 5B, Figure 6B-C, and Figures 9B-D and F. For instance, the mean of n independent experiments and standard deviation.

      Information about the number of replicates, error bars, and statistical tests has been added for all figures in the legends. 

      (2) What is the scale in the Y-axis of Figure 2C? As shown, it is difficult to know what is the virus titer in knocked-down cells. Please use a linear scale or a log scale.

      This is a linear scale of viral titers, which we have modified to make it clearer for the reader.

      (3) Throughout the manuscript (e.g. Figures 1, 2, and 3) the fold reduction in titer is presented instead of the actual virus titers. I suggest showing the titer as it may be much more informative for the reader.

      We prefer showing the data as fold reduction as they better reflect the IGF2BP2 knockdowninduced phenotypes across the independent biological replicates. Indeed, from one experiment to another, the reference titers in the control condition sometimes varies because of the cell passage or the lentiviral transduction efficiency for instance, especially when low multiplicities of infection are used. However, the reduction phenotype in foldchange observed upon IGF2BP2 knockdown was always consistent regardless of the titer value.  Of note, all considered experiments had reference titers above 105 PFU/mL.

      (4) Is it possible to perform a colocalization analysis of confocal images showing overlapping signals?

      This has been done and the results of these analyses are included in the updated figures 4B, 5A, S4 and S5.

      (5)  Assessing the effect of Atlastin2 knockdown in virus yield and showing coimmunoprecipitation of Atlastin 2 with NS5 can add relevant information.

      As mentioned in the discussion and above, ATL2 was already reported to be required for DENV and ZIKV replication in two independent studies (including one by one of the coauthors)(PMID: 31636417; 31534046). We have not tested whether ATL2 associates with NS5. However, new Fig. S7 of the revised manuscript shows that IGF2BP2/ATL2 is RNAdependent. This suggests that, as initially depicted in our model, IGF2BP2 associates with the ER (and thus, ATL2) after its binding to the viral RNA. Further interrogation into the role of atlastins in the flavivirus replication cycle is the focus of another ongoing IGF2BP2-unrelated study from one of the co-authors which will be reported elsewhere.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors investigate ligand and protein-binding processes in GPCRs (including dimerization) by the multiple walker supervised molecular dynamics method. The paper is interesting and it is very well written.

      Strengths:

      The authors' method is a powerful tool to gain insight into the structural basis for the pharmacology of G protein-coupled receptors.

      Weaknesses:

      Cholesterol may play a fundamental role in GPCR dimerization (as cited by the authors, Prasanna et al, "Cholesterol-Dependent Conformational Plasticity in GPCR Dimers"). Yet they do not use cholesterol in their simulations of the dimerization.

      We thank Reviewer #1 for the positive comment on mwSuMD.

      In the revised version of the manuscript, the section about the A<sub>2A</sub>/D2 receptors dimerization has been removed because largely speculative. We agree that the lack of cholesterol in those simulations added uncertainty to the presented results.

      Reviewer #2 (Public Review):

      The study by Deganutti and co-workers is a methodological report on an adaptive sampling approach, multiple walker supervised molecular dynamics (mwSuMD), which represents an improved version of the previous SuMD.

      Case-studies concern complex conformational transitions in a number of G protein Coupled Receptors (GPCRs) involving long time-scale motions such as binding-unbinding and collective motions of domains or portions. GPCRs are specialized GEFs (guanine nucleotide exchange factors) of heterotrimeric Gα proteins of the Ras GTPase superfamily. They constitute the largest superfamily of membrane proteins and are of central biomedical relevance as privileged targets of currently marketed drugs.

      MwSuMD was exploited to address:

      (1) Binding and unbinding of the arginine-vasopressin (AVP) cyclic peptide agonist to the V2 vasopressin receptor (V2R);

      (2) Molecular recognition of the β2-adrenergic receptor (β2-AR) and heterotrimeric GDPbound Gs protein;

      (3) Molecular recognition of the A1-adenosine receptor (A1R) and palmitoylated and geranylgeranylated membrane-anchored heterotrimeric GDP-bound Gi protein;

      (4) The whole process of GDP release from membrane-anchored heterotrimeric Gs following interaction with the glucagon-like peptide 1 receptor (GLP1R), converted to the active state following interaction with the orthosteric non-peptide agonist danuglipron;

      (5) The heterodimerization of D2 dopamine and A2A adenosine receptors (D2R and A2AR, respectively) and binding to a bi-valent ligand.

      The mwSuMD method is solid and valuable, has wide applicability, and is compatible with the most world-widely used MD engines. It may be of interest to the computational structural biology community.

      The huge amount of high-resolution data on GPCRs makes those systems suitable, although challenging, for method validation and development.

      While the approach is less energy-biased than other enhanced sampling methods, knowledge, at the atomic detail, of binding sites/interfaces and conformational states is needed to define the supervised metrics, the higher the resolution of such metrics is the more accurate the outcome is expected to be. The definition of the metrics is a user- and system-dependent process.

      The too many and ambitious case-studies undermine the accuracy of the output and reduce the important details needed for a methodological report. In some cases, the available CryoEM structures could have been exploited better.

      The most consistent example concerns AVP binding/unbinding to V2R. The consistency with CryoEM data decreases with an increase in the complexity of the simulated process and involved molecular systems (e.g. receptor recognition by membrane-anchored G protein and the process of nucleotide exchange starting from agonist recognition by an inactive-state receptor). The last example, GPCR hetero-dimerization, and binding to a bi-valent ligand, is the most speculative one as it does not rely on high-resolution structural data for metrics supervision.

      We praise Reviewer #2 for the detailed comment on the manuscript. In this revised version, the hetero-dimerization between A<sub>2A</sub>R and D<sub>2</sub>R has been removed. Also, results about GPCR case studies other than GLP-1R have been reduced and downgraded in importance to focus on the fundamental key points of the adaptive sampling method.  We agree that the consistency with cryoEM data tends to decrease with an increase in the complexity of the simulated process and involved molecular systems. While it is possible to approximate cryoEM results  our unbiased adaptive sampling technique finds its most interesting application in mechanistically unknown out-of-equilibrium processes rather than reproducing known experimental data perfectly. The simulated case studies we present showcase the versatility, speed and consistency of our adaptive method to explore energetically unbiased transitions.

      Reviewer #3 (Public Review):

      Summary:

      In the present work, Deganutti et al. report a structural study on GPCR functional dynamics using a computational approach called supervised molecular dynamics.

      Strengths:

      The study has the potential to provide novel insight into GPCR functionality. An example is the interaction between loops of GPCR and G proteins, which are not resolved experimentally, or the interaction between D344 and R385 identified during the Gs coupling by GLP-1R. However, validation of the findings, even computationally through for instance in silico mutagenesis study, is advisable.

      Weaknesses:

      In its current form, the manuscript seems immature and in particular, the described results grasp only the surface of the complex molecular mechanisms underlying GPCR activation. No significant advance of the existing structural data on GPCR and GPCR/G protein coupling is provided. Most of the results are a reproduction of the previously reported structures.

      We thank Reviewer #3 for the positive comment on the work. The revised manuscript focuses more on the GLP-1R and Gs case studies. We believe it addresses the weaknesses raised by showing the behaviour of key structural motifs and providing new hypotheses about GDP release.  

      Reviewer #2 (Recommendations For The Authors):

      In this methodological report, Deganutti and co-workers propose an improved version of supervised molecular dynamics (SuMD), named multiple walker SuMD (mwSuMD). Such an adaptive sampling method was challenged in simulations of complex transitions involving GPCRs, which are out of reach by classical MD.

      Although less energy-biased than other enhanced sampling methods, mwSuMD requires knowledge of the atomic detail of the ligand-protein or protein-protein binding site/interfaces and the structural hallmarks of the states whose conversion the method is going to address. Such knowledge is, indeed, necessary to define the supervised metrics (e.g. distances, RMSD, etc), which is a user- and system-dependent process.

      We classify mwSuMD as an adaptive, rather than enhanced, sampling method as it does not use any energy bias. We agree with the Reviewer that some knowledge of the system is required to productively set up the simulations, but this is the case for almost any MD advanced methods.  

      The text requires improvement in the essential methodological details and cleaning of those parts is not properly instrumental in method validation.

      While attempting to prove the widest possible applicability of the method, the authors exaggerated the number of examples, which, in spite of the increasing complexity were only summarily described. Please, limit the case studies to AVP binding/unbinding to V2R and the whole process of GDP release from membrane-anchored Gs following activation of GLP1R by danuglipron. The latter case, indeed, involves small ligand binding (danuglipron), small ligand dissociation (GDP), receptor activation, and activated receptor binding to membraneanchored G protein and G protein conformational transition instrumental to nucleotide depletion, which is already too much. In this framework, the cases of Gs-β2AR and Gi-A2R recognition are redundant. Most importantly, the case of D2R-A2AR heterodimerization and binding to a bi-valent ligand must be eliminated. The reason is that the case is not entirely based on the mwSuMD and the biased protein-protein interface does not rely on highresolution data (i.e. no structural model of D2R-A2AR dimer has been determined so far). Last but not least, the high intrinsic flexibility of the bi-valent ligand adds further indetermination to the computational experiment. Being too speculative, the case-study does not serve to model validation.

      We thank the Reviewer for the suggestion. In the current revised form, the manuscript focuses on AVP binding/unbinding to V2R and the GLP-1R activation, Gs recognition and GDP release.

      While eliminating the three case studies mentioned above, the remaining ones should be described more extensively and clearly, highlighting the most productive setup for each system. Incidentally, listing the performance parameters (e.g. distribution mode and minimum RMSD) of each simulation setting in Table S1 is worth doing.

      More accuracy in the methodological description is needed.

      As for the supervised metrics, the rationale behind the choice of a particular index and whether it is the outcome of a number of trials must be declared and the selected indices must be better defined. Here there are a few examples.

      AVP-V2R case. It is not clear why the AVP centroids were computed on residues C1-Q4 (I suppose the Cα-atoms) and not on the Cα-atoms of the whole cyclic part (C1-C6). Along the same line, the choice of the Cα-atoms of four amino acid residues to compute the receptor binding-site centroids requires justification.

      We have amended the text to clarify that all the heavy atoms of AVP residues C1-Q4, which are anticipated to bind deep into V<sub>2</sub>R, were considered alongside V<sub>2</sub>R residues part of the peptide binding site (Cα atoms only). From our experience, the choice of including side chains or not for the definition of centroids usually does not affect the supervision output. It should only affect the output of mwSuMD simulations based on the RMSD which considers the specific relative distance from the reference. However, a benchmark of the differences produced by divergent selections is beyond the scope of the present work.

      GLP1R case. The statement: "Since the opening of TM1-ECL1 was observed in two replicas out of four, we placed the ligand in a favorable position for crossing that region of GLP-1R" is rather weak as a strategy to manually (?) define the input position of the ligand.

      As stated in the manuscript, placing the agonist in that position was driven by preliminary 8 μs of classic MD simulations that pointed out the possible path for binding.  We agree with the Reviewer that there is still some degree of arbitrarity in it and for this reason, we have not presented structural details of the F06882961 binding path.

      As for the supervised metrics, what does it mean "the distance between the ligand and GLP-1R TM7 residues L3797.34-F3817.36"? Was the distance computed between ligand and L379-F381 centroids? Also: "In the supervised stages, the distance between residues M386-L394 Gas of helix 5 (α5) and the GLP-1R intracellular residues R1762.46, R3486.37, S3526.41, and N4057.60 was monitored" was it an inter-centroid distance? Furthermore, "supervising the distance between AHD residues G70-R199 Gas and K300-L394Gas" was it the distance between the centroid of the AHD and the centroid of the C-terminal half of the Ras-like domain? In general, when more than two atoms are involved in distance calculation, please, specify if the distance is inter-centroid.

      Also: "During the third phase, the RMSD of PF06882961, as well as the RMSD of ECL3 (residues A3686.57-T3787.33, Ca atoms), were supervised" was the RMSD computed without superimposing the ligand to estimate its roto-translations?

      We have added details about the selections used for computing centroids throughout the methods section. For example, all the heavy atoms of F06882961 and the Ca atoms of L379-F381 were considered. RMSD values during GLP-1R activation were computed after superimposition on TM2, ECL1, and TM3 residues 170-240 (Ca atoms). This now has been specified in the text.

      The authors considered the 7LCJ GLP1R-danuglipron complex as a fully active reference state instead of considering the receptor from a ternary complex with Gs. The ternary complex (7LCI) was indeed considered as a reference only in simulations of receptor-G protein recognition. 

      7LCJ and 7LCI are both fully active states. The main difference is that in 7LCJ, Gs coordinates were not deposited. Indeed, their RMSD computed on the TMD Ca atoms and F06882961 is 0.63 Å and 0.54 Å, respectively.

      Most importantly, the ternary complex chosen by the authors is not adequate as a reference for simulating the "opening" of the AHD because it bears a miniGs, hence, missing the AHD. In that framework, such an opening is rather vague and was not properly supervised by mwSuMD. The authors must repeat metrics supervisions by using, as a reference, the 6X1A ternary complex, which bears a displaced AHD. This would likely lead to a different path of GDP release.

      To the best of our knowledge, there is no evidence that a specific open conformation of the AHD is linked to GDP release. In support, we note that in GPCR ternary complexes, the AHD is usually not modelled because of its high flexibility. The only body of evidence we are aware of is that AHD must open up to allow GDP release. For this reason,  we decided to supervise the distance between AHD and the Ras domain without using a reference.

      In the statement: "The AHD opening was simulated starting from the best GLP-1R:Gs binding mwSuMD replica" the definition "best binding" requires clarification.

      This has been amended, specifying that Replica 2 was considered the “best replica” due to the closed deviation to the cryoEM structure.

      As for the case study on β2-AR-Gs recognition, I strongly suggest to eliminate it. However, I'd like to make some comments. The sentence: "the adrenergic β2 receptor (b2 AR) in an intermediate active state was downloaded from GPCRdb (https://gpcrdb.org/)" is vague as it does not indicate what intermediate active state structure was used. Since the goal of the case study was to probe the method in simulating receptor-G protein binding, it would have been better to start with a fully active state of the receptor like the 4LDO structure, employed by the authors only to extract epinephrine.

      mwSuMD is designed to provide insights into structural transitions. We started from an intermediate active state of β2-AR in complex with adrenaline because resembling the most populated state stabilised by a full agonist according to NMR studies (DOI:10.1016/j.cell.2015.08.045); the fully-active β2-AR conformation is stabilized only after Gs binding. However, following the Reviewer’s suggestion, we have reduced the presented results for the β2-AR-Gs recognition.

      Also in this case, it is not clear if the supervised receptor-G protein distance is between the centroid of the whole 7-helix bundle and the centroid of Gs α5. It is not clear why the TM6 RMSD concerned only the cytosolic end of the helix and did not include the kink region. With that selection, to estimate the outward displacement, RMSD should have been computed without superimposing the considered portion (once all remaining Cα-atoms of the receptors are superimposed).

      As the Reviewer pointed out above, some knowledge of the system is required to set up mwSuMD. Using more generic metrics as we did in this case, like the distance between the whole TMD and Gs α5 represents a general approach applicable to other GPCRs, that should allow orthogonal metrics to evolve independently from the supervision.

      As now specified in the text, the superimposition for RMSD calculation was performed on residues 40 to 140 Ca atoms, hence not considering TM6.

      As for the A1R-Gi recognition, as already stated, I strongly suggest eliminating it. However, I'd like to add some comments. I would discourage the employment of an AlphaFold model for simulations deputed to model validation in general and, in particular, when highresolution structures are available. In this case, the authors would have used the 1GP2 structure of heterotrimeric Gi no matter if from the rat species.

      Following the Reviewer’s suggestion, we have dramatically reduced the results presented for the A1R-Gi recognition. We considered 1GP2 for the simulations but H5 lacks the Cterminal six residues and therefore some extent of modelling was still necessary. However, we take the Reviewer’s comment on board and consider it for future work.

      Also, the palmitoylation and geranylgeranylation process is quite tortuous and it is not clear why the NVT ensemble was employed in the second stage of equilibration. This is reflected also on the GLP1R case study.

      We have amended the text to clarify this passage. The second NVT stage is required for stabilizing the G protein and its orientation in the simulation box. The figure below shows that a plateau of the Ca RMSD during the NVT step was reached after 700 ns for both Gi (black) and Gs (orange).

      Author response image 1.

      Here, it is not clear if the RMSD of α5 of Gi was computed with or without superposition.

      The RMSD of α5  was computed after superimposing on A<sub>1</sub>R residues 40-140 Ca atoms (the less flexible region of the receptor). We have now amended the text to report this information. 

      Reviewer #3 (Recommendations For The Authors):  

      Points to address:

      (1) Root Mean Square Deviation (RMSD) data are often reported as minimum values. It would be useful to provide the average value along the stable part of the trajectories. From the plots in Figure 2ab, it seems that the minimum values reported in the paper are very far from the average ones and thus represent special cases that are seldom reached during simulation. The authors should clarify this point;

      For the revised manuscript, we moved Figure 2 to the supplementary material and added average RMSD values for the most notable replicas in Figures 4e and S8a,b. As a reference, in the text, we now report RMSDs from our previous classic MD simulations (https://doi.org/10.1038/s41467-021-27760-0) of Gs:GLP-1R cryoEM structure (G<sub>α</sub> = 6.18 ± 2.40 Å; G<sub>β</sub> \= 7.22 ± 3.12 Å; G<sub>γ</sub> = 9.30 ± 3.65 Å) which show how flexible G proteins bound to GPCRs are and give better context to the RMSD values we measured during mwSuMD simulations.

      (2) The RMSD values reported in the paper always refer to single molecules or proteins. It would be useful to also report the RMSD computed over the whole complexes (ligand/GPCR or GPCR/G protein). It would provide a better metric for understanding the general distance between the results and the reference experimental structures;

      We have now removed the results sections for A<sub>1</sub>R and β<sub>2</sub> AR to focus on GLP-1R, whose RMSD is analyzed in detail in Figures 2, 3 and 4.

      (3) A number of computational works investigated the GPCR/G protein interaction and these studies should be cited and discussed. Examples are the works from Mafi et al. 2023 (doi: 10.1038/s41557-023-01238-6), Fleetwood et al. 2020 (doi: 10.1021/acs.biochem.9b00842), Calderon et al. 2023 and 2024 (doi: 10.1021/acs.jcim.3c00805 and doi: 10.1021/acs.jcim.3c01574), Maria-Solano and Choi 2023 (doi: 10.7554/eLife.90773.1), Mitrovic et al. 2023 (doi: 10.1021/acs.jpcb.3c04897), and D'Amore et al. 2023 (doi: 10.1101/2023.09.14.557711). Many of these works focused on the activation of B2AR and the interaction with its G protein. In addition, Maria-Solano and Choi 2023 and D'Amore et al. 2023 also characterized the rotation of TM6 during the A1R and A2AR activation. Therefore, the claim "To the best of our knowledge, this is the first time an MD simulation captures the TM6 rotation upon receptor activation as results reported so far are largely limited to the TM6 opening and kinking55." is untimely;

      We thank the Reviewer for the suggested references. We have added them to the introduction as examples of energy-biased (Calderon et al. 2023 and 2024, Maria-Solano and Choi, Mitrovic et al., D'Amore et al) or adaptive sampling (Fleetwood et al) approaches to GPCR. Since the above articles focus on β<sub>2</sub>  AR and A<sub>1</sub>R, we do not discuss them in detail because the results sections for A<sub>1</sub>R and b<sub>2</sub> AR have been drastically reduced in the manuscript.

      We note that among the suggested references, only Mafi et al report about a simulated G protein (in a pre-formed complex) and none of the work sampled TM6 rotation without input of energy. However, we have removed the claim from the text.   

      (4) In the discussion section, the authors claim that a distance-based approach can be employed when the structural data of the endpoints is limited. However, the results obtained from the distance-based protocol during the validation of the approach, which was done using V2R as a reference, are unsatisfying, as acknowledged by the authors themselves. For instance, the RMSD mode value reported for the AVP C alpha atoms with respect to 7DW9 is high, 0.7 nm, whereas the minimum value is 0.38 nm. In addition, some side chains are not oriented in the experimental conformation and might have a different interaction pattern with the receptor if compared with the experimental structure. Considering that in this case the endpoint is known, it is plausible that the performance of the method would degrade even further when data about the target structure is limited. In a real case scenario, the ligand binding mode is unknown and in such a case no RMSD matrix can be used. This represents the major concern of this study that is no prediction is provided, but only - rather inaccurate - reproduction of the known structural data;

      The goal of the first part of the work was to compare mwSuMD to SuMD to justify its application on ligand binding using a challenging case study like vasopressin. The general validation of the parent method SuMD as a predictive tool for ligand binding mode has been extensively reported over the years (a few examples: https://doi.org/10.1021/ci400766b ; https://doi.org/10.1021/acs.jcim.5b00702 ; https://doi.org/10.1038/s41598-020-77700-z) and fell beyond the scope of this work. 

      (5) In the discussion, the authors write "A complete characterization of the possible interfaces between GPCR monomers, which falls beyond the goal of the present work, should be achieved by preparing different initial unbound states characterized by divergent relative orientations between monomers to dynamically dock." It would be useful for the reader to refer to and cite here advanced computational approaches that allow a comprehensive sampling of GPCR dimerization independently from the starting conformation of the receptors. One example is coarse-grained metadynamics as shown in doi: 10.1038/s41467-023-42082-z;

      The A<sub>2A</sub/D<sub>2</sub receptors dimerization has been removed from the manuscript. 

      (6) In many cases, it is not reported how residues missing from the experimental structures used to model the proteins were reconstructed. This information is important, considering that the authors comment on the results of their calculations on addressing these regions, such as in the case of B2AR. Furthermore, the authors did not report how their initial models were validated. The authors should also explain why they did not model the IC loops of A2AR and D2R;

      In the current version of the manuscript, for V2R ECL2 and GLP-1R, we specify that we produced 10 solutions with Modeller and considered the best one in terms of the DOPE score. 

      The only receptor model used,  β<sub>2</sub> AR, is now presented as preliminary data focusing on Gs and avoiding any structural detail of the Gs recognition. 

      As reported above the A2A-D2 dimerization has been removed from the manuscript.

      (7) In several cases, the authors state that residues never investigated before play an important role in the interaction between different proteins. An example is provided on page 6 for the B2AR/G protein association. Since this claim is quite significant, it would benefit from validation, at least for further calculations such as in silico mutagenesis studies. Another example is at the end of page 10 where the authors report a hidden interaction between D344 and R385 that is pivotal for Gs coupling by GLP-1R. Is there other evidence supporting this result (previously reported literature data, conservation rate of these residues, etc.)?;

      We have removed the supplementary table reporting B2AR/G protein interactions to reduce speculations and added a reference that reports GLP-1 EC50 reduction upon mutation of position 344 to Ala (https://doi.org/10.1021/acscentsci.3c00063).

      (8) The authors should provide a deeper discussion about the conformational rearrangement of GPCR and G protein during the coupling. In detail, the conformational changes of microswitich amino acids of GPCR (e.g., PIF, NPxxY, inactivating ionic lock) and alpha helix 5 of G proteins should be discussed in relation to the literature data and experimental structures;

      We have removed the A1R and b2 AR results to focus on GLP-1R. Key structural motifs in the polar central network and TM6 kink are analyzed more in detail in Figure 3.

      (9) The chronology of the conformational changes of GLP-1R is arbitrarily chosen. During the simulation, the RMSD values reported in Fig. 3 are high and do not demonstrate the full accomplishment of the simulation of the activation process of the receptor;

      We agree with the Reviewer that the GLP-1R inactive to active transition was not fully accomplished, compared to other work on class A GPCRs.  Unlike class A, class B GPCRs represent a challenging system to work with in silico because inactive starting conformations (e.g 6LN2) are extremely distant from the active one (e.g 7LCJ, 7LCI or 6X18), as demonstrated in Figure S6 for GLP-1R. Here we report the first attempt to model a class B GPCR activation mechanism starting from the inactive state, and even if not fully achieved, we believe it represents state-of-the-art simulations for this class of receptors.

      (10) It would be helpful for the reader not familiar with the employed technique that the authors explain in one sentence in the main text the pros and cons of using multiple walkers instead of single walker SuMD;

      We thank the Reviewer for the excellent suggestion. In the Discussion, we have now commented that: “more extensive sampling obtainable by seeding multiple parallel short simulations instead of a single simulation for batch”, while in the Methods we explain that “mwSuMD is designed to increase the sampling from a specific configuration by seeding user-decided parallel replicas (walkers) rather than one short simulation as per SuMD. Since one replica for each batch of walkers is always considered productive, mwSuMD gives more control than SuMD on the total wall-clock time used for a simulation. On the flip side, mwSuMD requires multiple GPUs to be the most effective, although any multi-threaded GPU can run more walkers on the same hardware keeping the sampling variety.”.

      Minor points to address:

      (11) Page 3: the following sentence is duplicated (also found on page 2) "GPCRs preferentially couple to very few G proteins out of 23 possible counterparts";

      (12) Page 20: Figure S13 refers to the QM validation of PF06882961 torsional angle, not to the image of the receptor conformational changes, which is instead Figure S14 (please correct figure caption).

      We thank the Reviewer for the accurate reading of the manuscript. These typos have been corrected.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This important study convincingly shows that the less common D-serine stereoisomer is transported in the kidney by the neutral amino acid transporter ASCT2 and that it is a noncanonical substrate for sodium-coupled monocarboxylate transporter SMCTs. With a multihierarchical approach, this important study further shows that Ischemia-Reperfusion Injury in the kidney causes a specific increment in renal reabsorption carried out, in part, by ASCT2.

      Public Reviews:

      Reviewer #1 (Public Review):

      Most amino acids are stereoisomers in the L-enantiomer, but natural D-serine has also been detected in mammals and its levels shown to be connected to a number of different pathologies. Here, the authors convincingly show that D-serine is transported in the kidney by the neutral amino acid transporter ASCT2 and as a non-canonical substrate for the sodium-coupled monocarboxylate transporter SMCTs. Although both transport D-serine, this important study further shows in a mouse model for acute kidney injury that ASCT2 has the dominant role.

      Strengths:

      The paper combines proteomics, animal models, ex vivo transport analyses, and in vitro transport assays using purified components. The exhaustive methods employed provide compelling evidence that both transporters can translocate D-serine in the kidney.

      Weakness:

      In the model for acute kidney injury, the SMCTs proteins were not showing a significant change in expression levels and were rather analysed based on other, circumstantial evidence. Although its clear SMCTs can transport D-serine its physiological role is less obvious compared to ASCT2.

      We greatly value the reviewer's efforts and feedback in reviewing our manuscript. We acknowledge the reviewer's observation that the changes indicated by our proteomic results are not markedly pronounced. To reinforce our findings, we have incorporated an analysis of gene alterations at the single-cell level (snRNA-seq) from the publicly accessible IRI mouse model data (Figure supplement 7). The snRNA-seq data align with our proteomic data in terms of the general trend of gene/protein alterations, but reveal more substantial changes in both ASCT2 and SMCTs. These discrepancies might stem from the different quantification methods used, suggesting a possible underestimation in our label-free proteomic quantification. The differences we see between the functional changes in transporters and their quantification in proteomics can be explained by the unique challenges posed by membrane proteins. Post-translational modifications and the complex nature of multiple transmembrane domains often impact the accurate measurement of these proteins in proteomic studies. This complexity can lead to a mismatch between the actual functional changes occurring in the transporters and their perceived abundance or alterations as detected by proteomic methods (Figure 4A) (Schey KL et al. Biochemistry 2015, doi: 10.1021/bi301604j). However, this label-free quantitative proteomics approach is well-suited for our study, given its screening efficiency, compatibility with animal models, and the absence of a labeling requirement. We may consider incorporating alternative quantitative proteomic methods in future for a more thorough comparison. We have included these considerations in lines 351-356 of the revised manuscript.

      Manuscript lines 351-356

      “When evaluating the extent of gene/protein alterations between the control and IRI conditions, we observed that the gene alterations of both Asct2 and Smcts, as revealed by snRNAsequencing, are more pronounced than the protein alteration ratios obtained from proteomics. This discrepancy may stem from difficulty in the quantification method, especially for membrane transport proteins in label-free quantitative proteomics.”

      Regarding the roles of ASCT2 and SMCTs in renal D-serine transport, snRNA-seq showed that ASCT2 expression in the controls is less than 10% of the cell population. We suggest that ASCT2 contributes to D-serine reabsorption because of its high affinity and SMCTs (SMCT1 and SMCT2) would play a role in D-serine reabsorption in the cells without ASCT2 expression. In addition, we included other factors (the turnover rate and the presence of local canonical substrates) that may determine the capability of D-serine reabsorption. We have included this suggestion in the Discussion lines 386-404.

      Manuscript lines 386-404

      “Kinetics analysis of D-serine transport revealed the high affinity by ASCT2 (Km 167 µM) (Foster et al., 2016) and low affinity by SMCT1 (Km 3.39 mM; Figure 5E). In addition to transport affinity, the expression levels and co-localization of multiple transporters within the same cells are critical for elucidating the physiological roles of transporters or transport systems (Sakaguchi et al., 2024). In our proteome data, the chromatogram intensities of Smct1 (2.9 x 109 AU) and Smct2 (1.6 x 108 AU) were significantly higher than that of Asct2 (1.5 x 107 AU) in control mice (Table 1: abundance in Sham). While direct intensity comparisons between different proteins in mass spectrometry analyses are not precise, they can provide a general indication of relative protein amounts. This finding aligns with the snRNA-seq data, where Asct2 expression was found to be minimal, present in less than 10% of cell populations under both control and IRI conditions, suggesting that many cells do not express Asct2. Conversely, Smct1 and Smct2 show high and ubiquitous expression in control conditions, but their levels are markedly reduced in IRI conditions (Figure supplement 7). Our ex vivo assays demonstrate that both ASCT2 and SMCTs mediate D-serine transport (Figure 7B). Consequently, Asct2 may contribute to D-serine reabsorption due to its high affinity, whereas Smcts, owing to their abundance, particularly in cells lacking Asct2, likely play a significant role in D-serine reabsorption. Moreover, factors such as transport turnover rate (Kcat) and the presence of local canonical substrates are also vital in defining the overall contribution of Dserine transport systems.”

      Reviewer #2 (Public Review):

      Summary:

      The manuscript "A multi-hierarchical approach reveals D-1 serine as a hidden substrate of sodium-coupled monocarboxylate transporters" by Wiriyasermkul et al. is a resubmission of a manuscript, which focused first on the proteomic analysis of apical membrane isolated from mouse kidney with early Ischemia-Reperfusion Injury (IRI), a well-known acute kidney injury (AKI) model. In the second part, the transport of D-serine by Asct2, Smct1, and Smct2 has been characterized in detail in different model systems, such as transfected cells and proteoliposomes.

      Strengths:

      A major problem with the first submission was the explanation of the link between the two parts of the manuscript: it was not very clear why the focus on Asct2, Smct1, and Smct2 was a consequence of the proteomic analysis. In the present version of the manuscript, the authors have focused on the expression of membrane transporters in the proteome analysis, thus making the reason for studying Asct2, Smct1, and Smct2 transporters more clear. In addition, the authors used 2D-HPLC to measure plasma and urinary enantiomers of 20 amino acids in plasma and urine samples from sham and Ischemia-Reperfusion Injury (IRI) mice. The results of this analysis demonstrated the value of D-serine as a potential marker of renal injury. These changes have greatly improved the manuscript and made it more convincing.

      We deeply appreciate the reviewer’s comments on the manuscript. We have responded to the recommendations one by one in the later section.

      Reviewer #3 (Public Review):

      Summary:

      The main objective of this work has been to delve into the mechanisms underlying the increment of D-serine in serum, as a marker of renal injury.

      Strengths:

      With a multi-hierarchical approach, the work shows that Ischemia-Reperfusion Injury in the kidney causes a specific increment in renal reabsorption of D-serine that, at least in part, is due to the increased expression of the apical transporter ASCT2. In this way, the authors revealed that SMCT1 also transports D-serine.

      The experimental approach and the identification of D-serine as a new substrate for SMCT1 merit publication in Elife.

      The manuscript also supports that increased expression of ASCT2, even together with the parallel decreased expression of SMCT1, in renal proximal tubules underlies the increased reabsorption of D-serine responsible for the increment of this enantiomer in serum in a murine model of Ischemia-Reperfusion Injury.

      Weaknesses:

      Remains to be clarified whether ASCT2 has substantial stereospecificity in favor of D- versus L-serine to sustain a ~10-fold decrease in the ratio D-serine/L-serine in the urine of mice under Ischemia-Reperfusion Injury (IRI).

      It is not clear how the increment in the expression of ASCT2, in parallel with the decreased expression of SMCT1, results in increased renal reabsorption of D-serine in IRI.

      We thoughtfully appreciate the reviewer’s comment on the manuscript. Considering the alteration of D-/L-serine ratios, there are several factors including protein expression levels at both apical and basolateral sides, properties of the transporters (e.g. transport affinities, substrate stereoselectivities), and the expression of DAAO (D-amino acid oxidase) which selectively degrades D-amino acids. Moreover, the mechanism becomes more complicated when the transport systems of L- and D-enantiomers are different and have distinct stereoselectivities as in the case of serine. Future studies are required to complete the mechanism. However, we would like to explore the mechanism based on the current knowledge.

      From this study, we identified ASCT2 and SMCTs (SMCT1 and SMCT2) as D-serine transport systems. We showed that SMCT1 prefers D-serine. Although we did not analyze ASCT2 stereoselectivity, based on the previous studies, ASCT2 recognizes both D- and Lserine with high affinities and slightly prefers L-enantiomer (Km of 18.4 µM for L-serine in oocyte expression system (Utsunomiya-Tate et al. J Biol Chem 1996) and 167 µM for Dserine in oocyte expression system (Foster et al. Plos ONE 2016), and the IC50 of 0.7 mM for L-serine and 4.9 mM for D-serine (in HEK293 expression systems, Foster et al. PLOS ONE 2016). The proteomics showed an increase of ASCT2 (1.6-fold increase) and a decrease of SMCTs (1.7-fold decrease in SMCT1, and 1.3-fold decrease in SMCT2) in IRI conditions. The table below summarizes D-serine transport by ASCT2 and SMCTs.

      In the case of L-serine, ASCT2 and B0ATs (in particular B0AT3) have been revealed as L-serine transport systems in the kidneys (Bröer et al. Physiol Rev 2008; Singer et al. J Biol Chem 2009). Proteomics showed that B0ATs have higher expression levels than ASCT2 supporting the idea that B0ATs are the main L-serine transport system (Table S1: Abundance of B0AT1 = 1.34E+09, B0AT3 = 2.13E+08, ASCT2 = 1.46E+07). In IRI conditions, B0AT3 decreased 1.8 fold and B0AT1 decreased 1.1 fold. From these results, we included the contribution of B0ATs in L-serine transport in Author response table 1.

      Author response table 1.

      Taken together, we suggest that high ratios of D-/L-serine in IRI conditions are a combinational result of 1) increase of D-serine reabsorption by ASCT2 enhancement and SMCTs reduction and 2) decrease of L-serine reabsorption by B0ATs. We have included this suggestion in the Discussion lines 438-451.

      Manuscript lines 438-451

      “The enantiomeric profiles of serine revealed distinct plasma D/L-serine ratio, with low rations in the normal control but elevated ratios in IRI, despite the weak stereoselectivity of ASCT2 (Figure 1B). This observation suggested differential renal handling of D-serine compared to L-serine. While we identified SMCTs as a D-serine transport system, it has been reported that L-serine reabsorption is mediated by B0AT3 (Singer et al., 2009). We propose that the alterations in plasma and urinary D/L-serine ratios are the combined outcomes of: 1) transport systems for L-serine, and 2) transport systems for D-serine. In normal kidneys, the low plasma D/L-serine ratios could result from the efficient reabsorption of L-serine by B0AT3, coupled with the DAAO activity that degrades intracellular D-serine reabsorbed by SMCTs. In IRI conditions, our enantiomeric amino acid profiling revealed low plasma L-serine and high urinary L-serine (Figure supplements 1B, 2B). Additionally, the proteomic analysis indicated a reduction in B0AT3 levels (4h IRI/sham = 0.56 fold; 8h IRI/sham = 0.65 fold; Table S1). These observations suggest that the low L-serine reabsorption in IRI is a result of B0AT3 reduction.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      This is a thorough study that was reviewed previously under the old system. I think the authors have strengthened their findings and have no further suggestions.

      We appreciate reviewer 1 for his/her effort and comments, which greatly contributed to improving this manuscript.

      Reviewer #2 (Recommendations For The Authors):

      The experiments seem to me to have been well performed and the data are readily available.

      Weaknesses:

      More than weakness I would speak of discussion points: I have a few suggestions that may help to make the paper more accessible to a general audience.

      (1) In the Introduction, when the authors introduce the term "micromolecules", it would be beneficial to provide a precise definition or clarification of what they mean by this term. Adding a brief explanation may help the reader to better understand the context.

      Following the reviewer’s comment, we have included the explanation of the micromolecule and membrane transport proteins in lines 41-43.

      Manuscript lines 41-43

      “Membrane transport proteins function to transport micromolecules such as nutrients, ions, and metabolites across membranes, thereby playing a pivotal role in the regulation of micromolecular homeostasis.”

      (2) In line 91, I suggest specifying that this is a renal IRI model.

      Following the reviewer’s comment, we have added the information that it is a renal IRI model of AKI (lines 90-92).

      Manuscript lines 90-92

      “We applied 2D-HPLC to quantify the plasma and urinary enantiomers of 20 amino acids of renal ischemia-reperfusion injury (IRI) mice, a model of AKI and AKI-to-CKD transition (Sasabe et al., 2014; Fu et al., 2018).”

      (3) Lines 167-168 state that Asct2 is localised to the apical side of the renal proximal tubules. Is there any expression of Asct2 in other nephron segments?

      To our knowledge, there is no report of ASCT2 expression in other nephron segments. Our immunofluorescent data of the ASCT2 staining in the whole kidney at the low magnification and another region of Figure 3 (below) as well as immunohistochemistry from Human Protein Atlas (update: Jun 9th, 2023) did not show a strong signal of ASCT2 expression in other regions besides the proximal tubules. Thus, we conclude that ASCT2 is mainly expressed in proximal tubules, but not in other nephron regions.

      Author response image 1.

      (4) Lines 225-226: Have the authors expressed the candidate genes in HEK293 cells with ASCT2 knockdown?

      This experiment was done by expressing the candidate genes in the presence of endogenous ASCT2. We have added the information in lines 225-227 to emphasize this process.

      Manuscript lines 225-227

      “Based on this finding, we utilized cell growth determination assay as the screening method even in the presence of endogenous ASCT2 expression. HEK293 cells were transfected with human candidate genes without ASCT2 knockdown.”

      (5) Lines 254-255: why was D-serine transport enhanced by ASCT2 knockdown in FlpInTRSMCT1 or 2 cells?

      We appreciate the reviewer to point out this data. We apologize for causing the confusion in the text. The total amount of D-serine uptake in the cells did not enhance but the net uptake (uptake subtracted from the background) was increased. This enhancement is a result of the lower background by ASCT2 knockdown. We have revised the texts and explained this result in more detail (lines 256-258).

      Manuscript lines 256-258

      “In the cells with ASCT2 knockdown, the background level was lower, thereby enhancing the D-[3H]serine transport contributed by both SMCT1 and SMCT2 (the net uptake after subtracted with background) (Figure 5C).”

      (6) Line 265: The low affinity of SMCT1 for D-serine alone makes it an unlikely transporter for urinary D-serine.

      We admitted the reviewer’s concern about the low affinity of SMCT1. However, Km at mM range is widely accepted for several low-affinity amino acid transporters such as proton-coupled amino acid transporter PAT1 (Km = 2 – 5 mM; Miyauchi et al. Biochem J 2010), cationic amino acid transporter CAT2A (Km = 3 – 4 mM; Closs et al. Biochem 1997), and large-neutral amino acid transporter LAT4 (Km = 17 mM; Bodoy et al. J Biol Chem 2005). In the kidneys, many compounds are well-known to be reabsorbed by the low-affinity but high-capacity (high-expression) transporters. Similarly, D-serine was reported to be reabsorbed by the low-affinity transporter (Kragh-Hansen and Sheikh, J Physiol 1984; Shimomura et al. BBA 1988; Silbernagl et al. Am J Physiol Renal Physiol 1999). Moreover, amino acid profile showed urinary D-serine in the range of 100 – 200 µM (Figure supplement 2). This concentration range could drive SMCT1 function (Figure 5). Combined with the high and ubiquitous expression of SMCT1, we propose that SMCT1 is a low-affinity but highcapacity D-serine transporter in the kidneys.

      snRNA-seq is a method that can directly compare the expression levels between different genes within the same cells. From Figure supplement 7, expression of SMCT1 is much more abundant than ASCT2. ASCT2 was presented in less than 10% of cell population. It is possible that 90% of the cells that do not express ASCT2 use SMCT1 to reabsorb Dserine.

      We have revised the Discussion regarding this comment (lines 386-404).

      Manuscript lines 386-404

      “Kinetics analysis of D-serine transport revealed the high affinity by ASCT2 (Km 167 µM) (Foster et al., 2016) and low affinity by SMCT1 (Km 3.39 mM; Figure 5E). In addition to transport affinity, the expression levels and co-localization of multiple transporters within the same cells are critical for elucidating the physiological roles of transporters or transport systems (Sakaguchi et al., 2024). In our proteome data, the chromatogram intensities of Smct1 (2.9 x 109 AU) and Smct2 (1.6 x 108 AU) were significantly higher than that of Asct2 (1.5 x 107 AU) in the control mice (Table 1: abundance in Sham). While direct intensity comparisons between different proteins in mass spectrometry analyses are not precise, they can provide a general indication of relative protein amounts. This finding aligns with the snRNA-seq data, where Asct2 expression was found to be minimal, present in less than 10% of cell populations under both control and IRI conditions, suggesting that many cells do not express Asct2. Conversely, Smct1 and Smct2 show high and ubiquitous expression in control conditions, but their levels are markedly reduced in IRI conditions (Figure supplement 7). Our ex vivo assays demonstrate that both ASCT2 and SMCTs mediate D-serine transport (Figure 7B). Consequently, Asct2 may contribute to D-serine reabsorption due to its high affinity, whereas Smcts, owing to their abundance, particularly in cells lacking Asct2, likely play a significant role in D-serine reabsorption. Moreover, factors such as transport turnover rate (Kcat) and the presence of local canonical substrates are also vital in defining the overall contribution of Dserine transport systems.”

      (7) Line 316: The authors state that there is a high tubular D-serine reabsorption in IRI and in line 424 there is an inactivation of DAAO during the pathology. This suggests that there is a reabsorption of D-serine mediated by a transport system in the basolateral membrane domain of proximal tubular cells. Do the authors have any information about this transporter?

      We agree with the reviewer that transporters at the basolateral membrane are important to complete the D-serine reabsorption in the kidney, and have included this issue in the original manuscript. We stated that transport systems at the basolateral side are necessary to be analyzed in order to complete the picture of D-serine transport systems in the kidney (lines 481-483 of the revised manuscript). However, we did not have any strong candidates for basolateral D-serine transport systems. Because we analyzed the proteome of BBMV, which concentrates on the apical membrane proteins, the analysis did not detect several transporters at the basolateral side.

      (8) In lines 462-463, the authors state: "It is suggested that PAT1 is less active at the apical membrane where the luminal pH is neutral". However, the pH of urine in the proximal tubules is normally acidic due to the high activity of NH3. I suggest rewording this sentence.

      Thank you for your comment. Proximal tubule (PT) is the first and the main region to maintain acid-base homeostasis in the kidney. In PT cells, NH3 secretes H+ to titrate luminal HCO3- and creates CO2, which is absorbed into PT cells and produces "new intracellular HCO3-", which is subsequently reabsorbed into the blood. Although ion fluxes in PT is to maintain the pH homeostasis, the pH regulation in both luminal and intracellular PT cells is highly dynamic. We totally agree with the reviewer and to follow that, we have revised the text by emphasizing the pH around PT segments, rather than the final urine pH, and leaving the discussion open for the possibility of PAT1 function in PT of normal kidneys (lines 474481).

      Manuscript lines 474-481

      “PAT1, a low-affinity proton-coupled amino acid transporter (Km in mM range), has been found at both sub-apical membranes of the S1 segment and inside of the epithelia (The Human Protein Atlas: https://www.proteinatlas.org; updated on Dec 7th, 2022) (Sagné et al., 2001; Vanslambrouck et al., 2010). PAT1 exhibits optimum function at pH 5 - 6 but very low activity at pH 7 (Miyauchi et al., 2005; Bröer, 2008b). Future research is required to address the significance of PAT1 on D-serine transport in the proximal tubule segments where pH regulation is known to be highly dynamic (Boron, 2006; Nakanishi et al., 2012; Bouchard and Mehta, 2022; Imenez Silva and Mohebbi, 2022).”

      Reviewer #3 (Recommendations For The Authors):

      The authors proposed that the increased expression of ASCT2, even together with the decreased expression of SMCT1/2, causes the increased renal reabsorption of D-serine that occurs in IRI. In the discussion, the main argument to sustain this hypothesis is the higher apparent affinity for D-serine of ASCT2 (<200 uM Km) versus SMCT1 (3.4 mM Km). In the Discussion section (page 18- 1st complete paragraph), the authors indicate that the Mass Spec intensities of SMCT1 and 2 are two and one order of magnitude higher respectively than that of ASCT2. This suggests that SMCT1 is clearly more expressed than ASCT2 in control conditions. IRI increments ASCT2 protein expression in brush-border membrane vesicle from kidney 1.6 folds and decreases that of SMCT1 0.6 folds. How this fold changes, even taking into account the lower Km of ASCT2 versus SMCT1 would explain the dramatic changes in the D-/L-serine ratios in plasma and urine in IRI? The authors might discuss whether other transport characteristics, even unknown (e.g., a higher turnover rate of ASCT2 vs SMCT1), would also contribute to the higher D-serine reabsorption in IRI.

      SMCT1 shows some enantiomer selectivity for D- vs L-serine. At 50 uM concentration the transport is almost double for D. vs L-serine, but is ASCT2 stereoselective between the two enantiomers of serine? Some of the authors of this manuscript showed in a previous paper that the basolateral transporter Asc1 also participates in the accumulation of D-serine in serum caused by renal tubular damage. (Serum D-serine accumulation after proximal renal tubular damage involves neutral amino acid transporter Asc-1. Suzuki M et al. Sci Rep. 2019 Nov 13;9(1):16705 (PMID: 31723194)). Asc1 shows no stereoselectivity between L- and D-serine. Can the authors discuss possible mechanisms resulting in increased renal reabsorption of Dserine than L-serine in IRI with the participation of transporters with modest stereoselectivity for D- vs L-serine?

      We appreciate the reviewer’s comments on the degree of protein alteration in proteomics, the functional contributions of ASCT2 and SMCTs, and the alteration of D/L ratios. We have included the possibilities of the technical concerns and the discussion on the roles of ASCT2 and SMCTs as follows.

      • Regarding the expression levels, proteomics and snRNA-seq showed the same tendency that ASCT2 increase and SMCTs decrease in IRI conditions. However, the degrees of alterations are more contrast in snRNA-seq. This may be due to the difference in quantification methods and probably points out the underestimated quantification of membrane transport proteins in label-free proteomics. The accuracy of protein quantifications in the label-free proteomics are often impacted by the presence of post-translational modifications and multiple trans-membrane domains like in the case of the membrane transport proteins (Schey KL et al. Biochemistry 2015, doi: 10.1021/bi301604j). Alternative methods of quantitative proteomics may be added in the future for a more thorough comparison. We have added this issue in lines 351-356 of the revised version.

      Manuscript lines 351-356

      “When evaluating the extent of gene/protein alterations between the control and IRI conditions, we observed that the gene alterations of both Asct2 and Smcts, as revealed by snRNA-sequencing, are more pronounced than the protein alteration ratios obtained from proteomics. This discrepancy may stem from difficulty in the quantification method, especially for membrane transport proteins in label-free quantitative proteomics.”

      • For the functional contributions of ASCT2 and SMCTs in the kidney, we admitted the reviewer’s concern about the low affinity of SMCT1. Following the reviewer’s comment, we have included other factors besides transport affinities, e.g. expression levels and turnover rates of the transporters. From the results of both proteomics and snRNA-seq, ASCT2 expression is significantly lower than SMCTs in the normal conditions. snRNA-seq showed that ASCT2 was presented in less than 10% of the cell population (Figure supplement 7). We propose that most of the cells that do not express ASCT2 may use SMCT1 to reabsorb D-serine. This topic was included in the revised manuscript lines 386-404.

      Manuscript lines 386-404

      “Kinetics analysis of D-serine transport revealed the high affinity by ASCT2 (Km 167 µM) (Foster et al., 2016) and low affinity by SMCT1 (Km 3.39 mM; Figure 5E). In addition to transport affinity, the expression levels and co-localization of multiple transporters within the same cells are critical for elucidating the physiological roles of transporters or transport systems (Sakaguchi et al., 2024). In our proteome data, the chromatogram intensities of Smct1 (2.9 x 109 AU) and Smct2 (1.6 x 108 AU) were significantly higher than that of Asct2 (1.5 x 107 AU) in the control mice (Table 1: abundance in Sham). While direct intensity comparisons between different proteins in mass spectrometry analyses are not precise, they can provide a general indication of relative protein amounts. This finding aligns with the snRNA-seq data, where Asct2 expression was found to be minimal, present in less than 10% of cell populations under both control and IRI conditions, suggesting that many cells do not express Asct2. Conversely, Smct1 and Smct2 show high and ubiquitous expression in control conditions, but their levels are markedly reduced in IRI conditions (Figure supplement 7). Our ex vivo assays demonstrate that both ASCT2 and SMCTs mediate D-serine transport (Figure 7B). Consequently, Asct2 may contribute to D-serine reabsorption due to its high affinity, whereas Smcts, owing to their abundance, particularly in cells lacking Asct2, likely play a significant role in D-serine reabsorption. Moreover, factors such as transport turnover rate (Kcat) and the presence of local canonical substrates are also vital in defining the overall contribution of D-serine transport systems.”

      • As for the dramatic alterations of D/L-serine ratios juxtaposed with minimal changes in ASCT2 and SMCTs expression level, we cautiously refrain from drawing a definitive conclusion regarding the entire mechanism. This caution is grounded in the scientific understanding of a comprehensive elucidation of both L-serine transport systems and D-serine transport systems at both apical and basolateral membranes. Nevertheless, we would like to suggest a mechanism at the apical membrane based on the current knowledge.

      For D-serine transport systems, we found ASCT2 and SMCTs contributions in this study. Meanwhile, L-serine was previously reported to be mediated mainly by the neutral amino acid transporters B0AT3 (in particular B0AT3; Bröer et al. Physiol Rev 2008; Singer et al. J Biol Chem 2009). Hence, the mechanism behind the alterations of D/L-serine ratios should include B0AT3 functions as well. In IRI conditions, B0AT3 decreased 1.8 fold. We suggest that high ratios of D-/L-serine in IRI conditions are a combined outcome of 1) increase of D-serine reabsorption by ASCT2 enhancement and SMCTs reduction, and 2) decrease of L-serine reabsorption by B0AT3. We have included this suggestion in the Discussion lines 438-451.

      Manuscript lines 438-451

      “The enantiomeric profiles of serine revealed distinct plasma D/L-serine ratios, with low ratios in the normal control but elevated ratios in IRI, despite the weak stereoselectivity of ASCT2 (Figure 1B). This observation suggested the differential renal handling of D-serine compared to L-serine. While we identified SMCTs as a Dserine transport system, it has been reported that L-serine reabsorption is mediated by B0AT3 (Singer et al., 2009). We propose that the alterations in plasma and urinary D/Lserine ratios are the combined outcomes of: 1) transport systems for L-serine, and 2) transport systems for D-serine. In normal kidneys, the low plasma D/L-serine ratios could result from the efficient reabsorption of L-serine by B0AT3, coupled with the DAAO activity that degrades intracellular D-serine reabsorbed by SMCTs. In IRI conditions, our enantiomeric amino acid profiling revealed low plasma L-serine and high urinary L-serine (Figure supplements 1B, 2B). Additionally, the proteomics analysis indicated a reduction in B0AT3 levels (4h IRI/sham = 0.56 fold; 8h IRI/sham = 0.65 fold; Table S1). These observations suggest that the low L-serine reabsorption in IRI is a result of B0AT3 reduction.”

      • In the case of Asc-1, it was reported to be a D-serine transporter in the brain (Rosenberg et al. J Neurosci 2013). Suzuki et al. 2019 showed the increase of Asc-1 in cisplatin-induced tubular injury. Notably, the mRNA of Asc-1 is predominantly found in Henle’s loop, distal tubules, and collecting ducts but not in proximal tubules, and its protein expression level is dramatically low in the kidney (Human Protein Atlas: update on Jun 19, 2023). Furthermore, in this study, Asc-1 expression was not detected in the brush border membrane proteome. Consequently, we have decided not to include Asc-1 in the Discussion of this study, which primarily focuses on the proximal tubules.
    1. Author response:

      The following is the authors’ response to the original reviews.

      We thank the constructive criticism provided by the reviewers and editor. Based on these suggestions, we have thoroughly reworked the manuscript. More specifically but not limit:

      (1) We have corrected the mistakes mentioned by the reviewers on a point-by-point basis.

      (2) We have provided additional experimental evidences to explain the rationale behind selecting five miRNAs for q-PCR validation. Furthermore, we have elaborated on the reasons for focusing primarily on research related to cartilage.

      (3) In response to concerns regarding overinterpretation in the manuscript, we have made more precise descriptions and revisions. Furthermore, we have added some details in our methods, including the addition of results showing the conservation of miR-199b-5p sequences between human and mouse species.

      (4) We have provided additional details on the experiments, including the process for predicting target genes, timing of chondrocyte culture and other experimental operations.

      (5) Finally, we have made additional revisions to the details of the figures to avoid any distortions and enhance the precision of the language.

      Below please find our responses to the reviewers’ comments on a point-by-point basis. You also can track the changes in the modified manuscript. We believe that this revision has been substantially improved.

      eLife assessment

      The manuscript provides interesting evidence that miR-199b-5p regulates osteoarthritis and as such it may be considered as a potential therapeutic target. This finding may be useful to further advance the field.

      Thank you for your positive comments.

      Although the study is considered potentially clinically relevant, the evidence provided was deemed insufficient and incomplete to support the conclusions drawn by the authors.

      Thank you for your critical comments and constructive advices. We have response point to point according to the reviewers’ questions and thoroughly re-working our manuscript. We hope the revised manuscript can be qualified to the criteria and be published on the journal of eLife.

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, the authors observed that miR-199b-5p is elevated in osteoarthritis (OA) patients. They also found that overexpression of miR-199b-5p induced OA-like pathological changes in normal mice and inhibiting miR-199b-5p alleviated symptoms in knee OA mice. They concluded that miR-199b-5p is not only a potential micro-target for knee OA but also provides a potential strategy for the future identification of new molecular drugs.

      Thanks for your comment.

      Strengths:

      The data are generated from both human patients and animal models.

      Thanks for the positive comment.

      Weaknesses:

      The data presented in this manuscript is not solid enough to support their conclusions. There are several questions that need to be addressed to improve the quality of this study.

      The following questions that need to be addressed to improve the quality of the study.

      (1) Exosomes were characterized by electron microscopy and western blot analysis (for CD9, 264 CD63, and CD81). However, figure S1 only showed two sample WB results and there is no positive and negative control as well as the confused not clear WB figure.

      Thank you for your suggestion. We acknowledge that a comprehensive identification of extracellular vesicles should include both positive and negative samples. However, in some of the initial studies we referenced, the positive and negative control were not mentioned1;2. In our study, we identified extracellular vesicles using a combination of electron microscopy, nanoparticle tracking analysis, and marker detection of exosomes. We agree that having negative samples would make our results more convincing, and we will include a negative control group in our future experiments. Additionally, we have provided clearer images in the revised version. (supplemental fig1 A)

      Reference

      (1) Ying W, Riopel M, Bandyopadhyay G, et al. Adipose Tissue Macrophage-Derived Exosomal miRNAs Can Modulate In Vivo and In Vitro Insulin Sensitivity. Cell. 2017;171(2).

      (2) Fang T, Lv H, Lv G, et al. Tumor-derived exosomal miR-1247-3p induces cancer-associated fibroblast activation to foster lung metastasis of liver cancer. Nature Communications. 2018;9(1):191.

      (2) The sequencing of miRNAs in serum exosomes showed that 88 miRNAs were upregulated and 89 miRNAs were downregulated in KOA patients compared with the control group based on fold change > 1.5 and p < 0.05. Figure 2 legend did not clearly elucidate what those represent and why the authors chose those five miRNAs to further validate although they did mention it with several words in line 108 'based on the p-value and exosomal'.

      In fact, our study included two additional groups: the acupuncture treatment group (4 weeks of continuous acupuncture treatment) and the waiting treatment group (no intervention, followed by acupuncture treatment after 4 weeks), in addition to the healthy control and knee osteoarthritis (OA) patient groups. After comparing these four groups, we found that 11 genes (hsa-miR-504-3p, hsa-miR-1915-3p, hsa-miR-103a-2-5p, hsa-miR-887-3p, hsa-miR-1228-5p, hsa-miR-34c-3p, hsa-miR-3168, hsa-miR-518e-3p, hsa-miR-1296-5p, hsa-miR-338-3p, and hsa-miR-199b-5p) were upregulated in KOA patients but downregulated after acupuncture treatment, with no change in the waiting treatment group. Additionally, 7 genes (hsa-miR-448, hsa-miR-514a-3p, hsa-miR-4440, hsa-let-7f-5p, hsa-let-7a-5p, hsa-let-7d-5p, and hsa-miR-15b-3p) were downregulated in KOA patients but upregulated after acupuncture treatment, with no change in the waiting treatment group. Considering the improvement in clinical symptoms of KOA patients after acupuncture treatment, we believe that these 18 genes are of significant value. Based on overall expression abundance and species specificity, we finally selected 5 genes, namely the 5 genes mentioned in this article. Regarding this result, we have already included it in the supplementary fig5(fig. S5).

      Author response image 1.

      Venn diagram showing differentially expressed miRNAs in the OA group compared with healthy patients and patients who recovered after acupuncture treatment.

      (3) In Figure 3 legend and methods, the authors did not mention how they performed the cell viability assay. What cell had been used? How long were they treated and all the details? Other figure legends have the same problem without detailed information.

      Thank you for your suggestions. In Figure 3, cell viability was determined using the CCK-8 assay. We used second-generation chondrocytes for this analysis. The chondrocytes were obtained from young mice aged 3-5 days after birth. The cartilage tissues were extracted, and the cells were cultured in complete medium after digestion with collagenase. The detailed description of the cell viability assay, cell culture procedures, specific timing, and treatment methods of the cells used can be found in our revised manuscript. (page14-15,line304-313)

      Besides, we have made thorough revisions to all figure legends to provide a clearer explanation of the relevant content.

      (4) The authors claimed that Gcnt2 and Fzd6 are two target genes of miR-199b-5p. However, there is no convincing evidence such as western blot to support their bioinformatics prediction.

      In the current study, we first identified six potential target genes by intersecting the predicted targets obtained from six bioinformatics websites. Subsequently, q-PCR was employed to test all six genes, revealing two genes with significant changes, namely Fzd6 and Gcnt2. We then predicted the binding sites of these genes and validated their existence through luciferase assays. Moreover, we examined the expression of these two potential targets in human KOA samples using a human database and found them to be expressed specifically in the samples. These results suggest that Fzd6 and Gcnt2 are potential target genes for KOA. However, we didn’t do western blot assay to verify the results. Based on your suggestions, we have further discussed the limitations of our study in this regard and proposed future research strategies.

      (5) To verify the binding site on 3'UTR of two potential targets, the authors designed a mouse sequence for luciferase assay, but not sure if it is the same when using a human sequence.

      Thank for your great advice. We carried out the comparative analysis of sequence conservatism between human and mouse, and find the binding site on 3'UTR matches to human sequence very well. The sequence conservation between hsa_miR-199b-5p and mmu_miR-199b-5p was as high as 95.65%. We added the methods and results in the revised manuscript. (page9, line181-184; page17, line361-365) (supplemental fig6).

      In detail: Firstly, the sequence information of mmu_miRNA-199b-5p was used to locate the human homologous sequence in the UCSC database. The homologous sequence was found to be located in the human genome at chr9:128244721-128244830 (supplemental fig6 A). Based on this positional information and the source gene, a further comparison was conducted in miRbase to identify the nearest miRNA at the position of the human genome. It was discovered that hsa_miR-199b-5p is positionally conserved and located at chr9:128244721-128244830 (supplemental fig6 B). The sequence of hsa_miR-199b-5p was obtained from the miRbase database (supplemental fig6 C), and a comparative analysis was performed between the sequences of humans and mouse (supplemental fig6 D). Besides being positionally conserved, the sequence conservation between hsa_miR-199b-5p and mmu_miR-199b-5p was as high as 95.65%, indicating a good sequence conservation.

      Author response image 2.

      (A) By using the sequence information of mmu_miRNA-199b-5p, we located the position of its human homologous sequence in the UCSC database. (B) Based on the positional information and the source gene, we further aligned this position with the closest miRNA in miRbase. (C) We compared the sequences of hsa_miR-199b-5p and mmu_miR-199b-5p. (D) Conservation analysis was performed to compare the sequence conservation of miR-199b-5p.

      Reviewer #2 (Public Review):

      Summary:

      The authors identified miR-199b-5p as a potential OA target gene using serum exosomal small RNA-seq from human healthy and OA patients. Their RNA-seq results were further compared with publicly available datasets to validate their finding of miR-199b-5p. In vitro chondrocyte culture with miR-199b-5p mimic/inhibitor and in vivo animal models were used to evaluate the function of miR-199b-5p in OA. The possible genes that were potentially regulated by miR-199b-5p were also predicted (i.e., Fzd6 and Gcnt2) and then validated by using Luciferase assays.

      We greatly appreciate Reviewer #2 constructive comments.

      Strengths:

      (1) Strong in vivo animal models including pain tests.

      (2) Validates the binding of miR-199b-5p with Fzd6 and binding of miR-199b-5p with Gcnt2.

      Thanks for positive comment.

      Weaknesses:

      (1) The authors may overinterpret their results. The current work shows the possible bindings between miR-199b-5p and Fzd6 as well as bindings between miR-199b-5p and Gcnt2. However, whether miR-199b-5p truly functions through Fzd6 and/or Gcnt2 requires genetic knockdown of Fzd6 and Gcnt2 in the presence of miR-199b-5p.

      In this study, we employed a comprehensive approach by integrating data from six bioinformatics databases to identify potential target genes for miR-199b-5p. Subsequent qPCR analysis revealed significant changes in two genes, Fzd6 and Gcnt2. We then utilized luciferase assays to validate the predicted binding sites and confirmed the interaction between miR-199b-5p and these genes. Additionally, we examined the expression profiles of these potential target genes in human KOA samples using a human database, which unveiled distinct expression patterns.

      While our findings suggest that Fzd6 and Gcnt2 may serve as potential target genes for miR-199b-5p, we acknowledge the necessity for further experimental validation and in-depth functional characterization. Building upon your insightful recommendations, we have thoroughly addressed the research limitations and proposed potential research strategies for future investigations in our discussion. (page11,line227-231)

      (2) In vitro chondrocyte experiments were conducted in a 2D manner, which led to chondrocyte de-differentiation and thus may not represent the chondrocyte response to the treatments.

      We admit that 3D culture system will be more accurate and reliable. However, according to Liu Qianqian et al researches3, the 2D culture systems were also used and work well. Besides, the second-generation primary mice chondrocytes we used in the current study did not exhibit a significant dedifferentiated morphology. So, considering the experiment condition in our lab, we chose the second-generation cultured primary mouse chondrocytes in the whole process of cell experiment. To show the reliability of the cells, we provided more pictures in the supplement fig 7(fig. S7) In the future study, we will adopt 3D culture system for experiments. Thank you for your advices and we have added this limitation in the revised manuscript. (page11,line237-240)

      Author response image 3.

      Primary mice chondrocytes we cultured (P1)and the secondary generation cells(P2) we used in the following experiment.

      References which used 2D :

      (3) Liu Q, Zhai L, Han M, et al. SH2 Domain-Containing Phosphatase 2 Inhibition Attenuates Osteoarthritis by Maintaining Homeostasis of Cartilage Metabolism via the Docking Protein 1/Uridine Phosphorylase 1/Uridine Cascade. Arthritis & Rheumatology (Hoboken, NJ). 2022;74(3):462-474.

      (3) There is a lack of description for bioinformatic analysis.

      Sorry for our neglection. We have added relevant descriptions and details. (Pages 14, line299-303)

      (4) There are several errors in figure labeling.

      We have revised. (Fig. 3, Fig. 4, Fig. 5 and Fig. 7)

      Recommendations for the authors:

      We appreciate the reviewers' feedback as we believe it has significantly contributed to the refinement of our manuscript. We are confident that our revisions have strengthened the quality and impact of our study, and we agree that the suggestions presented by the reviewers are valuable and appropriate for publication.

      Reviewer #2 (Recommendations For The Authors):

      I would like to thank the authors for investigating the functional role of miR-199b-5p in knee OA. While this study has the potential to provide valuable knowledge to the fields of miRNAs and joint diseases, significant improvements in several areas are required.

      We appreciate your constructive comments, and we have made a substantial improvement to the manuscript. We thank all the reviewers for their advice as well as their criticisms.

      Major concerns:

      (1) According to the Authors, miR-199b-5p is identified by the results from their own miRNA-sequencing as well as comparison with other publicly available datasets (both synovium and cartilage datasets). It is unclear to me why the synovium dataset was used here as it appears that the entire manuscript was mainly focused on chondrocytes.

      Thank you for your question. As we are aware, cartilage degradation is the initial pathological change in knee osteoarthritis (KOA), which subsequently leads to other pathological changes such as synovial inflammation4. These factors are interrelated, and current research on KOA encompasses cartilage, synovium, and system inflammation et al. Therefore, when we identified a large number of dysregulated miRNAs in extracellular vesicles isolated from serum, it was crucial to determine whether these dysregulated miRNAs were also altered in cartilage or synovium. To address this, we compared our findings with publicly available databases and found a higher overlap with the cartilage cell dataset, including miRNA-199b. Consequently, we decided to focus our subsequent investigations on cartilage-related research.

      Reference

      (4) Hunter D, Bierma-Zeinstra S. Osteoarthritis. Lancet (London, England). 2019;393(10182):1745-1759.

      (2) Also, 169 of 177 differentially expressed exosome miRNAs were intersected with differentially expressed miRNAs from OA cartilage datasets. It is surprising that in the 5 selected miRNAs for further qRT-PCR validation, 3 out of 5 were not in the exosome miRNA dataset (i.e., hsa-mir-1296-5p, hsa-mir-15b-3p, and hsa-mir-338-3p; page 5, line 109 and Fig. 1B). Isn't that selecting the miRNAs that both differently expressed in exosome and cartilage datasets for validation more essential? Furthermore, from the Authors' exosome miRNA dataset, only 5 out of 15 KOA patients actually exhibited up-regulated miR-199b-5p vs. health controls. Please elaborate on how the target was determined.

      In fact, our study included two additional groups: the acupuncture treatment group (4 weeks of continuous acupuncture treatment) and the waiting treatment group (no intervention, followed by acupuncture treatment after 4 weeks), in addition to the healthy control and knee osteoarthritis (OA) patient groups. After comparing these four groups, we found that 11 genes (hsa-miR-504-3p, hsa-miR-1915-3p, hsa-miR-103a-2-5p, hsa-miR-887-3p, hsa-miR-1228-5p, hsa-miR-34c-3p, hsa-miR-3168, hsa-miR-518e-3p, hsa-miR-1296-5p, hsa-miR-338-3p, and hsa-miR-199b-5p) were upregulated in KOA patients but downregulated after acupuncture treatment, with no change in the waiting treatment group. Additionally, 7 genes (hsa-miR-448, hsa-miR-514a-3p, hsa-miR-4440, hsa-let-7f-5p, hsa-let-7a-5p, hsa-let-7d-5p, and hsa-miR-15b-3p) were downregulated in KOA patients but upregulated after acupuncture treatment, with no change in the waiting treatment group. Considering the improvement in clinical symptoms of KOA patients after acupuncture treatment, we believe that these 18 genes are of significant value. Based on overall expression abundance and species specificity, we finally selected 5 genes, namely the 5 genes mentioned in this article. Regarding this result, we have already included it in the supplementary fig5(fig. S5).

      Author response image 4.

      Venn diagram showing differentially expressed miRNAs in the OA group compared with healthy patients and patients who recovered after acupuncture treatment.

      (3) There is also a lack of description for bioinformatic analysis regarding how miRNA sequencing datasets were analyzed. What R/python packages or algorithms were used? What were the QC criteria?

      We apologize for any confusion caused. We have now included a clear description of the method employed, and R was utilized for this data analysis (revised in Page14, Line301-305). To ensure consistency, we compared our findings with publicly available human serum data from the database (GSE105027) using a fold change threshold of > 1.5 and a significance level of p < 0.05. In the cartilage data (GSE175961), we observed a list of miRNAs with shared expression patterns, yet the precise differential values could not be determined.

      (4) Another major concern is the chondrocyte culture method. Chondrocytes should be cultured in a 3D manner (i.e., a 3D pellet culture system or a micro mass culture method). 2D cultured chondrocytes tend to de-differentiate into MSC-like cells and thus lose their chondrocyte phenotype. This is evident from Fig. 3B and C. Cells started to spread out and only a few cells were positive for COL2A1 with a deep brown staining color. Thus, the results from the in vitro studies may not be representative of chondrocyte response to the treatments.

      We admit that 3D culture system will be more accurate and reliable. However, according to Liu Qianqian et al researches3, the 2D culture systems were also used and work well. Besides, the second-generation primary mice chondrocytes we used in the current study did not exhibit a significant dedifferentiated morphology. So, considering the experiment condition in our lab, we chose the second-generation cultured primary mouse chondrocytes in the whole process of cell experiment. To show the reliability of the cells, we provided more pictures in the supplement fig 7(fig. S7) In the future study, we will adopt 3D culture system for experiments. Thank you for your advices and we have added this limitation in the revised manuscript. (page11, line237-240)

      Author response image 5.

      Primary mice chondrocytes we cultured (P1)and the secondary generation cells(P2) we used in the following experiment.

      References which used 2D :

      (3) Liu Q, Zhai L, Han M, et al. SH2 Domain-Containing Phosphatase 2 Inhibition Attenuates Osteoarthritis by Maintaining Homeostasis of Cartilage Metabolism via the Docking Protein 1/Uridine Phosphorylase 1/Uridine Cascade. Arthritis & Rheumatology (Hoboken, NJ). 2022;74(3):462-474.

      (5) Page 7, lines 148-149: "The cartilage of mice injected with the miR-199b-5p mimic was slightly degraded (p=0.02) (Fig. 4E, F)". However, there was no significance between the groups found in Fig. 4F. Also, from the histological images of Fig. 4E, it looks like mice with inhibitor injection had more cartilage damage than miR-199b-5p mimic.

      We apologize for any confusion caused. Figures 4E and 4F represent the Safranin Fast Green Staining staining of the joint after the administration of miR-199b-5p inhibitor and mimic under physiological conditions. As you can see, there is minimal difference between these four images. There is no statistically significant difference. However, in Figures 5E and 5F, the MIA-induced KOA model was utilized, and noticeable differences can be observed after the administration of the inhibitor and mimic. In the revised version, we have emphasized that Figures 4E and 4F represent the results under physiological conditions, not under the MIA-induced model. (page 7, line 146-151)

      (6) Page 7, lines 149-150: "Additionally, the articular surface showed insect erosion (Fig. 4G)." It is also unclear how micro-CT analysis will be able to demonstrate the erosion of cartilage. Or the authors actually indicate the trochlear groove. However, this could also be observed in the control group and the results were not quantified. It is also unclear if the cross-section images of micro-CT shown here are helpful at all without any further explanation in the manuscript.

      Figure 4 G represents control, vehicle control, inhibitor, and mimic groups, while Figure 5 G represents model, model+vehicle control, model+inhibitor, and model+mimic groups. From Figure 4G, it can be observed that the simulator group showed the most obvious erosion appearance, while the inhibitor group did not exhibit this phenomenon5. From Figure 5G, it can be seen that the model group and model+mimic group exhibited the most pronounced erosion appearance, while the model+inhibitor group showed the best recovery. To highlight the pathological changes in the erosion appearance, we marked the typical locations with red arrows in the images for easy comparison and reading by the readers (Fig. 4G; Fig. 5G). We also made corresponding textual modifications in the original manuscript to address these findings (page 7, line 150-151; page 8, line 160-161). In addition, the 3D reconstruction of micro-CT is based on the synthesis of these cross-sectional images.

      References

      (5) Tao Y, Wang Z, Wang L, et al. Downregulation of miR-106b attenuates inflammatory responses and joint damage in collagen-induced arthritis. Rheumatology (Oxford, England). 2017;56(10):1804-1813.

      (7) Page 17, line 309-310: "Before model establishment and at 3, 7, 10, 14, 21, and 28 days after model establishment." Please re-write this as this is not clear regarding the experimental procedure.

      Thank you. We had to re-write the sentences as following:Baseline testing of behavioral pain thresholds was conducted prior to model establishment, followed by behavioral pain threshold testing on days 3, 7, 10, 14, 21, and 28 after model establishment. (pages15, line322-324)

      (8) Fig. 5A. The M + inhibitor and Model images are not at the same plane as M + mimic and M + RNAnc images.

      Thank you. We have modified.

      (9) Fig. 5B. There are two lines both with circle markers (Control and M+inhibitor). Please correct.

      We have corrected.

      (10) Fig. 5F. Missing * sign.

      We added *sign.

      (11) Please elaborate how the potential binding sites between miR-199b-5p and Gcnt2 and between miR-199b-5p and Fzd6.

      We apologize for any lack of clarity in the original text. In fact, we utilized targets to predict potential binding sites. Specifically, for the mouse species, we predicted that the 3'UTR of Fzd6 binds with miR-199b-5p at positions 2483-2490, 3244-3251, 3303-3309, and 3854-3860, while the 3'UTR of Gcnt2 binds with miR-199b-5p at positions 2755-2762 and 4144-4151. In the revised version, we provide a detailed description of the methodology used for predicting these sites and offer an elaborate explanation of the results. (pages16, line352)

      Additionally, to demonstrate consistency with human binding sites, we not only predicted the binding sites of human miR with these two target genes but also found a high conservation of up to 95.65% between the human and mouse sequences of miR-199b-5p. We have included this information in the supplementary materials (Fig. S6). In Fig. 6E-F, we presented the potential binding sites between miR-199b-5p and Gcnt2, as well as between miR-199b-5p and Fzd6. In addition, we provide the predicted binding of human sequence to illustrate the binding sites. Furthermore, the predicted binding of human miR-199b-5p with fzd6 and gcnt2 showed a high degree of consistency. (The fluorescent labeling in the following text indicates the potential predicted binding sites.) (Supplement file 8)

      hsa-miR-199b-5p MIMAT0000263

      CCCAGUGUUUAGACUAUCUGUUC

      NCBI Gene ID 8323 GenBank Accession NM_001164615

      Gene Symbol FZD6 3' UTR Length 1368

      Gene Description frizzled class receptor 6

      3' UTR Sequence: agaacattttctctcgttactcagaagcaaatttgtgttacactggaagtgacctatgcactgttttgtaagaatcactgttacattcttcttttgcacttaaagttgcattgcctactgttatactggaaaaaatagagttcaagaataatatgactcatttcacacaaaggttaatgacaacaatatacctgaaaacagaaatgtgcaggttaataatatttttttaatagtgtgggaggacagagttagaggaatcttccttttctatttatgaagattctactcttggtaagagtattttaagatgtactatgctattttacttttttgatataaaatcaagatatttctttgctgaagtatttaaatcttatccttgtatctttttatacatatttgaaaataagcttatatgtatttgaacttttttgaaatcctattcaagtatttttatcatgctattgtgatattttagcactttggtagcttttacactgaatttctaagaaaattgtaaaatagtcttcttttatactgtaaaaaaagatataccaaaaagtcttataataggaatttaactttaaaaacccacttattgataccttaccatctaaaatgtgtgatttttatagtctcgttttaggaatttcacagatctaaattatgtaactgaaataaggtgcttactcaaagagtgtccactattgattgtattatgctgctcactgatccttctgcatatttaaaataaaatgtcctaaagggttagtagacaaaatgttagtcttttgtatattaggccaagtgcaattgacttcccttttttaatgtttcatgaccacccattgattgtattataaccacttacagttgcttatattttttgttttaacttttgttttttaacatttagaatattacattttgtattatacagtacctttctcagacattttgtagaattcatttcggcagctcactaggattttgctgaacattaaaaagtgtgatagcgatattagtgccaatcaaatggaaaaaaggtagttttaataaacaagacacaacgtttttatacaacatactttaaaatattaaggagttttcttaattttgtttcctattaagtattattctttgggcaagattttctgatgcttttgattttctctcaatttagcatttgcttttggtttttttctctatttagcattctgttaaggcacaaaaactatgtactgtatgggaaatgttgtaaatattaccttttccacattttaaacagacaactttgaatacaaaaactttgttttgtgtgatcttttcattaataaaattatctttgtataagaaaaaaaaaaaaaa

      hsa-miR-199b-5p MIMAT0000263

      CCCAGUGUUUAGACUAUCUGUUC

      NCBI Gene ID 2651 GenBank Accession NM_001491

      Gene Symbol GCNT2 3' UTR Length 2780

      Gene Description glucosaminyl (N-acetyl) transferase 2 (I blood group)

      3' UTR Sequence: gctattcatgagctactcatgactgaagggaaactgcagctgggaagaggagcctgtttttgtgagagacttttgccttcgtaatgttaaccgtttcaggaccacgtttatagcttcaggacctggctacgtaattatacttaaaatatccactggacactgtgaaatacactaacaggatggctgggtagagcaatctgggcactttggccaattttagtcttgctgtttcttgatgctcacctctatattagtttattgttaggatcaatgataaatttaaatgacctcagatctttgcaccagatactcatcatatacaaatgttttagtaaaaaagagaattgtagataatactgtctaggaaaataagaattaggtttctttgaagaaggaatcttttataacaccttaacagtcaccactgtgctcaaccagacagatagtgaaacagctttctgggtaattcaccaatttcctttaaaacataagctacctgaatggagaatacatcttgtttctgagtttcaacactagcatttttggcttactcatggacaaagttctgtatatagtataaagtcattaacaagaaacaggatatgctttaagacagaattcactgtctgttgcttcagtaaaaggacctcggggaataaaacatttctctcttatatgccagaatgtaggctggtccctatgtcatgtcttccattaagaacactaaaaagtccttgcaagaatggagatatgcattcaagagaggtgctatcacatagatctagtctgaagtctggaacactttcctcttctatgacccctctctccccagtattatcttacttgcaaaatggagaccaaattctatcctgtgaggcttttaattgcaccatagtatgctctgagtagctttacactgcctggtactgatagtagtggctcgatttttaagagccttcaattgtagatgaacatctctgttatttatccctcattcatccatccgttcattcattcagccttcaatcaacatctcttgagtgtctattatgtacaggacatgtactgagacaaaaaggaaacataagagctttttcactctaaaaatcttggcaataatgtcaacaccagaaagcctcctctggagaatcttacagagtgattgtagtttaatacaggaacacacagggctgtgtagcatgataccaggcccaggagatcagtaattacaaattaagggttaaatcagagattattcaacagagagggagaaaggaggagacagagggaggacctgttgtgttccagccattctggtattcctttatgtatctaatttcattcaaacctcacaacagtcttgtgaggcccttatataattactcccattttgcagatgaagtaactgaggcttagaaaggttaatagcaccggggaacaatttctctgggtgagaattgggactctgttgctggtcttctcagttcatttcctgaggtggatttactgagagaaggtgaaataaagccatatttagtataccagagaaggtagattttaagaatggtctcagtgttaatactgagaaaaagtcctgtcagttcagaaaaaatgtgaagtctactttagtattcctgtaatactaaaccgttgagtttctaaatatttatttattctaacaaaaagcaattactacaaatggatgacacatttaatgaacacaattttattttttttctgtaactgtgcttgttgaatgtcaatcatatttaaagggaatgactttgaagtaaaaccttttttcttgctactgaaaaaaatggagttgttttgggtggtaaagtgttaaggaatagggacagctggtcacacaaggaactcttgaaggccacatgtgaaaacctgtcacttgcacagaggccagtcccactaaggtgaccagagtgggctccaagcacaaactgccattggctatagatgggactgtgtccccccaaaattcatgtgttggagccttaaccctcaatgtgatggtatttgagatggggcctttggtaagggaagtttagatgaggtcacgagggtaggaccctcatgatgggatgagtccccttacaagacctctggcttgggccgggcgtggtggctcacacctgtaatcccaacactttgggaggccaaggcaggtagatcacttgatgccaggagttccagaccaggctggccgacatggtgaaaccccatctctactaaaaaatataaaaattagccgggctttgtggcatgtgcctgtaatcccagctatttggcaggctgaggcatgagaatcgcttgaacccaggaggtggaggttacagtgagctgagagtgccccactgcactccagcctgggtgacagagcgagactttgtcccaaaacaaaataggtgaggggatagcgaatgcactcagggtcagcagtggagtttaaaaattgtctcttttcaacttatttaaatgacagcacctgagaagaggaaccgttttacactggatgtttctcatgtagaacaagaaatctttctggaattgatgtttacatgtctgttgttggtcatctctcctgtgtcttaaatactttaatgttggaagagcatagtgtttgggctagtgggtttctgacagcccatgggaatgccctgaaactactgtatctgatgtttgttttcgatgaggttccatgttttgttttcttgggaataaattaatatattgttttccaaaaaaaaaaaaaaaaaaaa

      (12) Page 10-11, Line 222-223: "Our findings indicate that miR-199b-5p plays a crucial role in KOA by targeting Fzd6 and Gcnt2". This is an overstatement. The current work shows the possible bindings of miR-199b-5p and Fzd6 as well as bindings of miR-199b-5p and Gcnnt2. Whether miR-199b-5p truly functions through Fzd6 and/or Gcnt2 requires genetic knockdown of Fzd6 and Gcnt2 in the presence of miR-199b-5p. Thus, please tune down this statement and the title of the manuscript.

      We agree your opinion of our conclusion. Therefore, we delete the overstatement sentences and tune down the conclusion of the manuscript. (the title; page 8,179; page11, line227-228)

      (13) The Schematic figure (the last figure). Please remove osteophyte as this was not quantified in the study.

      We modified the schematic figure accordingly.

      Minor concerns:

      (1) Most figures were distorted.

      We provide a new version of the figure to avoid distortions.

      (2) Providing GO term numbers in Fig. 1C is not very helpful. Maybe show the GO term and corresponding numbers in the manuscript (Page 4, lines 79 - 82).

      Thank you for your advice. We added the corresponding notes of the GO term numbers in the manuscript to explain each biological concept of it. (Page 4, line 77-89;Page 22,line 515-532)

      (3) What were M-0.5 and M-1 in Fig. 2D? Different MIA concentrations?

      Yes, these are different MIA concentrations, which we illustrate in the legend. (Page 23, line 535-536)

      (4) Please follow the nomenclature of the gene symbol. For example, Fig. 3E-P should be mouse genes (?).

      We modified the relevant gene symbol.

      (5) Page 3, line 59. Not all chondrocytes are pathogenic cells in OA.

      We are sorry for the mistake, now it has been modified. (Page 3, line 59)

      (6) Typo. Page 3, line 55.

      We changed the Typo.

      (7) Page 4, line 78. These are differentially expressed miRNAs, not genes.

      We have revised the unsuitable expression. (Page4, line75-76)

      I wish the authors all the best with their continued work in this area.

      Thank you for your wishes.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The manuscript by Xia et al. investigated the mechanisms underlying Glucocorticoid-induced osteonecrosis of the femoral head (GONFH). The authors observed that abnormal osteogenesis and adipogenesis are associated with decreased β-catenin in the necrotic femoral head of GONFH patients, and that the inhibition of β-catenin signalling leads to abnormal osteogenesis and adipogenesis in GONFH rats. Of interest, the deletion of β-catenin in Col2-expressing cells rather than in Osx-expressing cells leads to a GONFH-like phenotype in the femoral head of mice.

      Strengths:

      A strength of the study is that it sets up a Col2-expressing cell-specific β-catenin knockout mouse model that mimics the full spectrum of osteonecrosis phenotype of GONFH. This is interesting and provides new insights into the understanding of GONFH. Overall, the data are solid and support their conclusions.

      Reviewer #1 (Recommendations For The Authors):

      1) Fig. 1I should be quantified and presented as bar graphs to make it consistent with other data, and the significance should be shown.

      Reply: Thanks for your comments. We have provided the quantitative bar graph in the new version.

      2) Fig. 2H, beta-catenin, ALP and FABP4 should be labled below the X axis. Moreover, the pattern of Fig. 2H is different from other bar graphs and the dots for individual samples are missing, so I could not judge the N values for the experiments. N values should also be provided for Fig. 3.

      Reply: Thanks for your comments. We have added the labels of beta-catenin, ALP and FABP4 below the X axis in Fig. 2H. The modes of quantitative bar graphs were changed to show the N values in the each experiment.

      3) Fig. 4 shows the fate mapping of Col2+ cells and Osx+ cells in the femoral head. In this regard, the authors presented images for Col2-expressing cells at all the indicated time points, i.e. 1, 3, 6, and 9 months, but only presented images for Osx-expressing cells for 1 month while those for 3, 6, and 9 months are missing.

      Reply: Thanks for your comments. Here, we showed that the expression of Osx+ cells in the femoral head were total different with Col2+ cells at the age of 3, 6 month, further indicating they were two different progenitor lineage cells.

      Author response image 1.

      4) Some experiments may need to be described in more detail" e.g., ABH/Orange G staining, biomechanical testing, μCT analysis, et al.

      Reply: Thanks for your comments. We have provided more information of experiment procedures.

      5) This study proposed that Col2-expressing cells play a key role in the progression of GONFH, did the authors use Col2+ cells for the in vitro experiments?

      Reply: As in vitro experiments could not reflect the location of Col2-expressing cells in the femoral head, therefore here we applied in vivo lineage tracing study. After as long as 9 month of linage trace, we thoroughly showed the self-renew ability and osteogenic commitment of Col2+ cells, as well as its space variation in the femoral head with age. Conditional knockout of β-catenin caused that Col2+ cells trans-differentiated into adipogenic cells instead of osteogenic cells, which directly clarified the mechanism of Col2+ cells leading to GONFH-like phenotype in mice.

      6) A few typo errors, such as Line 13, "contribute" should be "contributes"; Line 118, "reveled" should be "revealed".

      Reply: We have revised the grammar errors in the new manuscript.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, the authors reported a study to uncover that β-catenin inhibition disrupting the homeostasis of osteogenic/adipogenic differentiation contributes to the development of Glucocorticoid-induced osteonecrosis of the femoral head (GONFH). In this study, they first observed abnormal osteogenesis and adipogenesis associated with decreased β-catenin in the necrotic femoral head of GONFH patients, but the exact pathological mechanisms of GONFH remain unknown. They then performed in vivo and in vitro studies to further reveal that glucocorticoid exposure disrupted osteogenic/adipogenic differentiation of bone marrow stromal cells (BMSCs) by inhibiting β-catenin signaling in glucocorticoid-induced GONFH rats, and specific deletion of β-catenin in Col2+ cells shifted BMSCs commitment from osteoblasts to adipocytes, leading to a full spectrum of disease phenotype of GONFH in adult mice.

      Strengths:

      This innovative study provides strong evidence supporting that β-catenin inhibition disrupts the homeostasis of osteogenic/adipogenic differentiation that contributes to the development of GONFH. This study also identifies an ideal genetically modified mouse model of GONFH. Overall, the experiment is logically designed, the figures are clear, and the data generated from humans and animals is abundant supporting their conclusions.

      Weaknesses:

      There is a lack of discussion to explain how the Wnt agonist 1 works. There are several types of Wnt ligands. It is not clear if this agonist only targets Wnt1 or other Wnts as well. Also, why Wnt agonist 1 couldn't rescue the GONFH-like phenotype in β-cateninCol2ER mice needs to be discussed.

      Reply: Thanks for your constructive comments. Wnt agonist 1 is a cell-permeating activator of the Wnt signaling pathway that induces transcriptional activity dependent on β-catenin (PMID: 25514428,18624906). In the present study, we aim to demonstrate that activation of β-catenin signaling could alleviate the phenotype of rat GONFH, thus only β-catenin and downstream targets (RUNX2, ALP, PPAR-γ, FABP4) expressions were detected after Wnt agonist 1 intervention. Conditional knockout β-catenin in Col2+ cells lead to an mouse GONFH-like phenotype. Wnt agonist 1 couldn't rescue this GONFH-like, as it did not activate β-catenin signaling. We have discussed them in the new version.

      Reviewer #3 (Public Review):

      Summary:

      In this manuscript, the authors are trying to delineate the mechanism underlying the osteonecrosis of the femoral head.

      Strengths:

      The authors provided compelling in vivo and in vitro data to demonstrate Col2+ cells and Osx+ cells were differentially expressed in the femoral head. Moreover, inducible knockout of β-catenin in Col2+ cells but not Osx+ cells lead to a GONFH-like phenotype including fat accumulation, subchondral bone destruction, and femoral head collapse, indicating that imbalance of osteogenic/adipogenic differentiation of Col2+ cells plays an important role in GONFH pathogenesis. Therefore, this manuscript provided mechanistic insights into osteonecrosis as well as potential therapeutic targets for disease treatment.

      Weaknesses:

      However, additional in-depth discussion regarding the phenotype observed in mice is highly encouraged.

      Reply: Thanks for your comments. Inducible knockout of β-catenin in Col2+ cells but not Osx+ cells lead to a GONFH-like phenotype. Lineage tracing data showed Col2+ cells and Osx+ cells were different cell populations, and we have discussed the potential mechanism caused the different phenotypes between β-cateninCol2ER mice and β-cateninOsxER mice.

      1) Why did the authors use dexamethasone in the cellular experiments but methylprednisolone to induce the GONFH rat model?

      Reply: Thanks for the comments. Here, we applied a dexamethasone (DEX)-treated BMSC model in vitro and a methylprednisolone (MPS)-induced rat model in vivo for GONFH study based on the published literatures (PMID: 37317020, 29662787, 29512684,35126710, 32835568).

      2) Both bone damage and fat accumulation were observed in 3-month-old and 6-month-old β-cateninCol2ER mice, but the femoral head collapse (the feature of GONFH at the late stage) only occurred in the older β-catenin Col2ER mice. This interesting observation needs to be discussed. Reply: Thanks for the comments. Bone damage caused a poor mechanical support is the key to femoral head collapse. Despite of similar trabecular bone loss and fat accumulation in the 3-month-old and 6-month-old β-cateninCol2ER mice, the older mice also presented extensive subchondral bone destruction. Integrated subchondral bone provided a well mechanical support for femoral head morphology, therefore femoral head collapse were occurred in the older β-cateninCol2ER mice.

      3) In the Materials and Methods, detailed information on the reagents should be provided.

      Reply: We have provided detailed information of the important reagents.

      4) As shown in Figure 4, β-cateninOsxER mice at 3 months of age did not show differences in lipid droplet area and empty lacunae rate, but there was a decrease in bone area. The authors should at least provide some necessary discussion of this phenomenon.

      Reply: Thanks for your comments. In the present study, we found few lipid droplet and empty lacuna but a significant decrease of bone mass in the femoral heads of β-cateninOsxER mice. Previous studies showed that specific knockout of β-catenin in Osx-expressing cells promoted osteoclast formation and activity, leading to the bone mass loss (PMID: 29124436, 34973494). We discussed this phenomenon in the new version.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study presents a potentially valuable discovery which indicates that activation of the P2RX7 pathway can reduce the lung fibrosis after its establishment by inflammatory damage. If confirmed, the study could clarify the role of specific immune networks in the establishment and progression of lung fibrosis. However, the presented data and analyses are incomplete as they primarily rely on limited pharmacological treatments with modest effect sizes. I hope you will be convinced by the validity of our approaches with the following explanation/information and I remain at your disposal to discuss

      Public Reviews:

      Reviewer #1 (Public Review):

      In this revised preprint the authors investigate whether a presumably allosteric P2RX7 activating compound that they previously discovered reduces fibrosis in a bleomycin mouse model. They chose this particular model as publicly available mRNA data indicate that the P2RX7 pathway is downregulated in idiopathic pulmonary fibrosis patients compared to control individuals. In their revised manuscript, the authors use three proxies of lung damage, Ashcroft score, collagen fibers, and CD140a+ cells, to assess lung damage following the administration of bleomycin. These metrics are significantly reduced on HEI3090 treatment. Additional data implicate specific immune cell infiltrates and cytokines, namely inflammatory macrophages and damped release of IL-17A, as potential mechanistic links between their compound and reduced fibrosis. Finally, the researchers transplant splenocytes from WT, NLRP3-KO, and IL-18-KO mice into animals lacking the P2RX7 receptor to specifically ascertain how the transplanted splenocytes, which are WT for P2RX7 receptor, respond to HEI3090 (a P2RX7 agonist). Based on these results, the authors conclude that HEI3090 enhanced IL-18 production through the P2RX7-NLRP3 inflammasome axis to dampen fibrosis.

      These findings could be interesting to the field, as there are conflicting results as to whether NLRP3 activation contributes to fibrosis and if so, at what stage(s) (e.g., acute damage phase versus progression). The revised manuscript is more convincing in that three orthogonal metrics for lung damage were quantified. However, major weaknesses of the study still include inconsistent and small effect sizes of HEI3090 treatment versus either batch effects from transplanted splenocytes or the effects of different genetic backgrounds. Moreover, the fundamental assumption that HEI3090 acts specifically and functionally through the P2RX7 pathway in this model cannot be directly tested, as the authors now provide results indicating that P2RX7 knockout mice do not establish lung fibrosis on bleomycin treatment.

      I’m particularly concerned by the assumption made by reviewer 1 concerning the fact that P2RX7 knockout mice do not establish lung fibrosis on bleomycin treatment.

      Indeed, what we showed in the point-to-point response is that BLM induces fibrosis in both WT and P2RX7 KO mice, but the intensity of the fibrosis is reduced in P2RX7KO mice, panel A. Therefore, as discussed in our first response, our results confirmed the previous publication of Riteau et al, that P2RX7 participates in BLM-induced lung fibrosis (see panel B).

      Author response image 1.

      Bleomycin induced lung fibrosis in WT versus p2rx7 KO mice. A: lung from BLM-treated mice were stained with HE and fibrosis was quantified using the Ashcroft protocol. Result showed that fibrosis induced by BLM in KO mice is reduced as compared to WT mice. B: Representative images of lung sections at day 14 after BLM treatment stained with H&E as published in Riteau et al. and illustrating that fibrosis induced by BLM in KO mice is reduced as compared to WT mice. WT mice vehicle (n=4) or p2rx7 KO (n=6) mice. Two-tailed Mann-Whitney test, p values: **p < 0.01.

      Importantly, this lower intensity of lung fibrosis in P2RX7 KO mice, does not interfere with the capacity of our molecule to attenuate lung fibrosis, as demonstrated in the adoptive transfer of IL1B KO splenocytes in P2RX7 KO mice, in which HEI3090 decreases the Ashcroft score, the % of fibrosis and the collagen fibers (see below).

      Author response image 2.

      HEI3090 activity requires P2RX7’s expressing immune cells: Experimental design. p2rx7-/- mice were given 3.106 il1β-/- splenocytes i.v. one day prior to BLM delivery (i.n. 2.5 U/kg). Mice were treated daily i.p. with 1.5 mg/kg HEI3090 or vehicle for 14 days. (C) Representative images of lung sections at day 14 after treatment stained with H&E and Sirius Red with il1β-/- splenocytes, bar= 100 µm (left) and fibrosis score assessed by the Ashcroft method, the % of fibrosis and the content of collagen fibers (right). Each point represents one mouse (n=2 in WT and NLRP3 experiment, n =1 in IL18 and IL1B experiment), data represented as violin plot or mean±SEM, two-tailed Mann-Whitney test, *p < 0.05. WT: Wildtype, KO: P2RX7 knock-out

      Importantly, in the same experimental setting, e.g adoptive transfer of splenocytes from different genetic backgrounds, HEI3090 decreases the fibrosis intensity only with WT and IL1B KO splenocytes and not with NLRP3 KO and IL18KO splenocytes.

      Author response image 3.

      HEI3090 activity requires P2RX7’s expressing immune cells: Experimental design. p2rx7-/- mice were given 3.106 WT, NLRP3-/-, IL18-/- or IL1β-/- splenocytes i.v. one day prior to BLM delivery (i.n. 2.5 U/kg). Mice were treated daily i.p. with 1.5 mg/kg HEI3090 or vehicle for 14 days. Fibrosis in whole lung was assessed by the % of fibrosis (upper panel) and the content of collagen fibers (lower panel). Each point represents one mouse (n=2 in WT and NLRP3 experiments, n =1 in IL18 and IL1B experiment). Data represented as violin plot or mean±SEM, two-tailed Mann-Whitney test, *p < 0.05. WT: Wildtype, KO: P2RX7 knock-out

      In order to provide clear evidence that HEI3090 functions through P2RX7, a different lung fibrosis model that does not require P2RX7 would be necessary. For example, in such a system the authors could demonstrate a lack of HEI3090-mediated therapeutic effect on P2RX7 knockout.

      Since BLM induces lung fibrosis in P2RX7 KO mice as we showed in this manuscript and as already published by Riteau in 2010, shown earlier in our response (first figure) and because HEI3090 is able to decrease the intensity of fibrosis in WT and IL1B-/- → P2RX7 KO mice but not in KO, NLRP3-/- → P2RX7 KO and IL18-/- → P2RX7 KO mice we believe that our data sustain the conclusion that

      1. HEI3090 required the expression of P2RX7 in immune cells to mediate the antifibrotic activity,

      2. IL1B is not a crucial effector mediating the antifibrotic effect of HEI3090.

      Molecularly, additional evidence on specificity, such as thermal proteome profiling and direct biophysical binding experiments, would also enhance the authors' argument that the compound indeed binds P2RX7 directly and specifically. Since all small molecules have some degree of promiscuity, the absence of an additional P2RX7 modulator, or direct recombinant IL-18 administration (as suggested by another reviewer), is needed to orthogonally validate the functional importance of this pathway. Another way the authors could probe pathway specificity would involve co-administering α-IL-18 with HEI3090 in several key experiments (similar to Figure 4L).

      At the moment we have no funds to do these experiments and given the high competition, we have decided to publish our story without these new data.

      Reviewer #2 (Public Review):

      In the study by Hreich et al, the potency of P2RX7-specific positive modulator HEI3090, developed by the authors, for the treatment of Idiopathic pulmonary fibrosis (IPF) was investigated. Recently, the authors have shown that HEI3090 can protect against lung cancer by stimulating dendritic cell P2RX7, resulting in IL-18 production that stimulates IFN-γ production by T and NK cells (DOI: 10.1038/s41467-021-20912-2). Interestingly, HEI3090 increases IL-18 levels only in the presence of high eATP. Since the treatment options for IPF are limited, new therapeutic strategies and targets are needed. The authors first show that P2RX7/IL-18/IFNG axis is downregulated in patients with IPF. Next, they used a bleomycin-induced lung fibrosis mouse model to show that the use of a positive modulator of P2RX7 leads to the activation of the P2RX7/IL-18 axis in immune cells that limits lung fibrosis onset or progression. Mechanistically, treatment with HEI3090 enhanced IL-18-dependent IFN-γ production by lung T cells leading to a decreased production of IL-17 and TGFβ, major drivers of IPF. The major novelty is the use of the small molecule HEI3090 to stimulate the immune system to limit lung fibrosis progression by targeting the P2RX7, which could be potentially combined with current therapies available. Overall, the study was well performed, and the manuscript is clear.

      We thank the reviewer for this very positive comments.

      However, there is need for more details on the description and interpretation of the adoptive transfer experiments, as well as the statistical analyses and number of replicate independent experiments.

      I’m concerned by the reviewer’s comments, and I would like to bring additional information/explanation, which I hope will convince you on the validity of our approaches.

      Author response image 4.

      Adoptive transfer experiment. Adoptive transfer experiments are classically used to document which immune cells participate in immune cell responses (with more than 150 publications in pubmed with the key words adoptive transfer and onco immunology) and intravenous administration is a common route to trigger lungs (PMID: 23336716). To characterize the molecular effector (P2RX7, NLRP3, IL18 and IL1B) accounting for the antifibrotic effect of HEI3090 we purified splenocytes from donor mice and administrated them intra venously in P2RX7 KO mice. As shown in Author response image 4, HEI3090 has no antifibrotic activity when splenocyte isolated from mice invalidated for p2rx7 are iv into P2RX7 KO mice (KO in KO). By contrast, HEI3090 has antifibrotic activity when WT splenocytes expressing P2RX7 (isolated from WT mice) are transferred into P2RX7 KO mice (WT in KO).

      This experiment brings strong evidence to demonstrate the efficacy of adoptive transfer approach to identify molecular effector required to mediate the antifibrotic effect of HEI3090.

      Statistical analyses and number of replicate independent experiments

      We thank the reviewer for his comment, and we apologize to not have been sufficiently clear in our previous response with this miss phrased statement “the experiment was stopped when significantly statistical results were observed” when we should have written “the experiment was stopped when each experimental group contained at least 5 mice”.

      To define the size of experimental groups we did a pilot experiment, with 4 WT mice (e.g. 4 biological replicates) in each group (as shown aside), and a statistical forecasting based on the result of the pilot experiment (40% difference, standard error: 0.9, α risk: 0.05, power: 0.8). Since we focused on the effect of HEI3090 we based our statistical analysis on a one-way ANOVA analysis comparing in each experiment the vehicle and the treated group.

      The pilot experiment and statistical forecasting indicated 4 mice per group to characterize the effect of HEI3090 on BLM-induced lung fibrosis. Each experiment was started with 6 to 8 mice per group. Being aware that 30% of mice can unexpectedly dye due to BLM treatment, we duplicated the experiment, when necessary, to include at least 5 mice in each group of each experiment meaning 5 biological replicates, knowing that 4 mice are sufficient to statistically analyze the results. In each experiment we have checked for the presence of outlier, using the ROULT method, and removed the outliers when necessary.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, Yang et al report a novel regulatory role of SIRT4 in the progression of kidney fibrosis. The authors showed that in the fibrotic kidney, SIRT4 exhibited an increased nuclear localization. Deletion of Sirt4 in renal tubule epithelium attenuated the extent of kidney fibrosis following injury, while overexpression of SIRT4 aggravates kidney fibrosis. Employing a battery of in vitro and in vivo experiments, the authors demonstrated that SIRT4 interacts with U2AF2 in the nucleus upon TGF-β1 stimulation or kidney injury and deacetylates U2AF2 at K413, resulting in elevated CCN2 expression through alternative splicing of Ccn2 gene to promote kidney fibrosis. The authors further showed that the translocation of SIRT4 is through the BAX/BAK pore complex and is dependent on the ERK1/2-mediated phosphorylation of SIRT4 at S36, and consequently the binding of SIRT4 to importin α1. This fundamental work substantially advances our understanding of the progression of kidney fibrosis and uncovers a novel SIRT4-U2AF2-CCN2 axis as a potential therapeutic target for kidney fibrosis.

      Strengths:

      Overall, this is an extensive, well-performed study. The results are convincing, and the conclusions are mostly well supported by the data. The message is interesting to a wider community working on kidney fibrosis, protein acetylation, and SIRT4 biology.

      Weaknesses:

      The manuscript could be further strengthened if the authors could address a few points listed below:

      (1) In the results part 3.9, an in vitro deacetylation assay employing recombinant SIRT4 and U2AF2 should be included to support the conclusion that SIRT4 is a deacetylase of U2AF2. Similarly, an in vitro binding assay can be included to confirm whether SIRT4 and U2AF2 are directly interacted.

      Thank you for your insightful comments and suggestions for improving our manuscript. We appreciate your recommendation to include an in vitro deacetylation assay employing recombinant SIRT4 and U2AF2 to support our conclusion regarding the deacetylase activity of SIRT4 on U2AF2.

      We would like to clarify that the data demonstrating the effect of SIRT4 on U2AF2 acetylation were already included in our original submission. Specifically, Figure 5C illustrates that the TGF-β1-caused decreased acetylation of U2AF2 is attenuated by Sirt4 knockdown. Conversely, overexpression of SIRT4 (SIRT4 OE) enhances the deacetylation process of U2AF2 in the presence of TGF-β1. These results support that SIRT4 is a deacetylase for U2AF2.

      Furthermore, we have already provided evidence of the direct interaction between SIRT4 and U2AF2 through a co-immunoprecipitation (CoIP) assay, which was shown in Figure 5B. This assay confirms the physical interaction between SIRT4 and U2AF2.

      We believe that the existing data sufficiently address the points raised in your comments. We are grateful for the opportunity to clarify these aspects of our study and hope that our response has adequately addressed your concerns.

      (2) In Figure 6D, the Western Blot data using U2AF2-K453Q is confusing and is quite disconnected from the rest of the data and not explained. This data can be removed or explained why U2AF2-K453Q is employed here.

      Thank you for your inquiry regarding the rationale behind the K453Q mutation in our study.

      In the study, we have predicted some acetylation sites. U2AF2-K453Q is another site mutation to mimic a hyperacetylated state of U2AF2, our results indicated that U2AF2 acetylation at K413 had little effects on CNN expression. Therefore, we found that only the U2AF2 acetylation at K413 can regulate CCN2 expression, not acetylation at other sites. In order not to cause ambiguity in the study, we have removed the results of U2AF2-K453Q in our revised manuscript.

      (3) Although ERK inhibitor U0126 blocked the nuclear translocation of SIRT4 in vivo, have the authors checked whether treatment with U0126 could affect the expression of kidney fibrosis markers in UUO mice?

      Thank you for your insightful question regarding the effects of the ERK inhibitor U0126 on the expression of kidney fibrosis markers in UUO mice.

      In our study, we indeed conducted in vivo experiments using U0126 and observed that it effectively ameliorated kidney fibrosis markers, which is consistent with its established role in inhibiting the fibrotic process. Specifically, U0126 treatment significantly suppressed the SIRT4-mediated renal fibrosis, which was evidenced by the reduced expression of fibrosis markers (Author response image 1).

      Author response image 1.

      U0126 treatment alleviates renal fibrosis in UUO mice.

      However, in the initial submission, we chose not to include these results in the main body of the manuscript based on the following reasons: 1) we intent to highlight the inhibitory effects of U0126 on ERK and its subsequent impact on kidney fibrosis might shift the focus of our study away from the central theme of SIRT4's role in renal fibrosis. 2) We aimed to maintain a clear narrative that emphasizes the novel findings related to SIRT4 and its regulation by the ERK pathway.

      Nonetheless, we recognize the importance of these findings and are willing to include the relevant data in the revised manuscript if it aligns with the journal's editorial direction and contributes to the broader understanding of renal fibrosis treatment strategies.

      We appreciate the opportunity to clarify this aspect of our research and are open to further suggestions from the editorial team.

      (4) The format of gene and protein abbreviations in the manuscript should be standardized.

      Thank you for your comment on the formatting of gene and protein abbreviations in our manuscript. We have carefully reviewed our formatting practices and confirmed that we have adhered to the standard conventions as follows:

      (1) Mouse gene names are presented with an initial capital letter and in italics.

      (2) Human gene names are written in uppercase and in italics.

      (3) Protein names are in all capital letters and not italicized.

      We understand the importance of consistency in scientific publications and have ensured that these standards are uniformly applied throughout the revised manuscript. If there were any discrepancies, we have corrected them to maintain the clarity and professionalism.

      We appreciate the opportunity to refine our work and are committed to upholding the standards of scientific communication.

      (5) There are a few grammar issues throughout the manuscript. The English/grammar could be stronger, thus improving the overall accessibility of the science to readers.

      Thank you for bringing the grammar issues to our attention. We have made diligent efforts to revise and improve the manuscript's English and grammar throughout. We have also enlisted the support of a professional language editing service to ensure the clarity and accuracy of our scientific communication.

      We are confident that these revisions have significantly enhanced the manuscript's accessibility to a broader readership and have addressed the language concerns raised.

      We appreciate your guidance and are committed to delivering a manuscript of the highest quality.

      Reviewer #2 (Public Review):

      Summary:

      This manuscript presents a novel and significant investigation into the role of SIRT4 For CCN2 expression in response to TGF-β by modulating U2AF2-mediated alternative splicing and its impact on the development of kidney fibrosis.

      Strengths:

      The authors' main conclusion is that SIRT4 plays a role in kidney fibrosis by regulating CCN2 expression via pre-mRNA splicing. Additionally, the study reveals that SIRT4 translocates from the mitochondria to the cytoplasm through the BAX/BAK pore under TGF-β stimulation. In the cytoplasm, TGF-β activated the ERK pathway and induced the phosphorylation of SIRT4 at Ser36, further promoting its interaction with importin α1 and subsequent nuclear translocation. In the nucleus, SIRT4 was found to deacetylate U2AF2 at K413, facilitating the splicing of CCN2 pre-mRNA to promote CCN2 protein expression. Overall, the findings are fully convincing. The current study, to some extent, shows potential importance in this field.

      Weaknesses:

      (1) Exosomes containing anti-SIRT4 antibodies were found to effectively mitigate UUO-induced kidney fibrosis in mice. While the protein loading capacity and loading methods were not mentioned.

      We appreciate your inquiry about the protein loading capacity and methods for the exosomes. As you have correctly noted, these details are indeed essential for the comprehensive understanding of our experimental approach. We have provided these information in the electronic supplementary material, specifically in Section 2.17, where we describe the methodology used for loading the anti-SIRT4 antibodies into the exosomes and the capacity at which this was achieved.

      We hope that this additional detail in the supplementary material addresses your concerns and enhances the clarity of our study's methodology.

      (2) The method section is incomplete, and many methods like cell culture, cell transfection, gene expression profiling analysis, and splicing analysis, were not introduced in detail.

      Thank you for your meticulous review and the feedback provided on our manuscript. We acknowledge your concern regarding the completeness of the methods section.

      We would like to clarify that in our initial submission, all text and figures were compiled into a single document, with the supplementary methods detailed at the end, separate from the main text methods. This format was chosen to adhere to submission guidelines that prioritize the concise presentation of core methods in the main text while providing additional details in the supplementary material for comprehensiveness.

      The detailed methodologies for cell culture, cell transfection, gene expression profiling analysis, and splicing analysis, which you inquired about, are now indeed included in the revised electronic supplementary material.

      We apologize for any misunderstanding caused by the initial structure of our submission and appreciate the opportunity to clarify the comprehensive nature of our methodological reporting.

      (3) The authors should compare their results with previous studies and mention clearly how their work is important in comparison to what has already been reported in the Discussion section.

      We appreciate the opportunity to discuss the significance of our findings in the broader context of renal fibrosis research. In response to your suggestion, we have further refined our discussion to explicitly compare our results with those of previous studies and to clearly articulate the importance of our work.

      (1) Novelty of SIRT4's Role in Renal Fibrosis: Our study introduces a novel concept in the field by demonstrating the nuclear translocation of SIRT4 as a key initiator of kidney fibrosis. This finding diverges from previous studies that have primarily focused on SIRT4's mitochondrial roles, highlighting a new dimension of SIRT4's function in renal pathophysiology.

      (2) Mechanistic Insights: We provide a detailed mechanistic pathway, from the release of SIRT4 from mitochondria through the BAX/BAK pore to its subsequent nuclear translocation and impact on U2AF2 deacetylation. This pathway has not been previously described, offering a fresh perspective on the regulation of fibrogenic gene expression.

      (3) Implications for Therapy: Our findings suggest potential therapeutic interventions targeting SIRT4 nuclear translocation, which could be a significant advancement over existing treatments that have shown limited efficacy in addressing the root causes of renal fibrosis.

      (4) Epigenetic Regulation: By elucidating the role of SIRT4 in regulating alternative splicing of CCN2 pre-mRNA through U2AF2 deacetylation, our study contributes to the growing understanding of epigenetic mechanisms in renal fibrosis, a field that has been understudied compared to genetic factors.

      Differential Cellular Roles of SIRT4: Our work indicates that SIRT4 may have distinct roles in different cell types, which is a complex and nuanced aspect of CKD pathophysiology that has not been fully explored in previous research.

      Integration with Previous Research: We have compared our findings with existing literature, noting where our work aligns with and diverges from previous studies. This comparison underscores the value of our research in expanding the current paradigm of renal fibrosis.

      In conclusion, we believe that our study provides critical insights into the pathogenesis of renal fibrosis and offers a potential therapeutic target. We have clarified these points in the discussion section of our manuscript to ensure that the significance of our work is clearly communicated to the readers.

      Reviewer #3 (Public Review):

      Summary:

      Yang et al reported in this paper that TGF-beta induces SIRT4 activation, TGF-beta activated SIRT4 then modulates U2AF2 alternative splicing, U2AF2 in turn causes CCN2 for expression. The mechanism is described as this: mitochondrial SIRT4 transport into the cytoplasm in response to TGF-β stimulation, phosphorylated by ERK in the cytoplasm, and pathway and then undergo nuclear translocation by forming the complex with importin α1. In the nucleus, SIRT4 can then deacetylate U2AF2 at K413 to facilitate the splicing of CCN2 pre-mRNA to promote CCN2 protein expression. Moreover, they used exosomes to deliver Sirt4 antibodies to mitigate renal fibrosis in a mouse model. TGF-beta has been widely reported for its role in fibrosis induction.

      Strengths:

      TGF-beta induction of SIRT4 translocation from mitochondria to nuclei for epigenetics or gene regulation remains largely unknown. The findings presented here that SIRT4 is involved in U2AF2 deacetylation and CCN2 expression are interesting.

      Weaknesses:

      SIRT4 plays a critical role in mitochondria involved in respiratory chain reaction. This role of SIRT4 is critically involved in many cell functions. It is hard to rule out such a mitochondrial activity of SIRT4 in renal fibrosis. Moreover, the major concern is what kind of message mitochondrial SIRT4 proteins receive from TGF-beta. Although nuclear SIRT4 is increased in response to TNF treatment, it is likely de novo synthesized SIRT4 proteins can also undergo nuclear translocation upon cytokine stimulation. TGF-beta-induced mitochondrial calcium uptake and acetyl-CoA should be evaluated for calcium and acetyl-CoA may contribute to the gene expression regulation in nuclei.

      Recommendations for the authors:

      Reviewer #3 (Recommendations For The Authors):

      (1) SIRT4 overall is a mitochondrial enzyme that indeed can undergo shuttling between mitochondria and cytoplasm. Renal fibrosis is a process of complex, SIRT4 deacetylates U2AF4 at K 413.

      Thank you for your comment highlighting the known mitochondrial localization of SIRT4 and its role in renal fibrosis.

      We concur with the literature that SIRT4 is predominantly a mitochondrial enzyme. However, our study expands upon this understanding by demonstrating a novel shuttling mechanism of SIRT4 between mitochondria and the nucleus in the context of renal fibrosis. Specifically, we observed that under conditions of obstructive nephropathy and renal ischemia reperfusion injury, SIRT4 significantly accumulates in the nucleus, which is a critical event in the fibrotic response.

      Our findings reveal that upon TGF-β stimulation, a known inducer of fibrosis, SIRT4 is released from the mitochondria through the BAX/BAK pore and subsequently translocates to the nucleus. This translocation is mediated by the ERK1/2-dependent phosphorylation of SIRT4 at serine 36, which enhances its interaction with importin α1, a key component in nuclear import processes.

      Once in the nucleus, SIRT4 exerts its effects on the alternative splicing of CCN2 pre-mRNA by deacetylating U2AF2 at lysine 413. This deacetylation event promotes the formation of the U2 small nuclear ribonucleoprotein (U2 snRNP) and facilitates the splicing of CCN2 pre-mRNA, leading to increased expression of the profibrotic protein CCN2.

      Our study, therefore, not only confirms the mitochondrial association of SIRT4 but also uncovers its nuclear function in the regulation of gene expression during renal fibrosis. These findings underscore the complexity of SIRT4's role in cellular processes and its potential as a therapeutic target for fibrotic diseases.

      (2) Figure 2 and Figure 3 should be combined.

      Thank you for your suggestion to combine Figures 2 and 3 for potential improvement in presentation.

      After careful consideration, we have found that merging these figures is not feasible due to space constraints on a standard A4 page, which is necessary to maintain the clarity and detail of the data presented in both figures. Each figure contains complex data that, when combined, would compromise the readability and the integrity of the individual elements.

      We believe that the current presentation of Figures 2 and 3 provides a clear and detailed visualization of the data, which is essential for the reader's understanding of our study's findings.

      (3) In Figure 4G, the mass spectrum of U2AF2 acetylation on K413 should be included rather than the alignment among species. Moreover, endogenous HAT1 on endogenous U2AF2 rather than exogenous FLAG-U2F2 should be examined.

      Thank you for your thoughtful comments and for the suggestion to include the mass spectrum of U2AF2 acetylation on K413 in Figure 4G.

      We appreciate the value that the mass spectrometry data would add to our study, providing a direct and definitive assessment of the acetylation status at this specific residue. However, we regret to inform you that our current facilities do not have access to the necessary mass spectrometry equipment to perform these analyses.

      While we are unable to include this data in the present manuscript, we concur with the importance of such evidence and plan to undertake these studies in the future. We are in the process of establishing collaborations with laboratories that have the required facilities to perform mass spectrometry. Our intention is to incorporate these data into a follow-up study, which will further validate and expand upon the findings presented in this manuscript.

      We believe that our current findings, although lacking the mass spectrometry confirmation, still provide valuable insights into the role of U2AF2 acetylation in [insert relevant biological process]. We have taken care to present our data rigorously and transparently, and we are committed to pursuing the highest standards of experimental validation in our future work.

      We hope you will consider the merits of our study in the context of the current limitations and appreciate the opportunity to clarify our position.

      Furthermore, regarding the examination of endogenous HAT1's effect on endogenous U2AF2 acetylation levels, we have conducted the necessary experiments. Our results demonstrate that overexpression of HAT1 leads to a significant increase in the acetylation of endogenous U2AF2 (Figure. R2). This new data set has been added to the revised manuscript and supports the role of HAT1 in the regulation of U2AF2 acetylation.

      We believe that these revisions address your concerns and provide a more comprehensive understanding of the molecular mechanisms underlying the regulation of U2AF2 acetylation.

      We appreciate the opportunity to improve our manuscript based on your constructive feedback and hope that our revisions meet with your satisfaction.

      Author response image 2.

      HAT1 OE reduces the acetylation of endogenous U2AF2

      (4) Figure 6F. Does portien mean protein?

      Thank you for your careful review and insightful comments on our manuscript. You are correct in pointing out the error regarding the term "portien" in Figure 6F. It was indeed a typographical oversight on our part, and we apologize for any confusion this may have caused.

      We have made the necessary correction to ensure that "protein" is accurately used in place of "portien" in Figure 6F. We appreciate the opportunity to enhance the clarity and accuracy of our presentation.

      (5) The authors should pay attention to their writing. There are many typos and other issues with the use of the English language and grammar.

      Thank you for bringing the grammar issues to our attention. We have made diligent efforts to revise and improve the manuscript's English and grammar throughout. We have also enlisted the support of a professional language editing service to ensure the clarity and accuracy of our scientific communication.

      We are confident that these revisions have significantly enhanced the manuscript's accessibility to a broader readership and have addressed the language concerns raised.

      We appreciate your guidance and are committed to delivering a manuscript of the highest quality.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1

      Reviewer #1’s main concerns revolved around the evidential strength of the study’s conclusion that age-specific effects of birth weight on brain structure are more localized and less consistent across cohorts than age-uniform, stable effects. Specifically, the reviewer points out the evidence (or lack of such) for age-specific effects. We have rearticulated as a “bullet-point summarization” the reviewer’s concerns for a better response (please, see the original reviewer’s response in the annexed document). We thank the reviewer for his/her comment.

      Concern #1: No direct statistical comparisons are conducted between samples (beyond the spin-tests).

      In the initial version of the manuscript, the spin-tests represented a key test since they compared the spatial distribution of birth weight effects across samples. In the revised manuscript, we additionally perform a replicability analysis across samples both for birth weight effects on brain characteristics and on brain change in a similar fashion as described for the within-sample analysis. The results of these analyses provide complementary evidence of robust associations of birth weight effects on cortical characteristics (for area and volume, less so for thickness) and of unreliable associations of birth weight on cortical change. These analyses are briefly mentioned in the main document and fully described as supplementary information. Briefly, the effects of birth weight on cortical area and cortical volume showed high (exploratory and confirmatory) replicability while replicability was almost nonexistent for the effects of birth weight on cortical change. See below, under Reviewer #1, concern #2, for a description of the changes in the revised manuscript.

      Concern #2: The differential composition of samples in terms of age distribution leads to the possibility that lack of results is explained by methodological differences.

      The revised version of the manuscript provides now a within-sample replicability analysis of the birth weight effects on cortical change. This analysis addresses the reviewer’s concern as the lack of replicability in this analysis cannot be attributed to sample or methodological differences. We thank the reviewer for suggesting this analysis which provides further quantification of the (lack of) robustness of the birth weight effects on cortical change. See below for changes in the revised version of the manuscript concerning additional replicability analyses which were carried out as a response to reviewer #1 concerns #1 and #2.

      pp. 12-3. “Additionally, we performed replicability analyses both across and within samples to further investigate the robustness of the effects of birth weight on cortical characteristics and cortical change. Split-half analyses within datasets were performed, to investigate the replicability of significant effects 36,37 of BW on cortical characteristics within samples (refer to Figure 1). These analyses further confirmed that the significant effects were largely replicable for volume and area, but not for thickness (see Supplementary Figure 11). Split-half analyses of BW on cortical change (refer to Figure 2) showed, in general, a very low degree of replicability on the three different cortical measures. See Supplementary Table 3. Replicability across datasets showed a similar pattern, that is, replicability was high for the effect of brain weight on cortical characteristics but very low for the effects of cortical change. See Supplementary Table 4 for stats. See Supplementary statistical methods for a full description of the analyses. These analyses provide complementary evidence of robust associations of BW with cortical area and volume – but not cortical change - across and within samples.”

      p. 41. “For each dataset and cortical measure, we assessed the effects of birth weight on cortical structure and cortical change (…)”

      p. 42. “Across samples replicability was performed as described in the within-sample replicability analysis (i.e., we assessed the exploratory and confirmatory replicability) except that split-half was not performed - the three datasets were compared with each other - and the analyses were performed in the original fsaverage space.”

      pp. 54-55. “The exploratory replicability of birth weight on cortical change was negligible across datasets and measures [.00 (.00), .00 (.00), .00 (.00) for area, .02 (.09), .00 (.02), .01 (.03) for volume, and .01 (.05), .01 (.14), .00 (.01) for thickness] while confirmatory replicability was generally poor, except for the ABCD dataset [.02 (.05), .68 (.35), .00 (.00) for area, .08 (.14), .56 (.25), .00 (.02) for volume, and .37 (.26), .60 (.27), .01 (.03) for thickness] (see Supplementary Table 3).

      These results are not fully comparable to other studies assessing the replicability of brain phenotype associations due to analytical differences (e.g. sample size, multiple-comparison correction method)20,36, yet clearly show that the rate of replicability of BW associations with cortical area and volume are comparable to benchmark brain-phenotype associations such as body-mass index and age68. Lower levels of replicability in the LCBC subsample are likely attributable to higher sample variability (e.g. increased age span). Kinship may lead to inflated patterns of replicability within the ABCD cohort. Confirmatory replicability is, also, to some degree, affected by sample size, and thus the estimates of confirmatory replicability may be somewhat inflated in the ABCD dataset.

      Finally, the degree of across-sample replicability was high for the effects of birth weight on cortical area and volume (average confirmatory replicability = .96 and .93), low for thickness (.27), and negligible for the effects of birth weight on cortical change (.03, .06, and .06). See further information in Supplementary Table 4.”

      Concern #3: Some datasets have a narrow age range precluding the detection of age-related effects.

      We do not believe concern #3 is a major problem since timebirth weight refers to a within subject contrast, e.g., longitudinal-only-based contrast. Birth weight, even when self reported, is a highly reliable measure and the sample sizes are relatively large (n = 635, 1759, and 3324 unique individuals). Note that the smaller dataset does have longer follow-up times and more observations per participant, increasing the reliability of estimations in individual change. Structural MRI measures have very high reliability. Clearly, longitudinal brain change is less reliable, yet the present sample size and the high reliability of birth weight should provide enough statistical power to capture even small time-varying effects of birth weight on brain structure. Note as well that in each model age is treated as a covariate. Rather, the consistency of timebirth weight (that is, the effects of birth weight on cortical change) is assessed with split-half replications within and across samples. In this methodological pipeline, a narrow age range for a given dataset, if anything, may constitute an advantage. We have clarified the statistical model (see changes in the revised manuscript, referred to in response to reviewer #1, concern #5).

      Concern #4 The modeling strategy does not allow for non-linear interaction between age and BW suggesting the use of spline models instead in a mega-analytical fashion.

      Indeed, we agree that some - if not most - brain structures follow non-linear trajectories throughout life. In the present study, age regressors are used only for accounting for variance in the data rather than capturing any effect of interest. Rather, it is the time*birth weight regressor that captures age-varying changes in brain structure. Time reflects within-subject follow-up time. We believe non-linear modeling of age will only account for additional variance (compared to linear models) in the LCBC dataset given the dataset’s wider age range, while it will not have any consequential effect in the ABCD and UKB datasets (as predicted in the provisional response). In any case, we recognize it as a valid concern. Consequently, we have rerun the main models in an ROI-based fashion using or not using spline models to fit age. Specifically, we have fitted the models in each of Desikan-Killiany’s ROIs using generalized additive mixed models (GAMM with age as a smooth term) or linear mixed models (LME with age as a linear regressor). The results are shown in Supplementary Figures 13 and 14. The Beta regressors are nearly identical. As expected, the differences are noticeable in the LCBC dataset while the effect of using - or not using- splines to fit age is almost null in the other two datasets. See also FDR-corrected maps below for both birth weight effects on brain structure and brain change (we opted to show Beta-maps as supplementary material as the multiple-comparisons correction in the ROI-based analysis is not fully comparable with the one used in the vertex-wise approach).

      p. 9: “Both birth weight effects on cortical characteristics and cortical change were rerun (ROIwise) using spline models that accounted for possible non-linear effects of age on cortical structure. The results were comparable to those reported above in Figures 1 and 2. See Supplementary Figures 13 and 14 for birth weight effects on cortical characteristics and cortical change, respectively.”

      Caption to Supplementary Figure 13. “Comparison between spline (GAMM) and linear (LME) models on the effect of birth weight on cortical characteristics. Age was fitted either as a smoothing spline using generalized additive mixed models (GAMM, mgcv r-package) or a linear regressor with a linear mixed models (LME, lmer r-package) framework. The analyses were performed ROI-wise using the Desikan-Killiany atlas. Significance was considered at a FDR corrected threshold of p < 0.04. All the remaining parameters were comparable to the main analyses shown in Figure 1. The viridis-yellow scale represents the lower-higher Beta regressors. Red contour displays regions showing significant effects of birth weight. Note the high correspondence with both fitting models. Differences are only noticeable in the LCBC sample due to the datasets’ wider age range (i.e., lifespan dataset).” Caption to Supplementary Figure 14. “Comparison between spline (GAMM) and linear (LME) models on the effect of birth weight on cortical change. Age was fitted either as a smoothing spline using generalized additive mixed models (GAMM, mgcv r-package) or a linear regressor with a linear mixed models (LME, lmer r-package) framework. The analyses were performed on ROI-based using the Desikan-Killiany atlas. Significance was considered at a FDR corrected threshold of p < 0.04. All the remaining parameters were comparable to the main analyses shown in Figure 1. The viridis-yellow scale represents the lower-higher Beta regressors. Red contour displays regions showing significant effects of birth weight. Note the high correspondence with both fitting models. Differences are only noticeable in the LCBC sample due to the datasets’ wider age range (i.e., lifespan dataset).” The figures below show the birth weight effects on brain characteristics (above) and change (below) using a GAMM or an LME approach; that is, using age as a smooth term or as a regressor. FDR-corrected p < 0.05 values are shown in a signed logarithmic scale. Red-yellow values represent positive associations between birth weight and brain while blue-lightblue values represent negative associations. The results are qualitatively comparable and quantitative differences exist only in the LCBC dataset. Please see Supplementary Figures 13 and 14 in the revised manuscript.

      Author response image 1.

      Concern #5: Greater clarity regarding the statistical models and the provision of effect-size maps.

      The revised manuscript provides additional information regarding the statistical model, especially in the results section, to avoid misunderstanding (see below examples of clarifications in the revised manuscript). We now provide Beta-maps, F-maps, unthresholded p-values maps, and degrees of freedom for the main univariate analyses. That is, we provide this information for both the whole sample and the twin analyses which correspond to Figures 1, 2, 4, and 5. We opted not to compute effect-size estimates (e.g. partial eta-squared, cohen’s d) due to the ambiguous interpretation of these maps in the context of linear mixed models.

      p.8. “To test the effect of birth weight on cortical change we rerun the analyses with BW x time and age x time interactions. Note BW x time (i.e., within-subject follow-up time) represents the contrasts of interest while age – and age interactions – are used to account for differences in age across individuals.”

      p.11. “In contrast, the spatial correlation of the maps capturing BW-associated cortical change (i.e., BW x time contrast) …”

      p. 12. “Additionally, we performed replicability analysis both across and within samples to further investigate the robustness of the effects of birth weight on cortical characteristics and cortical change.”

      p. 14: “BW discordance analyses on twins specifically were run as described for the main analyses above, with the exception that twin scans were reconstructed using FS v6.0.1. for ABCD and the addition of the twin’s mean birth weight as a covariate.”

      p .31. “Group-level unthresholded p-maps, F-maps, Beta-maps, and degrees of freedom for the univariate analyses accompany this manuscript as additional material.”

    1. Author Response

      The following is the authors’ response to the original reviews.

      necessary clarifications on some of the reviewers' suggestions.

      Reviewer #1 (Public Review):

      Weaknesses:

      • This is a pilot study with only 24 cases and 24 controls. Because the human microbiota entails individual variability, this work should be confirmed with a higher sample size to achieve enough statistical power.

      Thank you for your suggestion. Unlike the high sparsity of 16s rRNA, the data density of metagenomic data is higher. Based on the experience of previous research, the sample size used this time can basically meet the requirements. However, your suggestion is very valuable, increasing the sample size allows better in-depth analysis. Due to limitations of objective factors, it is difficult for us to continue to increase the sample size in this study.

      • The authors do not report here the use of blank controls. The use of this type of control is important to "subtract" the potential background from plasticware, buffer or reagents from the real signal. Lack of controls may lead to microbiome artefacts in the results. This can be seen in the results presented where the authors report some bacterial contaminants (Agrobacterium tumefaciensis, Aequorivita lutea, Chitinophagaceae, Marinobacter vinifirmus, etc) as part of the most common bacteria found in cervical samples.

      Thank you for your suggestion. Applying blank controls in low biomass areas can effectively avoid contamination caused by the environment or kits. This opinion is consistent with that published by Raphael Eisenhofer et al. in Trends in Microbiology. When designing this study, we considered that this study described a biomass-rich site, and the abundance of dominant species was much higher than that of the possible 'kitome', so we did not set a blank control. On the other hand, our main discussion object in this study is high-abundance species, and the species filtering threshold for some analyzes was raised to 50%. Therefore, we believe that the absence of the blank control has little effect on the conclusions of this study. However, your opinion is spot on. Failure to set up a negative control will affect our future research on rare species. We will add a description in the Limitations section of the Discussion section.

      • Samples used for this study were collected from the cervix. Why not collect samples from the uterine cavity and isthmocele fluid (for cases)? In their previous paper using samples from the same research protocol ((IRB no. 2019ZSLYEC-005S) they used endometrial tissue from the patients, so access to the uterine cavity was guaranteed.

      Thank you for your suggestion. In Author response image 1 we show the approximate location of our cervical swab sampling. There are two main reasons for choosing cervical swabs:

      1) The adsorption of swabs allows us to obtain sufficient nucleic acid for high-depth sequencing, while the isthmocele fluid varies greatly among patients, which will introduce unnecessary batch effects.

      2) Since the female reproductive tract is a continuous whole, our sampling location is close to the lesion in the cervix, which can be effectively studied. On the other hand, the microbial biomass of the endometrium is probably two orders of magnitude lower than that of the cervix, and it is difficult to avoid contamination of the lower genital tract when sampling.

      Based on the above reasons, we selected cervical swabs for our microbial data.

      Author response image 1.

      • Through the use of shotgun genomics, results from all the genomes of the organisms present in the sample are obtained. However, the authors have only used the metagenomic data to infer the taxonomical annotation of fungi and bacteria.

      Thank you for your suggestion. The advantage of metagenomics is that it can obtain all the nucleic acid information of the entire environment. However, in the study of the female reproductive tract, the database of viruses and archaea is still immature, in order to ensure the accuracy of the results, we did not conduct the study. Looking forward to the emergence of a mature database in the future.

      Reviewer #1 (Recommendations For The Authors):

      • It would be interesting to use another series of functional data coming from the metagenomic analyses (not only taxonomic) to expand and reinforce the results presented.

      Thank you for your suggestion. We have dissected the functional data of microbiota in the article.

      • The authors have previously published the 16S rRNA sequencing and transcriptomic analysis of the same set of patients. It would be nice to see the integration of all the datasets produced.

      Thank you for your suggestion. There is no doubt that integrating all the data will have more dimensional results. In our previous study we focused on microbe-host interactions. However, there is an unanswered question: What are the characteristics of the regulatory network within microbiota? Therefore, we answered this question in this study, exploring the complex interaction processes within microbial communities. In addition to direct effects, interactions between microbiota may also occur through special metabolite experiments. Therefore, we introduced the analysis of the untargeted metabolome. However, 16s rRNA can only provide bacterial information, so we did not integrate the data. In addition, the transcriptome provides host information and is not the focus of this study. However, your suggestion is very valuable, and we will integrate all the data in the next study on the exploration of treatment methods.

      Reviewer #2 (Public Review):

      Weaknesses: Methodological descriptions are minimal.

      Some example:

      *The CON group (line 147) has not been defined. I supposed it is the control group.

      • There are no statistics related to shotgun sequencing. How many reads have been sequenced? How many have been removed from the host? How many are left to study bacteria and fungi? Are these reads proportional among the 48 samples? If not, what method has been used to normalise the data?

      • ggClusterNet has numerous algorithms to better display the modules of the microbiome network. Which one has been used?

      Thank you for your suggestion. We have added details to the method.

      Reviewer #2 (Recommendations For The Authors):

      I think the author should take into account the points described in the "Weaknesses" section. The lack of detail extends to almost all the analyses that have been included in the manuscript. Although the results are sound, I think it is important to understand what has been analysed and how it has been analysed. It is important that all work is reproducible and this requires vital information.

      For example, what parameters have been used for bowtie2? has a local analysis been used? or end-to-end ? Some parameters like --very-sensitive are important for this kind of analysis. You can also use specific programs like kneaddata.

      The Raw data preprocessing section should be more detailed.

      The same with the "Taxa and functional annotation" section, how have the data been normalised? has any Zero-Inflated Gamma probabilistic model algorithm been taken into account? How were the 0 (no species detected) in the shallow samples treated?

      Which algorithms have been used for LEfSe ? Kluskal-Wallis->(Wilcoxon)->LDA ?

      Which p-value has been used as cut-off ? this p-value has been corrected for multiple testing?

      • Information on ggClusterNet should be included and explained.

      The first section of the results and Table 1 should be in the Materials and Methods.

      Thank you for your suggestion. We have added details to the method.

      In the fungi section, it is mentioned that 431 species have been found. They should be included in a supplementary table.

      How many bacteria were found? Please include them also in a supplementary table.

      Thank you for your suggestion. We have added the corresponding table.

      Reviewer #3 (Public Review):

      Major

      1. Smoke or drink conditions, as well as diseases like hypertension and diabetes are important factors that could influence the metabolism of the host, thus the authors should add them in the exclusion criteria in the Methods.

      Thanks to reviewer #3 for professional comments. We have made corresponding additions in the method section. We also followed this standard when recruiting subjects.

      1. The sample size of this study is not large enough to draw a convincing conclusion.

      Thank you for your suggestion. Unlike the high sparsity of 16s rRNA, the data density of metagenomic data is higher. Based on the experience of previous research, the sample size used this time can basically meet the requirements. However, your suggestion is very valuable, increasing the sample size allows better in-depth analysis. Due to limitations of objective factors, it is difficult for us to continue to increase the sample size in this study.

      Reviewer #3 (Recommendations For The Authors):

      Please recruit more samples.

      In addition, there are many formatting and grammatical mistakes in the manuscript.

      Minor

      1. In Line 24-25 of the "Composition and characteristics of fungal communities", the format of "Goyaglycoside A and Janthitrem E." shouldn't be italic.

      2. In Line 126 of the "Metabolite detection using liquid chromatography (LC) and mass spectrometry (MS)", the "10 ul" should be changed to "Ten ul". Beginning with arabic numerals in a sentence should be avoided.

      3. In Line 170 of the "Composition and characteristics of bacterial communities", the "162 differential species" should be "One hundred and sixty-two differential species".

      4. In Line 187 of the "Composition and characteristics of fungal communities", the "42 differential" should be "Forty-two differential".

      Thanks to reviewer #3 for professional comments. We have completely revised the language of the article.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors focused on genetic variability in relation to insulin resistance. They used genetically different lines of mice and exposed them to the same diet. They found that genetic predisposition impacts the overall outcome of metabolic disturbances. This work provides a fundamental novel view on the role of genetics and insulin resistance.

      Reviewer #2 (Public Review):

      Summary:

      In the present study, van Gerwen et al. perform deep phosphoproteomics on muscle from saline or insulin-injected mice from 5 distinct strains fed a chow or HF/HS diet. The authors follow these data by defining a variety of intriguing genetic, dietary, or gene-by-diet phosphor-sites that respond to insulin accomplished through the application of correlation analyses, linear mixed models, and a module-based approach (WGCNA). These findings are supported by validation experiments by intersecting results with a previous profile of insulin-responsive sites (Humphrey et al, 2013) and importantly, mechanistic validation of Pfkfb3 where overexpression in L6 myotubes was sufficient to alter fatty acid-induced impairments in insulin-stimulated glucose uptake. To my knowledge, this resource provides the most comprehensive quantification of muscle phospho-proteins which occur as a result of diet in strains of mice where genetic and dietary effects can be quantifiably attributed in an accurate manner. Utilization of this resource is strongly supported by the analyses provided highlighting the complexity of insulin signaling in muscle, exemplified by contrasts to the "classically-used" C57BL6/J strain. As it stands, I view this exceptional resource as comprehensive with compelling strength of evidence behind the mechanism explored. Therefore, most of my comments stem from curiosity about pathways within this resource, many of which are likely well beyond the scope of incorporation in the current manuscript. These include the integration of previous studies investigating these strains for changes in transcriptional or proteomic profiles and intersections with available human phospho-protein data, many of which have been generated by this group.

      Strengths:

      Generation of a novel resource to explore genetic and dietary interactions influencing the phospho-proteome in muscle. This is accompanied by the elegant application of in silico tools to highlight the utility.

      Weaknesses:

      Some specific aspects of integration with other data among the same fixed strains could be strengthened and/or discussed.

      Reviewer #3 (Public Review):

      Summary:

      The authors aimed to investigate how genetic and environmental factors influence the muscle insulin signaling network and its impact on metabolism. They utilized mass spectrometry-based phosphoproteomics to quantify phosphosites in the skeletal muscle of genetically distinct mouse strains in different dietary environments, with and without insulin stimulation. The results showed that genetic background and diet both affected insulin signaling, with almost half of the insulin-regulated phosphoproteome being modified by genetic background on an ordinary diet, and high-fat high-sugar feeding affecting insulin signaling in a strain-dependent manner.

      Strengths:

      The study uses state-of-the-art phosphoproteomics workflow allowing quantification of a large number of phosphosites in skeletal muscle, providing a comprehensive view of the muscle insulin signaling network. The study examined five genetically distinct mouse strains in two dietary environments, allowing for the investigation of the impact of genetic and environmental factors on insulin signaling. The identification of coregulated subnetworks within the insulin signaling pathway expanded our understanding of its organization and provided insights into potential regulatory mechanisms. The study associated diverse signaling responses with insulin-stimulated glucose uptake, uncovering regulators of muscle insulin responsiveness.

      Weaknesses:

      Different mouse strains have huge differences in body weight on normal and high-fat high-sugar diets, which makes comparison between the models challenging. The proteome of muscle across different strains is bound to be different but the changes in protein abundance on phosphosite changes were not assessed. Authors do get around this by calculating 'insulin response' because short insulin treatment should not affect protein abundance. The limitations acknowledged by the authors, such as the need for larger cohorts and the inclusion of female mice, suggest that further research is needed to validate and expand upon the findings.

      Reviewer #1 (Recommendations For The Authors):

      I would suggest further discussion of the potential differences between males and females of the various strains.

      In the revised manuscript we have included a more detailed discussion of the potential differences between male and female mice in the "Limitations of this study" section on lines 455-459. In particular, a landmark study of HFD-fed inbred mouse strains found that insulin sensitivity, as inferred from the proxy HOMA-IR, was affected by interactions between sex and strain despite generally being greater in female mice (10.1016/j.cmet.2015.01.002). Furthermore, a recent phosphoproteomics study of human induced pluripotent stem-cell derived myoblasts identified groups of insulin-regulated phosphosites affected by donor sex, and by interactions between sex and donor insulin sensitivity (10.1172/JCI151818). Based on these results, we anticipate that both soleus insulin sensitivity and phoshoproteomic insulin responses would differ between male and female mice through interactions with strain and diet, adding yet another layer of complexity to what we observed in this study. This will be an important avenue for future research to explore.

      Reviewer #2 (Recommendations For The Authors):

      The following are comments to authors - many, if not all are suggestions for extended discussion and beyond the scope of the current elegant study.

      In the discussion section (line 428) the authors make a key point in that the genetic, dietary, and interacting patterns of variation of Phospho-sites could be due to changes in total protein and/or transcript levels across strains. For example, given the increased expression of Pfkfb3 was sufficient to impact glucose uptake, suggesting that the transcript levels of the gene might also show a similar correlation with insulin responsiveness as in Fig 6b. Undoubtedly, phospho-proteomics analyses will provide unique information on top of more classical omics layers and uncover what would be an important future direction. Therefore, I would suggest adding to the discussion some guidance on performing similar applications to datasets from, at least some, of the strains used where RNA-seq and proteomics are available.

      We thank the reviewer for this suggestion. To address this, we mined recently published total proteomics data collected from soleus muscles of seven CHOW or HFD-fed inbred mouse strains, three of which were in common with our study (C57Bl6J, BXH9, BXD34; 10.1016/j.cmet.2021.12.013). In this study ex vivo soleus glucose uptake was measured and correlation analysis was performed, so we directly extracted the resulting glucose uptake-protein associations and compared them to the glucose uptake-phosphoprotein associations identified in our study. Indeed, we found that only a minority of proteins correlated at both the phosphosite and total protein levels, highlighting the utility of phosphoproteomics to provide orthogonal information to more classical omics layers. We have included this analysis in lines 303-311.

      Relevant to this, the authors might want to consider depositing scripts to analyze some aspects of the data (ex. WGCNA on P-protein data or insulin-regulated anova) in a repository such as github so that these can be applied easily to other datasets.

      We refer the reviewer to the section "Code availability" on lines 511-513, where we deposited all code used to analyse the data on github.

      In contrast to the points above, I feel that the short time-course of insulin stimulation was one important aspect of the experimental design that was not emphasized enough as a strength. It was mentioned as a limitation in that other time points could provide more info, yes. But given that the total abundance of proteins and transcripts likely doesn't shift tremendously in this time frame, this provides an important appeal to the analysis of phosphor-proteomic data. I would suggest highlighting the insulin-stimulated response analysis here as something that leverages the unique nature of phosphoproteomics.

      We are grateful for the reviewer's positivity regarding this aspect of our experimental design. We have reiterated the value of the 10min insulin stimulation - that it temporally segregates phosphoproteomic and total proteomic changes - in the "Limitations of this study" section on lines 477-481.

      While I recognize the WGCNA analysis as an instrumental way to highlight global patterns of phospo-peptide abundance co-regulation, the analysis currently seems somewhat underdeveloped. For example, Fig 5f-h shows a lot of overlap between kinase substrates and pathways among modules. Clearly, there are informative differences based on the intersection with Humphries 2013 and the correlation with Pfkbp3. To highlight the specific membership of these modules, most people rank-order module members by correlation with eigen-gene (or P-peptide) and then perform pathway enrichments on these. Alternatively, it looks like all data was used to generate modules across conditions. One consideration would be to perform WGCNA on relevant comparison data separately (ex. chow mice only and HFHS only) and then compare modules whose membership is retained or shift between the two. Or even look at module representation for genes that show large correlations with insulin-responsiveness. This might also be a good opportunity to suggest readers intersect module members with muscle eQTLs which colocalize to glucose or insulin to prioritize some potential key drivers.

      We thank the reviewer for their helpful suggestions, which we feel have substantially improved the WGCNA analysis. To probe specific functional differences between subnetworks, we performed rank-based enrichment using phosphopeptide module membership scores. Interestingly, this did reveal pathways that were enriched only in certain modules. However, we found that after p-value adjustment, virtually all enriched pathways lost statistical significance, hence we interpret these results as suggestive only. We have made this analysis available to readers in Fig S4b-d and lines 263-265: "To further probe functional differences we analysed phosphopeptide subnetwork membership scores, which revealed additional pathways enriched in individual subnetworks. However, these results were not significant after p-value adjustment and hence are suggestive only (Fig. S4b-d)". We also visualised module representation for glucose-uptake correlated phosphopeptides. This agreed with our existing analyis in Fig. 6f, where the eigenpeptides of modules V and I were correlated with glucose uptake (Fig. 6f). We have incorporated this new analysis in Fig. S6b-c and lines 324-325: "Examining the subnetwork membership scores for glucose-uptake correlated phosphopeptides also revealed a preference for clusters V and I, supporting this analysis (Fig. S6b-c)." Finally, in the discussion we have presented the integration of genetic data, such as muscle-specific eQTLs, as a future direction (lines 398-401): "Alternatively, one could overlap subnetworks with genetic information, such as genes associated with glucose homeostasis and other metabolic traits in human GWAS studies, or muscle-specific eQTLs or pQTLs genetically colocalised with similar traits, to further prioritise subnetwork-associated phenotypes and identify potential drivers within subnetworks."

      Have the authors considered using their heritability and GxE estimated for module eigenpeptides? To my knowledge, this has never been performed and might provide some informative information as the co-regulated P-protein structure occurs as a result of relevant contexts.

      In the revised manuscript we have now analysed eigenpeptides with the same statistical tests used to identify Strain and Diet effects in insulin-regulated phosphopeptides. We have displayed the statistical results in Fig. S4a, and have explicitly mentioned examples of StrainxDiet effects on lines 245-247: "For example, HFD-feeding attenuated the insulin response of subnetwork I in CAST and C57Bl6J strains (t-test adjusted p = 0.0256, 0.0365), while subnetwork II was affected by HFD-feeding only in CAST and NOD (Fig. 5e, Fig. S4a, t-test adjusted p = 0.00258, 0.0256)."

      The integration of modules with adipocyte phosphoproteomic data from the authors 2013 Cell metab paper seems like an important way to highlight the integration of this resource to define critical cellular signaling mechanisms. To assess the conservation of signaling mechanisms and relationships to additional key contexts (ex. exercise), the intersection of the insulin-stimulated P-peptides with human datasets generated by this group (ex. cell metab 2015, nature biotech 2022) seems like an obvious future direction to prioritize targets. Figure S3B shows a starting point for these types of integrations.

      To demonstrate the value of integrating our results with related phosphoproteomics data, we have incorporated the reviewer's advice of comparing insulin-regulated phosphosites to exercise-regulated phosphosites from Needham et. Nature Biotech 2022 and Hoffman et al. Cell Metabolism 2015. We identified a small subset of commonly regulated phosphosites (8 across all three studies). Given insulin and exercise both promote GLUT4 translocation, these sites may represent conserved regulatory mechanisms. This analysis is presented in Fig. S3d, Table S2, and lines 129-135: "In addition to insulin, exercise also promotes GLUT4 translocation in skeletal muscle. We identified a small subset of phosphosites regulated by insulin in this study that were also regulated by exercise in two separate human phosphoproteomics studies (Fig. S3d, Table S2, phosphosites: Eef2 T57 and T59, Mff S129 and S131, Larp1 S498, Tbc1d4 S324, Svil S300, Gys1 S645), providing a starting point for exploring conserved signalling regulators of GLUT4 translocation."

      For the Pfkfb3 overexpression system, are there specific P-peptides that are increased/decreased upon insulin stimulation? This might be an interesting future direction to mention in order to link signaling mechanisms.

      We assessed whether canonical insulin signalling was affected by Pfkfb3 overexpression by immunoblotting. Insulin-stimulated phosphorylation of Akt S473, Akt T308, Gsk3a/b S21/S9, and PRAS40 T246 differed little across conditions, with only a weak, statistically insignificant trend towards increased pT308 Akt, pS21/S9 Gsk3a/b, and pT246 PRAS40 in palmitate-treated Pfkfb3-overexpressing cells. Hence, as the reviewer has suggested, an interesting future direction will be to perform phosphoproteomics to characterise more deeply the effects of palmitate and Pfkfb3 overexpression on insulin signalling. We have modified the manuscript to reflect these findings and suggested future directions on lines 362-365: "immunoblotting of canonical insulin-responsive phosphosites on Akt and its substrates GSK3α/β and PRAS40 revealed minimal effect of palmitate treatment and Pfkfb3 overexpression (Fig. S7e-f), hence more detailed phosphoproteomics studies are needed to clarify whether Pfkfb3 overexpression restored insulin action by modulating insulin signalling."

      Reviewer #3 (Recommendations For The Authors):

      This remarkable contribution by the esteemed research group has significantly enriched the field of metabolism. The extensive dataset, intertwined with a sophisticated research design, promises to serve as an invaluable resource for the scientific community. I offer a series of suggestions aimed at potentially elevating the manuscript to an even higher standard.

      Mouse Weight Variation and Correlation Analysis: The pronounced variances in mouse body weights pose a challenge to meaningful comparisons (Fig S1). Could the disparities in the phosphoproteome between basal and insulin-stimulated conditions be attributed to differences in body weight? Consider performing a correlation analysis. Furthermore, does the phosphoproteome of these mouse strains evolve comparably over time? Do these mice age similarly? Kindly incorporate this information.

      We thank the reviewer for the suggested analysis. We found there was a significant correlation between the phosphopeptide insulin response and mouse body weight, either in CHOW-fed mice (Strain effects) or across both diets (Diet effects), for ~ 25% of phosphopeptides that exhibited a Strain or Diet effect. Hence, while there is a clear effect of body weight on insulin signalling, this influences only a small proportion of the entire insulin-responsive phosphoproteome. Notably, insulin was dosed according to mouse lean mass to ensure equivalent dosage received by the soleus muscle, hence any insulin signalling differences associated with body weight are unlikely due to differences in dosing. As the reviewer also alludes to, different strains could have different lifespans. This may result in mice having different biological ages at the time of experimentation, and this in turn could influence insulin signalling. This possibility is challenging to assess in a quantitative manner because lifespan data is not available for most strains used. However, it is worth noting that female CAST mice live 77% as long as C57Bl6J mice (median age of 671 vs 866 (10.1073/pnas.1121113109); data is not available for male mice nor the other three strains), and substantial differences in insulin signalling were observed between these two strains. Ultimately, regardless of whether body weight and/or lifespan altered insulin signalling, such differences would still have arisen solely from the distinct genetic backgrounds and diets of the mice, hence we believe they are meaningful results that should not be dismissed. We have added this analysis to the revised manuscript in the "Limitations of this study" section on lines 471-477: "We were also unable to determine the extent to which signalling changes arose from muscle-intrinsic or extrinsic factors. For instance, body weight varied substantially across mice and correlated significantly with 25% of Strain and Diet-affected phosphopeptides (Fig. S8c), suggesting obesity-related systemic factors likely impact a subset of the muscle insulin signalling network. Furthermore, genetic differences in lifespan could alter the “biological age” of different strains and their phosphoproteomes, though we could not assess this possibility since lifespan data are not available for most strains used. "

      Soleus Muscle Data and Bias Considerations: Were measurements taken for lean mass and soleus muscle weight? If so, please present the corresponding data.

      Measurements for lean mass and the mass of soleus muscle after grinding have been including in Supplementary Figure S1 (panels c-d)

      As outlined in the methods section, the variation in protein yield from the soleus muscle across each strain is substantial. Notably, the distinct peptide input for phospho enrichment introduces biases, given that muscles with lower input may exhibit reduced identification (Fig S2). This bias might also manifest in the PCA plot (S2C). Ideally, adopting a uniform protein/peptide input would have been advantageous. Address this concern and contemplate moving the PCA plot to the main figure. It's prudent to reconsider the sentence stating, "Samples from animals of the same strain and diet were highly correlated and generally clustered together, implying the data are highly reproducible (Fig. S2b-d)," particularly if the input and total IDs were not matched.

      The reviewer highlights an important point. As the reviewer comments, it would have been our preference to use the same amount of protein material for all samples. However, as there was a wide range in the mass of the soleus muscle across mouse strains (in particular much lower in CAST mice), it was not appropriate to use the same amount of material for all strains. This is indeed evident in the PCA plot (Figure S2c), whereby samples cluster in the second component (PC2) based on the amount of protein material. However, this clustering is not observed in the hierarchical clustering (Figure S2d), and nor are the number of phosphopeptides quantified in each sample substantially impacted by these differences (Figure S2a) as implied by the reviewer. Indeed, the number of phosphopeptides quantified did not noticeably vary when comparing BXH9/BXD34 to C57Bl6J/NOD despite 32.3% less material used, and there were only 12.4% fewer phosphopeptides (average #13891.56 vs 15851.29) in CAST compared to C57Bl6J/NOD strains, despite 51.8% less material used. To further emphasise the minimal effect that input material had on phosphopeptide quantification, we have additionally plotted the number of phosphopeptides quantified in each sample following the filtering steps we employed prior to statistical analysis of the dataset (i.e. ANOVA). This plot (Author response image 1) shows that there is even less variation in the number of quantified phosphopeptides between strains, with only 9.12% fewer phosphopeptides quantified and filtered on average in CAST compared to C57Bl6J/NOD (average #9026.722 vs 9932.711). From a quantitative perspective, in both the PCA (Principal Component 1) and hierarchical clustering analyses, samples are additionally clustered by individual strains, and in the latter they also cluster generally by diet, implying that biological variation between samples remains the primary variation captured in our data. We have modified the manuscript so that these observations are forefront (lines 103-106): "Furthermore, while different strains clustered by the amount of protein material used in the second component of the PCA (Figure S2c), samples from animals of the same strain and diet were highly correlated and generally clustered together, indicating that our data are highly reproducible". To ensure that readers are aware of our decision to alter protein starting material and its implications, we have moved the description of this from the methods to the results, and we have highlighted the impact on phosphopeptide quantification in CAST mice (lines 99-103): "Due to the range in soleus mass across strains (Fig. S1D) we altered the protein material used for EasyPhos (C57Bl6J and NOD: 755 µg, BXH9 and BXD34: 511 µg, CAST: 364 µg), though phosphopeptide quantification was minimally affected, with only 12.4% fewer phosphopeptides quantified on average in CAST compared to the C57lB6J/NOD (average 13891.56 vs 15851.29 Fig. S2a)."

      Author response image 1.

      Phosphopeptide quantification following filtering. a) The number of phosphopeptides quantified in each sample after filtering prior to statistical analysis.

      Phosphosite Quantification Filtering: The quantified phosphosites have been dropped from 23,000 to 10,000. Could you elucidate the criteria employed for filtering and provide a concise explanation in the main text?

      We thank the reviewer for drawing this ambiguity to our attention. Before testing for insulin regulation, we performed a filtering step requiring phosphopeptides to be quantified well enough for comparisons across strains and diets. Specifically, phosphopeptides were retained if they were quantified well enough to assess the effect of insulin in more than eight strain-diet combinations (≥ 3 insulin-stimulated values and ≥ 3 unstimulated values in each combination). We have now included this explanation of the filtering in the main text on lines 108-114.

      ANOVA Choice Clarification: In Figure 4, there's a transition from one-way ANOVA in B to two-way ANOVA in C. Could you expound on the rationale for selecting these distinct methods?

      In panel B, we first focussed on kinase regulation differences between strains in the absence of a dietary perturbation. Hence, we performed one-way ANOVAs only within the CHOW-fed mice. In panel C, we then consider the effect of perturbation with the HFD. We perform two-way ANOVAs, allowing us to identify effects of the HFD that are uniform across strains (Diet main effect) or variable across strains (Strain-by-diet interaction).

      Cell Line Selection for Functional Experiments: Could you elucidate the rationale behind opting for L6 cells of rat origin over C2C12 mouse cells for functional experiments?

      We acknowledge that C2C12 cells have the benefit of being of mouse origin, which aligns with our mouse-derived phosphoproteomics data. However, they are unsuitable for glucose uptake experiments as they lack an insulin-responsive vesicular compartment even upon GLUT4 overexpression, and undergo spontaneous contraction when differentiated resulting in confounding non-insulin dependent glucose uptake (10.1152/ajpendo.00092.2002, 10.1007/s11626-999-0030-8). In contrast, L6 cells readily express insulin-responsive GLUT4, and cannot contract (doi.org/10.1113/JP281352, 10.1007/s11626-999-0030-8). Therefore they are a superior model for studying insulin-dependent glucose transport. We have added a justification of L6 cells over C2C12 cells in the revised manuscript, on lines 352-354: "While L6 cells are of rat origin, they are preferable to the popular C2C12 mouse cell line since the latter lack an insulin-responsive vesicular compartment and undergo spontaneous contraction, resulting in confounding non-insulin dependent glucose uptake."

      It's intriguing that while a phosphosite was modulated on Pfkfb2, functional assays were conducted on a different isoform (Pfkfb3) wherein the phosphosite was not detected.

      The correlation between Pfkfb2 S469 phosphorylation and insulin-stimulated glucose uptake suggests that F2,6BP production, and subsequent glycolytic activation, positively regulate insulin responsiveness. There are several ways of testing this: 1) Knock down endogenous Pfkfb2, and re-express either wild-type protein or a S469A phosphomutant. If S469 phosphorylation positively regulates insulin responsiveness, then knockdown should decrease insulin responsiveness and re-expression of wild-type Pfkfb2, but not S469A, should restore it. 2) Induce insulin resistance (e.g. through palmitate treatment), and overexpress phosphomimetic S469D or S469E Pfkfb2 to enhance F2,6BP production. Under our hypothesis, this should reverse insulin resistance. 3) There is some evidence that dual phosphorylation of S469 and S486, another activating phosphosite on Pfkfb2, enhances F2,6BP production through 14-3-3 binding (10.1093/emboj/cdg363). Hence, we may expect that introduction of an R18 sequence into Pfkfb2, which causes constitutive 14-3-3 binding (10.1074/jbc.M603274200), would increase Pfkfb2-driven F2,6BP production, and under our hypothesis this should reverse insulin resistance. 4) The paralog Pfkfb3 lacks Akt regulatory sites and has substantially higher basal activity than Pfkfb2. Thus, overexpression of Pfkfb3 should mimic the effect of phosphorylated Pfkfb2, and hence reverse insulin resistance under our hypothesis. While approaches 1), 2), and 3) directly target Pfkfb2, they have drawbacks. For example, 1) may not work if Pfkfb2 knockdown is compensated for by other Pfkfb isoforms, 2) may not work since D/E phosphomimetics often do not recapitulate the molecular effects of S/T phosphorylation (10.1091/mbc.E12-09-0677), and 3) may not work if S469 phosphorylation does not operate through 14-3-3 binding. Hence we performed 4) as it seemed to be the most robust and cleanest experiment to test our hypothesis. We have revised the manuscript to further clarify the challenges of directly targeting Pfkfb2 and the benefits of targeting Pfkfb3 on lines 342-349: "Since Pfkfb2 requires phosphorylation by Akt to produce F2,6BP substantially, increasing F2,6BP production via Pfkfb2 would require enhanced activating site phosphorylation, which is difficult to achieve in a targeted fashion, or phosphomimetic mutation of activating sites to aspartate/glutamate, which often does not recapitulate the molecular effects of serine/threonine phosphorylation. By contrast, the paralog Pfkfb3 has high basal production rates and lacks an Akt motif at the corresponding phosphosites. We therefore rationalised that overexpressing Pfkfb3 would robustly increase F2,6BP production and enhance glycolysis regardless of insulin stimulation and Akt signalling."

      Insulin-Independent Action of Pfkfb3: The functionality of Pfkfb3 unfolds in an insulin-independent manner, yet it restores insulin action (Fig 6h). Could you shed light on the mechanism underpinning this phenomenon? Consider measuring F2,6BP concentrations or assessing kinase activity upon overexpression.

      Pfkfb3 overexpression increased the glycolytic capacity of L6 myotubes in the absence of insulin stimulation, as inferred by extracellular acidification rate (Fig. S7c). This is indeed consistent with Pfkfb3 enhancing glycolysis through increased F2,6BP concentration in an insulin-independent manner. To shed light on the mechanism connecting this to insulin action, we performed immunoblotting experiments to assess the kinase activity of Akt, a master regulator of the insulin response. Indeed, this experimental direction has precedent as we previously observed that Pfkfb3 overexpression enhanced insulin-stimulated Akt signalling in HEK293 cells, while small-molecule inhibition of Pfkfb kinase activity reduced Akt signalling in 3T3-L1 adipocytes (10.1074/jbc.M115.658815). However, insulin-stimulated phosphorylation of Akt S473, Akt T308, Gsk3a/b S21/S9, and PRAS40 T246 differed little across conditions, with only a weak, statistically insignificant trend towards increased pT308 Akt, pS21/S9 Gsk3a/b, and pT246 PRAS40 in palmitate-treated Pfkfb3-overexpressing cells. Hence, a more detailed phosphoproteomics study will be needed to assess whether Pfkfb3 restores insulin action by modulating insulin signalling. We have described these immunoblotting experiments in lines 361-365 and Fig. S7e-f. We also discussed potential mechanisms through which Pfkfb3-enhanced glycolysis could connect to insulin action in the discussion (lines 427-434).

      Figure 6h Statistical Analysis: For the 2DG uptake in Figure 6h, a conventional two-way ANOVA might be more appropriate than a repeated measures ANOVA.

      On reflection, we agree that a conventional ANOVA is more appropriate. Furthermore, for simplicity and conciseness we have decided to analyse and present only insulin-stimulated/unstimulated 2DG uptake fold change values in Figure 6h. We have presented all unstimulated and insulin-stimulated values in Figure S7d.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, Janssens et al. addressed the challenge of mapping the location of transcriptionally unique cell types identified by single nuclei sequencing (snRNA-seq) data available through the Fly Cell Atlas. They identified 100 transcripts for head samples and 50 transcripts for fly body samples allowing the identification of every unique cell type discovered through the Fly Cell Atlas. To map all of these cell types, the authors divided the fly body into head and body samples and used the Molecular Cartography (Resolve Biosciences) method to visualize these transcripts. This approach allowed them to build spatial tissue atlases of the fly head and body, to identify the location of previously unknown cell types and the subcellular localization of different transcripts. By combining snRNA-seq data from the Fly Cell Atlas with their spatially resolved transcriptomics (SRT) data, they demonstrated an automated cell type annotation strategy to identify unknown clusters and infer their location in the fly body. This manuscript constitutes a proof-of-principle study to map the location of the cells identified by ever-growing single-cell transcriptomic datasets generated by others.

      Strengths:

      The authors used the Molecular Cartography (Resolve Biosciences) method to visualize 100 transcripts for head samples and 50 transcripts for fly body samples in high resolution. This method achieves high resolution by multiplexing a large number of transcript visualization steps and allows the authors to map the location of unique cell types identified by the Fly Cell Atlas. 

      We thank this reviewer for appreciating the quality of our spatial data. We do not know what caused the technical problem (grayscale version of PDF) for this reviewer (the PDF figures are in color on the eLife website and on bioRxiv). We are surprised that the eLife discussion session did not resolve this issue.

      Weaknesses:

      Combining single-nuclei sequencing (snRNA-seq) data with spatially resolved transcriptomics (SRT) data is challenging, and the methods used by the authors in this study cannot reliably distinguish between cells, especially in brain regions where the processes of different neurons are clustered, such as in neuropils. This means that a grid that the authors mark as a unique cell may actually be composed of processes from multiple cells. 

      The small size of an individual fly is one of the most challenging aspects of performing spatial transcriptomics. While the resolution of Molecular Cartography is rather high (< 200 nm), in the brain challenges remain as noted by the reviewer. Drosophila neuronal nuclei are notoriously small and cannot be easily resolved with the current imaging techniques. We agree that for a full atlas either expansion microscopy, 3D techniques or other super-resolution techniques will be required. 

      Reviewer #2 (Public Review):

      Summary:

      The landmark publication of the "Fly Atlas" in 2022 provided a single cell/nuclear transcriptomic dataset from 15 individually dissected tissues, the entire head, and the body of male and female flies. These data led to the annotation of more than 250 cell types. While certainly a powerful and datarich approach, a significant step forward relies on mapping these data back to the organism in time and space. The goal of this manuscript is to map 150 transcripts defined by the Fly Atlas by FISH and in doing so, provide, for the first time, a spatial transcriptomic dataset of the adult fly. Using this approach (Molecular Cartography with Resolve Biosciences), the authors, furthermore, distinguish different RNA localizations within a cell type. In addition, they seek to use this approach to define previously unannotated clusters found in the Fly Atlas. As a resource for the community at large interested in the computational aspects of their pipeline, the authors compare the strengths and weaknesses of their approach to others currently being performed in the field.

      Strengths:

      (1) The authors use Resolve Biosciences and a novel bioinformatics approach to generate a FISHbased spatial transcriptomics map. To achieve this map, they selected 150 genes (50 body; 100 head) that were highly expressed in the single nuclear RNA sequencing dataset and were used in the 2022 paper to annotate specific cell types; moreover, the authors chose several highly expressed genes characteristic of unannotated cell types. Together, the approach and generated data are important next steps in translating the transcriptomic data to spatial data in the organism.

      We thank the reviewer for this comment, as it reminded us that we need to be clearer in the text, about how we chose the genes to investigate. The statement that we selected “150 genes (50 body; 100 head) that were highly expressed in the single nuclear RNA sequencing dataset” is not correct. We have chosen genes with widely differing expression levels (log-scale range of 3.95 in body, 5.76 in head, we show this now in the new Figure 1 – figure fupplement 1B, D). Many of the chosen genes are also transcription factors. In fact, the here introduced method is more sensitive than the single cell atlas: the tinman positive cells were readily located (even non-heart cells were found to express tinman), whereas in the single cell FCA data tinman expression is often not detected in the cardiomyocytes (tinman is detected in 273 cells in the entire FCA (mean expression of 1.44 UMI in positive cells), and in 71 cells out of 273 cardiac cells (26%)). 

      (2) Working with Resolve, the authors developed a relatively high throughput approach to analyze the location of transcripts in Drosophila adults. This approach confirmed the identification of particular cell types suggested by the FlyAtlas as well as revealed interesting subcellular locations of the transcripts within the cell/tissue type. In addition, the authors used co-expression of different RNAs to unbiasedly identify "new cell types". This pipeline and data provide a roadmap for additional analyses of other time points, female flies, specific mutants, etc.

      (3) The authors show that their approach reveals interesting patterns of mRNA distribution (e.g alpha- and beta-Trypsin in apical and basal regions of gut enterocytes or striped patterns of different sarcomeric proteins in body muscle). These observations are novel and reveal unexpected patterns. Likewise, the authors use their more extensive head database to identify the location of cells in the brain. They report the resolution of 23 clusters suggested by the single-cell sequencing data, given their unsupervised clustering approach. This identification supports the use of spatial cell transcriptomics to characterize cell types (or cell states).

      (4) Lastly, the authors compare three different approaches --- their own described in this manuscript, Tangram, and SpaGE - which allow integration of single cell/nuclear RNA-seq data with spatial localization FISH. This was a very helpful section as the authors compared the advantages and disadvantages (including practical issues, like computational time).

      Weaknesses:

      (1) Experimental setup. It is not clear how many and, for some of the data, the sex of the flies that were analyzed. It appears that for the body data, only one male was analyzed. For the heads, methods say male and female heads, but nothing is annotated in the figures. As such, it remains unclear how robust these data are, given such a limited sample from one sex. As such, the claims of a spatial atlas of the entire fly body and its head ("a rosetta stone") are overstated. Also, the authors should clearly state in the main text and figure legends the sex, the age, how many flies, and how many replicates contributed to the data presented (not just the methods). What also adds to the confusion is the use of "n" in para 2 of the results. " ... we performed coronal sections at different depths in the head (n=13)..." 13 sections in total from 1 head or sections from 13 heads? Based on the body and what is shown in the figure, one assumes 13 sections from one head. Please clarify.

      While we agree that sex differences present indeed an interesting opportunity to study with spatial transcriptomics, our goal was not to define male/female differences but rather to establish the technology to go into this detail if wanted in the future. In the revised version, we have provided an additional supplementary table with a more detailed description of the head sections (Table S3). We have added the number of animals (12 for the head sections, mixed sex; and 1 male for the body sections) to the main text. We would like to point out that we verified the specificity of our MC method on all the 5 body sections (Figure 2A, TpnC4 & Act88F and text) and not only on one. Furthermore, we also would like to state that the idea of “a Rosetta stone” was mentioned as a future prospect that clearly goes beyond our presented work. We have rewritten the discussion to make this clearer and to any avoid overstatements.

      (2) Probes selected: Information from the methods section should be put into the main text so that it is clear what and why the gene lists were selected. The current main text is confusing. If the authors want others to use their approach, then some testing or, at the very least, some discussion of lower expressed genes should be added. How useful will this approach be if only highly expressed genes can be resolved? In addition, while it is understood that the company has a propriety design algorithm for the probes, the authors should comment on whether the probes for individual genes detect all isoforms or subsets (exons and introns?), given the high level of splicing in tissues such as muscle.

      As stated above, while there is a slight bias to higher expressed genes (as expected for marker genes), we have also used low expressed genes like salm, CG32121, tinman (body) or sens (head). This is now shown in new Figure 1 – figure Supplement 1B, D. This shows that our method is more sensitive than single-cell data, as all cardiomyocytes can be identified by tinman expression and not only some are positive, as is the case in the FCA data. In fact, the method cannot resolve too highly expressed genes due to optical crowding of the signal leading to a worse quantification. For this reason, ninaE was removed from the analysis (as mentioned in Spatial transcriptomics allows the localization of cell types in the head and brain and in Methods).

      As mentioned in the Methods, the probes are designed on gene level targeting all isoforms, but favoring principal isoforms (weighted by APPRIS level). The high level of splicing is indeed interesting and we expect that in the future spatial transcriptomics can help to generate more insight into this by designing isoform-specific probes.

      (3) Imaging: it isn't clear from the text whether the repeated rounds of imaging impacted data collection. In many of what appear to be "stitched" images, there are gradients of signal (eg, figure 2F); please comment. Also, since this a new technique, could a before and after comparison of the original images and the segmented images be shown in the supplemental data so that the reader can better appreciate how the authors assessed/chose/thresholded their data? More discussion of the accuracy of spot detection would be helpful. 

      High-resolution imaging (pixel size = 138 nm) of a large field of view (>1mm) for spatial transcriptomics uses a stitching method to combine several individual images to reconstruct a large field of view. This does not generate signal gradients, apart from lower signal at the extreme edges of each of the individual images, as seen in our images, too. The spot detection algorithm was written and used by Resolve Biosciences and benchmarked for human (Hela) and mouse (NIH-3T3) cell lines in Groiss et al. 2021 (Highly resolved spatial transcriptomics for detection of rare events in cells, bioR xiv). The specificity of the decoded probes was found to lie between 99.45 and 99.9% here, matching the results we found for specific detection of TpnC4 and Act88F (99.4 and 99.8%).

      (4) The authors comment on how many RNAs they detected (first paragraph of results). How do these numbers compare to the total mRNA present as detected by single-cell or single-nuclear sequencing?

      We can compare the numbers, but the different methodologies make the interpretation of such a comparison difficult. FCA used single nucleus sequencing, so only nuclear pre-mRNAs are detected. The total amount of counts per single cell sample strongly depends on how many cells were sequenced in an experiment. MC detects all mRNAs present in the section. Here, the size of the sample and hence the size or the number of cells analyzed determines how many mRNAs are detected. In Author response image 1, we have compared our MC results versus FCA data, comparing the genes investigated here in MC per section vs per sequencing experiment. Numbers for MC are slightly lower for the brain (not all cell types are on all sections) and much higher for the larger body samples. However, we feel a direct comparison is questionable, so we prefer to not include this figure in our manuscript.

      Author response image 1:

      Barplots showing total number of mRNA molecules detected in Molecular Cartography (MC, Resolve, spatial spots) and in snRNA-seq data from the Fly Cell Atlas (10x Genomics, UMIs). Individual black dots show individual experiments, counts are only shown for the chosen gene panel for each sample. Bar shows the mean, with error bars representing the standard error.

      (5) Using this higher throughput method of spatial transcriptomics, the authors discern different cell types and different localization patterns within a tissue/cell type.

      a. The authors should comment on the resolution provided by this approach, in terms of the detection of populations of mRNAs detected by low throughput methods, for example, in glia, motor neuron axons, and trachea that populate muscle tissue. Are these found in the images? Please show.

      We did not add any markers for trachea in our gene panel, but we do detect sparse spots of repo (glia) and elav/VGlut in the muscle tissues (Gad1/VAChT are hardly detected in the muscle tissue). This is consistent with the glutamatergic nature of motor neurons in Drosophila as described previously (Schuster CM (2006), Glutamatergic synapses of Drosophila neuromuscular junctions: a high-resolution model for the analysis of experience-dependent potentiation. Cell Tissue Res 326:

      287–299.). We have present these new data in new Figure 2 – figure supplement 1.

      b.The authors show interesting localization patterns in muscle tissue for different sarcomere proteincoding mRNAs, including enrichment of sls in muscle nuclei located near the muscle-tendon attachment sites. As this high throughput approach is newly being applied to the adult fly, it would increase confidence in these data, if the authors would confirm these data using a low throughput FISH technique. For example, do the authors detect such alternating "stripes" ( Act 88F, TpnC4, and Mhc) or enriched localization (sls) using FISH that doesn't rely on the repeated colorization, imaging, decolorization of the probes? 

      We thank the reviewer for the interest in the localization patterns in muscle tissue. We show that Act88F, TpnC4 are not detected outside of flight muscle cells (99.4% and 99.8% of the single molecular signal in flight muscles only), giving us confidence in the specificity of the MC method. Following the suggestion of the reviewer, we have adapted an HCR-FISH method to Drosophila adult body sections for the revised version of the manuscript. Using this method, we were able to confirm the higher expression/localization of sls transcripts to and around the adult flight muscle nuclei, with an enrichment in nuclei close to the muscle-tendon attachment sites (new Figure 4D-F and new Figure 4 – figure supplement 1). We have also been able to confirm some complementarity in the localization patterns of Act88F and TpnC4 in longitudinal stripes in adult flight muscles, however for Mhc we could not confirm this pattern with HCR-FISH (new Figure 5C-F and new Figure 5 – figure supplement 1). While we could confirm most of the pattern seen, we do not know the exact reason for the slight discrepancies. Thus, we now recommend that insights found with SRT should be confirmed with more classical FISH methods.

      (6) The authors developed an unbiased method to identify "new cell types" which relies on coexpression of different transcripts. Are these new cell types or a cell state? While expression is a helpful first step, without any functional data, the significance of what the authors found is diminished. The authors need to soften their statements.

      The term “new cell types” only appeared in the old title. We agree that with the current spatial map we cannot be sure to have found “new cell types”, instead we show where unannotated/uncharacterized clusters from the scRNA-seq atlas are located, based on their gene expression. Therefore, we have updated the title in the revised version (Spatial transcriptomics in the adult Drosophila brain and body) and thank the reviewer for this valuable suggestion.

      Appraisal:

      The authors' goal is to map single cell/nuclear RNAseq data described in the 2022 Fly Atlas paper spatially within an organism to achieve a spatial transcriptomic map of the adult fly; no doubt, this is a critical next step in our use of 'omics approaches. While this manuscript does the hard work of trying to take this next step, including developing and testing a new pipeline for high throughput FISH and its analysis, it falls short, in its present form, in achieving this goal. The authors discuss creating a robust spatial map, based on one male fly. Moreover, they do not reveal principles of mRNA localization, as stated in the abstract; they show us patterns, but nothing about the logic or function of these patterns. This same criticism can be said of the identification of "new cell types, just based on RNA colocalization. In both cases (mRNA subcellular localization or cell type identification), further data in the form of validation with traditional low throughput FISH and genetic manipulations to assess the relation to cell function are required for the authors to make such claims. 

      We have indeed used one male fly for the adult male body data. This is mainly due to the cost of the sample processing. We used 12 individuals for the head samples (from 1 individual we acquired 2 sections, a total of 13 sections). We show that the body samples show a high correlation with each other, while the head samples cover multiple depths of the head. Still, even in the head, we find that sections at similar depths show a high similarity to each other in terms of gene-gene coexpression and expression patterns. Although obtaining sections from more animals would be valuable, we do not believe it to be necessary for our current goals. Additional replicates beyond the ones we already provide would require significant amounts of extra time and budget, while they would very likely produce similar results as we already show. Following the reviewer’s suggestion, we have tested several genes with HCR-FISH and could readily confirm the localization pattern of sls mRNA close to the terminal nuclei of the flight muscles. This pattern is likely due to a higher expression of sls in these nuclei, as a large amount of sls mRNA signal is detected within the nuclei (Figure 4). A detailed dissection of the mechanism that establishes this pattern is beyond the scope of this manuscript, which is the first one on applying spatial transcriptomics to adult Drosophila.

      The usage of the term “new cell types” was indeed ambiguous and we removed this from the revised version. We now clarified that we map the spatial location of unannotated clusters in the brain. This may or may not include uncharacterized cell types. We now further specify that we have only inferred the location of the nuclei; thus, neuronal function or the location of their axonal processes are still unknown. As such, our data provides a starting point to identify uncharacterized cell types, since their marker genes and nuclear location are now determined. The next step to identify “new cell types” would indeed be to acquire genetic access to these cell types and characterize them in more detail. This is beyond the scope of this manuscript, and therefore we have toned down the title in the revised version and thank the reviewer for this valuable suggestion. 

      Discussion of likely impact:

      If revised, these data, and importantly the approach, would impact those working on Drosophila adults as well as those working in other model systems where single cell/nuclear sequencing is being translated to the spatial localization within the organism. The subcellular localization data - for example, the size of transcripts and how that relates to localization or the patterns of sarcomeric protein localization in muscle - are intriguing, and would likely impact our thinking on RNA localization, transport, etc if confirmed. Lastly, the authors compare their computational approaches to those available in the field; this is valuable as this is a rapidly evolving field and such considerations are critical for those wishing to use this type of approach.

      We thank this reviewer for appreciating the impact of our findings and approach to the Drosophila field and beyond. We here provide the groundwork for a full Drosophila adult spatial atlas, similar to how early scRNA-seq datasets provided a framework for the Fly Cell Atlas. In the manuscript we provide both experimental information on how to successfully perform spatial transcriptomics (treating slides for optimal attachment) and the data serves as a benchmark for future experiments to improve upon (similar to how early Drop-seq datasets were compared to later 10x datasets in single-cell transcriptomics). In addition, it also provides proof of principle methods on how to integrate the FCA data with these spatial data and it identifies localized mRNA species in large adult muscle cells, showing the complementarity of spatial techniques with single-cell RNA-seq. For a small number of genes, we have confirmed the mRNA patterns using HCR-FISH in the revised version of this manuscript. To conclude, this is the first spatial adult Drosophila transcriptomics paper, locating 150 mRNA species with easy data access in our user portal (https://spatialfly.aertslab.org/).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) All figures in the manuscript were in grayscale, which made it difficult to interpret the results because the data could only be interpreted by distinguishing different colors to visualize different transcripts. This is likely a technical problem. The manuscript must contain colored images.

      We apologize to the reviewer for this technical issue. The manuscript was uploaded in color to bioRxiv and to eLife. We therefore do not understand to reason for this problem. We are surprised that this issue was not resolved in the reviewers’ discussion since color is obviously essential to appreciate the beauty of this manuscript.

      (2) In Figure 2a, the authors comment on the subcellular localization of trypsin isoforms, but the figure does not indicate the cell borders or the apical and basal regions of the cell. These must be indicated in the figure to help readers understand the results. 

      We thank the reviewer for pointing this out; we have now indicated the outlines of the single-cell layer epithelium on the figure. While we have no marker for cell borders, we have a nuclear marker showing that it is a single cell layer. We hope this allows the reader to appreciate the subcellular localization of the trypsin isoforms.

      (3) All figures (including the data on the authors' website) contain background staining, which I assume is labeling nuclei. This is not indicated in the manuscript, and should be clarified.

      We again thank the reviewer for pointing this out; the background staining indeed labels nuclei (using DAPI). We have indicated this better in the revised version.

      (4) In Figure 5c, the authors claim that neuronal and muscular genes are grouped into the same cluster, but they do not indicate which transcripts are neuronal and which ones are muscular. This must be indicated in the figure.

      We thank the reviewer for this comment. Indeed, there was only one gene, acj6, present in the muscle cluster. So, we decided to delete this statement in the revised version.

      (5) The authors utilized and compared three different approaches to integrate single nuclei sequencing data from the Fly Cell Atlas to their spatially resolved transcriptomics (SRT) data. I was wondering if it is possible to generate a virtual expression explorer using this integrated data, similar to the dataset published in the 2017 Science article by Karaiskos et al., where they combined publicly available in situ hybridization data of fly embryos and their single-cell sequencing data. This virtual expression explorer would be useful to visualize the expression pattern of transcripts that the authors of this paper did not use for their SRT.

      We thank the reviewer for this interesting comment. Using Tangram, we indeed infer gene expression for all genes from the Fly Cell Atlas. To make this visible we have created a Scope session (https://scope.aertslab.org/#/Spatial_Fly/*/welcome), with which users can browse inferred gene expression levels (note that this is on a segmented cell level). We do notice that the inferred gene expression levels contain many false positives and should therefore be used with caution. The spatial data themselves can be browsed through the spatial portal at https://spatialfly.aertslab.org/ .

      Reviewer #2 (Recommendations For The Authors):

      Suggestions for improved or additional experiments, data, or analyses:

      The authors have used a new high throughput approach to examine the location of 150 RNAs in adult Drosophila heads or one body. It is unclear whether the fixation/repeated imaging etc is accurately reflecting the patterns of expression in vivo. The authors should confirm these data using low throughput established techniques for the RNA patterns in muscle for example.

      The authors should clarify their experimental approaches and include additional samples if they indeed want to establish the rosetta stone of fly adults. These data are from only a male fly (and as such is not a complete analysis of the adult fly). To be a map of the adult fly, data from both sexes need to be included.

      Unless functional data that complement the descriptive data shown here are included, the authors have to soften their conclusions. For example, while spatial transcriptomics has mapped RNA expression to a location, without some functional data, it is difficult to conclude that these are indeed "new cell types". Same with the RNA localization principles.

      Recommendations for improving the writing and presentation:

      (1) The manuscript should be heavily revised: in many places, important details are left out or should be moved from the methods to the main text. In addition, the authors often overstate their findings throughout the manuscript. As an example, it appears that the data presented is only from 1 fly, so this doesn't increase the reader's confidence in the data or the applicability of the approach. Also, it isn't clear how many flies were analyzed for the heads (one male fly too?) nor what variability is present from fly to fly. For the approach and data to be used by others, this is important to know.

      We moved some text from the methods section to the main text to be clearer. We now also state how many animals were used for the MC method. While the data for the body has been generated from 1 male only, the data for the head was generated from 12 flies; for both cases, similar slices show very similar gene expression patterns. Furthermore, in the body we used widely known and published marker genes that all showed expected expression patterns, indicating robustness. We agree that this is not a full spatial atlas of the fly, this was also not our goal and we have removed such general statements from the revised version: we aimed to generate a spatial transcriptomics dataset, covering the entire fly (head and body) as a proof-of-principle, tackling data generation and analysis, and highlighting challenges in both.

      (2) The grammar and word choice throughout are challenging often making the text difficult to follow. This reads like an early draft of the paper.

      We apologize to the reviewer for any difficulties. We have revised the text and hope it is now easier to read, while still being accurate on the technical details of the various methods used in our manuscript.

      Minor corrections to the text and figures.

      See the weaknesses mentioned above. Also:

      Figure S1 is unreadable.

      There is no simple way to describe the expression values of 100 genes in 100 cell types on a single page. The resolution of the PDF is high enough that after zooming in, all the information can be read easily.

      Figure S2, in a, please include the axes so that the reader can better understand the sections shown.

      In b, it is unclear what the pink boxes mean. In c, the labels are barely legible.

      In Figure 1 – figure supplement 2 (head sections), we have ordered the head sections from anterior to posterior. The boxes in (B) represent boxplots. We have updated this plot for clarity to better display the number of mRNA molecules detected for each gene. We have increased the font size in (C).

      Figure S3, in a, please include axes. In b, the meaning of the pink box

      In Figure 1 – figure supplement 3 (the body sections) we have added the anterior to posterior and dorso-ventral axis, and ordered the sections that stem from the same animal. The boxes in (B) represent boxplots. We have updated this plot for clarity to better display the number of mRNA molecules detected for each gene. We have added an explanation to the figure legend.  

      Figure S4, the text in the axes of the heatmap should have a darker typeface

      We have changed it to black, thanks.

      Figure S5c, are the colors in the dendrogram supposed to match the spatial location on the right?

      The purple of the muscles is barely visible.

      Yes, they do match. Colors were modified in the revised version for better visibility.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We would like to thank the reviewer and the editor for carefully reading our manuscript, and acknowledging the strength of combining quantitative analysis with semi-naturalistic experiments on mice social behavior. Please find below our response to both the public review and the recommendation to the authors. As a summary, we have included additional figures and texts such as 

      - a new Results subsection “Choosing timescales for analysis ” (page 6)

      - a new Materials and Methods subsection “Maximum entropy model with triplet interactions” (page 17)

      - new supplementary figures, which have current labels of:

      - Figure 2 - figure supplement 5

      - Figure 2 - figure supplement 6

      - Figure 2 - figure supplement 7

      - Figure 4 - figure supplement 1

      - Figure 4 - figure supplement 2    

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      In this manuscript, Chen et al. investigate the statistical structure of social interactions among mice living together in the ECO-Hab. They use maximum entropy models (MEM) from statistical physics that include individual preferences and pair-wise interactions among mice to describe their collective behavior. They also use this model to track the evolution of these preferences and interactions across time and in one group of mice injected with TIMP-1, an enzyme regulating synaptic plasticity. The main result is that they can explain group behavior (the probability of being together in one compartment) by a MEM that only includes pair-wise interactions. Moreover, the impact of TIMP-1 is to increase the variance of the couplings J_ij, the preference for the compartment containing food, as well as the dissatisfaction triplet index (DTI). 

      Strengths: 

      The ECO-Hab is a really nice system to ask questions about the sociability of mice and to tease apart sociability from individual preference. Moreover, combining the ECO-Hab with the use of MEM is a powerful and elegant approach that can help statistically characterize complex interactions between groups of mice -- an important question that requires fine quantitative analysis. 

      Weaknesses: 

      However, there is a risk in interpreting these models. In my view, several of the comparisons established in the current study would require finer and more in-depth analysis to be able to establish firmer conclusions (see below). Also, the current study, which closely resembles previous work by Shemesh et al., finds a different result but does not provide the same quantitative model comparison included there, nor a conclusive explanation of why their results are different. In total, I felt that some of the results required more solid statistical testing and that some of the conclusions of the paper were not entirely justified. In particular, the results from TIMP-1 require proper interaction tests (group x drug) which I couldn't find. This is particularly important when the control group has a smaller N than the drug groups.  

      We would like to thank the reviewer and the editor for carefully reading our manuscript, and acknowledging the strength of combining quantitative analysis with semi-naturalistic experiments on mice social behavior. Thanks to the reviewer’s suggestion, we have improved our manuscript by 

      (1) A proper comparison with Shemesh et al., especially to include maximum entropy models with triplet interactions. We show that triplet models overfit even given the entire 10 day dataset, which limits our study to look at pairwise interactions.

      (2) Results on cross-validation for both triplet interaction models and pairwise interaction models, completed on aggregates of various length of days. This analysis showed that pairwise models overfit for single-day data, and led us to learn pairwise models only on 5day aggregation of data. We have updated the manuscript (both the text and the figures) to present these results.

      (3) New results that subsample the drug groups to the same size as the control group. The conclusions about TIMP-1 treated mice hordes hold when we compare groups of the same size. 

      Recommendations for the authors:  

      Reviewer #1 (Recommendations For The Authors): 

      (1) COMPARISON WITH PREVIOUS WORK. The comparison with the cited previous work of Shemesh et al. 2013 rests novelty to the use of ME models in characterizing social interactions between groups of mice as well as sheds doubts on the main claim of the manuscript, namely that second-order correlations are sufficient to describe the joint distribution of occupancies of all mice (in particular triplets; there is no quantification of the variance explained by model in panel Fig. 2D). In my view, to make the claim "These results show that pairwise interaction among mice are sufficient to assess the observed collective behavior", the authors should compare models with 2nd and 3rd order interactions and quantify how much of the total correlation can be explained by pair-wise interactions, triplet interactions, and so on. Without a proper model comparison, it is unclear how the authors can make such a claim. One thing observed by Shemesh et al. is that, on average, J_ij are negative. This does not seem to be the case in the current study and the authors should discuss why. 

      Finally, the explanation provided in the Discussion about this discrepancy (spatial resolution and different group size) are not completely satisfactory. With more animals, one would imagine that the impact of higher order correlations would increase (and not decrease) as the number of terms of 3rd, 4th, ... order will be very big. I would also think that the same could be true for the spatial scale: assessing interactions with a coarser spatial grid (whole cages in the case of the ECO-Hab) would allow for simultaneous interactions among more mice to happen compared with a situation in which the spatial grid is so small that only a few animals can fit in each subdivision. 

      We thank the reviewer for the recommendation. In the updated version of the manuscript, we explicitly learn the triplet interaction model. We show that because the number of mice in our experiment is much larger than Shemesh et al., a triplet model runs into the problem of overfitting.

      In particular, we found that the test set likelihood increases monotonically when the L2 regularization strength increases, which corresponds to a suppression of the triplet interaction strength (see additional supplementary figure, now Figure 2 - figure supplement 5). More specifically, for the range of regularization strength (β<sub>G</sub>) we tested (10<sup>-1</sup> < β<sub>G</sub> < 10<sup>1</sup>), the maximum test set likelihood is achieved at β<sub>G</sub> = 10<sup>1</sup>, which corresponds to . Notice that those learned triplet interactions are very close to zero. This means we should select a model with pairwise interactions over a model with triplet interactions.

      We have added the above reasoning in page 5, line 166-169 of the Results section with the sentence “Moreover, models with triplet interactions show signs of overfitting under crossvalidation, which is mitigated when the triplet interactions are suppressed close to zero using L2 regularization”,  a new subsection “Maximum entropy model with triplet interactions” in Materials and Methods (page 16-17, line 548 - 563) to describe the protocols of learning and crossvalidation for these triplet interaction models. 

      Furthermore, we extended the discussion about the difference between Shemesh et al. and our results in the Discussion section. In addition to the difference of spatial scales (chamber vs. location in the chamber), and the difference of group size and its impact on data analysis (N = 15 in our largest cohort and N = 4 in theirs), we added a discussion about the difference of experimental arena, which in Eco-HAB contains connected chambers that mimic the naturalistic environment, and in Shemesh et al. contains a single chamber. The change in the text is on page 12, between line 390 and line 394.

      We thank the reviewers for pointing out that the mean 2nd order interaction in Shemesh et al. is negative. One possibility is that the labeled areas in Shemesh et al. are much smaller than in our Eco-HAB setup, which could suggest that mice do have the space to stay in the same area, which will lead to a negative mean 2nd order interaction.

      (2) ASSESSMENT OF THE TEMPORAL EVOLUTION OF THE INTERACTIONS. The analysis of the stability of the social structure is not conclusive. First, I don't think the authors can conclude that "These results suggest that the structure of social interactions in a cohort as a whole is consistent across all days." If anything is preserved, they would be the statistics of that structure but not the structure itself (i.e., there is no evidence for that). The comparison of the stability of the mean <h\_i> and the mean <J\_ik> would also require a statistical test to be able to state that "Delta h_i changed more strongly from day to day (Fig. 3D, top panel) relative to the interaction measured as the Jij's." The same is true for the assessment of the TIMP: the differences found in the variability in J_ij and in the mean and variance of the h_i's, look noisy and would require a proper statistical test. The traces look quite variable across days in the control condition, so assessing differences may be difficult. Finally, it would be good to know if the variability in individual J_ij is because they truly vary from day to day or because estimating them within one day is difficult (statistical error). If the reason is the latter, one could decrease the temporal resolution to 2-3 days and see whether the estimated J_ijs are more stable. Perhaps, also for that reason, the summed interaction strength J_i is also more stable, simply because it aggregates more data and has a smaller statistical error. 

      We thank the reviewer for pointing out the necessity of assessing the temporal evolution of the interactions. The problem of shorter data duration leads to more noise in the estimation, together with the reviewer’s Comment 4 about the risk of overfitting, led us to add a new Results subsection “Choosing timescales for analysis” (page 6, line 171 to line 189). Specifically, we assess whether the pairwise maximum entropy model overfits using data from _K-_day aggregates, by computing the log-likelihood of both the training sets and the test sets,which is chosen to be 1 hour from the 6 hour data window of each day. We found that for single day data, the pairwise maximum entropy model overfits. In contrast, for data with aggregates of more or equal to 4 days of data, the pairwise model does not overfit. This new result is supported by an additional supplementary figure, now Figure 2 - figure supplement 6.

      To be consistent with later approaches in the manuscript where we consider the effects of TIMP1, we choose the analysis windows to be data aggregates from 5 days. This means for the experiment that collects a total of 10 days of data, there are only two time points, thus a study of the temporal evolution is limited to comparison between the first 5 days and the last 5 days of the experiment. We describe these results in the Results subsection “Stability of sociability over time” (page 6, line 190 - 220). An additional supplementary figure, now Figure 2 - figure supplement 7, shows in details the comparison of the inferred interaction strength J and the chamber preference between the first 5 days and the last 5 days for the 4 cohorts of male C57BL6/J mice, which shows the inferred interactions have a consistent variability across first and last 5 days, and across all cohorts. The small value of Pearsons’ correlation coefficient shows that the exact structure (pairspecific J<sub>ij</sub>) is not stable. At the end of the Results subsection “Stability of sociability over time”, we explicitly say that “This implies that the maximum entropy model does not infer a social structure that is stable over time.”

      (3) EFFECT OF TIMP-1. The reported effects of TIMP-1 on the variance of the J_ij seem very small and possibly caused by a few outlier J_ijs (perhaps from one or two animals) which

      are not present in the control group which seems to have fewer animals (N = 9 minus two mice that died after the surgery vs. N = 14 in the drug group), so the lack of a significant difference in the sigma[J_ij] could simply be due to a smaller N (a test for the interaction group x drug was not done). 

      The clearest effect of TIMP-1 seems to be a change in place preference (h_i) and not the interaction terms (J_ij) (Fig. 3F bottom). But this could be explained by a number of factors that have nothing to do with sociability such as that recovery from surgery makes them eat more/less. The fact that it seems to be present, as recognized by the authors, in the control group with no TIMP-1 and that this effect was not observed in the female group F1, puts into question the specificity and reproducibility of the result. 

      Finally, the effect of TIMP-1 in the DTI would require more statistics (testing the interaction group x drug). The fact that the control group has fewer animals (N = 9 vs. 15 and 13 in the drug groups), and that there is a weaker trend in the DTI of the control group to start high and then decrease, makes this test necessary.  

      Now, after we select a proper timescale to learn the pairwise maximum entropy model, we update the manuscript to present results only on 5-day aggregation of data (see updated Figure 3, updated supplementary figures, Figure 3 - figure supplement 1 and 2). For the variance of the J<sub>ij</sub>, the F-test between different 5-day aggregates before and after TIMP for the male drug group now shows a nonsignificant p-value after applying the Bonferroni correction. For the female drug group, the difference of the J<sub>ij</sub> variance is still significant. 

      To test the effect of different group size on DTI, we subsampled the drug groups by 1) subsampling the inferred interactions learned from the original N = 15 or N = 13 data, or 2) subsampling the mice colocalization data and then inferring the pairwise interactions.  In both cases, the resulting DTI for the subsampled drug group still exhibits the same global pattern as before, i.e. after TIMP-1 injection, DTI significantly increases, which after 5 days falls back to the baseline level. The results are supported by two additional supplementary figures, Figure 4 - figure supplement 1 and 2. This result is referred to in the text in the Results subsection “Impaired neuronal plasticity in the PL affects the structure of social interactions” (page 10, line 333 - 336): “Notably, the difference of the DTI is not due to the control group M4 has less mice, as subsampling both on the level of the inferred interactions (Figure 4 - figure supplement 1) and on the level of the mice locations (Figure 4 - figure supplement 2) give the same DTI for cohorts M1 and F1.”

      (4) MODEL COMPARISON. Any quantitative measure of "goodness" of the model , (i.e., comparison of the predictions of the model with triplet frequency as well as the distribution of p(K)) should be cross-validated. In particular, Fig. S2 needs to be cross-validated for the goodness of fit to be properly quantified. Is the analysis shown in Fig. 3F crossvalidated? Because otherwise, there is an expected increase in the likelihood simply explained by an increase in the number of parameters of the model (i.e., adding the J_ij's). 

      As discussed in our responses to Comment 1 and 2, we have added results about cross-validation in the new supplementary figures, Figure 2 – figure supplement 5 and 6 , for which we computed the test-set and training-set likelihood for maximum entropy models with pairwise interactions and also for models with triplet interactions. Figure 2 - figure supplement 6 shows the pairwise model does not overfit when we consider the aggregated data from more or equal to 4 days. 

      (5) EFFECT OF SLEEP. The comparison of p(K) between the data and the model requires a bit more investigation: the model underestimates instances in which almost all mice were in the same compartment (i.e., for K >= 13. p(K)_data >> p(K)_MEM; btw where is the pairwise point p(15) in Fig. 2E and Fig. S4?). Could this be because there were still short periods during the dark cycle in which all mice were asleep in one of the cages? As explained by the authors, sleep introduces very strong higher order correlations between animals as they like sleeping altogether. Knowing whether removing light periods was enough to remove this "sleep contamination" or not, would be important in order to interpret discrepancies between the pairwise model and the data. 

      Figure 2E shows that the pairwise maximum entropy model (in black) overestimates the data (in blue circles) for P(K) at large K (and not underestimates). In the data, we never observe all 15 mice being in the same box; hence P<sub>data</sub>(15) = 0, and does not show up in the log-scaled figure (same for Figure 2 - figure supplement 3). A possible explanation for the pairwise model overestimating P(K) at large K is that the finite-sized box limits the total number of mice that are comfortably staying in the same box. It can also be due to the fact that the number of time points at which K >= 13 is small and hence causes an underestimation due to finite data. We have added this interpretation of the discrepancy of P(K) to Section “Pairwise interaction model explains the statistics of social behavior” in page 6, line 160. 

      We thank the Reviewer for raising the point of “sleep contamination”. Indeed, Eco-HAB data, as do data from other 24h-testing behavioral systems, demonstrate distinct differences in activity levels during the light and dark phases of the light-dark cycle (Rydzanicz et al., EMBO Mol. Med., 2024). During the light phases, mice primarily sleep and, as noted, they huddle, so many individuals within the cohort tend to remain in close proximity for extended periods. We acknowledge that including such periods in the analysis could potentially introduce confounding effects to the model due to limited movement and interactions, and this is why we decided not to use this data. However, during the dark phases, mice are highly active, with individuals rarely staying in the same compartment for long periods. Specifically, in the dark phases, while there are occasional instances where a few mice may remain in the same compartment for over 1 hour, the majority exhibit considerable mobility, actively exploring and transitioning between compartments. We see no compelling reason to exclude these periods from our analysis, as such activity aligns with the natural behavioral repertoire of the mice and provides robust data for our model. Furthermore, it is well-established that mammals, including nocturnal species such as mice, are most active shortly after waking, typically at the onset of their active phase (i.e., the beginning of the dark phase). To ensure a conservative approach, we specifically analyzed the first 6 hours of the dark phase when the cumulative number of box visits is at its peak, indicating heightened activity levels. In our view, this period offers an optimal window for studying natural behaviors, including social interactions.

      Additionally, prior studies using the Eco-HAB system have consistently demonstrated that mice engage in social interactions both within the compartments and in the connecting tubes during the dark phase (Puścian et al., eLife, 2016, Winiarski et al. in press). Given this evidence and the observed behavioral dynamics in our data, the likelihood of mice being asleep during the analyzed periods of the dark phase is very low.

      We hope this clarification addresses the reviewer’s concerns and highlights the rationale underpinning our analysis choices. Thank you for raising this important point, which allowed us to provide additional context for our approach.

      (6) COMPARTMENT PREFERENCES. The differences between p(K) across compartments also would require a bit more attention: of a MEM with non-spatially dependent pair-wise interactions shows differences across compartments, it must be because of the terms h_{i,r} terms which contain a compartment index, right? Wouldn't this imply that the independence model, which always underrepresents data events with large K, already contains the difference in goodness of fit between compartments (1, 3) and (2, 4)? In the plots, it does not look like the goodness of the independent model depends on the compartment (the authors could compare directly the models' predictions between compartments). Moreover, when looking at Fig. 2C, it does not look like the value of h_{i,r} in compartments (1,3) is higher than in (2,4) (if anything, it would be the other way around). How can this be explained? It would be good to know if the difference across compartments comes from differences in the empirical p(K) or in the models' prediction? If the difference is in the data p(K), could it be that the compartments 2-4 showing higher p(K=15) (i.e., larger difference with the pairwise MEM prediction) are those chosen by mice to sleep during the light cycle? If not, what could explain these differences across compartments? Could the presence of food and water explain this difference? 

      The reviewer is correct, in the pairwise MEM, the difference across compartments enter in the box preference h<sub>ir</sub>. Greater h<sub>ir</sub> means compartment r is more attractive to mouse i. Because box 2 and 4 contain food and water, we expect that mice are more attracted to box 2 and 4, and this is what we see in Figure 2C, bottom subpanels. To reduce the number of parameters to look at, we introduce an index Δh<sub>i</sub> = h<sub>i2</sub> + h<sub>i4</sub> - h<sub>i1</sub> - h<sub>i3</sub>. This index Δh<sub>i</sub> is found to be mostly positive (see updated Figure 3C), which makes sense because mice are attracted to food and water. 

      Next we analyze the difference of P(K) across compartments (Figure 2 - figure supplement 3). There is already a difference in the P(K) calculated from empirical data. For example, P(K) in compartment 2 has a maximum at K = 5 while P(K) in compartment 1 has a maximum at K = 3

      One interesting observation is that it seems from Figure 2 - figure supplement 3 that the pairwise model explains P(K) in compartment 1 and compartment 3 better than in compartment 2 and in compartment 4. In compartment 2 and 4, the pairwise MEM overestimates P(K) for large K. An alternative MEM could include compartment-specific interaction strength, but it will also introduce 315 new parameters for a mice cohort with size N = 15.

      MINOR

      (1) A more quantitative comparison between in-cohort sociability and couplings J_ij as œwell as mean rates and parameters h_i is required. The matrices in Fig. 2C do look similar. So it is not clear how the comparison between these values is contributing to characterizing the correlation structure of the data. 

      The comparison between in-cohort sociability and coupling J<sub>ij</sub> is given by supplementary Figure 2 - figure supplement 2.  The key point for the model with the learned J<sub>ij</sub> reproducing the in-cohort sociability is given by Figure 2 - figure supplement 1.

      (2) Analysis of "in-state" probability is not explained. To me, it wasn't obvious what Fig. S5 is showing. I was assuming that this analysis was comparing the prediction of the MEM about the position of each animal at each time point, given its preference (h), pairwise interactions (J_ij), and the position of all other animals and the true position of the animal. But it seems like it is comparing the shape of the distribution of this prob across time between the data and the model (I guess the data had to be temporally binned in coarser temporal periods to yield prob values other than 0s and 1s). Also, not clear whether this analysis was done for each compartment separately and then averaged. This needs explanation. 

      The in-state probability is comparing the prediction of the MEM about the position of each animal at each time point, given its preference (h), pairwise interactions (J<sub>ij</sub>), and the position of all other animals and the true position of the animal. To achieve values between 0s and 1s, we bin the data temporally according to the model-predicted in-state probability. 

      We have added the explanation of in-state probability on page 6, line 163-166. We have also improved the description of in-state probability in Materials and Methods (subsection “Comparing in-state probability between model prediction and data”, line 493 - 503, page 15), and added a pointer from the main text to it. 

      (3) Looks like Fig. S3 is not cited in the text. 

      We added a pointer to Fig. S3 (now Figure 2 - figure supplement 2) in line 154. 

      (4) The authors say that "TIMP-1 release from the TIMP-1-loaded nanoparticles diminishes after 5 days." Does that mean from the day of the injection (4-5 days before the "After Day 1") or five days after reintroduced in the ECO-Hab? 

      It means five days after the mice were re-introduced in the ECO-Hab. We have updated the text in Results/Effects of impairing neuronal plasticity in the PL on subterritory preferences and sociability (the end of the first paragraph of this subsection) to 

      “The choice of five-day aggregated data for analysis is in line both with the proper timescales needed for the pairwise maximum entropy model to not overfit, and with the literature that TIMP-1 release from the TIMP-1-loaded nanoparticles is stable for 7-10 days after injection (Chaturvedi et al., 2014)  (i.e. 2-5 days after the mice are reintroduced to Eco-HAB).” (line 272 - 276, page 9)

      (5) In Methods, the authors should report the final N of each of the three groups. 

      The number of final N is reported in Table 1 (page 13). In the updated version, we have added a pointer to Table 1 in Materials and Methods/Animals, and in Materials and Methods/Exclude inactive and dead mice from analysis. We have also expanded the caption of Table 1 to clarify the difference between final N and initial N, and added a pointer to Materials and Methods/Exclude inactive and dead mice from analysis.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors attempt to fully characterize the immunoglobulin (Ig) heavy (H) chain repertoire of tumor-infiltrating B cells from three different cancer types by identifying the IgH repertoire overlap between these, their corresponding draining lymph nodes (DLNs), and peripheral B cells. The authors claim that B cells from tumors and DLNs have a closer IgH profile than those in peripheral blood and that DLNs are differentially involved with tumor B cells. The claim that tumor-resident B cells are more immature and less specific is made based on the characteristics of the CDR-H3 they express.

      Strengths:

      The authors show great expertise in developing in-house bioinformatics pipelines, as well as using tools developed by others, to explore the IgH repertoire expressed by B cells as a means of better characterizing tumor-associated B cells for the future generation of tumor-reactive antibodies as a therapy.

      Weaknesses:

      This paper needs major editing, both of the text and the figures, because as it stands it is convoluted and extremely difficult to follow. The conclusions reached are often not obvious from the figures themselves. Sufficient a priori details describing the framework for their analyses are not provided, making the outcome of their results questionable and leaving the reader wondering whether the findings are on solid ground.

      The authors are encouraged to explain in more detail the premises used in their algorithms, as well as the criteria they follow to define clonotypes, clonal groups, and clonal lineages, which are currently poorly defined and are crucial elements that may influence their results and conclusions.

      In response to this comment, we significantly expanded the paragraph dedicated to the tumor and non-tumor repertoire overlap and isotype composition. The following sections were added:

      First, we characterized the relative similarity of IGH repertoires derived from tumors, DLN, and PBMC on the individual CDR-H3 clonotype level. We define clonotype as an instance with an identical CDR-H3 nucleotide sequence  and identical V- and J- segment attribution (isotype attribution may be different). Unlike other authors, here we do not pool together similar CDR-H3 sequences to account for hypermutation. (Hypermutation analysis is done separately and defined as clonal group analysis. )

      As overlap metrics are dependent on overall repertoire richness, we normalized the comparison using the same number of top most frequent clonotypes of each isotype from each sample (N = 109). Repertoire data for each sample were split according to the immunoglobulin isotype, and the F2 metric was calculated for each isotype separately and plotted as an individual point.

      We also analyzed D metric, which represents the relative overlap diversity uninfluenced by clonotype frequency (Dij\=dij/(di*dj), where dij is the number of clonotypes present in both samples, while di and dj are the diversities of samples i and j respectively). The results for D metric are not shown, as they indicate a similar trend to that of F2 metric. This observation allows us to conclude that tumor IGH repertoires are more similar to the repertoires of lymph nodes than to those of peripheral blood, both if clonotype frequency is taken into account, and when it is not.

      Having excluded the IGHD gene segment from some of their analyses (at least those related to clonal lineage inference and phylogenetic trees), it is not well explained which region of CDR-H3 is responsible for the charge, interaction strength, and Kidera factors, since in some cases the authors mention that the central part of CDR-H3 consists of five amino acids and in others of seven amino acids.

      We considered different ways of calculating amino acid properties of CDR3 and used different parameters for sample-average and individual-sequence CDR3s. Now plots for Fig S6 C are updated  for consistency and the parameters depicted there are now calculated using 5 central amino acids, as in other sections.

      How can the authors justify that the threshold for CDR-H3 identity varies according to individual patient data? 

      Ideal similarity threshold may depend on several factors, such as sampling, sequencing depth etc. For example, imagine a sample picking up 100% of the clonal lineage sequences which differ only 1 amino acid from each other, and a worse quality sample/sequencing picking up only every other sequence. Obviously, the minimal threshold required to accumulate these into a cluster/clonal group  would be different for these two cases (1aa for the former, and ~2 aa for the latter for single-linkage clustering). Or, in other words, the more the sequencing depth, the more dense the clusters will be. The method of individual threshold tailoring relies on the following: https://changeo.readthedocs.io/en/latest/examples/cloning.html

      Although individual kidera factors that are significant in the context of our analysis are described in the text one by one on their first appearance, we now also added a sentence to describe Kidera factor analysis in general (page 8):

      Kidera factors are a set of scores which quantify physicochemical properties of protein sequences (Nakai et al. 1988). 188 physical properties of the 20 amino acids are encoded using dimension reduction techniques.

      Throughout the analyses, the reasons for choosing one type of cancer over another sometimes seem subjective and are not well justified in the text.

      Whenever possible, we pooled all patients with all cancer types together, because the number of available samples did not allow us to draw any significant conclusions comparing between individual cancer types. When analyzing and showing individual patient data, we also did not attempt to depict any cancer-type-specific findings, but it is inevitable that we name a specific cancer type when labelling a sample coming from a specific tumor.

      Overall, the narrative is fragmented. There is a lack of well-defined conclusions at the end of the results subheadings.

      In addition to the described above, a conclusion was added to the paragraph describing hypermutation analysis:

      IGHG clonotypes from lung cancer samples show higher number of hypermutations, possibly reflecting high mutational load found in lung cancer tissue. For melanoma, another cancer known for high mutational load, no statistically significant difference was found. This may be due to higher variance between melanoma samples, which hinders the analysis, or due to the small sample size.

      The exact same paragraph is repeated twice in the results section.

      Corrected.

      The authors have also failed to synchronise the actual number of main figures with the text, and some panels are included in the main figures that are neither described nor mentioned in the text  (Venn diagram Fig. 2A and phylogenetic tree Fig. 5D). Overall, the manuscript appears to have been rushed and not thoroughly read before submission.

      Corrected.

      Reviewers are forced to wade through, unravel, and validate poorly explained algorithms in order to understand the authors' often bold conclusions.

      We hope that the aforementioned additions to the text and also addition to the Figure 1 make the narrative more easily understandable.

      Reviewer #2 (Public Review):

      Summary:

      The authors sampled the B cell receptor repertoires of Cancers, their draining lymph nodes, and blood. They characterized the clonal makeup of all B cells sampled and then analyzed these clones to identify clonal overlap between tissues and clonal activation as expressed by their mutation level and CDR3 amino acid characteristics and length. They conclude that B cell clones from the Tumor interact more with their draining lymph node than with the blood and that there is less mutation/expansion/activation of B cell clones in Tumors. These conclusions are interesting but hard to verify due to the under-sampling and short sequencing reads as well as confusion as to when analysis is across all individuals or of select individuals.

      Strengths:

      The main strength of their analysis is that they take into account multiple characteristics of clonal expansion and activation and their different modes of visualization, especially of clonal expansion and overlap. The triangle plots once one gets used to them are very nice.

      Weaknesses:

      The data used appears inadequate for the conclusions reached. The authors' sample size of B cells is small and they do not address how it could be sufficient. At such low sampling rates, compounded by the plasmablast bias they mention, it is unclear if the overlap trends they observe show real trends. Analyzing only top clones by size does not solve this issue. As it could be that the top 100 clones of one tissue are much bigger than those of another and that all overlap trends are simply because the clones are bigger in one tissue or the other. i.e there is equal overlap of clones with blood but blood is not sufficiently sampled given its greater diversity and smaller clones.

      Regarding the number of clonotypes to be taken into account,  we were limited by the B cell infiltration of tumor samples and our ability to capture their repertoire. However, we use technical replicates on the level of cell suspension to ensure that at least top clonotypes are consistently sampled. So, this is how the data should be interpreted - as describing the most abundant clones in the repertoire (which also may be considered the most functionally relevant in case of tumor infiltrating lymphocytes).

      To analyze the repertoire overlap, we generally use the F2 metric that takes clone size into account - because we think that clone size is an important functional factor. However, we have now added the description of using D metric (does not include clone frequency as a parameter) - which shows exactly the same trend as F2 metric. So, both F2 and D overlap metrics support our conclusion of higher overlap between tumor and LN.

      The following text was added:

      We also analyzed D metric, which represents the relative overlap diversity uninfluenced by clonotype frequency (Dij\=dij/(di*dj), where dij is the number of clonotypes present in both samples, while di and dj are the diversities of samples i and j respectively). The results for D metric are not shown, as they indicate a similar trend to that of F2 metric. This observation allows us to conclude that tumor IGH repertoires are more similar to the repertoires of lymph nodes than to those of peripheral blood, both if clonotype frequency is taken into account, and when it is not.

      All in all, of course, the deeper the better, but given the data we were able to generate from the samples, this was the best approach to normalization that could be used.

      Similarly, the read length (150bp X2) is too short, missing FWR1 and CDR1 and often parts of FWR2 if CDR3 is long. As the authors themselves note (and as was shown in (Zhang 2015 - PMC4811607) this makes mutation analysis difficult.

      Indeed, we are aware of this problem, and therefore only a small part of the manuscript is dedicated to the hypermutation analysis. However, as the CDR-H3 region is the most mutated part, we still can capture significant diversity of mutations. To address the question of applicability of our data for the hypermutation phylogeny analysis, we compare the distribution of physico-chemical properties along the trees of hypermutation using the 150+150 and 300+300 data from the same donor and the same set of samples. The main conclusion is that neither for long, nor for short datasets could any correlation of physicochemical properties of the CDR-H3 region with the rank of the clonotype on the tree be found.  

      It also makes the identification of V genes and thus clonal identification ambiguous. This issue becomes especially egregious when clones are mutated.

      Again, this would be important for clonotype phylogeny analysis. However, for the simple questions that we address with our clonal group analysis, such as clonal group overlap between tissues etc, we consider this data acceptable, because if any mislabelling of V segment occurs, it is a) rare and b) is equally frequent in all types of samples. Therefore, any conclusions made are still valid despite this technical drawback.

      To directly address the question of mislabelling of V-genes in our data, we looked at the average number of different  V-genes attributed to the same nucleotide sequence of CDR-H3 region in the short (150+150) and long (300+300) datasets from the same donor. Indeed, some ambiguity of V-gene labelling is observed (see below), but we think that it is unlikely to influence any of our cautious conclusions.

      Author response image 1.

      Finally, it is not completely clear when the analysis is of single individuals or across all individuals. If it is the former the authors did not explain how they chose the individuals analyzed and if the latter then it is not clear from the figures which measurements belong to which individual (i.e they are mixing measurements from different people).

      We addressed this issue by adding a comment to each figure caption, describing whether a particular figure or panel describes individual or pooled data, and also whether the analysis is done on individual clonotype or clonal group level.

      Also, in case pooled data were used, we added the number of patients that was pooled for a particular type of analysis. This number differs from one type of analysis to the other, because not all the patients had a complete set of tissues, and also not all samples passed a quality check for a particular analysis.

      Here are the numbers listed:

      Fig 2A: N=6 (we were only considering those who had all three tissues)

      Fig 2C, N=14 (all)

      2D: N=14 (all)

      2E N=7 (have both tum and PBMC).

      2F N=9 (have both tum and PBMC).

      2G N=9 (have both tum and PBMC)

      2H N=7 (have both tum and LN)

      3A N=14 (all)

      3B N=11 (only those with tumor)

      3E - N=14

      7F N=11 (all that have tumor)

      Reviewer #3 (Public Review):

      In multiple cancers, the key roles of B cells are emerging in the tumor microenvironment (TME). The authors of this study appropriately introduce that B cells are relatively under-characterised in the TME and argue correctly that it is not known how the B cell receptor (BCR) repertoires across tumors, lymph nodes, and peripheral blood relate. The authors therefore supply a potentially useful study evaluating the tumor, lymph node, and peripheral blood BCR repertoires and site-to-site as well as intra-site relationships. The authors employ sophisticated analysis techniques, although the description of the methods is incomplete. Among other interesting observations, the authors argue that the tumor BCR repertoire is more closely related to that of draining lymph node (dLN) than the peripheral blood in terms of clonal and isotype composition. Furthermore, the author's findings suggest that tumor-infiltrating B cells (TIL-B) exhibit a less mature and less specific BCR repertoire compared with circulating B cells. Overall, this is a potentially useful work that would be of interest to both medical and computational biologists working across cancer. However, there are aspects of the work that would have benefitted from further analysis and areas of the manuscript that could be written more clearly and proofread in further detail.

      Major Strengths:

      (1) The authors provide a unique analysis of BCR repertoires across tumor, dLN, and peripheral blood. The work provides useful insights into inter- and intra-site BCR repertoire heterogeneity. While patient-to-patient variation is expected, the findings with regard to intra-tumor and intra-dLN heterogeneity with the use of fragments from the same tissue are of importance, contribute to the understanding of the TME, and will inform future study design.

      (2) A particular strength of the study is the detailed CDR3 physicochemical properties analysis which leads the authors to observations that suggest a less-specific BCR repertoire of TIL-B compared to circulating B cells.

      Major Weaknesses:

      The study would have benefitted from a deeper biological interpretation of the data. While given the low number of patients one can plausibly understand a reluctance to speculate about clinical details, there is limited discussion about what may contribute to observed heterogeneity.

      We indeed do not want to overinterpret our data, especially where it comes to the difference between types of cancer. On the other hand, extracting similar patterns between different cancer types allows to pinpoint mechanisms that are more general and do not depend on cancer type. As for the potential source of intratumoral heterogeneity that we observe, we think that it may be coming from the selective sampling of tertiary lymphoid structures. We include IHC data for TLS detection in the supplementary Fig.5.  Also, tumor mutation clonality may correlate with differential antibody response (i.e. different IGH clonotypes developing to recognize different antigens) – as has been previously described for TCRs by the lab of B.Chain in https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6890490/.

      For example, for the analysis of three lymph nodes taken per patient which were examined for inter-LN heterogeneity, there is a lack of information regarding these lymph nodes.

      Unfortunately no clinical information about the lymph nodes was available.

      'LN3' is deemed as exhibiting the most repertoire overlap with the tumor but there is no discussion as to why this may be the case.

      The following phrases describes this in the “LN-to-LN heterogeneity in colorectal cancer” paragraph:

      Similarly, an unequal interaction of tumors with DLNs was observed at the level of hypermutating clonal groups.

      Functionally, this may again indicate that within a group of DLNs, nodes are unequal in terms of access to tumor antigens, and this inequality shapes the BCR repertoires within these lymph nodes.

      (2) At times the manuscript is difficult to follow. In particular, the 'Intra-LN heterogeneity' section follows the 'LN-LN heterogeneity in colorectal cancer' section and compares the overlap of LN fragments (LN11, LN21, LN31) with the tumor in two separate patients (Fig 6A). In the previous section (LN-LN), LN11, LN21, LN31 are names given to separate lymph nodes from the same patient. The fragments are referred to as 'LN2' and the nodes in the previous section are referred to similarly. This conflation of naming for nodes and fragments is confusing.

      We corrected this.

      (3) There is a duplicated paragraph in 'Short vs long trees' and the following section 'Productive involvement in hypermutation lineages depends on CDR3 characteristics.

      Corrected.

      Reviewer #1 (Recommendations For The Authors):

      - Figures:

      Figure 1A lacks resolution

      Corrected

      Figure 2A, Venn diagram: What do the colors indicate?

      Corrected

      Figure 5D, why include this tree when there is no mention of it in the text?

      Described

      Figures 8, 9, and 10 are not to be found. One should not have to figure out that they became supplementary in the end.

      Corrected

      Regarding the physicochemical properties of CDR-H3, what do the authors mean by "the central part"? Do the authors refer to the CDR-H3 loop, and if so, how is that defined when the IGHD gene segment is excluded from the analyses? Is it 5 amino acids (Productive involvement in hypermutating lineages depends on CDR3 characteristics, Page 21/39 in merged document) and (CDR3 properties, Page 8/39 in merged document), or 7 amino acids (Short vs long trees phylogeny analysis, Page 19/39 in merged document)? Please clarify.  

      We considered different ways of calculating amino acid properties of CDR3 and used different parameters for sample-average and individual-sequence CDR3s. Now plots for Fig S6 C are updated for consistency. IGHD segment was not excluded from the analysis. The reviewer might be confused by our description of phylogenetic inference, when an artificial outgroup with D segment deleted is added to the clonal group to facilitate the inference process. All other sequences were analyzed in their original form with the D segment. This way, we could avoid biases in phylogeny introduced by misassignment of D gene germline to the outgroup.

      What was the threshold for CDR-H3 identity in their analyses? How can the authors justify that this value changes according to individual patient datasets? (Materials & methods, Clonal lineage inference Page 29/39 in merged document).

      As described earlier, ideal similarity threshold may depend on several factors, such as sampling, sequencing depth etc. For example, imagine a sample picking up 100% of the clonal lineage sequences which differ only 1 amino acid from each other, and a worse quality sample/sequencing picking up only every other sequence. Obviously, the minimal threshold required to accumulate these into a clonotype would be different for these two cases (1aa for the former, and ~2 aa for the latter for single-linkage clustering). The method of individual threshold tailoring relies on this: https://changeo.readthedocs.io/en/latest/examples/cloning.html

      What is the difference between tumor-induced and tumor-infiltrating B cells? How can the authors discriminate between the two? Page 6/39 in the merged document.

      corrected to tumor-infiltrating

      "Added nucleotides" meaning N additions? Page 3/39 in the merged document.

      yes

      How many cancer patients were enrolled? 17 or 14(Materials & methods page 27/39 in the merged document)? Please clarify.   

      In the current project 14 patients were enrolled. The appropriate changes have been introduced in the final text. Supplementary table 2 has been added with the patient data.

      Abbreviations are used without full descriptions.

      According to reviewer’s recommendation, a list of abbreviations was added in the manuscript, and also full descriptions were added in the text upon first mentioning of the term.

      Use either CDR3 or CDR-H3

      We corrected the text to use CDR-H3 abbreviation throughout the text.

      Reviewer #2 (Recommendations For The Authors):

      I would like to start by apologizing for the time it took me to review.

      As I mentioned above there are issues with the clonal sampling of the sequencing length and the statistics in this paper. From reading the paper I am not sure if they are fixable but there are some things that could be tried.

      (1) The authors mention the diversity of their individual analysis - 17 individuals across 3 cancer types, but do not then systematically show us how the different things they measure track across the different individuals and cancer types. it is possible that some trends would be more convincing if we saw them happening again and again across all individuals. But, as I said above, the authors do not identify individuals clearly across all their types of analysis nor do they explain why sometimes they show analysis of specific individuals.

      For overlap analysis (Fig. 2 except panel B), CDR3 properties analysis (Fig. 3, Fig. S7), clonal group analysis (Fig. 4) we used pooled data on all cancers, unless it is indicated otherwise on the panel. For overlap analysis, we used Cytoscape graph (Fig. 2B) for one patient, mp3, to illustrate the findings that were made on pooled data. For other types of analysis, such as overlap between individual lymph nodes, or tumor fragments (Fig. 5, 6, 7 except panel F) pooled analysis is not possible due to the individual nature of the processes in question.

      (2) The authors do not address how lacking their sampling is nor the distribution of clone sizes in different tissues/ individuals/ subsets. Without such a discussion it is not clear how tenuous or convincing their conclusions are.

      (3) The short sequencing lengths limit the ability to exactly identify V and thus the germline root of clones, whose positions are mutated and clonal association of sequences. The authors appear to be aware of this as they often use the most common ancestor as the start of their analysis... however, again there are inconsistencies that are not clearly described in the text. in creating trees with change they defined roots as the putative germline and at least in most cases also in clone association although in some analyses potentially similar clones were collapsed into clonotypes. Again it is not clear when one method was used or the other and how the choice was made what to choose.

      Here we can only state that we consistently used the approach described in the Methods section, which was the following:

      First, the repertoires were clustered into clonal lineages using the criteria described in “Methods: Clonal lineage inference” Assuming that each clonotype sequence in the clonal lineage originated from the same ancestor, we try to recover the phylogeny. Please note that we refer to the individual BCR sequences as “clonotypes”, and to a group of clonotypes that presumably share a common ancestor - as “clonal lineage” or “clonal group”.

      The phylogeny of B-cell hypermutations was inferred for each clonal lineage of size five or more using the maximum likelihood method and the GTR GAMMA nucleotide substitution model. To find the most recent common ancestor (MRCA) or “root” of the tree, we used an artificial outgroup constructed as a conjugate of germline segments V and J defined by MIXCR and added it to the clonal lineage. The D segment was excluded from the outgroup formation, as there was insufficient confidence in the germline annotations due to its short length and high level of mutations. The rest of the clonotypes were still analyzed in their original form with D segment in place. Deleting D segment from the outgroup simply eliminates the risk of biasing the phylogeny by missasigning D segment germline sequence to the outgroup. The MUSCLE tool was used for multiple sequence alignment and RAxML software was used to build and root phylogenetic trees.

      (4) Beyond the statistical issues mentioned above: the unclear selection of individual examples for comparison and significance testing, the mixing of individuals and cancer types without clear identification, etc. there is in general a lack of coherence in the statistical analysis performed. specifically:

      (a) the authors should choose one cutoff for significance (0.01 for instance) and then just mention when things are significant and when not. There is no need and it is confusing to add the p-value for every comparison. P-values are not good measures of effect size.

      We corrected the figures and left p-values only where they are below significance threshold.

      (b) the Bonferroni correction used is not well characterized. For an alpha of 0.01 in Figures 3 C and D how many tests were performed?

      The number of tests performed that was used for Bonferroni-Holm correction equals the number of comparisons on the heatmap which makes it 39 for each heatmap on Fig 3C and 13 for Fig 3D.

      Finally some minor issues -

      (1) Not all acronyms are described, for instance, TME and TIL. The first time any acronym is used it should be spelled out.  -> Katya B- список сокращений

      (2) The figure captions are not all there...

      (a) there is no caption for Figure 3E.

      corrected

      (b) there are Figure 7 F and G panels but no Figure 7E panel and Figure F is described after Figure G.

      corrected

      (3) A few problems with wording -

      (a) bottom paragraph of page 3 - instead of :

      "different lymph nodes from one draining lymph node pool may be more or less involved"

      Corrected to "different lymph nodes from one draining lymph node pool may be differentially involved"

      (b) figure caption for figure 3a: instead of:

      "CDR3 are on average significantly higher in tumor"

      Corrected to "CDR3 are on average significantly longer in tumor"

      Reviewer #3 (Recommendations For The Authors):

      - FIG1A - Suggest expanding the legend to include more information on the computational analyses.

      added

      - PAGE SIX: Suggest adding a table or some text on patient characteristics. Numbers of unique clonotypes per sample etc. Are there differences in age/sex that need to be considered? Some clonotype information is available in S1 but some summary and statistics would be appreciated.

      Added patient information as Supplementary table 2.

      - PAGE SIX: F2 Metric, suggestion to explain why this was used vs. other metrics.

      We expanded the following paragraph to include information about F2 metric and D metric, and the reason why we are using F2.

      Repertoire data for each sample were split according to the immunoglobulin isotype, and the F2 metric was calculated for each isotype separately and plotted as an individual point. We used the repertoire overlap metric F2 (Сlonotype-wise sum of geometric mean frequencies of overlapping clonotypes), which accounts for both the number and frequency of overlapping clonotypes (Fig. 2A). As expected, significantly lower overlaps were observed between the IGH repertoires of peripheral blood and tumors compared to LN/tumor overlaps. The LN/PBMC overlap also tended to be lower, but the difference was not statistically significant. We also analyzed D metric, which represents the relative overlap diversity uninfluenced by clonotype frequency (Dij\=dij/(di*dj), where dij is the number of clonotypes present in both samples, while di and dj are the diversities of samples i and j respectively). The results for D metric are not shown, as they indicate a similar trend to that of F2 metric. This observation allows us to conclude that tumor IGH repertoires are more similar to the repertoires of tumor-draining LNs than to those of peripheral blood, both if clonotype frequency is taken into account, and when it is not.

      - PAGE SIX: Make clear in the text that mp3 is a patient.

      Added “melanoma patient mp3”

      - PAGE EIGHT: Suggest explaining kidera factors at first use - not all readers will know what they are.

      We expanded the following paragraph to add more information about Kidera factors:

      To explore CDR-H3 physicochemical properties, we calculated the mean charge, hydropathy, predicted interaction strength, and Kidera factors 1 - 9 (kf1-kf9) for five central amino acids of the CDR-H3 region for the 100 most frequent clonotypes of each sample using VDJtools. Kidera factors are a set of scores which quantify physicochemical properties of protein sequences 61. 188 physical properties of the 20 amino acids are encoded using dimension reduction techniques, to yield 9 factors which are used to quantitatively characterize physicochemical properties of amino acid sequences.

      - Fig 5D is not referred to.

      Corrected

    1. Author response:

      The following is the authors’ response to the original reviews

      eLife assessment 

      This valuable study aims to present a mathematical theory for why the periodicity of the hexagonal pattern of grid cell firing would be helpful for encoding 2D spatial trajectories. The idea is supported by solid evidence, but some of the comparisons of theory to the experimental data seem incomplete, and the reasoning supporting some of the assumptions made should be strengthened. The work would be of interest to neuroscientists studying neural mechanisms of spatial navigation. 

      We thank the reviewers for this assessment. We have addressed the comments made by reviewers and believe that the revised manuscript has theoretical and practical implications beyond the subfield of neuroscience concerned with mechanisms underpinning spatial memory and spatial navigation. Specifically, the demonstration that four simple axioms beget the spatial firing pattern of grid cells is highly relevant for the field of artificial intelligence and neuromorphic computing. This relevance stems from the fact that the four axioms define a set of four simple computational algorithms that can be implemented in future work in grid cell-inspired computational algorithms. Such algorithms will be impactful because they can perform path integration, a function that is independent of an animal’s or agent’s location and therefore generalizable. Moreover, because of the functional organization of grid cells into modules, the algorithm is also scalable. Generalizability and scalability are two highly sought-after properties of brain-inspired computational frameworks. We also believe that the question why grid cells emerge in the brain is a fundamental one. This manuscript is, to our knowledge, the first one that provides an interpretable and intuitive answer to why grid cells are observed in the brain. 

      Before addressing each comment, we would like to point out that the first sentence of the assessment appears misphrased. The study does not aim to present a theory for why the periodicity in grid cell firing would be helpful for encoding 2D spatial trajectories. To present a theory “for why grid cell firing would be helpful for encoding 2D trajectories”, one assumes the existence of grid cells a priori. Instead of assuming the existence of grid cells and deriving a computational function from grid cells, our study derives grid cells from a computational function, as correctly summarized by reviewers #1 and #3 in their individual statements. In contrast to previous normative models, we prove mathematically that spatial periodicity in grid cell firing is implied by a sequence code of trajectories. If the brain uses cell sequences to code for trajectories, spatially periodic firing must emerge. As correctly pointed out by reviewer #1, the underlying assumptions of this study are that the brain codes for trajectories and that it does so using cell sequences. In response to comments by reviewer #1, we now discuss these two assumptions more rigorously.

      Public Reviews:

      Reviewer #1 (Public Review): 

      Rebecca R.G. et al. set to determine the function of grid cells. They present an interesting case claiming that the spatial periodicity seen in the grid pattern provides a parsimonious solution to the task of coding 2D trajectories using sequential cell activation. Thus, this work defines a probable function grid cells may serve (here, the function is coding 2D trajectories), and proves that the grid pattern is a solution to that function. This approach is somewhat reminiscent in concept to previous works that defined a probable function of grid cells (e.g., path integration) and constructed normative models for that function that yield a grid pattern. However, the model presented here gives clear geometric reasoning to its case. 

      Stemming from 4 axioms, the authors present a concise demonstration of the mathematical reasoning underlying their case. The argument is interesting and the reasoning is valid, and this work is a valuable addition to the ongoing body of work discussing the function of grid cells. 

      However, the case uses several assumptions that need to be clearly stated as assumptions, clarified, and elaborated on: Most importantly, the choice of grid function is grounded in two assumptions: 

      (1) that the grid function relies on the activation of cell sequences, and 

      (2) that the grid function is related to the coding of trajectories. While these are interesting and valid suggestions, since they are used as the basis of the argument, the current justification could be strengthened (references 28-30 deal with the hippocampus, reference 31 is interesting but cannot hold the whole case). 

      We thank this reviewer for the overall positive and constructive criticism. We agree with this reviewer that our study rests on two premises, namely that 1) a code for trajectories exist, and 2) this code is implemented by cell sequences. We now discuss and elaborate on the data in the literature supporting the two premises.

      In addition to the work by Zutshi et al. (reference 31 in the original manuscript), we have now cited additional work presenting experimental evidence for sequential activity of neurons in the medial entorhinal cortex, including sequential activity of grid cells.

      We have added the following paragraph to the Discussion section:

      “Recent studies provided compelling evidence for sequential activity of neurons representing spatial trajectories. In particular, Gardner et al. (2022) demonstrated that the sequential activity of hundreds of simultaneously recorded grid cells in freely foraging rats represented spatial trajectories. Complementary preliminary results indicate that grid cells exhibit left-rightalternating “theta sweeps,” characterized by temporally compressed sequences of spiking activity that encode outwardly oriented trajectories from the current location (Vollan et al., 2024).

      The concept of sequential grid cell activity extends beyond spatial coding. In various experimental contexts, grid cells have been shown to encode non-spatial variables. For instance, in a stationary auditory task, grid cells fired at specific sounds along a continuous frequency axis (Aronov et al., 2017). Further studies revealed that grid cell sequences also represent elapsed time and distance traversed, such as during a delay period in a spatial alternation task (Kraus et al., 2015). Similar findings were reported for elapsed time encoded by grid cell sequences in mice performing a virtual “Door Stop” task (Heys and Dombeck, 2018).

      Additionally, spatial trajectories represented by temporally compressed grid cell sequences have been observed during sleep as replay events (Ólafsdóttir et al., 2016; O’Neill et al., 2017). Collectively, these studies demonstrate that sequential activity of neurons within the MEC, particularly grid cells, consistently encodes ordered experiences, suggesting a fundamental role for temporal structure in neuronal representations.

      The theoretical underpinnings of grid cell activity coding for ordered experiences have been explored previously by Rueckemann et al. (2021) who argued that the temporal order in grid cell activation allows for the construction of topologically meaningful representations, or neural codes, grounded in the sequential experience of events or spatial locations. However, while Rueckemann et al. argue that the MEC supports temporally ordered representations through grid cell activity, our findings suggest an inverse relationship: namely, that grid cell activity emerges from temporally ordered spatial experiences. Additional studies demonstrate that hippocampal place cells may derive their spatial coding properties from higher-order sequence learning that integrates sensory and motor inputs (Raju et al., 2024) and that hexagonal grids, if assumed a priori, optimally encode transitions in spatiotemporal sequences (Waniek, 2018).

      Together, experimental and theoretical evidence demonstrate the significance of sequential neuronal activity within the hippocampus and entorhinal cortex as a core mechanism for representing both spatial and temporal information and experiences.”

      The work further leans on the assumption that sequences in the same direction should be similar regardless of their position in space, it is not clear why that should necessarily be the case, and how the position is extracted for similar sequences in different positions. 

      We thank this reviewer for giving us the opportunity to clarify this point. We define a trajectory as a path taken in space (Definition 6). By this definition, a code for trajectories is independent of the animal’s spatial location. This is consistent with the definition of path integration, which is also independent of an animal’s spatial location. If the number of neurons is finite (Axiom #4) and the space is large, sequences must eventually repeat in different locations. This results in neural sequences coding for the same directions being identical at different locations. We have clarified this point under new Remark 6.1. in the Results section of the revised:

      “Remark 6.1. Note that a code for trajectories is independent of the animal’s spatial location, consistent with the definition of path integration. This implies that, if the number of neurons is finite (Axiom #4) and the space is large, sequences must eventually repeat in different location, resulting in neural sequences coding for the same trajectories at different locations.”

      The formal proof was already included in the original manuscript: “Generally speaking, starting in a firing field of element i and going along any set of firing fields, some element must eventually become active again since the total number of elements is finite by axiom 4. Once there is a repeat of one element’s firing field, the whole sequence of firing fields of all elements must repeat by axiom 1. More specifically, if we had a sequence 1,2, … , k, 1, t of elements, then 1,2 and 1, t both would code for traveling in the same direction from element 1, contradicting axiom 1.”

      Further: “More explicitly, assuming axioms 1 and 4, the firing fields of trajectory-coding elements must be spatially periodic, in the sense that starting at any point and continuing in a single direction, the initial sequence of locally active elements must eventually repeat with a repeat length of at least 3”.

      Regarding the question how an animal’s position is extracted for similar sequences in different positions, we agree with this reviewer that this is an important question when investigating the contributions of grid cells to the coding of space. However, since a code for trajectories is independent of spatial location, the question of how to extract an animal’s position from a trajectory code is irrelevant for this study.

      While a trajectory code by neural sequences begets grid cells, a spatial code by neural sequences does not. Nevertheless, grid cells could contribute to the coding of space (in addition to providing a trajectory code). However, while experimental evidence from studies with rodents and human subjects and theoretical work demonstrated the importance of grid cells for path integration (Fuhs and Touretzky, 2006; McNaughton et al., 2006; Moser et al., 2017), experimental studies have shown that grid cells contribute little to the coding of space by place cells (Hales et al., 2014). Yet, theoretical work (Mathis et al., 2012) showed that coherent activity of grid cells across different modules can provide a code for spatial location that is more accurate than spatial coding by place cells in the hippocampus. Importantly, such a spatial code by coherent activity across grid cell modules does not require location-dependent differences in neural sequences.

      The authors also strengthen their model with the requirement that grid cells should code for infinite space. However, the grid pattern anchors to borders and might be used to code navigated areas locally. Finally, referencing ref. 14, the authors claim that no existing theory for the emergence of grid cell firing that unifies the experimental observations on periodic firing patterns and their distortions under a single framework. However, that same reference presents exactly that - a mathematical model of pairwise interactions that unifies experimental observations. The authors should clarify this point. 

      We thank this reviewer for this valuable feedback. We agree that grid cells anchor to borders and may be used to code navigated areas locally. In fact, the trajectory code performs a local function, namely path integration, and the global grid pattern can only emerge from performing this local computation if the activity of at least one grid unit or element (we changed the wording from unit to element based on feedback from reviewer #3) is anchored to either a spatial location or a border. Yet, the trajectory code itself does not require anchoring to a reference frame to perform local path integration. Because of the local nature of the trajectory code, path integration can be performed locally without the emergence of a global grid pattern. This has been shown experimentally in mice performing a path integration task where changes in the location of a task-relevant object resulted in translations of grid patterns in single trials. Although no global grid pattern was observed, grid cells performed path integration locally within the multiple reference frames defined by the task-relevant object, and grid patterns were visible when the changes in the references frames were accounted for in computing the rate maps (Peng et al., 2023). The data by Peng et al. (2023) confirm that the anchoring of the grid pattern to borders and the emergence of the global pattern are not required for local coding of trajectories. The global pattern emerges only when the reference frame does not change. However, this global pattern itself might not serve any function. According to the trajectory code model, the beguiling grid pattern is merely a byproduct of a local path integration function that is independent of the animal’s current location (which makes the code generalizable across space). The reviewer is correct that, if the reference frame used to anchor the grid pattern did not change in infinite space, the trajectory code model of grid cell firing would predict an infinite global pattern. But does the proof implicitly assume that space is infinite? The trajectory code model makes the quantitative prediction that the field size increases linearly with an increase in grid spacing (the distance between two fields). If the field size remains fixed, periodicity will emerge in finite spaces that are larger than the grid spacing. We have clarified these points in the revised manuscript:

      “Notably, the trajectory code itself does not require anchoring to a reference frame to perform local path integration. Because of the local nature of the trajectory code, path integration can be performed locally without the emergence of a global grid pattern. This has been shown experimentally in mice performing a path integration task where changes in the location of a task-relevant object resulted in translations of grid patterns in single trials (Peng et al., 2023). Although no global grid pattern was observed because the reference frame was not fixed in space, grid cells performed path integration locally within the reference frame defined by the moving task-relevant object, and grid patterns were visible when the changes in the references frames were accounted for in computing the rate maps”.

      Regarding how the emergence of grid cells from a trajectory code relates to the theory of a local code by grid cells brought forward by Ginosar et al. (ref. 14), we argue that the local computational function suggested by Ginosar et al. is to provide a code for trajectories. The perspective article by Ginosar et al. provides an excellent review of the experimental data on grid cells that point to grid cells performing a local function (see also Kate Jeffery’s excellent review article (Jeffery, 2024) on the mosaic structure of the mammalian cognitive map.) Assuming the existence of grid cells a priori, Ginosar et al. then propose three possible functions of grid cells, all of which are consistent with the trajectory code model of grid cell firing. Yet, the perspective article remains agnostic, in our opinion, on the exact nature of the local computation that is carried out by grid cells. But without knowing the local computation underlying grid cell function, a unifying theory explaining the emergence of grid cells cannot be considered complete. In contrast, our manuscript identifies the local computational function as a trajectory code by cell sequences. We have clarified these points in the revised manuscript:

      “The influential hypothesis that grid cells provide a universal map for space is challenged by experimental data suggesting a yet to be identified local computational function of grid cells (Ginosar et al., 2023; Jeffery, 2024). Here, we identify this local computational function as a trajectory code.”

      The mathematical model of pairwise interactions described by Ginosar et al. is fundamentally different from the mathematical framework developed in our manuscript. The mathematical model by Ginosar et al. describes how pairwise interactions between already existent grid fields can explain distortions in the grid pattern caused by the environment’s geometry, reward zones, and dimensionality. However, the model does not explain why there is a grid pattern in the first place. In contrast, our trajectory model provides an explanation for why grid cells may exist by demonstrating that a grid pattern emerges from a trajectory code by cell sequences. We stand by our assessment that a unifying theory of grid cells is not complete if it takes the existence of the grid pattern for granted.

      Reviewer #2 (Public Review): 

      Summary: 

      In this work, the authors consider why grid cells might exhibit hexagonal symmetry - i.e., for what behavioral function might this hexagonal pattern be uniquely suited? The authors propose that this function is the encoding of spatial trajectories in 2D space. To support their argument, the authors first introduce a set of definitions and axioms, which then lead to their conclusion that a hexagonal pattern is the most efficient or parsimonious pattern one could use to uniquely label different 2D trajectories using sequences of cells. The authors then go through a set of classic experimental results in the grid cell literature - e.g. that the grid modules exhibit a multiplicative scaling, that the grid pattern expands with novelty or is warped by reward, etc. - and describe how these results are either consistent with or predicted by their theory. Overall, this paper asks a very interesting question and provides an intriguing answer. However, the theory appears to be extremely flexible and very similar to ideas that have been previously proposed regarding grid cell function. 

      We thank this reviewer for carefully reading the manuscript and their valuable feedback which helps us clarify major points of the study. One major clarification is that the theoretical/axiomatic framework we put forward does not assume grid cells a priori. In contrast, we start by hypothesizing a computational function that a brain region shown to be important for path integration likely needs to solve, namely coding for spatial trajectories. We go on to show that this computational function begets spatially periodic firing (grid maps). By doing so, we provide mathematical proof that grid maps emerge from solving a local computational function, namely spatial coding of trajectories. Showing the emergence of grid maps from solving a local computational function is fundamentally different from many previous studies on grid cell function, which assign potential functions to the existing grid pattern. As we discuss in the manuscript, our work is similar to using normative models of grid cell function. However, in contrast to normative models, we provide a rigorous and interpretable mathematical framework which provides geometric reasoning to its case.

      Major strengths: 

      The general idea behind the paper is very interesting - why *does* the grid pattern take the form of a hexagonal grid? This is a question that has been raised many times; finding a truly satisfying answer is difficult but of great interest to many in the field. The authors' main assertion that the answer to this question has to do with the ability of a hexagonal arrangement of neurons to uniquely encode 2D trajectories is an intriguing suggestion. It is also impressive that the authors considered such a wide range of experimental results in relation to their theory.  

      We thank this reviewer for pointing out the significance of the question addressed by our manuscript.

      Major weaknesses: 

      One major weakness I perceive is that the paper overstates what it delivers, to an extent that I think it can be a bit confusing to determine what the contributions of the paper are. In the introduction, the authors claim to provide "mathematical proof that ... the nature of the problem being solved by grid cells is coding of trajectories in 2-D space using cell sequences. By doing so, we offer a specific answer to the question of why grid cell firing patterns are observed in the mammalian brain." This paper does not provide proof of what grid cells are doing to support behavior or provide the true answer as to why grid patterns are found in the brain. The authors offer some intriguing suggestions or proposals as to why this might be based on what hexagonal patterns could be good for, but I believe that the language should be clarified to be more in line with what the authors present and what the strength of their evidence is. 

      We thank this reviewer for this assessment. While there is ample experimental evidence demonstrating the importance of grid cells for path integration, we agree with this reviewer that there may be other computational functions that may require or largely benefit from the existence of grid cells. We now acknowledge the fact that we have provided a likely teleological cause for the emergence of grid cells and that there might be other causes for the emergence of grid cells. We have changed the wording in the abstract and discussion sections to acknowledge that our study does provide a likely teleological cause. We choose “likely” because the computational function – trajectory coding – from which grid maps emerge is very closely associated to path integration, which numerous experimental and theoretical studies associate with grid cell function.

      Relatedly, the authors claim that they find a teleological reason for the existence of grid cells - that is, discover the function that they are used for. However, in the paper, they seem to instead assume a function based on what is known and generally predicted for grid cells (encode position), and then show that for this specific function, grid cells have several attractive properties. 

      We agree with this reviewer that we leveraged what is known about grid cells, in particular their importance for path integration, in finding a likely teleological cause. However, the major significance of our work is that we demonstrate that coding for spatial trajectories requires spatially periodic firing (grid cells).This is very different from assuming the existence of grid cells a priori and then showing that grid cells have attractive, if not optimal, properties for this function. If we had shown that grid cells optimized a code for trajectories, this reviewer would be correct: we would have suggested just another potential function of grid cells. Instead, we provide both proof and intuition that trajectory coding by cell sequences begets grid cells (not the other way around), thereby providing a likely teleological cause for the emergence of grid cells. As stated above, we clarified in the revised manuscript that we provide a likely teleological cause which requires additional experimental verification.

      There is also some other work that seems very relevant, as it discusses specific computational advantages of a grid cell code but was not cited here: https://www.nature.com/articles/nn.2901

      We thank this reviewer for pointing us toward this article by (Sreenivasan and Fiete, 2011). The revised manuscript now cites this article in the Introduction and Discussion sections. We agree that the article by (Sreenivasan and Fiete, 2011) discusses a specific computational advantage of a population code by grid cells, namely unprecedented robustness to noise in estimating the location from the spiking information of noisy neurons. However, the work by (Sreenivasan and Fiete, 2011) differs from our work in that the authors assume the existence of grid cells a priori.

      In addition, we now discuss other relevant work, namely work on the conformal isometry hypothesis  by (Schøyen et al., 2024) and (Xu et al., 2024), published as pre-prints after publication of the first version of our manuscript, as well as work on transition scale- spaces by Nicolai Waniek. (Xu et al., 2024) and (Schøyen et al., 2024) investigate conformal isometry in the coding of space by grid cells. Conformal isometry means that trajectories in neural space map trajectories in physical space. (Xu et al., 2024) show that the conformal isometry hypothesis can explain the spatially periodic firing pattern of grid cells. (Schøyen et al., 2024) further show that a module of seven grid cells emerges if space is encoded as a conformal isometry, ensuring equal representation in all directions. While the work by (Xu et al., 2024) and (Schøyen et al., 2024) arrive at very similar conclusions as stated in the current manuscript, the conformal isometry hypothesis provides only a partial answer to why grid cells exist because it doesn’t explain why conformal isometry is important or required. In contrast, a sequence code of trajectories provides an intuitive answer to why such a code is important for animal behavior. Furthermore, we included the work by Nicolai Waniek, (2018, 2020) in the Discussion, who demonstrated that the hexagonal arrangement of grid fields is optimal for coding transitions in space. 

      The paragraph added to the Discussion reads as follows:

      “As part of the proof that a trajectory code by cell sequences begets spatially periodic firing fields, we proved that the centers of the firing fields must be arranged in a hexagonal lattice. This arrangement implies that the neural space is a conformally isometric embedding of physical space, so that local displacements in neural space are proportional to local displacements of an animal or agent in physical space, as illustrated in Figure 5. This property has recently been introduced in the grid cell literature as the conformal isometry hypothesis(Schøyen et al., 2024; Xu et al., 2024). Strikingly, Schøyen et al.(Schøyen et al., 2024) arrive at similar if not identical conclusions regarding the geometric principles in the neural representations of space by grid cells.”

      A second major weakness was that some of the claims in the section in which they compared their theory to data seemed either confusing or a bit weak. I am not a mathematician, so I was not able to follow all of the logic of the various axioms, remarks, or definitions to understand how the authors got to their final conclusion, so perhaps that is part of the problem. But below I list some specific examples where I could not follow why their theory predicted the experimental result, or how their theory ultimately operated any differently from the conventional understanding of grid cell coding. In some cases, it also seemed that the general idea was so flexible that it perhaps didn't hold much predictive power, as extra details seemed to be added as necessary to make the theory fit with the data. 

      I don't quite follow how, for at least some of their model predictions, the 'sequence code of trajectories' theory differs from the general attractor network theory. It seems from the introduction that these theories are meant to serve different purposes, but the section of the paper in which the authors claim that various experimental results are predicted by their theory makes this comparison difficult for me to understand. For example, in the section describing the effect of environmental manipulations in a familiar environment, the authors state that the experimental results make sense if one assumes that sequences are anchored to landmarks. But this sounds just like the classic attractornetwork interpretation of grid cell activity - that it's a spatial metric that becomes anchored to landmarks. 

      We thank this reviewer for giving us the opportunity to clarify in what aspects the ‘sequence code of trajectories’ theory of grid cell firing differs from the classic attractor network models, in particular the continuous attractor network (CAN) model. First of all, the CAN model is a mechanistic model of grid cell firing that is specifically designed to simulate spatially periodic firing of grid cells in response to velocity inputs. In contrast, the sequence code of trajectories theory of grid cell firing resembles a normative model showing that grid cells emerge from performing a specific function. However, in contrast to previous normative models, the sequence code of trajectories model grounds the emergence of grid cell firing in a mathematical proof and both geometric reasoning and intuition. The proof demonstrates that the emergence of grid cells is the only solution to coding for trajectories using cell sequences. The sequence code of trajectories model of grid cell firing is agnostic about the neural mechanisms that implements the sequence code in a population of neurons. One plausible implementation of the sequence code of trajectories is in fact a CAN. In fact, the sequence code of trajectories theory predicts conformal isometry in the CAN, i.e., a trajectory in neural space is proportional to a trajectory of an animal in physical space. However, other mechanistic implementations are possible. We have clarified how the sequence code of trajectories theory of grid cells relates to the mechanistic CAN models of grid cells. 

      We added the following text to the Discussion section:

      “While the sequence code of trajectories-model of grid cell firing is agnostic about the neural mechanisms that implements the sequence code, one plausible implementation is a continuous attractor network (McNaughton et al., 2006; Burak and Fiete, 2009). Interestingly, a sequence code of trajectories begets conformal isometry in the attractor network, i.e., a trajectory in neural space is proportional to a trajectory of an animal in physical space.”

      It was not clear to me why their theory predicted the field size/spacing ratio or the orientation of the grid pattern to the wall. 

      We thank this reviewer for bringing to our attention that we lacked a proper explanation for why the sequence code of trajectories theory predicts the field size/spacing ration in grid maps. We have modified/added the following text to the Results section of the manuscript to clarify this point:

      “Because the sequence code of trajectories model of grid cell firing implies a dense packing of firing fields, the spacing between two adjacent grid fields must change linearly with a change in field size. It follows that the ratio between grid spacing and field size is fixed. When using the distance between the centers of two adjacent grid fields to measure grid spacing and a diameter-like metric to measure grid field size, we can compute the ratio of grid spacing to grid field size as √7≈2.65 (see Methods).”

      We are also grateful for this reviewer’s correctly pointing out that the explanation as to why the sequence code of trajectories predicts a rotation of the grid pattern relative to a set of parallel walls in a rectangular environment. We have now made explicit the underlying premise that a sequence of firing fields from multiple grid cells are aligned in parallel to a nearby wall of the environment. We cite additional experimental evidence supporting this premise. Concretely, we quote Stensola and Moser summarizing results reported in (Stensola et al. 2015): “A surprising observation, however, was that modules typically assumed one of only four distinct orientation configurations relative to the environment” (Stensola and Moser, 2016). Importantly, all of the four distinct orientations show the characteristic angular rotation. Intriguingly, this is predicted by the sequence code of trajectories-model under the premise that a sequence of firing fields aligns with one of the geometric boundaries of the environment, as shown in Author response image 1 below.

      Author response image 1.

      Under the premise that a sequence of firing fields aligns with one of the geometric boundaries (walls) of a square arena, there are precisely four possible distinct configurations of orientations. This is precisely what has been observed in experiments (Stensola et al., 2015; Stensola and Moser, 2016).

      We added clarifying language to the Results section: “Under the premise that a sequence of firing fields aligns with one of the geometric boundaries of the environment, the sequence code model explains that the grid pattern typically assume one of only four distinct orientation configurations relative to the environment41,46. Concretely, the four orientation configurations arise when one row of grid fields aligns with one of the two sets of parallel walls in a rectangular environment, and each arrangement can result in two distinct orientations (Figure 3B).”

      I don't understand how repeated advancement of one unit to the next, as shown in Figure 4E, would cause the change in grid spacing near a reward. 

      In familiar environments, spatial firing fields of place cells in hippocampal CA1 and CA3 tend to shift backwards with experience (Mehta et al., 2000; Lee et al., 2004; Roth et al., 2012; Geiller et al., 2017; Dong et al., 2021). This implies that the center of place fields move closer to each other. A potential mechanism has been suggested, namely NMDA receptor-dependent longterm synaptic plasticity (Ekstrom et al., 2001). When we apply the same principle observed for place fields on a linear track to grid fields anchored to a reward zone, grid fields will “gravitate” towards the reward side. A similar idea has been presented by (Ginosar et al., 2023) who use the analogy of reward locations as “black holes”. In contrast to (Ginosar et al., 2023), who we cite multiple times, our idea unifies observations on place cells and grid cells in 1-D and 2-D environments and suggests a potential mechanism. We changed the wording in the revised manuscript and clarified the underlying premises.

      I don't follow how this theory predicts the finding that the grid pattern expands with novelty. The authors propose that this occurs because the animals are not paying attention to fine spatial details, and thus only need a low-resolution spatial map that eventually turns into a higher-resolution one. But it's not clear to me why one needs to invoke the sequence coding hypothesis to make this point. 

      We agree with this reviewer that this point needs clarification. The sequence code model adds explanatory power to the hypothesis that the grid pattern in a novel environment reflects a lowresolution mapping of space or spatial trajectories because it directly links spatial resolution to both field size and spacing of a grid map. Concretely, the spatial resolution of the trajectory code is equivalent to the spacing between two adjacent spatial fields, and the spatial resolution is directly proportional to the grid spacing and field size. If one did not evoke the sequence coding hypothesis, one would need to explain how and why both spacing and field size are related to the spatial resolution of the grid map. Lastly, as written in the manuscript text, we point out that, while the experimentally observed expansion of grid maps is consistent with the sequence code of trajectory, it is not predicted by the theory without making further assumption. 

      The last section, which describes that the grid spacing of different modules is scaled by the square root of 2, says that this is predicted if the resolution is doubled or halved. I am not sure if this is specifically a prediction of the sequence coding theory the authors put forth though since it's unclear why the resolution should be doubled or halved across modules (as opposed to changed by another factor). 

      We agree with reviewer #2 that the exact value of the scaling factor is not predicted by the sequence coding theory. E.g., the sequence code theory does not explain why the spatial resolution doesn’t change by a factor 3 or 1.5 (resulting in changes in grid spacing by square root of 3 or square root of 1.5, respectively). We have changed the wording to reflect this important point. We further clarified in the revised manuscript that future work on multiscale representations using modules of grid cells needs to show why changing the spatial resolution across modules by a factor of 2 is optimal. Interestingly, a scale ratio of 2 is commonly used in computer vision, specifically in the context of mipmapping and Gaussian pyramids, to render images across different scales. Literature in the computer vision field describes why a scaling factor of 2 and the use of Gaussian filter kernels (compare with Gaussian firing fields) is useful in allowing a smooth and balanced transition between successive levels of an image pyramid (Burt and Adelson, 1983; Lindeberg, 2008). Briefly, larger factors (like 3) could result in excessive loss of detail between levels, while smaller factors (like 1.5) would not reduce the image size enough to justify additional levels of computation (that would come with the structural cost of having more grid cell modules in the brain). We have clarified these points in the Discussion section.

      Reviewer #3 (Public Review): 

      The manuscript presents an intriguing explanation for why grid cell firing fields do not lie on a lattice whose axes aligned to the walls of a square arena. This observation, by itself, merits the manuscript's dissemination to the eLife's audience. 

      We thank this reviewer for their positive assessment.

      The presentation is quirky (but keep the quirkiness!). 

      We kept the quirkiness.

      But let me recast the problem presented by the authors as one of combinatorics. Given repeating, spatially separated firing fields across cells, one obtains temporal sequences of grid cells firing. Label these cells by integers from $[n]$. Any two cells firing in succession should uniquely identify one of six directions (from the hexagonal lattice) in which the agent is currently moving. 

      Now, take the symmetric group $\Sigma$ of cyclic permutations on $n$ elements.  We ask whether there are cyclic permutations of $[n]$ such that 

      \left(\pi_{i+1} - \pi_i \right) \mod n \neq \pm 1 \mod n, \; \forall i. 

      So, for instance, $(4,2,3,1)$ would not be counted as a valid permutation of $(1,2,3,4)$, as $(2,3)$ and $(1,4)$ are adjacent. 

      Furthermore, given $[n]$, are there two distinct cyclic permutations such that {\em no} adjacencies are preserved when considering any pair of permutations (among the triple of the original ordered sequence and the two permutations)? In other words, if we consider the permutation required to take the first permutation into the second, that permutation should not preserve any adjacencies. 

      {\bf Key question}: is there any difference between the solution to the combinatorics problem sketched above and the result in the manuscript? Specifically, the text argues that for $n=7$ there is only {\em one} solution. 

      Ideally, one would strive to obtain a closed-form solution for the number of such permutations as a function of $n$.  

      This is a great question! We currently have a student working on describing all possible arrangements of firing fields (essentially labelings of the hexagonal lattice) that satisfy the axioms in 2D, and we expect that results on the number of such arrangements will come out of his work. We plan to publish those results separately, possibly targeting a more mathematical audience.   

      The argument above appears to only apply in the case that every row (and every diagonal) contains all of the elements 1,...,n. However, when n is not prime, there are often arrangements where rows and/or diagonals do not contain every element from 1,...,n. For example, some admissible patterns with 9 neurons have a repeat length of 3 in all directions (horizontally and both diagonals). As a result the construction listed here will not give a full count of all possible arrangements. 

      Recommendations for the authors:  

      Reviewer #1 (Recommendations For The Authors): 

      I think the concise style of mathematical proof is both a curse and a blessing. While it delivers the message, I think the fluency and readability of the mathematical proof could be improved with longer paragraphs and some more editing. 

      We have added some clarifications in the text that we hope improve the readability.

      Reviewer #3 (Recommendations For The Authors): 

      A minor qualm I have with the nomenclature: 

      On page 7: 

      “To prove this statement, suppose that row A consists of units $1, \dots , k$ repeating in this order. Then any row that contains any unit from $1, \dots, k$ must contain the full repeat $1, \dots , k$ by axiom 1. So any row containing any unit from $1,\dots , k$ is a translation of row A, and any unit that does not contain them is disjoint from row A.”

      The last use of `unit' at the end of this paragraph instead of `row' is confusing. Technically, the authors have given themselves license to use this term by defining a unit to be “either to a single cell or a cell assembly”. Yet modern algebra tends to use `unit' as meaning a ring element that has an inverse.  

      We have renamed “unit” to “element” to avoid confusion with the terminology in modern algebra.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      The authors examine how probabilistic reversal learning is affected by dopamine by studying the effects of methamphetamine (MA) administration. Based on prior evidence that the effects of pharmacological manipulation depend on baseline neurotransmitter levels, they hypothesized that MA would improve learning in people with low baseline performance. They found this effect, and specifically found that MA administration improved learning in noisy blocks, by reducing learning from misleading performance, in participants with lower baseline performance. The authors then fit participants' behavior to a computational learning model and found that an eta parameter, responsible for scaling learning rate based on previously surprising outcomes, differed in participants with low baseline performance on and off MA.

      Questions:

      (1) It would be helpful to confirm that the observed effect of MA on the eta parameter is responsible for better performance in low baseline performers. If performance on the task is simulated for parameters estimated for high and low baseline performers on and off MA, does the simulated behavior capture the main behavioral differences shown in Figure 3?

      We thank the reviewer for this suggestion. We agree that the additional simulation provides valuable confirmation of the effect of methamphetamine (MA) on the eta parameter and subsequent choice behavior. Using individual maximum likelihood parameter estimates, we simulated task performance and confirmed that the simulated behavior reflects the observed mean behavioral differences. Specifically, the simulation demonstrates that MA increases performance later in learning for stimuli with less predictable reward probabilities, particularly in subjects with low baseline performance (mean ± SD: simPL low performance: 0.69 ± 0.01 vs. simMA low performance: 0.72 ± 0.01; t(46) = -2.00, p = 0.03, d = 0.23).

      We have incorporated this analysis into the manuscript. Specifically, we added a new figure to illustrate these findings and updated the text accordingly. Below, we detail the changes made to the manuscript.

      From the manuscript page 12, line 25:

      “Sufficiency of the model was evaluated through posterior predictive checks that matched behavioral choice data (see Figure 4D-F and Figure 5) and model validation analyses (see Supplementary Figure 2). Specifically, using individual maximum likelihood parameter estimates, we simulated task performance and confirmed that MA increases performance later in learning for stimuli with less predictable reward probabilities, particularly in subjects with low baseline performance (Figure 5A; mean ± SD: simPL low performance: 0.69 ± 0.01 vs. simMA low performance: 0.72 ± 0.01; t(46) = -2.00, p = 0.03, d = 0.23).”

      (2) In Figure 4C, it appears that the main parameter difference between low and high baseline performance is inverse temperature, not eta. If MA is effective in people with lower baseline DA, why is the effect of MA on eta and not IT?

      Thank you for raising this important point. It is correct that the primary difference between the low and high baseline performance groups in the placebo session lies in the inverse temperature (mean(SD); low baseline performance: 2.07 (0.11) vs. high baseline performance: 2.95 (0.07); t(46) = -5.79, p = 5.8442e-07, d = 1.37). However, there is also a significant difference in the eta parameter between these groups during the placebo session (low baseline performance: 0.33 (0.02) vs. baseline performance: 2.07 (0.11243) vs. high baseline performance: 0.25 (0.02); t(46) = 2.59, p = 0.01, d = 0.53).

      Interestingly, the difference in eta is resolved by MA (mean(SD); low baseline performance: 0.24 (0.02) vs. high baseline performance: 0.23 (0.02); t(46) = 0.39, p = 0.70, d = 0.08), while the difference in inverse temperature remains unaffected (mean(SD); low baseline performance: 2.16 (0.11) vs. high baseline performance: 2.99 (0.08); t(46) = -5.38, p < .001, d = 1.29). Moreover, we checked the distribution of the inverse temperature estimates on/offdrug to ensure the absent drug effect is not driven by outliers. Here, we do not observe any descriptive drug effect (see Author response image 1). Additionally, non-parametric tests indicate no drug effect (Wilcoxon signed-rank test; across groups: zval = -0.59; p = 0.55; low baseline performance: zval = -0.54; p = 0.58; high baseline performance: zval = -0.21; p = 0.83).

      Author response image 1.

      Inverse temperature distribution on/off drug suggest that this parameter is not affected by the drug. Inverse temperature for low (blue points) and high (yellow points) baseline performer tended to be not affected by the drug effect (Wilcoxon signed-rank test; across groups: zval = -0.59; p = 0.55; low baseline performance: zval = -0.54; p = 0.58; high baseline performance: zval = -0.21; p = 0.83).

      This pattern of results might suggests that MA specifically affects eta but not other parameters like the inverse temperature, pointing to a selective influence on a single computational mechanism. To verify this conclusion, we extended the winning model by allowing each parameter in turn to be differentially estimated for MA and placebo, while keeping other parameters fixed to the group (low and high baseline performance) mean estimates of the winning model fit to chocie behaviour of the placebo session.

      These control analyses confirmed that MA does not affect inverse temperature in either the low baseline performance group or the high baseline performance group. Similarly, MA did not affect the play bias or learning rate intercept parameter. Yet, it did affect eta in the low performer group (see supplementary table 1 reproduced below).

      Taken together, our data suggest that only the parameter controlling dynamic adjustments of the learning rate based on recent prediction errors, eta, was affected by our pharmacological manipulation and that the paremeters of our models did not trade off. A similar effect has been observed in a previous study investigating the effects of catecholaminergic drug administration in a probabilistic reversal learning task (Rostami Kandroodi et al., 2021). In that study, the authors demonstrated that methylphenidate influenced the inverse learning rate parameter as a function of working memory span, assessed through a baseline cognitive task. Similar to our findings, they did not observe drug effects on other parameters in their model including the inverse temperature.

      We have updated the section of the manuscript where we discuss the difference in inverse temperature between low and high performers in the task. From the manuscript (page 19, line 13):

      “While eta seemed to account for the differences in the effects of MA on performance in our low and high performance groups, it did not fully explain all performance differences across the two groups (see Figure 1C and Figure 7A/B). When comparing other model parameters between low and high baseline performers across drug sessions, we found that high baseline performers displayed higher overall inverse temperatures (2.97(0.05) vs. 2.11 (0.08); t(93) = 7.94, p < .001, d = 1.33). This suggests that high baseline performers displayed higher transfer of stimulus values to actions leading to better performance (as also indicated by the positive contribution of this parameter to overall performance in the GLM). Moreover, they tended to show a reduced play bias (-0.01 (0.01) vs. 0.04 (0.03); t(93) = -1.77, p = 0.08, d = 0.26) and increased intercepts in their learning rate term (-2.38 (0.364) vs. -6.48 (0.70); t(93) = 5.03, p < .001, d = 0.76). Both of these parameters have been associated with overall performance (see Figure 6A). Thus, overall performance difference between high and low baseline performers can be attributed to differences in model parameters other than eta. However, as described in the previous paragraph, differential effects of MA on performance on the two groups were driven by eta.

      This pattern of results suggests that MA specifically affects the eta parameter while leaving other parameters, such as the inverse temperature, unaffected. This points to a selective influence on a single computational mechanism. To verify this conclusion, we extended the winning model by allowing each parameter, in turn, to be differentially estimated for MA and PL, while keeping the other parameters fixed at the group (low and high baseline performance) mean estimates of the winning model for the placebo session. These control analyses confirmed that MA affects only the eta parameter in the low-performer group and that there is no parameter-trade off in our model (see Supplementary Table 1). A similar effect was observed in a previous study investigating the effects of catecholaminergic drug administration on a probabilistic reversal learning task (Rostami Kandroodi et al., 2021). In that study, methylphenidate was shown to influence the inverse learning rate parameter (i.e., decay factor for previous payoffs) as a function of working memory span, assessed through a baseline cognitive task. Consistent with our findings, no drug effects were observed on other parameters in their model, including the inverse temperature.”

      Additionally, we summarized the results in a supplementary table:

      Also, this parameter is noted as temperature but appears to be inverse temperature as higher values are related to better performance. The exact model for the choice function is not described in the methods.

      We thank the reviewer for bringing this to our attention. The reviewer is correct that we intended to refer to the inverse temperature. We have corrected this mistake throughout the manuscript and added information about the choice function to the methods section.

      From the manuscript (page 37, line 3):

      On each trial, this value term was transferred into a “biased” value term (𝑉<sub>𝐵</sub>(𝑋<sub>𝑡</sub>) = 𝐵<sub>𝑝𝑙𝑎𝑦</sub> + 𝑄<sub>𝑡</sub>(𝑋<sub>𝑡</sub>), where 𝐵<sub>𝑝𝑙𝑎𝑦</sub> is the play bias term) and converted into action probabilities (P(play|(𝑉<sub>𝐵 play</sub>(𝑡)(𝑋<sub>𝑡</sub>); P(pass|𝑉<sub>𝐵 pass</sub>(𝑡)(𝑋<sub>𝑡</sub>)) using a softmax function with an inverse temperature (𝛽):

      Reviewer #1 (Recommendations for the authors):

      (1) Given that the task was quite long (700+ trials), were there any fatigue effects or changes in behavior over the course of the task?

      To address the reviewer comment, we regressed each participant single-trial log-scaled RT and accuracy (binary variable reflecting whether a participant displayed stimulus-appropriate behavior on each trial) onto the trial number as a proxy of time on task. Individual participants’ t-values for the time on task regressor were then tested on group level via two-sided t-tests against zero and compared across sessions and baseline performance groups. The results of these two regression models are shown in the supplementary table 2 and raw data splits in supplementary figure S7. Results demonstrate that the choice behavior was not systematically affected over the course of the task. This effect was not different between low and high baseline performers and not affected by the drug. In contrast, participants’ reaction time decreased over the course of the task and this speeding was enhanced by MA, particularly in the low performance group.

      We added the following section to the supplementary materials and refer to this information in the task description section of the manuscript (page 35, line 26):

      “Time-on-Task Effects

      Given the length of our task, we investigated whether fatigue effects or changes in behavior occurred over time. Specifically, we regressed each participant's single-trial log-scaled reaction times (RT) and accuracy (a binary variable reflecting whether participants displayed stimulus-appropriate behavior on each trial) onto trial number, which served as a proxy for time on task. The resulting t-values for the time-on-task regressor were analyzed at the group level using two-sided t-tests against zero and compared across sessions and baseline performance groups. The results of these regression models are presented in Supplementary Table S2, with raw data splits shown in Supplementary Figure S3.

      Our findings indicate that choice behavior was not systematically affected over the course of the task. This effect did not differ between low and high baseline performers and was not influenced by the drug. In contrast, reaction times decreased over the course of the task, with this speeding effect being enhanced by MA, particularly in the low-performance group.”

      (2) Figure 5J is hard to understand given the lack of axis labels on some of the plots. Also, the scatter plot is on the left, not the right, as stated in the legend.

      We agree that this part of the figure was difficult to understand. To address this issue, we have separated it from Figure 5, added axis labels for clarity, and reworked the figure caption.

      (3) The data and code were not available for review.

      Thank you for pointing this out. The data and code are now made publicly available on GitHub: https://github.com/HansKirschner/REFIT_Chicago_public.git

      We updated the respective section in the manuscript:

      Data Availability Statement All raw data and analysis scripts can be accessed at: https://github.com/HansKirschner/REFIT_Chicago_public.git

      Reviewer #2 (Public review):

      Summary:

      Kirschner and colleagues test whether methamphetamine (MA) alters learning rate dynamics in a validated reversal learning task. They find evidence that MA can enhance performance for low-performers and that the enhancement reflects a reduction in the degree to which these low-performers dynamically up-regulate their learning rates when they encounter unexpected outcomes. The net effect is that poor performers show more volatile learning rates (e.g. jumping up when they receive misleading feedback), when the environment is actually stable, undermining their performance over trials.

      Strengths:

      The study has multiple strengths including large sample size, placebo control, double-blind randomized design, and rigorous computational modeling of a validated task.

      Weaknesses:

      The limitations, which are acknowledged, include that the drug they use, methamphetamine, can influence multiple neuromodulatory systems including catecholamines and acetylcholine, all of which have been implicated in learning rate dynamics. They also do not have any independent measures of any of these systems, so it is impossible to know which is having an effect.

      Another limitation that the authors should acknowledge is that the fact that participants were aware of having different experiences in the drug sessions means that their blinding was effectively single-blind (to the experimenters) and not double-blind. Relatedly, it is difficult to know whether subjective effects of drugs (e.g. arousal, mood, etc.) might have driven differences in attention, causing performance enhancements in the low-performing group. Do the authors have measures of these subjective effects that they could include as covariates of no interest in their analyses?

      We thank the reviewer for highlighting this complex issue. ‘Double blind’ may refer to masking the identity of the drug before administration, or to the subjects’ stated identifications after any effects have been experienced. In our study, the participants were told that they might receive a stimulant, sedative or placebo on any session, so before the sessions their expectations were blinded. After receiving the drug, most participants reported feeling stimulant-like effects on the drug session, but not all of them correctly identified the substance as a stimulant. We note that many subjects identified placebo as ‘sedative’. The Author response image 2 indicates how the participants identified the substance they received.

      Author response image 2.

      Substance identification.

      We share the reviewer’s interest in the extent to which mood effects of drugs are correlated with the drugs’ other effects, including cognitive function. To address this in the present study, we compared the subjective responses to the drug in participants who were low- or highperformers at baseline on the task. The low- and high baseline performers did not differ in their subjective drug effects, including ‘feel drug’ or stimulant-like effects (see Figure 1 from the mansucript reproduced below; peak change from baseline scores for feel drug ratings ondrug: low baseline performer: 48.36(4.29) vs. high baseline performer: 47.21 (4.44); t(91) = 0.18, p = 0.85, d = 0.03; ARCI-A score: low baseline performer: 4.87 (0.43) vs. high baseline performer: 4.00 (0.418); t(91) = 1.43, p = 0.15, d = 0.30). Moreover, task performance in the drug session was not correlated with the subjective effects (peak “feel drug” effect: r(94) = 0.09, p = 0.41; peak “stimulant like” effect: r(94) = -0.18, p = 0.07).

      We have added details of these additional analyses to the manuscript. Since there were no significant differences in subjective drug effects between low- and high-baseline performers, and these effects were not systematically associated with task performance, we did not include these measurements as covariates in our analyses. Furthermore, as both subjective measurements indicate a similar pattern, we have chosen not to report the ARCI-A effects in the manuscript.

      From the manuscript (page 6, line 5ff):

      “Subjective drug effects MA administration significantly increased ‘feel drug effect’ ratings compared to PL, at 30, 50, 135, 180, and 210 min post-capsule administration (see Figure 1; Drug x Time interaction F(5,555) = 38.46, p < 0.001). In the MA session, no differences in the ‘feel drug effect’ were observed between low and high baseline performer, including peak change-from-baseline ratings (rating at 50 min post-capsule: low baseline performer: 48.36(4.29) vs. high baseline performer: 47.21 (4.44); t(91) = 0.18, p = 0.85, d = 0.03; rating at 135 min post-capsule: low baseline performer: 37.27 (4.15) vs. high baseline performer: 45.38 (3.84); t(91) = 1.42, p = 0.15, d = 0.29).”

      Reviewer #2 (Recommendations for the authors):

      I was also concerned about the distinctions between the low- and high-performing groups. It is unclear why, except for simplicity of presentation, they chose to binarize the sample into high and low performers. I would like to know if the effects held up if they analyzed interactions with individual differences in performance and not just a binarized high/low group membership. If the individual difference interactions do not hold up, I would like to know the authors' thoughts on why they do not.

      Thank you for raising this important issue. We chose a binary discretization of baseline performance to simplify the analysis and presentation. However, we acknowledge that this simplification may limit the interpretability of the results.

      To address the reviewer’s concern, we conducted additional linear mixed-effects model (LMM) analyses, focusing on the key findings reported in the manuscript. See supplementary materials section “Linear mixed effects model analyses for key findings”

      From the manuscript (page 30, line 4ff):

      “Methamphetamine performance enhancement depends on initial task performance<br /> Another key finding of the current study is that the benefits of MA on performance depend on the baseline task performance. Specifically, we found that MA selectively improved performance in participants that performed poorly in the baseline session. However, it should be noted, that all the drug x baseline performance interactions, including for the key computational eta parameter did not reach the statistical threshold, and only tended towards significance. We used a binary discretization of baseline performance to simplify the analysis and presentation. To parse out the relationship between methamphetamine effects and baseline performance into finer level of detail, we conducted additional linear mixed-effects model (LMM) analyses using a sliding window regression approach (see supplementary results and supplementary figure S4 and S5). A key thing to notice in the sliding regression results is that, while each regression reveals that drug effects depend on baseline performance, they do so non-linearly, with most variables of interest showing a saturating effect at low baseline performance levels and the strongest slope (dependence on baseline) at or near the median level of baseline performance, explaining why our median splits were able to successfully pick up on these baseline-dependent effects. Together, these results suggest that methamphetamine primarily affects moderately low baseline performer. It is noteworthy to highlight again that we had a separate baseline measurement from the placebo session, allowing us to investigate baseline-dependent changes while avoiding typical concerns in such analyses like regression to the mean (Barnett et al., 2004). This design enhances the robustness of our baseline-dependent effects.”

      See supplementary materials section “Linear mixed effects model analyses for key findings”

      Perhaps relatedly, in multiple analyses, the authors point out that there are drug effects for the low-performance group, but not the high-performance group. This could reflect the well-documented baseline-dependency effect of catecholamergic drugs. However, it might also reflect the fact that the high-performance group is closer to their ceiling. So, a performance-enhancement drug might not have any room to make them better. Note that their results are not consistent with inverted-U-like effects, previously described, where high performers actually get worse on catecholaminergic drugs.

      Given that the authors have the capacity to simulate performance as a function of parameter values, they could specifically simulate how much better performance could get if their high-performance group all moved proportionally closer to optimal levels of the parameter eta. On the basis of that analysis do they have any evidence that they had the power to detect an effect in the high performance group? If not, they should just acknowledge that ceiling effects might have played a role for high performers.

      We agree with the reviewer's interpretation of the results. First, when plotting overall task performance and the probability of correct choices in the high outcome noise condition—the condition where we observe the strongest drug-induced performance enhancement—we find minimal performance variation among high baseline performers. In both testing sessions, high baseline performers cluster around optimal performance, with little evidence of drug-induced changes (see Supplementary Figure 6).

      Furthermore, performance simulations using (a) optimal eta values and (b) observed eta values from the high baseline performance group reveal only a small, non-significant performance difference (points optimal eta: 701.91 (21.66) vs. points high performer: 694.47 (21.71); t(46) = 2.84, p = 0.07, d = 0.059).

      These results suggest that high baseline performers are already near optimal performance, limiting the potential for drug-related performance improvements. We have incorporated this information into the manuscript (page 30, line 24ff).

      “It is important to note, that MA did not bring performance of low baseline performers to the level of performance of high baseline performers. We speculate that high performers gained a good representation of the task structure during the orientation practice session, taking specific features of the task into account (change point probabilities, noise in the reward probabilities). This is reflected in a large signal to noise ratio between real reversals and misleading feedback. Because the high performers already perform the task at a near-optimal level, MA may not further enhance performance (see Supplementary Figure S6 for additional evidence for this claim). Intriguingly, the data do not support an inverted-u-shaped effect of catecholaminergic action (Durstewitz & Seamans, 2008; Goschke & Bolte, 2018) given that performance of high performers did not decrease with MA. One could speculate that catecholamines are not the only factor determining eta and performance. Perhaps high performers have a generally more robust/resilient decision-making system which cannot be perturbed easily. Probably one would need even higher doses of MA (with higher side effects) to impair their performance.”

      Finally, I am confused about why participants are choosing correctly at higher than 50% on the first trial after a reversal (see Figure 3)? How could that be right? If it is not, does this mean that there is a pervasive error in the analysis pipeline?

      Thank you for pointing this out. The observed pattern is an artifact of the smoothing (±2 trials) applied to the learning curves in Figure 3. Below, we reproduce the figure without smoothing.

      Additionally, we confirm that the probability of choosing the correct response is not above chance level (t-test against chance): • All reversals: t(93)=1.64,p=0.10,d=0.17, 99% CI[0.49,0.55] • Reversal to low outcome noise: t(93)=1.67,p=0.10,d=0.17, 99% CI [0.49,0.56] • Reversal to high outcome noise: t(93)=0.87,p=0.38,d=0.09, 99% CI [0.47,0.56]

      We have amended the caption of Figure 3 accordingly. Moreover, we included an additional figure in this revision letter (Author response image 4) showing a clear performance drop to approximately 50% correct choices across all sessions, indicating random-choice behavior at the point of reversal. Notably, this performance is slightly better than expected (i.e., the inverse of pre-reversal performance). One possible explanation is that participants developed an expectation of the reversal, leading to increased reversal behaviour around reversals.

      Author response image 3.

      Learning curves after reversals suggest that methamphetamine improves learning performance in phases of less predictable reward contingencies in low baseline performer. Top panel of the Figure shows learning curves after all reversals (A), reversals to stimuli with less predictable reward contingencies (B), and reversals to stimuli with high reward probability certainty (C). Bottom panel displays the learning curves stratified by baseline performance for all reversals (D), reversals to stimuli with less predictable reward probabilities (E), and reversals to stimuli with high reward probability certainty (F). Vertical black lines divide learning into early and late stages as suggested by the Bai-Perron multiple break point test. Results suggest no clear differences in the initial learning between MA and PL. However, learning curves diverged later in the learning, particular for stimuli with less predictable rewards (B) and in subjects with low baseline performance (E). Note. PL = Placebo; MA = methamphetamine; Mean/SEM = line/shading.

      Author response image 4.

      Adaptive behavior following reversals. Each graph shows participants' performance (i.e., stimulus-appropriate behavior: playing good stimuli with 70/80% reward probability and passing on bad stimuli with 20/30% reward probability) around reversals for the (A) orientation session, (B) placebo session, and (C) methamphetamine session. Trial 0 corresponds to the trial when reversals occurred, unbeknownst to participants. Participants' performance exhibited a fast initial adaptation to reversals, followed by a slower, late-stage adjustment to the new stimulus-reward contingencies, eventually reaching a performance plateau. Notably, we observe a clear performance drop to approximately 50% correct choices across all sessions, indicating random-choice behavior at the point of reversal. This performance is slightly better than expected (i.e., the inverse of pre-reversal performance). One possible explanation is that participants developed an expectation of the reversal, leading to increased reversal behaviour around reversals.

      Minor comments:

      (1) I'm unclear on what the analysis in 6E tells us. What does it mean that the marginal effect of eta on performance predicts changes in performance? Also, if multiple parameters besides eta (e.g. learning rate) are strongly related to actual performance, why should it be that only marginal adjustments to eta in the model anticipate actual performance improvements when marginal adjustments to other model parameters do not?

      We agree that these simulations are somewhat difficult to interpret and have therefore decided to omit these analyses from the manuscript. Our key point was that individuals who benefited the most from methamphetamine were those who exhibited the most advantageous eta adjustments in response to it. We believe this is effectively illustrated by the example individual shown in Figure 8D.

      (2) Does the vertical black line in Figure 1 show when the tasks were completed, as it says in the caption, or when the task starts, as it indicates in the figure itself?

      Apologies for the confusion. There was a mistake in the figure caption—the vertical line indicates the time when the task started (60 minutes post-capsule intake). We have corrected this in the figure caption.

      (3) The marginally significant drug x baseline performance group interaction does not support strong inferences about differences in drug effects on eta between groups...

      We agree and have added information on this limitation to the Discussion. Additionally, we have addressed the complex relationship between drug effects and baseline performance in the supplementary analyses, as detailed in our previous response regarding the binary discretization of baseline performance.

      (4) Should lines 10-11 on page 12 say "We did not find drug-related differences in any other model parameters..."?

      Thank you for bringing this grammatical error to our attention. We have corrected it.

      (5) It would be good to confirm that the effect of MA on p(Correct after single MFB) does not have an opposite sign from the effect of MA on p(Correct after double MFB). I'm guessing the effect after single is just weak, but it would be good to confirm they are in the same direction so that we can be confident the result is not picking up on spurious relationships after two misleading instances of feedback.

      We confirm that the direction of the effect between eta and p(Correct after single MFB) is similar to p(Correct after double MFB). First, we see a similar negative association between p(Correct after single MFB) and eta (r(94) = -.26, p = 0.01). Similarly there was a descriptive increase in p(Correct after single MFB) for low baseline performer on- vs. off-drug ( p(Correct after single MFB): low baseline performance PL: 0.71 (0.02) vs. low baseline performance MA: 0.73 (0.02); t(46) = 1.27, p = 0.20, d = 0.17).

      (6) "implemented equipped" seems like a typo on page 16, line 26

      Thank you for bringing this typo to our attention. We have corrected it.

      Reviewing Editor (Public Review):

      Summary:

      In this well-written paper, a pharmacological experiment is described in which a large group of volunteers is tested on a novel probabilistic reversal learning task with different levels of noise, once after intake of methamphetamine and once after intake of placebo. The design includes a separate baseline session, during which performance is measured. The key result is that drug effects on learning rate variability depend on performance in this separate baseline session.

      The approach and research question are important, the results will have an impact, and the study is executed according to current standards in the field. Strengths include the interventional pharmacological design, the large sample size, the computational modeling, and the use of a reversal-learning task with different levels of noise.

      (i) One novel and valuable feature of the task is the variation of noise (having 70-30 and 8020 conditions). This nice feature is currently not fully exploited in the modeling of the task and the data. For example, recently reported new modeling approaches for disentangling two types of uncertainty (stochasticity vs volatility) could be usefully leveraged here (by Piray and Daw, 2021, Nat Comm). The current 'signal to noise ratio' analysis that is targeting this issue relies on separately assessing learning rates on true reversals and learning rates after misleading feedback, in a way that is experimenter-driven. As a result, this analysis cannot capture a latent characteristic of the subject's computational capacity.

      We thank the reviewing editor for the positive evaluation of our work and the suggestion to leverage new modeling approaches. In the light of the Piray/Daw paper, it is noteworthy, that the choice behavior of the low performance group in our sample mimics the behavior of their lesioned model, in which stochasticity is assumed to be small and constant. Specifically, low performers displayed higher learning rates, particularly in high outcome noise phases in our task. One possible interpretation of this choice pattern is that they have problems to distinguish volatility and noise. Consistently, surprising outcomes may get misattributed to volatility instead of stochasticity resulting in increased learning rates and overadjustments to misleading outcomes. This issue particularly surfaces in phases of high stochasticity in our task. Interestingly, methamphetamine seems to reduce this misattribution. In an exploratory analysis, we fit two models to our task structure using modified code provided by the Piray and Daw paper. The control model made inference about both the volatility and stochasticity. A key assumption of the model is, that the optimal learning rate increases with volatility and decreases with stochasticity. This is because greater volatility raises the likelihood that the underlying reward probability has changed since the last observation, increasing the necessity of relying on new information. In contrast, higher stochasticity reduces the relative informativeness of the new observation compared to prior beliefs about the underlying reward probability. The lesioned model assumed stochasticity to be small and constant. We show the results of this analyses in Figure 9 and Supplementary Figure S5 and S6. Interestingly, we found that the inability to make inference about stochasticity leads to misestimation of volatility, particularly for high outcome noise phases (Figure 9A-B). Consistently, this led to reduced sensitivity of the learning rate to volatility (i.e., the first ten trials after reversals). The model shows similar behaviour to our low performer group, with reduced accuracy in later learnings stages for stimuli with high outcome noise (Figure 9D). Finally, when we fit simulated data from the two models to our model, we see increased eta parameter estimates for the lesioned model. Together, these results may hint towards an overinterpretation of stochasticity in low performers of our task and that methamphetamine has beneficial effects for those individuals as it reduced the oversensitivity to volatility. It should be noted however, that we did not fit these models to our choice behaviour directly as this implementation is beyond the scope of our current study. Yet, our exploratory analyses make testable predictions for future research into the effect of catecholamines on the inference of volatility and stochasticity.

      We incorporated information on these explorative analyses to the manuscript and supplementary material.

      Form the result section (page 23, line 12ff):

      “Methamphetamine may reduce misinterpretation of high outcome noise in low performers

      In our task, outcomes are influenced by two distinct sources of noise: process noise (volatility) and outcome noise (stochasticity). Optimal learning rate should increase with volatility and decrease with stochasticity. Volatility was fairly constant in our task (change points around every 30-35 trials). However, misleading feedback (i.e., outcome noise) could be misinterpreted as indicating another change point because participants don’t know the volatility beforehand. Strongly overinterpreting outcome noise as change points will hinder building a correct estimate of volatility and understanding the true structure of the task. Simultaneously estimating volatility and stochasticity poses a challenge, as both contribute to greater outcome variance, making outcomes more surprising. A critical distinction, however, lies in their impact on generated outcomes: volatility increases the autocorrelation between consecutive outcomes, whereas stochasticity reduces it. Recent computational approaches have successfully utilised this fundamental difference to formulate a model of learning based on the joint estimation of stochasticity and volatility (Piray & Daw, 2021; Piray & Daw, 2024). They report evidence that humans successfully dissociate between volatility and stochasticity with contrasting and adaptive effects on learning rates, albeit to varying degrees. Interestingly they show that hypersensitivity to outcome noise, often observed in anxiety disorders, might arise from a misattribution of the outcome noise to volatility instead of stochasticity resulting in increased learning rates and overadjustments to misleading outcomes. It is noteworthy, that we observed a similar hypersensitivity to high outcome noise in low performers in our task that is partly reduced by MA. In an exploratory analysis, we fit two models to our task structure using modified code provided by Piray and Daw (2021) (see Methods for formal Description of the model). The control model inferred both the volatility and stochasticity. The lesioned model assumed stochasticity to be small and constant. We show the results of this analyses in Figure 9 and Supplementary Figure S7 and S8). We found that the inability to make inference about stochasticity, leads to misestimation of volatility, particularly for high outcome noise phases (Figure 9A-B). Consistently, this led to reduced sensitivity of the learning rate to volatility (i.e., the first ten trials after reversals). The model shows similar behaviour to our low performer group, with reduced accuracy in later learning stages for stimuli with high outcome noise (Figure 9D). Finally, when we fit simulated data from the two models to our model, we see increased eta parameter estimates for the lesioned model. Together, these results may hint towards an overinterpretation of stochasticity in low performer of our task and that MA has beneficial effects for those individuals as it reduced the oversensitivity to volatility. It should be noted however, that we did not fit these models to our choice behaviour directly as this implementation is beyond the scope of our current study. Yet, our exploratory analyses make testable predictions for future research into the effect of catecholamines on the inference of volatility and stochasticity.”

      From the discussion (page 28, line 15ff):

      “Exploratory simulation studies using a model that jointly estimates stochasticity and volatility (Piray & Daw, 2021; Piray & Daw, 2024), revealed that MA might reduce the oversensitivity to volatility.”

      See methods section “Description of the joint estimation of stochasticity and volatility model “

      (ii) An important caveat is that all the drug x baseline performance interactions, including for the key computational eta parameter did not reach the statistical threshold, and only tended towards significance.

      We agree and have added additional analyses on the issue. See also our response to reviewer 2. There is a consistent effect for low-medium baseline performance. We toned done the reference to low baseline performance but still see strong evidence for a baseline dependency of the drug effect.

      From the manuscript (page 30, line 4ff):

      “Methamphetamine performance enhancement depends on initial task performance<br /> Another key finding of the current study is that the benefits of MA on performance depend on the baseline task performance. Specifically, we found that MA selectively improved performance in participants that performed poorly in the baseline session. However, it should be noted, that all the drug x baseline performance interactions, including for the key computational eta parameter did not reach the statistical threshold, and only tended towards significance. We used a binary discretization of baseline performance to simplify the analysis and presentation. To parse out the relationship between methamphetamine effects and baseline performance into finer level of detail, we conducted additional linear mixed-effects model (LMM) analyses using a sliding window regression approach (see supplementary results and supplementary figure S4 and S5). A key thing to notice in the sliding regression results is that, while each regression reveals that drug effects depend on baseline performance, they do so non-linearly, with most variables of interest showing a saturating effect at low baseline performance levels and the strongest slope (dependence on baseline) at or near the median level of baseline performance, explaining why our median splits were able to successfully pick up on these baseline-dependent effects. Together, these results suggest that methamphetamine primarily affects moderately low baseline performer. It is noteworthy to highlight again that we had a separate baseline measurement from the placebo session, allowing us to investigate baseline-dependent changes while avoiding typical concerns in such analyses like regression to the mean (Barnett et al., 2004). This design enhances the robustness of our baseline-dependent effects.”

      (iii) Both the overlap and the differences between the current study and previous relevant work (that is, how this goes beyond prior studies in particular Rostami Kandroodi et al, which also assessed effects of catecholaminergic drug administration as a function of baseline task performance using a probabilistic reversal learning task) are not made explicit, particularly in the introduction.

      Thank you for raising this point. We have added information of the overlap and differences between our paper and the Rostami Kondoodi et al paper to the introduction and disscussion.

      In the intoduction we added a sentence to higlight the Kondoordi findings (page 3, line 24ff).

      For example, Rostami Kandroodi et al. (2021) reported that the re-uptake blocker methylphenidate did not alter reversal learning overall, but preferentially improved performance in participants with higher working memory capacity.”

      In our Discussion, we go back to this paper, and say how our findings are and are not consistent with their findings (page 32, line 16ff).

      Our findings can be contrasted to those of Rostami Kandroodi et al. (2021), who examined effects of methylphenidate on a reversal learning task, in relation to baseline differences on a cognitive task. Whereas Rostami Kandroodi et al. (2021) found that the methylphenidate improved performance mainly in participants with higher baseline working memory performance, we found that methamphetamine improved the ability to dynamically adjust learning from prediction errors to a greater extent in participants who performed poorly-tomedium at baseline. There are several possible reasons for these apparently different findings. First, MA and methylphenidate differ in their primary mechanisms of action: MPH acts mainly as a reuptake blocker whereas MA increases synaptic levels of catecholamines by inhibiting the vesicular monoamine transporter 2 (VMAT2) and inhibiting the enzyme monoamine oxidase (MAO). These differences in action could account for differential effects on cognitive tasks. Second, the tasks used by Rostami Kandroodi et al. (2021) and the present study differ in several ways. The Rostami Kandroodi et al. (2021) task assessed responses to a single reversal event during the session whereas the present study used repeated reversals with probabilistic outcomes. Third, the measures of baseline function differed in the two studies: Rostami Kandroodi et al. (2021) used a working memory task that was not used in the drug sessions, whereas we used the probabilistic learning task as both the baseline measure and the measure of drug effects. Further research is needed to determine which of these factors influenced the outcomes.”

      performance effects, but this is not true in the general sense, given that an accumulating number of studies have shown that the effects of drugs like MA depend on baseline performance on working memory tasks, which often but certainly not always correlates positively with performance on the task under study.

      We recognize that there is a large body of research reporting that the effects of stimulant drugs are related to baseline performance, and we have adjusted our wording in the Discussion accordingly. At the same time, numerous published studies report acute effects of drugs without considering individual differences in responses, including baseline differences in task performance.

      Reviewing Editor (Recommendations for the Authors):

      (i) To leverage recently reported new modeling approaches for disentangling two types of uncertainty (stochasticity vs volatility) might be usefully leveraged (Piray and Daw, 2021, Nat Comm) to help overcome the shortcomings of the 'signal-to-noise ratio' analysis performed here (learning rates on true reversals minus learning rates after misleading feedback) which is experimenter-driven, and thus cannot capture a latent characteristic of the subject's computational capacity.

      Please see our previous response.

      (ii) To highlight more explicitly the fact that various of the key drug x baseline performance interactions did not reach the statistical threshold.

      Please see our previous responses to this issue.

      (iii) To make more explicit, in the introduction, both the overlap and the differences between the current study and previous relevant work (that is, how this goes beyond prior study in particular Rostami Kandroodi et al, which also assessed effects of catecholaminergic drug administration as a function of baseline task performance using a probabilistic reversal learning task).

      Please see our previous response.

      (iv) To revise and tone down, in the discussion section, the statement about novelty, that the existing literature has, to date, overlooked baseline performance effects.

      Please see our previous response.

      (v) It is unclear why the data from the 4th session (under some other sedative drug, which is not mentioned) are not reported. I recommend justifying the details of this manipulation and the decision to omit the report of those results. By analogy 4 other tasks were administered in the current study, but not described. Is there a protocol paper, describing the full procedure?

      Thank you for pointing this out. We added additional information to the method section. We are analysing the other cognitive measures in relation to the brain imaging data obtained on sessions 3 and 4. Therefore we argue, that these are beyond the scope of the present paper. We did not administer any sedative drug. However, participants were informed during orientation that they might receive a stimulant, sedative, or placebo on any testing session to maintain blinding of their expectations before each session.

      “Design. The results presented here were obtained from the first two sessions of a larger foursession study (clinicaltrials.gov ID number NCT04642820). During the latter two sessions of the larger study, not reported here, participants participated in two fMRI scans. During the two 4-h laboratory sessions presented here, healthy adults received methamphetamine (20 mg oral; MA) or placebo (PL), in mixed order under double-blind conditions. One hour after ingesting the capsule they completed the 30-min reinforcement reversal learning task. The primary comparisons were on acquisition and reversal learning parameters of reinforcement learning after MA vs PL. Secondary measures included subjective and cardiovascular responses to the drug.”

      “Orientation session. Participants attended an initial orientation session to provide informed consent, and to complete personality questionnaires. They were told that the purpose of the study was to investigate the effects of psychoactive drugs on mood, brain, and behavior. To reduce expectancies, they were told that they might receive a placebo, stimulant, or sedative/tranquilizer. However, participants only received methamphetamine and placebo. They agreed not to use any drugs except for their normal amounts of caffeine for 24 hours before and 6 hours following each session. Women who were not on oral contraceptives were tested only during the follicular phase (1-12 days from menstruation) because responses to stimulant drugs are dampened during the luteal phase of the cycle (White et al., 2002). Most participants (N=97 out of 113) completed the reinforcement learning task during the orientation session as a baseline measurement. This measure was added after the study began. Participants who did not complete the baseline measurement were omitted from the analyses presented in the main text. We run the key analyses on the full sample (n=109). This sample included participants who completed the task only on the drug sessions. When controlling for session order and number (two vs. three sessions) effects, we see no drug effect on overall performance and learning. Yet, we found that eta was also reduced under MA in the full sample, which also resulted in reduced variability in the learning rate (see supplementary results for more details).”

      “Drug sessions. The two drug sessions were conducted in a comfortable laboratory environment, from 9 am to 1 pm, at least 72 hours apart. Upon arrival, participants provided breath and urine samples to test for recent alcohol or drug use and pregnancy (CLIAwaived Inc,Carlsbad, CAAlcosensor III, Intoximeters; AimStickPBD, hCG professional, Craig Medical Distribution). Positive tests lead to rescheduling or dismissal from the study. After drug testing, subjects completed baseline mood measures, and heart rate and blood pressure were measured. At 9:30 am they ingested capsules (PL or MA 20 mg, in color-coded capsules) under double-blind conditions. Oral MA (Desoxyn, 5 mg per tablet) was placed in opaque size 00 capsules with dextrose filler. PL capsules contained only dextrose. Subjects completed the reinforcement learning task 60 minutes after capsule ingestion. Drug effects questionnaires were obtained at multiple intervals during the session. They completed other cognitive tasks not reported here. Participants were tested individually and were permitted to relax, read or watch neutral movies when they were not completing study measures.”

      (vi) Some features of the model including the play bias parameter require justification, at least by referring to prior work exploring these features.

      We have added information to justify the features of the model.

      Form the method section:

      “The base model (M1) was a standard Q-learning model with three parameters: (1) an inverse temperature parameter of the softmax function used to convert trial expected values to action probabilities, (2) a play bias term that indicates a tendency to attribute higher value to gambling behavior (Jang et al., 2019), ….

      The two additional learning rate terms—feedback confirmation and modality—were added to the model set, as these factors have been shown to influence learning in similar tasks (Kirschner et al., 2023; Schüller et al., 2020).”

      Literature

      Doucet, A., & Johansen, A. M. (2011). A tutorial on particle filtering and smoothing: fifteen years later. Oxford University Press.

      Durstewitz, D., & Seamans, J. K. (2008). The dual-state theory of prefrontal cortex dopamine function with relevance to catechol-o-methyltransferase genotypes and schizophrenia. Biol Psychiatry, 64(9), 739-749. https://doi.org/10.1016/j.biopsych.2008.05.015

      Gamerman, D., dos Santos, T. R., & Franco, G. C. (2013). A NON-GAUSSIAN FAMILY OF STATE-SPACE MODELS WITH EXACT MARGINAL LIKELIHOOD. Journal of Time Series Analysis, 34(6), 625-645. https://doi.org/https://doi.org/10.1111/jtsa.12039

      Goschke, T., & Bolte, A. (2018). A dynamic perspective on intention, conflict, and volition: Adaptive regulation and emotional modulation of cognitive control dilemmas. In Why people do the things they do: Building on Julius Kuhl’s contributions to the psychology of motivation and volition. (pp. 111-129). Hogrefe. https://doi.org/10.1027/00540-000

      Jang, A. I., Nassar, M. R., Dillon, D. G., & Frank, M. J. (2019). Positive reward prediction errors during decision-making strengthen memory encoding. Nature Human Behaviour, 3(7), 719-732. https://doi.org/10.1038/s41562-019-0597-3

      Jenkins, D. G., & Quintana-Ascencio, P. F. (2020). A solution to minimum sample size for regressions. PLoS One, 15(2), e0229345. https://doi.org/10.1371/journal.pone.0229345

      Kirschner, H., Nassar, M. R., Fischer, A. G., Frodl, T., Meyer-Lotz, G., Froböse, S., Seidenbecher, S., Klein, T. A., & Ullsperger, M. (2023). Transdiagnostic inflexible learning dynamics explain deficits in depression and schizophrenia. Brain, 147(1), 201-214. https://doi.org/10.1093/brain/awad362

      Maris, E., & Oostenveld, R. (2007). Nonparametric statistical testing of EEG- and MEG-data. Journal of Neuroscience Methods, 164(1), 177-190. https://doi.org/https://doi.org/10.1016/j.jneumeth.2007.03.024

      Morean, M. E., de Wit, H., King, A. C., Sofuoglu, M., Rueger, S. Y., & O'Malley, S. S. (2013). The drug effects questionnaire: psychometric support across three drug types. Psychopharmacology (Berl), 227(1), 177-192. https://doi.org/10.1007/s00213-0122954-z

      Murphy, K., & Russell, S. (2001). Rao-Blackwellised particle filtering for dynamic Bayesian networks. In Sequential Monte Carlo methods in practice (pp. 499-515). Springer. Piray, P., & Daw, N. D. (2020). A simple model for learning in volatile environments. PLoS Comput Biol, 16(7), e1007963. https://doi.org/10.1371/journal.pcbi.1007963

      Piray, P., & Daw, N. D. (2021). A model for learning based on the joint estimation of stochasticity and volatility. Nature Communications, 12(1), 6587. https://doi.org/10.1038/s41467-021-26731-9

      Piray, P., & Daw, N. D. (2024). Computational processes of simultaneous learning of stochasticity and volatility in humans. Nat Commun, 15(1), 9073. https://doi.org/10.1038/s41467-024-53459-z

      Rostami Kandroodi, M., Cook, J. L., Swart, J. C., Froböse, M. I., Geurts, D. E. M., Vahabie, A. H., Nili Ahmadabadi, M., Cools, R., & den Ouden, H. E. M. (2021). Effects of methylphenidate on reinforcement learning depend on working memory capacity. Psychopharmacology (Berl), 238(12), 3569-3584. https://doi.org/10.1007/s00213021-05974-w

      Schüller, T., Fischer, A. G., Gruendler, T. O. J., Baldermann, J. C., Huys, D., Ullsperger, M., & Kuhn, J. (2020). Decreased transfer of value to action in Tourette syndrome. Cortex, 126, 39-48. https://doi.org/10.1016/j.cortex.2019.12.027

      West, M. (1987). On scale mixtures of normal distributions. Biometrika, 74(3), 646-648. https://doi.org/10.1093/biomet/74.3.646

      White, T. L., Justice, A. J., & de Wit, H. (2002). Differential subjective effects of Damphetamine by gender, hormone levels and menstrual cycle phase. Pharmacol Biochem Behav, 73(4), 729-741.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Editor’s summary:

      This paper by Castello-Serrano et al. addresses the role of lipid rafts in trafficking in the secretory pathway. By performing carefully controlled experiments with synthetic membrane proteins derived from the transmembrane region of LAT, the authors describe, model and quantify the importance of transmembrane domains in the kinetics of trafficking of a protein through the cell. Their data suggest affinity for ordered domains influences the kinetics of exit from the Golgi. Additional microscopy data suggest that lipid-driven partitioning might segregate Golgi membranes into domains. However, the relationship between the partitioning of the synthetic membrane proteins into ordered domains visualised ex vivo in GPMVs, and the domains in the TGN, remain at best correlative. Additional experiments that relate to the existence and nature of domains at the TGN are necessary to provide a direct connection between the phase partitioning capability of the transmembrane regions of membrane proteins and the sorting potential of this phenomenon.

      The authors have used the RUSH system to study the traffic of model secretory proteins containing single-pass transmembrane domains that confer defined affinities for liquid ordered (lo) phases in Giant Plasma Membrane derived Vesicles (GPMVs), out of the ER and Golgi. A native protein termed LAT partitioned into these lo-domains, unlike a synthetic model protein termed LAT-allL, which had a substituted transmembrane domain. The authors experiments provide support for the idea that ER exit relies on motifs in the cytosolic tails, but that accelerated Golgi exit is correlated with lo domain partitioning.

      Additional experiments provided evidence for segregation of Golgi membranes into coexisting lipid-driven domains that potentially concentrate different proteins. Their inference is that lipid rafts play an important role in Golgi exit. While this is an attractive idea, the experiments described in this manuscript do not provide a convincing argument one way or the other. It does however revive the discussion about the relationship between the potential for phase partitioning and its influence on membrane traffic.

      We thank the editors and scientific reviewers for thorough evaluation of our manuscript and for positive feedback. While we agree that our experimental findings present a correlation between trafficking rates and raft affinity, in our view, the synthetic, minimal nature of the transmembrane protein constructs in question makes a strong argument for involvement of membrane domains in their trafficking. These constructs have no known sorting determinants and are unlikely to interact directly with trafficking proteins in cells, since they contain almost no extramembrane amino acids. Yet, the LATTMD traffics through Golgi similarly to the full-length LAT protein, but quite different from mutants with lower raft phase affinity. We suggest that these observations can be best rationalized by involvement of raft domains in the trafficking fates and rates of these constructs, providing strong evidence (beyond a simple correlation) for the existence and relevance of such domains.

      We have substantially revised the manuscript to address all reviewer comments, including several new experiments and analyses. These revisions have substantially improved the manuscript without changing any of the core conclusions and we are pleased to have this version considered as the “version of record” in eLife.

      Below is our point-by-point response to all reviewer comments.

      ER exit:

      The experiments conducted to identify an ER exit motif in the C-terminal domain of LAT are straightforward and convincing. This is also consistent with available literature. The authors should comment on whether the conservation of the putative COPII association motif (detailed in Fig. 2A) is significantly higher than that of other parts of the C-terminal domain.

      Thank you for this suggestion, this information has now been included as Supp Fig 2B. While there are other wellconserved residues of the LAT C-terminus, many regions have relatively low conservation. In contrast, the essential residues of the COPII association motif (P148 and A150) are completely conserved across in LAT across all species analyzed.

      One cause of concern is that addition of a short cytoplasmic domain from LAT is sufficient to drive ER exit, and in its absence the synthetic constructs are all very slow. However, the argument presented that specific lo phase partitioning behaviour of the TMDs do not have a significant effect on exit from the ER is a little confusing. This is related to the choice of the allL-TMD as the 'non-lo domain' partitioning comparator. Previous data has shown that longer TMDs (23+) promote ER export (eg. Munro 91, Munro 95, Sharpe 2005). The mechanism for this is not, to my knowledge, known. One could postulate that it has something to do with the very subject of this manuscript- lipid phase partitioning. If this is the case, then a TMD length of 22 might be a poor choice of comparison. A TMD 17 Ls' long would be a more appropriate 'non-raft' cargo. It would be interesting to see a couple of experiments with a cargo like this.

      The basis for the claim that raft affinity has relatively minor influence on ER exit kinetics, especially in comparison to the effect of the putative COPII interaction motif, is in Fig 1G. We do observe some differences between constructs and they may be related to raft affinity, however we considered these relatively minor compared to the nearly 4-fold increase in ER efflux induced by COPII motifs.

      We have modified the wording in the manuscript to avoid the impression that we have ruled out an effect of raft affinity of ER exit.

      We believe that our observations are broadly consistent with those of Munro and colleagues. In both their work and ours, long TMDs were able to exit the ER. In our experiments, this was true for several proteins with long TMDs, either as fulllength or as TMD-only versions (see Fig 1G). We intentionally did not measure shorter synthetic TMDs because these would not have been comparable with the raft-preferring variants, which all require relatively long TMDs, as demonstrated in our previous work1,2. Thus, because our manuscript does not make any claims about the influence of TMD length on trafficking, we did not feel that experiments with shorter non-raft constructs would substantively influence our conclusions.

      However, to address reviewer interest, we did complete one set of experiments to test the effect of shortening the TMD on ER exit. We truncated the native LAT TMD by removing 6 residues from the C-terminal end of the TMD (LAT-TMDd6aa). This construct exited the ER similarly to all others we measured, revealing that for this set of constructs, short TMDs did not accumulate in the ER. ER exit of the truncated variant was slightly slower than the full-length LAT-TMD, but somewhat faster than the allL-TMD. These effects are consistent with our previous measurements with showed that this shortened construct has slightly lower raft phase partitioning than the LAT-TMD but higher than allL2. While these are interesting observations, a more thorough exploration of the effect of TMD length would be required to make any strong conclusion, so we did not include these data in the final manuscript.

      Author response image 1.

      Golgi exit:

      For the LAT constructs, the kinetics of Golgi exit as shown in Fig. 3B are surprisingly slow. About half of the protein Remains in the Golgi at 1 h after biotin addition. Most secretory cargo proteins would have almost completely exited the Golgi by that time, as illustrated by VSVG in Fig. S3. There is a concern that LAT may have some tendency to linger in the Golgi, presumably due to a factor independent of the transmembrane domain, and therefore cannot be viewed as a good model protein. For kinetic modeling in particular, the existence of such an additional factor would be far from ideal. A valuable control would be to examine the Golgi exit kinetics of at least one additional secretory cargo.

      We disagree that LAT is an unusual protein with respect to Golgi efflux kinetics. In our experiments, Golgi efflux of VSVG was similar to full-length LAT (t1/2 ~ 45 min), and both of these were similar to previously reported values3. Especially for the truncated (i.e. TMD) constructs, it is very unlikely that some factor independent of their TMDs affects Golgi exit, as they contain almost no amino acids outside the membrane-embedded TMD.

      Practically, it has proven somewhat challenging to produce functional RUSH-Golgi constructs. We attempted the experiment suggested by the reviewer by constructing SBP-tagged versions of several model cargo proteins, but all failed to trap in the Golgi. We speculate that the Golgin84 hook is much more sensitive to the location of the SBP on the cargo, being an integral membrane protein rather than the lumenal KDEL-streptavidin hook. This limitation can likely be overcome by engineering the cargo, but we did not feel that another control cargo protein was essential for the conclusions we presented, thus we did not pursue this direction further.

      Comments about the trafficking model

      (1) In Figure 1E, the export of LAT-TMD from the ER is fitted to a single-exponential fit that the authors say is "well described". This is unclear and there is perhaps something more complex going on. It appears that there is an initial lag phase and then similar kinetics after that - perhaps the authors can comment on this?

      This is a good observation. This effect is explainable by the mechanics of the measurement: in Figs 1 and 2, we measure not ‘fraction of protein in ER’ but ‘fraction of cells positive for ER fluorescence’. This is because the very slow ER exit of the TMD-only constructs present a major challenge for live-cell imaging, so ER exit was quantified on a population level, by fixing cells at various time points after biotin addition and quantifying the fraction of cells with observable ER localization (rather than tracking a single cell over time).

      For fitting to the kinetic model (which attempts to describe ‘fraction in ER/Golgi’) we re-measured all constructs by livecell imaging (see Supp Fig 5) to directly quantify relative construct abundance in the ER or Golgi. These data did not have the plateau in Fig 1E, suggesting that this is an artifact of counting “ER positive cells” which would be expected to have a longer lag than “fraction of protein in ER”. Notably however, t1/2 measured by both methods was similar, suggesting that the population measurement agrees well with single-cell live imaging.

      We have included all these explanations and caveats in the manuscript. We have also changed the wording from “well described” to “reasonably approximated”.

      (2) The model for Golgi sorting is also complicated and controversial, and while the authors' intention to not overinterpreting their data in this regard must be respected, this data is in support of the two-phase Golgi export model (Patterson et al PMID:18555781).

      The reviewers are correct, our observations and model are consistent with Patterson et al and it was a major oversight that a reference to this foundational work was not included. We have now added a discussion regarding the “two phase model” of Patterson and Lippincott-Schwartz.

      Furthermore contrary to the statement in lines 200-202, the kinetics of VSVG exit from the Golgi (Fig. S3) are roughly linear and so are NOT consistent with the previous report by Hirschberg et al.

      Regarding kinetics of VSVG, our intention was to claim that the timescale of VSVG efflux from the Golgi was similar to previously reported in Hirschberg, i.e. t1/2 roughly between 30-60 minutes. We have clarified this in the text. Minor differences in the details between our observations and Hirschberg are likely attributable to temperature, as those measurements were done at 32°C for the tsVSVG mutant.

      Moreover, the kinetics of LAT export from the Golgi (Fig. 3B) appear quite different, more closely approximating exponential decay of the signal. These points should be described accurately and discussed.

      Regarding linear versus exponential fits, we agree that the reality of Golgi sorting and efflux is far more complicated than accounted for by either the phenomenological curve fitting in Figs 1-3 or the modeling in Fig 4. In addition to the possibility of lateral domains within Golgi stacks, there is transport between stacks, retrograde traffic, etc. The fits in Figs 1-3 are not intended to model specifics of transport, but rather to be phenomenological descriptors that allowed us to describe efflux kinetics with one parameter (i.e. t1/2). In contrast, the more refined kinetic modeling presented in Figure 4 is designed to test a mechanistic hypothesis (i.e. coexisting membrane domains in Golgi) and describes well the key features of the trafficking data.

      Relationship between membrane traffic and domain partitioning:

      (1) Phase segregation in the GPMV is dictated by thermodynamics given its composition and the measurement temperature (at low temperatures 4degC). However at physiological temperatures (32-37degC) at which membrane trafficking is taking place these GPMVs are not phase separated. Hence it is difficult to argue that a sorting mechanism based solely on the partitioning of the synthetic LAT-TMD constructs into lo domains detected at low temperatures in GPMVs provide a basis (or its lack) for the differential kinetics of traffic of out of the Golgi (or ER). The mechanism in a living cell to form any lipid based sorting platforms naturally requires further elaboration, and by definition cannot resemble the lo domains generated in GPMVs at low temperatures.

      We thank the reviewers for bringing up this important point. GPMVs are a useful tool because they allow direct, quantitative measurements of protein partitioning between coexisting ordered and disordered phases in complex, cell-derived membranes. However, we entirely agree, that GPMVs do not fully represent the native organization of the living cell plasma membrane and we have previously discussed some of the relevant differences4,5. Despite these caveats, many studies have supported the cellular relevance of phase separation in GPMVs and the partitioning of proteins to raft domains therein 6-9. Most notably, elegant experiments from several independent labs have shown that fluorescent lipid analogs that partition to Lo domains in GPMVs also show distinct diffusive behaviors in live cells 6,7, strongly suggesting the presence of nanoscopic Lo domains in live cells. Similarly, our recent collaborative work with the lab of Sarah Veatch showed excellent agreement between raft preference in GPMVs and protein organization in living immune cells imaged by super-resolution microscopy10. Further, several labs6,7, including ours11, have reported nice correlations between raft partitioning in GPMVs and detergent resistance, which is a classical (though controversial) assay for raft association.

      Based on these points, we feel that GPMVs are a useful tool for quantifying protein preference for ordered (raft) membrane domains and that this preference is a useful proxy for the raft-associated behavior of these probes in living cells. We propose that this approach allows us to overcome a major reason for the historical controversy surrounding the raft field: nonquantitative and unreliable methodologies that prevented consistent definition of which proteins are supposed to be present in lipid rafts and why. Our work directly addresses this limitation by relating quantitative raft affinity measurements in a biological membrane with a relevant and measurable cellular outcome, specifically inter-organelle trafficking rates.

      Addressing the point about phase transition temperatures in GPMVs: this is the temperature at which macroscopic domains are observed. Based on physical models of phase separation, it has been proposed that macroscopic phase separation at lower temperatures is consistent sub-microscopic, nanoscale domains at higher temperatures8,12. These smaller domains can potentially be stabilized / functionalized by protein-protein interactions in cells13 that may not be present in GPMVs (e.g. because of lack of ATP).

      (2) The lipid compositions of each of these membranes - PM, ER and Golgi are drastically different. Each is likely to phase separate at different phase transition temperatures (if at all). The transition temperature is probably even lower for Golgi and the ER membranes compared to the PM. Hence, if the reported compositions of these compartments are to be taken at face value, the propensity to form phase separated domains at a physiological temperature will be very low. Are ordered domains even formed at the Golgi at physiological temperatures?

      It is a good point that the membrane compositions and the resulting physical properties (including any potential phase behavior) will be very different in the PM, ER, and Golgi. Whether ordered domains are present in any of these membranes in living cells remains difficult to directly visualize, especially for non-PM membranes which are not easily accessible by probes, are nanoscopic, and have complex morphologies. However, the fact that raft-preferring probes / proteins share some trafficking characteristics, while very similar non-raft mutants behave differently argues that raft affinity plays a role in subcellular traffic.

      (3) The hypothesis of 'lipid rafts' is a very specific idea, related to functional segregation, and the underlying basis for domain formation has been also hotly debated. In this article the authors conflate thermodynamic phase separation mechanisms with the potential formation of functional sorting domains, further adding to the confusion in the literature. To conclude that this segregation is indeed based on lipid environments of varying degrees of lipid order, it would probably be best to look at the heterogeneity of the various membranes directly using probes designed to measure lipid packing, and then look for colocalization of domains of different cargo with these domains.

      This is a very good suggestion, and a direction we are currently following. Unfortunately, due to the dynamic nature and small size of putative lateral membrane domains, combined with the interior of a cell being filled with lipophilic environments that overlay each other, directly imaging domains in organellar membranes with lipid packing probes remains extremely difficult with current technology (or at least available to us). We argue that the TMD probes used in this manuscript are a reasonable alternative, as they are fluorescent probes with validated selectivity for membrane compartments with different physical properties.

      Ultimately, the features of membrane domains suggested by a variety of techniques – i.e. nanometric, dynamic, relatively similar in composition to the surrounding membrane, potentially diverse/heterogeneous – make them inherently difficult to microscopically visualize. This is one reason why we believe studies like ours, which use a natural model system to directly quantify raft-associated behaviors and relate them to cellular effects (in our case, protein sorting), are a useful direction for this field.

      We believe we have been careful in our manuscript to avoid confusing language surrounding lipid rafts, phase separation, etc. Our experiments clearly show that mammalian membranes have the capacity to phase separate, that some proteins preferentially interact with more ordered domains, and that this preference is related to the subcellular trafficking fates and rates of these proteins. We have edited the manuscript to emphasize these claims and avoid the historical controversies and confusions.

      (4) In the super-resolution experiments (by SIM- where the enhancement of resolution is around two fold or less compared to optical), the authors are able to discern a segregation of the two types of Golgi-resident cargo that have different preferences for the lo-domains in GPMVs. It should be noted that TMD-allL and the LATallL end up in the late endosome after exit of the Golgi. Previous work from the Bonafacino laboratory (PMID: 28978644) has shown that proteins (such as M6PR) destined to go to the late endosome bud from a different part of the Golgi in vesicular carriers, while those that are destined for the cell surface first (including TfR) bud with tubular vesicular carriers. Thus at the resolution depicted in Fig 5, the segregation seen by the authors could be due to an alternative explanation, that these molecules are present in different areas of the Golgi for reasons different from phase partitioning. The relatively high colocalization of TfR with the GPI probe in Fig 5E is consistent with this explanation. TfR and GPI prefer different domains in the GPMV assays yet they show a high degree of colocalization and also traffic to the cell surface.

      This is a good point. Even at microscopic resolutions beyond the optical diffraction limit, we cannot make any strong claims that the segregation we observe is due to lateral lipid domains and not several reasonable alternatives, including separation between cisternae (rather than within), cargo vesicles moving between cisternae, or lateral domains that are mediated by protein assemblies rather than lipids. We have explicitly included this point in the Discussion: “Our SIM imaging suggests segregation of raft from nonraft cargo in the Golgi shortly (5 min) after RUSH release (Fig 5B), but at this level of resolution, we can only report reduced colocalization, not intra-Golgi protein distributions. Moreover, segregation within a Golgi cisterna would be very difficult to distinguish from cargo moving between cisternae at different rates or exiting via Golgi-proximal vesicles.”

      We have also added a similar caveat in the Results section of the manuscript: “These observations support the hypothesis that proteins can segregate in Golgi based on their affinity for distinct membrane domains; however, it is important to emphasize that this segregation does not necessarily imply lateral lipid-driven domains within a Golgi cisterna. Reasonable alternative possibilities include separation between cisternae (rather than within), cargo vesicles moving between cisternae, or lateral domains that are mediated by protein assemblies rather than lipids.”

      Finally, while probes with allL TMD do eventually end up in late endosomes (consistent with the Bonifacino lab’s findings which we include), they do so while initially transiting the PM2,11.

      Minor concerns:

      (1) Generally, the quantitation is high quality from difficult experimental data. Although a lot appears to be manual, it appears appropriately performed and interpreted. There are some claims that are made based on this quantitation, however, where there are no statistics performed. For example, figure 1B. Any quantitation with an accompanying conclusion should be subject to a statistical test. I think the quality of the model fits- this is particularly important.

      We appreciate the thoughtful feedback, the quantifications and fits were not trivial, but we believe important. We have added statistical significance to Figure 1B and others where it was missing.

      (2) Modulation of lipid levels in Fig 4E shows a significant change for the trafficking rate for the LAT-TMD construct and a not so significant change for all-TMD construct. However, these data are not convincing and appear to depend on a singular data point that appears to lower the mean value. In general, the experiment with the MZA inhibitor (Fig. 4D-F) is hard to interpret because cells will likely be sick after inhibition of sphingolipid and cholesterol synthesis. Moreover, the difference in effects for LAT-TMD and allL-TMD is marginal.

      We disagree with this interpretation. Fig 4E shows the average of three experiments and demonstrates clearly that the inhibitors change the Golgi efflux rate of LAT-TMD but not allL-TMD. This is summarized in the t1/2 quantifications of Fig 4F, which show a statistically significant change for LAT-TMD but not allL-TMD. This is not an effect of a singular data point, but rather the trend across the dataset.

      Further, the inhibitor conditions were tuned carefully to avoid cells becoming “sick”: at higher concentrations, cells did adopt unusual morphologies and began to detach from the plates. We pursued only lower concentrations, which cells survived for at least 48 hrs and without major morphological changes.

      (3) Line 173: 146-AAPSA-152 should read either 146-AAPSA-150 or 146-AAPSAPA-152, depending on what the authors intended.

      Thanks for the careful reading, we intended the former and it has been fixed.

      (4) What is the actual statistical significance in Fig. 3C and Fig. 3E? There is a single asterisk in each panel of the figure but two asterisks in the legend.

      Apologies, a single asterisk representing p<0.05 was intended. It has been fixed.

      (5) The code used to calculate the model. is not accessible. It is standard practice to host well-annotated code on Github or similar, and it would be good to have this publicly available.

      We have deposited the code on a public repository (doi: 10.5281/zenodo. 10478607) and added a note to the Methods.

      (1) Lorent, J. H. et al. Structural determinants and func7onal consequences of protein affinity for membrane ra=s. Nature communica/ons 8, 1219 (2017).PMC5663905

      (2) Diaz-Rohrer, B. B., Levental, K. R., Simons, K. & Levental, I. Membrane ra= associa7on is a determinant of plasma membrane localiza7on. Proc Natl Acad Sci U S A 111, 8500-8505 (2014).PMC4060687

      (3) Hirschberg, K. et al. Kine7c analysis of secretory protein traffic and characteriza7on of golgi to plasma membrane transport intermediates in living cells. J Cell Biol 143, 1485-1503 (1998).PMC2132993

      (4) Levental, K. R. & Levental, I. Giant plasma membrane vesicles: models for understanding membrane organiza7on. Current topics in membranes 75, 25-57 (2015)

      (5) Sezgin, E. et al. Elucida7ng membrane structure and protein behavior using giant plasma membrane vesicles. Nat Protoc 7, 1042-1051 (2012)

      (6) Komura, N. et al. Ra=-based interac7ons of gangliosides with a GPI-anchored receptor. Nat Chem Biol 12, 402-410 (2016)

      (7) Kinoshita, M. et al. Ra=-based sphingomyelin interac7ons revealed by new fluorescent sphingomyelin analogs. J Cell Biol 216, 1183-1204 (2017).PMC5379944

      (8) Stone, M. B., Shelby, S. A., Nunez, M. F., Wisser, K. & Veatch, S. L. Protein sor7ng by lipid phase-like domains supports emergent signaling func7on in B lymphocyte plasma membranes. eLife 6 (2017).PMC5373823

      (9) Machta, B. B. et al. Condi7ons that Stabilize Membrane Domains Also Antagonize n-Alcohol Anesthesia. Biophys J 111, 537-545 (2016)

      (10) Shelby, S. A., Castello-Serrano, I., Wisser, I., Levental, I. & S., V. Membrane phase separa7on drives protein organiza7on at BCR clusters. Nat Chem Biol in press (2023)

      (11) Diaz-Rohrer, B. et al. Rab3 mediates a pathway for endocy7c sor7ng and plasma membrane recycling of ordered microdomains Proc Natl Acad Sci U S A 120, e2207461120 (2023)

      (12) Veatch, S. L. et al. Cri7cal fluctua7ons in plasma membrane vesicles. ACS Chem Biol 3, 287-293 (2008)

      (13) Wang, H. Y. et al. Coupling of protein condensates to ordered lipid domains determines func7onal membrane organiza7on. Science advances 9, eadf6205 (2023).PMC10132753

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      (1) The main hypothesis/conclusion is summarized in the abstract: "Our study presents an intriguing model of cilia length regulation via controlling IFT speed through the modulation of the size of the IFT complex." The data clearly document the remarkable correlation between IFT velocity and ciliary length in the different cells/tissues/organs analyzed. The experimental test of this idea, i.e., the knock-down of GFP-IFT88, further supports the conclusion but needs to be interpreted more carefully. While IFT particle size and train velocity were reduced in the IFT88 morphants, the number of IFT particles is even more decreased. Thus, the contributions of the reduction in train size and velocity to ciliary length are, in my opinion, not unambiguous. Also, the concept that larger trains move faster, likely because they dock more motors and/or better coordinating kinesin-2 and that faster IFT causes cilia to be longer, is to my knowledge, not further supported by observations in other systems (see below).

      Thank you for your comments. We agree with the reviewer that the final section on IFT train size, velocity, and ciliary length regulation requires additional evidence. The purpose of the knockdown experiments was to investigate the potential relationship between IFT speed and IFT train size. We hypothesize that a deficiency in IFT88 proteins may disrupt the regular assembly of IFT particles, leading to the formation of shorter IFT trains. Indeed, we observed a shorter IFT particles and slight reduction in the transport speed of IFT particles in the morphants. Certainly, it would be more convincing to distinguish these IFT trains through ultrastructural analysis. However, with current techniques, performing such analysis on the zebrafish model will be very difficult due to the limited sample size. In the revised version, we have tempered the conclusions in these sections, as suggested by other reviewers as well.

      (2) I think the manuscript would be strengthened if the IFT frequency would also be analyzed in the five types of cilia. This could be done based on the existing kymographs from the spinning disk videos. As mentioned above, transport frequency in addition to train size and velocity is an important part of estimating the total number of IFT particles, which bind the actual cargoes, entering/moving in cilia.

      Thank you. We have analyzed the entry frequency of IFT in five types of cilia, both anterior and posterior. The analysis indicates that longer cilia also exhibit a higher frequency of fluorescent particles entering the cilia. These results are presented in Figure 3J.

      (3) Here, the variation in IFT velocity in cilia of different lengths within one species is documented - the results document a remarkable correlation between IFT velocity and ciliary length. These data need to be compared to observations from the literature. For example, the velocity of IFT in the quite long (~ 100 um) olfactory cilia of mice is similar to that observed in the rather short cilia of fibroblasts (~0.6 um/s). In Chlamydomonas, IFT velocity is not different in long flagella mutants compared to controls. Probably data are also available for C. elegans or other systems. Discussing these data would provide a broader perspective on the applicability of the model outside of zebrafish.

      Thank you for your suggestions. We believe the most significant novelty of our manuscript is the discovery that IFT velocities are closely related to cilia length in an in vivo model system. Our data suggest that longer cilia may require faster IFT transport to maintain their stable length, powered by larger IFT trains. We did observe substantial variability in IFT velocities across different studies. For example, anterograde IFT transport ranges from 0.2 µm/s in mouse olfactory neurons (Williams et al, 2014) to 0.8 µm/s in 293T cells (See et al, 2016) and 0.4 µm/s in IMCD-3 cells (Broekhuis et al, 2014). Even in NIH-3T3 cells, two studies report significant differences, despite using the same IFT reporters: 0.3 µm/s versus 0.9 µm/s (Kunova Bosakova et al, 2018; Luo et al, 2017). These findings suggest that cell types and culture conditions can influence IFT velocities in vitro, which may not accurately represent in vivo conditions. Interestingly, research on mouse olfactory neurons showed a strong correlation between anterograde and retrograde IFT velocities. Additionally, IFT velocity is closely related to the cell types within the olfactory neuron population, consistent with our results (Williams et al., 2014). 

      Reviewer #2 (Public Review):

      Summary:

      In this study, the authors study intraflagellar transport (IFT) in cilia of diverse organs in zebrafish. They elucidate that IFT88-GFP (an IFT-B core complex protein) can substitute for endogenous IFT88 in promoting ciliogenesis and use it as a reporter to visualize IFT dynamics in living zebrafish embryos. They observe striking differences in cilia lengths and velocity of IFT trains in different cilia types, with smaller cilia lengths correlating with lower IFT speed. They generate several mutants and show that disrupting the function of different kinesin-2 motors and BBSome or altering post-translational modifications of tubulin does not have a significant impact on IFT velocity. They however observe that when the amount of IFT88 is reduced it impacts the cilia length, IFT velocity as well as the number and size of IFT trains. They also show that the IFT train size is slightly smaller in one of the organs with shorter cilia (spinal cord). Based on their observations they propose that IFT velocity determines cilia length and go one step further to propose that IFT velocity is regulated by the size of IFT trains.

      Strengths:

      The main highlight of this study is the direct visualization of IFT dynamics in multiple organs of a living complex multi-cellular organism, zebrafish. The quality of the imaging is really good. Further, the authors have developed phenomenal resources to study IFT in zebrafish which would allow us to explore several mechanisms involved in IFT regulation in future studies. They make some interesting findings in mutants with disrupted function of kinesin-2, BBSome, and tubulin modifying enzymes which are interesting to compare with cilia studies in other model organisms. Also, their observation of a possible link between cilia length and IFT speed is potentially fascinating.

      Weaknesses:

      The manuscript as it stands, has several issues.

      (1) The study does not provide a qualitative description of cilia organization in different cell types, the cilia length variation within the same organ, and IFT dynamics. The methodology is also described minimally and must be detailed with more care such that similar studies can be done in other laboratories.

      Thank you for your comments. We found that cilia length is generally consistent within the same cell types we examined, including those in the pronephric duct, spinal cord, and epidermal cells. However, we observed variability in cilia length within ear crista cilia. Upon comparing IFT velocities, we found no differences among these cilia, further confirming our conclusion that IFT velocity is directly related to cell type rather than cilia length. These new results are presented in Figure S4 of the revised version.

      We apologize for the lack of methodological details in the original manuscript. Following the reviewer's suggestion, we have added a detailed description of the methods used to generate the transgenic line and to perform IFT velocity analysis. These details are included in Figure S2 and are thoroughly described in the methods section of the revised manuscript.

      (2) They provide remarkable new observations for all the mutants. However, discussion regarding what the findings imply and how these observations align (or contradict) with what has been observed in cilia studies in other organisms is incomprehensive.

      Thank you for this suggestion. We initially submitted this paper as a report, which have word limits. We believe the main finding of our work is that IFT velocity is directly associated with cell type, with longer cilia requiring higher velocities to maintain their length. This association of IFT velocity with cell type has also been observed in mouse olfactory neurons(Williams et al., 2014). We have included a discussion of our findings, along with related data published in other organisms, in the revised version.

      (3) The analysis of IFT velocities, the main parameter they compare between experiments, is not described at all. The IFT velocities appear variable in several kymographs (and movies) and are visually difficult to see in shorter cilia. It is unclear how they make sure that the velocity readout is robust. Perhaps, a more automated approach is necessary to obtain more precise velocity estimates.

      Thank you for these comments. To measure the IFT velocities, we first used ImageJ software to generate a kymograph, where moving particles appear as oblique lines. The velocity of these particles can be calculated based on the slope of the lines (Zhou et al, 2001). In the initial version, most of the lines were drawn manually. To eliminate potential artifacts, we also used KymographDirect software to automatically trace the particle paths. The velocities obtained with this method were similar to those calculated manually. These new data are now shown in Figure S2 B-D. For shorter cilia, we only used particles with clear moving paths for our calculations. In the revised version, we have included a detailed description of the velocity analysis methods.

      (4) They claim that IFT speeds are determined by the size of IFT trains, based on their observations in samples with a reduced amount of IFT88. If this was indeed the case, the velocity of a brighter IFT train (larger train) would be higher than the velocity of a dimmer IFT train (smaller train) within the same cilia. This is not apparent from the movies and such a correlation should be verified to make their claim stronger.

      Thank you for these excellent suggestions. We measured the particle size and fluorescence intensity of 3 dpf crista cilia using high-resolution images acquired with Abberior STEDYCON. The results showed a positive correlation between the two. These data have been added to the revised version in Figure 5I, which includes both control and ift88 morphant data.

      (5) They make an even larger claim that the cilia length (and IFT velocity) in different organs is different due to differences in the sizes of IFT trains. This is based on a marginal difference they observe between the cilia of crista and the spinal cord in immunofluorescence experiments (Figure 5C). Inferring that this minor difference is key to the striking difference in cilia length and IFT velocity is incorrect in my opinion.

      Impact:

      Overall, I think this work develops an exciting new multicellular model organism to study IFT mechanisms. Zebrafish is a vertebrate where we can perform genetic modifications with relative ease. This could be an ideal model to study not just the role of IFT in connection with ciliary function but also ciliopathies. Further, from an evolutionary perspective, it is fascinating to compare IFT mechanisms in zebrafish with unicellular protists like Chlamydomonas, simple multicellular organisms like C elegans, and primary mammalian cell cultures. Having said that, the underlying storyline of this study is flawed in my opinion and I would recommend the authors to report the striking findings and methodology in more detail while significantly toning down their proposed hypothesis on ciliary length regulation. Given the technological advancements made in this study, I think it is fine if it is a descriptive manuscript and doesn't necessarily need a breakthrough hypothesis based on preliminary evidence.

      Thanks for with these comments. We agree with this reviewer that more evidences are required to explain why IFT is transported faster in longer cilia. In the revised version, we have modified and softened this section, focusing primarily on the novel findings of IFT velocity differences between cilia of varying lengths.

      Reviewer #3 (Public Review):

      Summary:

      A known feature of cilia in vertebrates and many, if not all, invertebrates is the striking heterogeneity of their lengths among different cell types. The underlying mechanisms, however, remain largely elusive. In the manuscript, the authors addressed this question from the angle of intraflagellar transport (IFT), a cilia-specific bidirectional transportation machinery essential to biogenesis, homeostasis, and functions of cilia, by using zebrafish as a model organism. They conducted a series of experiments and proposed an interesting mechanism. Furthermore, they achieved in situ live imaging of IFT in zebrafish larvae, which is a technical advance in the field.

      Strengths:

      The authors initially demonstrated that ectopically expressed Ift88-GFP through a certain heatshock induction protocol fully sustained the normal development of mutant zebrafish that would otherwise be dead by 7 dpf due to the lack of this critical component of IFT-B complex.

      Accordingly, cilia formations were also fully restored in the tissues examined. By imaging the IFT using Ift88-GFP in the mutant fish as a marker, they unexpectedly found that both anterograde and retrograde velocities of IFT trains varied among cilia of different cell types and appeared to be positively correlated with the length of the cilia.

      For insights into the possible cause(s) of the heterogeneity in IFT velocities, the authors assessed the effects of IFT kinesin Kif3b and Kif17, BBSome, and glycylation or glutamylation of axonemal tubulin on IFT and excluded their contributions. They also used a cilia-localized ATP reporter to exclude the possibility of different ciliary ATP concentrations. When they compared the size of Ift88-GFP puncta in crista cilia, which are long, and spinal cord cilia, which are relatively short, by imaging with a cutting-edge super-resolution microscope, they noticed a positive correlation between the puncta size, which presumably reflected the size of IFT trains, and the length of the cilia.

      Finally, they investigated whether it is the size of IFT trains that dictates the ciliary length. They injected a low dose (0.5 ng/embryo) of ift88 MO and showed that, although such a dosage did not induce the body curvature of the zebrafish larvae, crista cilia were shorter and contained less Ift88-GFP puncta. The particle size was also reduced. These data collectively suggested mildly downregulated expression levels of Ift88-GFP. Surprisingly, they observed significant reductions in both retrograde and anterograde IFT velocities. Therefore, they proposed that longer IFT trains would facilitate faster IFT and result in longer cilia.

      Weaknesses:

      The current manuscript, however, contains serious flaws that markedly limit the credibility of major results and findings. Firstly, important experimental information is frequently missing, including (but not limited to) developmental stages of zebrafish larvae assayed (Figures 1, 3, and 5), how the embryos or larvae were treated to express Ift88-GFP (Figures 3-5), and descriptions on sample sizes and the number of independent experiments or larvae examined in statistical results (Figures 3-5, S3, S6). For instance, although Figure 1B appears to be the standard experimental scheme, the authors provided results from 30-hpf larvae (Figure 3) that, according to Figure 1B, are supposed to neither express Ift88-GFP nor be genotyped because both the first round of heat shock treatment and the genotyping were arranged at 48 hpf. Similarly, the results that ovl larvae containing Tg(hsp70l:ift88 GFP) (again, because the genotype is not disclosed in the manuscript, one can only deduce) display normal body curvature at 2 dpf after the injection of 0.5 ng of ift88 MO (Fig 5D) is quite confusing because the larvae should also have been negative for Ift88-GFP and thus displayed body curvature. Secondly, some inferences are more or less logically flawed. The authors tend to use negative results on specific assays to exclude all possibilities. For instance, the negative results in Figures 4A-B are not sufficient to "suggest that the variability in IFT speeds among different cilia cannot be attributed to the use of different motor proteins" because the authors have not checked dynein-2 and other IFT kinesins. In fact, in their previous publication (Zhao et al., 2012), the authors actually demonstrated that different IFT kinesins have different effects on ciliogenesis and ciliary length in different tissues. Furthermore, instead of also examining cilia affected by Kif3b or Kif17 mutation, they only examined crista cilia, which are not sensitive to the mutations. Similarly, their results in Figures 4C-G only excluded the importance of tubulin glycylation or glutamylation in IFT. Thirdly, the conclusive model is based on certain assumptions, e.g., constant IFT velocities in a given cell type. The authors, however, do not discuss other possibilities.

      Thank you for pointing out the flaws in our experiments. We apologize for any confusion caused by the lack of detail in our descriptions. Regarding Figure 2B, we want to clarify that it depicts the procedure for heat shock experiments conducted for the ovl mutants' rescue assay, not the experimental procedure for IFT imaging. In the revised version, we have included detailed methods on how to induce the expression of Ift88-GFP via heat shock and the subsequent image processing. The procedure for heat induction is also shown in Figure S2A. We have also added the sample sizes for each experiment and descriptions of the statistical tests used in the appropriate sections of the revised version.

      Regarding the comments on the relationship between IFT speed variability and motor proteins, we completely agree with the reviewer. We have revised our description of this part accordingly.

      Lastly, the results shown in Figure 5D are from a wild-type background, not ovl mutants. We aimed to demonstrate that a lower dose of ift88 morpholino (0.5 ng) can partially knock down Ift88, allowing embryos to maintain a generally normal body axis, while the cilia in the ear crista became significantly shorter.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Minor

      (I recommend adding page numbers and probably line numbers. This makes commenting easier)

      We have added page numbers and line numbers in the revised manuscript.

      Intro: Furthermore, ultra-high-resolution microscopy showed a close association between cilia length in different organs and the size of IFT fluorescent particles, indicating the presence of larger IFT trains in longer cilia.

      This correlation is not that strong and data are only available for 2 types of cilia.

      Thanks. We have modified this part.

      P5) cilia (Fig. 1D) -> (Fig. S1)

      Thanks. We have corrected this.

      P5) "These movies provide a great opportunity to compare IFT across different cilia." Rewrite: "This approach allows one to determine the velocity and frequency based of IFT based on kymographs" or similar. 

      Thank you for your correction, we have changed it in the revised manuscript.

      This observation suggests that cargo and motor proteins are more effectively coordinated in transporting materials, resulting in increased IFT velocity-a novel regulatory mechanism governing IFT speed in vertebrate cilia.

      This is a somewhat cryptic phrase, rewrite?

      We have modified this sentence.

      P6 and elsewhere: "IFT in the absence of Kif17 or Bbs proteins" I wonder if it would be better to provide subheadings summarizing the main observation instead of descriptive titles. This includes the title of the manuscript.

      Thanks for this suggestion. We have changed the title of subheadings in the revised manuscript. We prefer to keep the current title of this manuscript, as we think this paper is mainly to describe IFT in different types of cilia. 

      Is it known whether IFT protein and motors are alternatively spliced in the various ciliated cells of zebrafish? In this context, is it known whether the cells express IFT proteins at different levels?

      We analyzed the transcript isoforms of several ciliary genes, including ift88, ift52, ift70, ift172, and kif3a. Most of these IFT genes possess only a single transcript isoform. The Kif3a motor proteins have two isoforms (long and short isoforms), however, the shorter isoform contains only the motor domain and is presumed to be nonfunctional for IFT. While we cannot completely rule out this possibility, we consider it unlikely that the variation in IFT speed is due to alternative splicing in ciliary tissues.

      P6) The relation between osm-3 and Kif17 needs to be introduced briefly.  

      Thank you for pointing this out. We have added it in the proper place of the revised manuscript.

      P6) "IFT was driven by kinesin or dynein motor proteins along the ciliary axoneme." "is driven"?

      Delete phrase and IFT to the next sentence?

      We have deleted this sentence.

      P7) "Moreover, the mutants were able to survive to adulthood and there is no difference in the fertility or sperm motility between mutants and control siblings, which is slightly different from those observed in mouse mutants(Gadadhar et al., 2021)." Could some of these data be shown? 

      Thanks for this suggestion. When crossed with wild-type females, all homozygous mutants showed no difference in fertility compared to controls. The percentage of fertilization rates in mutants was 90.5% (n = 7), which was similar to wild-type (87.2%, n = 7). We determined the trajectories of free-swimming sperm by high-speed video microscopy. The vast majority of sperm in ttll3 mutant, similar to wild-type sperm, swim almost entirely along a straight path, which is different from what was observed in the mouse mutant (where 86% of TTLL3-/-TTLL8-/- sperm rotate in situ). We assessed cilia motility in the pronephric ducts of 5dpf embryos using high-speed video microscopy. The ttll3 mutant exhibited a rhythmic sinusoidal wave pattern similar to the control, and there was no significant difference in ciliary beating frequency. These new data are now included in Figure S7C-H.

      P7) "which has been shown early to reduce" earlier

      We have changed it. Thanks.

      Maybe the authors could speculate how the cells ensure the assembly of larger/faster trains in certain cells. Are the relative expression levels known or worth exploring?

      Thank you for these suggestions. We believe that longer cilia may maintain larger IFT particle pools in the basal body region, facilitating the assembly of large IFT trains. The higher frequency of IFT injection in longer cilia further supports this hypothesis. It is likely that cells with longer cilia have higher expression levels of IFT proteins. However, due to the lack of proper antibodies for IFT proteins in zebrafish, it is currently unfeasible to compare this. This experiment is certainly worth investigating in the future. We have added this discussion in the revised manuscript.

      Reviewer #2 (Recommendations for The Authors):

      Here are detailed comments for the authors:

      (1) The authors need to describe their methodology of imaging and what they observe in much greater detail. How were the different cilia types organized? Approximately how many were observed in every organ? How were they oriented? Were there length variations between cilia in the same organ? While imaging, were individual cilium mostly lying in a single focal plane of imaging or the authors often performed z-scans over multiple planes. Velocity measurement is highly variable if individual cilia are spanning over a large volume, with only part of it in focus in single plane acquisition.

      Thank you for your comments. We apologize for the lack of details in the methodology. We have added a detailed description in the 'Materials and Methods' section and illustrated the experimental paradigm in Figure S2A of the revised manuscript. In most tissues we examined, the length of cilia was relatively uniform, except in the crista. The cilia in the crista were significantly longer, with lengths varying between 5 and 30 μm, compared to those in other tissues. We categorized the cilia lengths in the crista into three groups at intervals of 10 μm and measured the anterograde and retrograde velocities of IFT in each group. The results, shown in Figure S4, revealed no significant difference in IFT velocity among the different cilia lengths within the same tissue.  Regarding the imaging, all IFT movies were captured in a single focal plane. In most cases, we did not observe significant velocity variability within the same cilium.

      (2) It is very difficult to directly observe the large differences in IFT velocity from the kymographs, especially in the case of shorter cilia and retrograde motion in them. The quality of the example kymographs could be improved and more zoomed in several cases.

      Thank you for this suggestion. We have modified this.

      (3) The authors do not describe at all, how velocity analysis was done on the kymographs? Were lines drawn manually on the kymographs? From the movies and the kymographs it is visible that the IFT motion is often variable and sometimes gets stuck. How did the authors determine the velocities of such trains? A single slope through the entire train or part of the train? Were they consistent with this? Such variable motion is not so easy to discern in the case of really short cilia. The authors could use a more automatic way of extracting velocities from kymographs using tools such as kymodirect or kymobutler. Keeping in mind that IFT velocity is the main parameter studied in this work, it is important that the analysis is robust.

      We apologize for the previous lack of detailed description. We utilized ImageJ software to generate kymographs, where particles appear as lines. For a moving particle, this line appears oblique. We manually drew lines on the kymographs, and the velocity of particles was calculated based on the slope (Zhou et al., 2001). We only analyzed particles that tracked the full length of the cilia. Following the reviewer's suggestions, we also used the automatic software KymographDirect to calculate the velocity of IFT particles. The results were similar to those calculated using the previous method. These new data are now shown in Figure S2B-D. For shorter cilia, we only used particles with clear moving paths for our calculations. In the revised version, we have included a detailed description of the velocity analysis methods.

      (4) In line with the previous point, as visible from the kymographs the velocity is significantly slower near the transition zone. Did the authors make sure they are not including the region around the transition zone while measuring the IFT velocity, especially in the case of shorter cilia?

      Thank you for the comment. In the revised manuscript, we automatically extracted the path of particle using KymographDirect software. Quantification of each particle's velocity versus position in crista reveals that anterograde IFT proceeds from the base to the tip at a relatively constant speed, whereas retrograde IFT undergoes a slightly acceleration process when returning to the base (Fig. S2E). This finding differs from observations in C. elegans, which dynein-2 first accelerating and then decelerating back to 1.2 μm/s adjacent to the ciliary base (Yi et al, 2017). We believe it is very unlikely that the slow IFT velocity is due to the calculation of IFT only in the transition zone of shorter cilia.

      (5) There are several fascinating findings in this work that the authors do not discuss properly. Firstly, do the authors have a hypothesis as to why IFT speeds are so radically different in different cilia types, given that they are driven by the same motor proteins and have the same ATP levels? They make a big claim in this paper that IFT train sizes correlate with train velocities. IFT trains have a highly ordered structure with regular binding sites for motor proteins. So, a smaller train would have a proportional number of motors attached to them. Why (and how) are the motors moving trains so slowly in some cilia and not in others? If there is no clear answer, the authors must put forward the open question with greater clarity.

      Thank you for the comment. We hypothesize that if multiple motors drive the movement of cargoes synergistically, it could increase the speed of IFT transport. An example supporting this hypothesis is the principle of multiple-unit high-speed trains, which use multiple motors in each individual car to achieve high speeds. Of course, this is just one hypothesis, and we cannot exclude other possibilities, such as the use of different adaptors in different cell types. We have revised our conclusions accordingly in the updated manuscript.

      (6) They find that IFT speeds do not change in kif17 mutants. Are the cilia length also similar (does not appear to be the case in Figure 4 and Figure S3)? Cilia length needs to be quantified. Further, they mention that in C elegans, heterotrimeric kinesin-2 and homodimeric kinesin-2 coordinate IFT. However, from several previous studies, we know that in Chlamydomonas and in mammalian cilia IFT is driven primarily by heterotrimeric kinesin-2 with no evidence that homodimeric kinesin-2 is linked with driving IFT. It appears to be the same in zebrafish. This is an interesting finding and needs to be discussed far more comprehensively.

      Thank you for your comments. We have previously shown that the number and length of crista cilia were grossly normal in kif17 mutants (Zhao et al, 2012). The length of crista cilia displayed slight variability even in wild-type larvae. We quantified the length of cilia in both the crista and neuromast within different mutants, and our analysis revealed no significant difference (see Author response image 1). We agree with the reviewer that Kif17 may play a minor role in driving IFT in cilia. However, previous studies have shown that KIF17 exhibits robust, processive particle movement in both the anterograde and retrograde directions along the entire olfactory sensory neuron cilia in mice. This suggests that, although not essential, KIF17 may also be involved in IFT (Williams et al., 2014). We have added more discussion about Kif17 and heterotrimeric kinesin in the appropriate section of the revised manuscript.

      Author response image 1.

      Statistical significance is based on Kruskal-Wallis statistic, Dunn's multiple comparisons test. n.s., not significant, p>0.05.

      (7) Again, they find that IFT speeds do not change in BBS-4 mutants. I have the same comment about the cilia length as for kif17 mutants. Further, the discussion for this finding is lacking. The authors mention that IFT is disrupted in BBSome mutants of C elegans. Is this the case in other organisms as well? Structural studies on IFT trains reveal that BBSomes are not part of the core structure, while other studies reveal that BBSomes are not essential for IFT. So perhaps the results here are not too surprising.

      We agree with the reviewer that BBSome is possibly not essential for IFT in most cilia. However, in the cilia of olfactory sensory neurons, BBSome is involved in IFT in both mice and nematodes (Ou et al, 2005; Williams et al., 2014). We have added more discussion about BBSome in the appropriate section of the revised manuscript.

      (8) No change in IFT velocities in kif3b mutants is rather surprising. The authors suggest that Kif3C homodimerizes to carry out IFT in the absence of Kif3B. Even if that is the case, the individual homodimer constituents of heterotrimeric kinesin-2 have been shown in previous studies to have different motor properties when homodimerized artificially. Why is IFT not affected in these mutants? This should be discussed. Also, the cilia lengths should be quantified.

      We think the presence of the Kif3A/Kif3C/KAP3 trimeric kinesin may substitute for the Kif3A/Kif3B/KAP3 motors in kif3b mutants, which show normal length of cristae cilia. The Kif3A/Kif3C/KAP3 trimeric kinesin may have similar transport speeds as the Kif3A/Kif3B/KAP3 motors. We did not propose that the Kif3C homodimer can drive the cargoes alone. We apologize for this misunderstanding. Additionally, we have reevaluated the IFT velocities among different lengths of cristae cilia and found no difference between longer and shorter cilia within the same cell types.

      (9) The findings with tubulin modifications should also be discussed in comparison to what has been observed in other organisms.

      We have added further discussion about this result in the revised manuscript.

      (10) The authors find that IFT velocity is lower in ift88 morphants. They also find that the cilia length is shorter (in which cilia type?). Immunofluorescence experiments show that the IFT particle number and size are lower in the ift88 morphants. How many organisms did they look at for this data? What is the experimental variability in intensity measurements in immunofluorescence experiments? Wouldn't the authors expect much higher variability in ift88 morphants (between individual organisms) due to different amounts of IFT88 than for wildtype?

      Thank you for your comments. We apologize for the lack of information regarding the number of organisms observed in Figure 5. These numbers have been added to the figure legends in the revised manuscript. When a low dose of ift88 morpholino was injected, we observed significant shortening of cilia in the ear crista, along with reduced IFT speed. We measured the fluorescence intensity of different IFT particles and found a positive correlation between IFT particle size and fluorescence intensity (Fig 5I). Moreover, the variability of cilia length in cristae is slightly higher in ift88 morphants. These new data have been included in the revised version.

      (11) From their observations they make the claim that IFT velocity is directly proportional to IFT train size. Now within every cilium, IFT trains have large size variations, given the variable intensities for different IFT trains. The authors themselves show that they resolve far more trains when imaging with STED (possibly because they are able to visualize the smaller trains). Is the IFT velocity within the same cilium directly correlated with the intensity of the train, both for wildtype and ift88 morphants? That is the most direct way the authors can test that their hypothesis is true. Higher intensity (larger train size) results in faster velocity. From a qualitative look at their movies, I do not see any strong evidence for that.

      Thank you for your comments. We have measured the particle size and fluorescence intensity of 3dpf crista cilia using high-resolution images acquired with Abberior STEDYCON. The results, shown in Figure 5I, demonstrate a positive correlation between particle size and fluorescence intensity.

      (12) Are the sizes of both anterograde and retrograde trains lower in ift88 morphants? It's not clear from the data. It should be clearly stated that the authors speculate this and this is not directly evident from the data.

      Because the size of IFT fluorescence particles is based on immunostaining results, not live imaging, we cannot determine whether they are anterograde or retrograde IFT particles.

      Therefore, we can only speculate that possibly both anterograde and retrograde trains are reduced in ift88 morphants.

      (13) The biggest claim in this paper is that the cilia lengths in different organs are different due to differences in IFT train sizes. This is based on highly preliminary data shown in Figure 5C (how many organisms did they measure?). The difference is marginal and the dataset for spinal cord cilia is really small. The internal variability within the same cilia type is larger than the difference. How is this tiny difference resulting in such a large difference in IFT speeds? I believe their conclusions based on this data are incorrect.

      From our results, we believe that IFT velocity is related to cell types rather than the length of cilia (Fig. S4), which has also been mentioned in previous studies (Williams et al., 2014).  We agree with the reviewer that the evidence for faster IFT speed due to larger train size is not very solid. We have accordingly softened our conclusion and mentioned other possibilities in the revised version.

      Minor comments:

      (1) The authors only mention the number of IFT particles for their data. They should provide the number of cilia and the number of organisms as well.

      Thank you for your suggestion. We added the number of cilia and organisms next to the number of particles in Figure 3, Figure S2-S5 and Table S1 of the revised manuscript.

      (2) Cilia and flagella are similar structurally but not the same. The authors should change the following sentence: In contrast to the localization of most organelles within cells, cilia (also known as flagellar) are microtubule-based structures that extend from the cell surface, facilitating a more straightforward quantification of their size.  

      Thank you for the detailed review. We have changed it in our revised manuscript. 

      (3) The authors should provide references here. For example, Chlamydomonas has two flagella with lengths ranging from 10 to 14 μm, while sensory cilia in C. elegans vary from approximately 1.5 μm to 7.5 μm. In most mammalian cells, the primary cilium typically measures between 3 and 10 μm.  

      We have added it in our revised manuscript. 

      (4) They should mention ovl mutants are IFT88 mutants when they introduce it in the main text.

      We have added it in our revised manuscript. 

      (5) Correct the grammar here: The velocity of IFT within different cilia also seems unchanged (Figure 4F, Movie S9, Table S1).  

      We have changed it. 

      (6) Correct the grammar here: Similarly, the IFT speeds also exhibited only slight changes in ccp5 morphants, which decreased the deglutamylase activities of Ccp5 and resulted in a hyperglutamylated tubulin

      We have changed it. 

      Reviewer #3 (Recommendations For The Authors):

      Introduction:

      1st paragraph, "flagellar" should be "flagella"; 2nd paragraph, "result a wide range of" should be "result in a...".  

      We have changed it. 

      Results and discussion:

      "...certain specialized cell types, including olfactory epithelia and pronephric duct, ...": olfactory epithelia and pronephric duct are tissues, not cells.  

      "...the GFP fluorescence of the transgene was prominently enriched in the cilia (Fig 1D)" : Fig 2D?  

      "The velocity of IFT within different cilia was also seems unchanged (Fig. 4 F, Movie S9, Table S1)": "was" and "seems" cannot be used together.  

      "...driven by b-actin2 promotor":    -actin2? 

      "...each dynein motor protein might propel multiple IFT complexes": The "protein" should be deleted.  

      Thanks. We have corrected all of these mistakes.  

      Figures:

      Figure 1: Dyes and antibodies used other than the anti-acetylated tubulin antibody should mentioned. The developmental stages of zebrafish used for the imaging are mostly missing.  

      Thanks. In the revised version, we have updated the figure legends to include descriptions of the antibodies, developmental stages, as well as N numbers.

      Figure 2B: What "hphs" means should be explained somewhere.  

      Thanks. We have added full name for these abbreviations.  

      Figures 3A-E: For clarity, the cilia whose IFT kymographs are shown should be marked. "Representative particle traces are marked with white lines in panels D and E" (legend): they are actually black lines. The authors should also clearly disclose the developmental stages of zebrafish used for the imaging.  

      Thank you for your comments. In the revised manuscript, the cilia used to generate the kymograph are marked by yellow arrows. We have updated the legend to change "white" to "black." Additionally, we have included the developmental stages of zebrafish used for imaging in Figure 3A.

      Figures 3G-K: The authors used quantification results from 4-dpf larvae and 30-hpf embryos for comparisons. Nevertheless, according to their experimental scheme in Figure 2B, 30-hpf embryos were not subjected to heat-shock treatment and genotyping. How could they express Ift88-GFP for the imaging? How could the authors choose larvae of the right genotypes? In addition, even if the authors heat-shocked them in time but forgot to mention, there are issues that need to be clarified experimentally and/or through citations, at least through discussions. Firstly, at 30 hpf, those motile cilia are probably still elongating. If this is the case, their final lengths would be longer than those presented (H; the authors need to disclose whether the lengths were measured from ciliary Ift88-GFP or another marker). In other words, the correlation with IFT velocities (H and I) might no longer exist when mature cilia were measured. Similarly, cilia undergo gradual disassembly during the cell cycle. Epidermal cells at 30-hpf are likely proliferating actively, and the average length of their cilia (H) would be shorter than that measured from quiescent epidermal cells in later stages.

      Thank you for these comments. First, we want to clarify that Figure 2B depicts the procedure for heat shock experiments conducted for the ovl mutants' rescue assay, not the experimental procedure for IFT imaging. We visualized IFT in five types of cilia using Tg (hsp70l: ift88-GFP) embryos without the ovl mutant background. In the revised manuscript, we have provided a detailed description of embryo treatment in the 'Materials and Methods' section and illustrated the experimental paradigm in Figure S2A. 

      Regarding the ciliary length differences between different developmental stages, we quantified cilia length in epidermal cells at 30 hpf versus 4 dpf, and in pronephric duct cilia at 30 hpf versus 48 hpf. Our analysis found no significant difference in length between earlier and later stages. Additionally, IFT velocities were comparable between these stages. These findings suggest that slower IFT velocities may not be attributed to the selection of different embryonic stages. Furthermore, we demonstrated that longer and shorter cilia maintain similar IFT velocities in crista cilia, indicating that elongated cilia within the same cell type exhibit comparable IFT velocities. These new results are presented in Figures S4 and S5 in the revised version.

      Secondly, do IFT velocities differ between elongating and mature cilia or remain relatively constant for a given cell type? The authors apparently take the latter for granted without even discussing the possibility of the former. In addition, whether the quantification results were from cilia of one or multiple fish, an important parameter to reflect the reproducibility, and sample sizes for the length data are not disclosed. The lack of descriptions on sample sizes and the number of independent experiments or larvae examined are actually common for statistical results in this manuscript.

      Thank you for your comments. We apologize for omitting the basic description of sample sizes and the number of cilia analyzed. We have addressed these issues in the revised manuscript. The length of 4dpf Crista cilia is variable, with longer cilia reaching up to 30 µm and shorter cilia measuring only around 5 µm within the same crista. We categorized the cilia length of Crista into three groups at intervals of 10 µm and measured anterograde and retrograde velocities of IFT in each group. The results revealed no significant difference in IFT velocity among elongating and mature cilia within crista. These supplementary data are now included in Figure S4.

      Figures 4A-B: When mutating neither Kif17 nor Kif3b affected the IFT of crista cilia, the data unlikely "suggest that the variability in IFT speeds among different cilia cannot be attributed to the use of different motor proteins". In fact, in the cited publication (Zhao et al., 2012), the authors used the same and additional mutants (Kif3c and Kif3cl) to demonstrate that different IFT-related kinesin motors have different effects on ciliogenesis and ciliary length in different tissues, results actually implying tissue-specific contributions of different kinesin motors to IFT. Furthermore, although likely only cytoplasmic dynein-2 is involved in the retrograde IFT, the authors cannot exclude the possibility that different combinations or isoforms of its many subunits and regulators contribute to the velocity regulation. Therefore, the authors need to reconsider their wording. This reviewer would suggest that the authors examine the IFT status of cilia that were previously reported to be shortened in the Kif3b mutant to see whether the correlation between ciliary length and IFT velocities still stands. This would actually be a critical assay to assess whether the proposed correlation is only a coincidence or indeed has a certain causality.

      Thank you for your comments. The shortened cilia observed in Kif3b mutants may be attributed to the presence of maternal Kif3b proteins, making it challenging to exclude the involvement of Kif3b motor. Regarding the relationship between IFT speed variability and motor proteins, we agree with the reviewer that we cannot entirely dismiss the possibility of different motors or adaptors being involved. We have revised our description of this aspect accordingly.

      Figures 4C-G: Similarly, when the authors found that tubulin glycylation or glutamylation has little effect on IFT, they cannot use these observations to exclude possible influences of other types of tubulin modifications on IFT. They should only stick to their observations.

      Yes, we agree. We have changed the description in the revised manuscript.

      Figure 5:

      A-C: When the authors only compared immotile cilia of crista with motile cilia of the spinal cord, it is hard to say whether the difference in particle size is correlated with ciliary length or motility. Cilia from more tissues should be included to strengthen their point, especially when the authors want to make this point the central one.

      D: The authors showed that ovl larvae containing Tg(hsp70l:ift88 GFP) (as they do not indicate the genotype, this reviewer can only deduce) display normal body curvature at 2 dpf after the injection of 0.5 ng of ift88 MO. Such a result, however, is quite confusing. According to their experimental scheme in Figure 2B, these larvae were not subjected to heat shock induction for Ift88-GFP. Do ovl larvae containing Tg(hsp70l:ift88 GFP) naturally display normal body curvature at 2 dpf? 

      Thank you for your comments. Due to technical limitations, comparing IFT particle size across different cilia using STED is challenging. We agree with this reviewer that the evidence supporting this aspect is relatively weak. Accordingly, we have modified and softened our conclusion in the revised version.

      Regarding the injection of ift88 morpholino, we want to clarify that we are injecting it into wildtype embryos, not oval mutants. The lower dose of ift88 morpholino (0.5ng) partially knocked down Ift88, allowing embryos to maintain a grossly normal body axis while resulting in shorter cilia in the ear crista.

      E: The authors need to indicate the developmental stage of the larvae examined. One piece of missing data is global expression levels of both endogenous (maternal) Ift88 and exogenous

      Ift88-GFP in zebrafish larvae that are either uninjected, 8-ng-ift88 MO-injected, or 0.5-ng-ift88 MO-injected, preferably at multiple time points up to 3 dpf. The results will clarify (1) the total levels of Ift88 following time; (2) the extent of downregulation the MO injections achieved at different developmental stages; and importantly (3) whether the low MO dosage (0. 5 ng) indeed allowed a persistent downregulation to affect IFT trains at 3 dpf, a time the authors made the assays for Figures 5F-J to reach the model (K). It will be great to include wild-type larvae for comparison.

      Thank you for these valuable suggestions. The ift88 morpholino (MO) was designed to block the splicing of ift88 transcripts and has been used in multiple studies. This morpholino specifically blocks the expression of endogenous ift88, while the expression of the Ift88-GFP transgene remains unaffected. It would be beneficial to titrate the expression level of Ift88 in the morphants at different stages. Unfortunately, we do not have access to a zebrafish Ift88 antibody. We assessed the effects of a lower amount of MO based on our observation that the fish maintained a normal body axis while exhibiting shorter cilia. Ideally, the amount of Ift88 should be lower in the morphants, considering the presence of ciliogenesis defects. We have included additional comments regarding this limitation in the revised version.

      Movies:

      Movies 1-5: Elapsed time is not provided. Furthermore, cilia in the pronephric duct and spinal cord are known to beat rapidly. Their motilities, however, appear to be largely compromised in Movies 3 and 4. Although the quantification results in Fig 3G imply that the authors imaged 30hpf embryos for such cilia, there is no statement on real conditions.

      Thank you for your comments. We apologize for missing elapsed time in our movies. We have addressed this issue in the revised manuscript. Motile cilia are difficult to image due to their fast beating. To immobilize the moving cilia and enable the capture of IFT movement within the cilia, we gently press the embryo with a round cover glass to inhibit the beating of cilia. Data from each embryo were collected within 5 minutes to avoid the impact of embryo death on the results. We have added detail description in the 'Materials and Methods' section.

      Materials:

      The sequence of morpholino oligonucleotide against ift88 is missing.  

      We have added the sequence of ift88 morpholino in the revised manuscript.

      References:

      Important references are missing, including (1) the paper by Leventea et al., 2016 (PMID: 27263414), which shows cilia morphologies in various zebrafish tissues with more detailed descriptions of tissue anatomies and experimental techniques; (2) papers documenting that dynein motors "move faster than Kinesin motors" in IFT of C. reinhardtii and C. elegans cilia; and (3) the paper by Li et al., 2020 (PMID: 33112235), in which the authors constructed a hybrid IFT kinesin to markedly reduced anterograde IFT velocity (~ 2.8 fold) and IFT injection rate in C. reinhardtii cilia and found only a mild reduction (~15%) in ciliary length. This paper is important because it is a pioneer one that elegantly investigated the relationship between IFT velocity and ciliary length. The findings, however, do not necessarily contradict the current manuscript due to differences in, e.g., model organisms and methodology.

      Thank you for the detailed review, we have cited these literatures in the proper place of the revised manuscript.

      Reference

      Broekhuis JR, Verhey KJ, Jansen G (2014) Regulation of cilium length and intraflagellar transport by the RCK-kinases ICK and MOK in renal epithelial cells. PLoS One 9: e108470

      Kunova Bosakova M, Varecha M, Hampl M, Duran I, Nita A, Buchtova M, Dosedelova H, Machat R, Xie Y, Ni Z et al (2018) Regulation of ciliary function by fibroblast growth factor signaling identifies FGFR3-related disorders achondroplasia and thanatophoric dysplasia as ciliopathies. Hum Mol Genet 27: 1093-1105

      Luo W, Ruba A, Takao D, Zweifel LP, Lim RYH, Verhey KJ, Yang W (2017) Axonemal Lumen Dominates Cytosolic Protein Diffusion inside the Primary Cilium. Sci Rep 7: 15793 Ou G, Blacque OE, Snow JJ, Leroux MR, Scholey JM (2005) Functional coordination of intraflagellar transport motors. Nature 436: 583-587

      See SK, Hoogendoorn S, Chung AH, Ye F, Steinman JB, Sakata-Kato T, Miller RM, Cupido T, Zalyte R, Carter AP et al (2016) Cytoplasmic Dynein Antagonists with Improved Potency and Isoform Selectivity. ACS Chem Biol 11: 53-60

      Williams CL, McIntyre JC, Norris SR, Jenkins PM, Zhang L, Pei Q, Verhey K, Martens JR (2014) Direct evidence for BBSome-associated intraflagellar transport reveals distinct properties of native mammalian cilia. Nat Commun 5: 5813

      Yi P, Li WJ, Dong MQ, Ou G (2017) Dynein-Driven Retrograde Intraflagellar Transport Is Triphasic in C. elegans Sensory Cilia. Curr Biol 27: 1448-1461 e1447

      Zhao C, Omori Y, Brodowska K, Kovach P, Malicki J (2012) Kinesin-2 family in vertebrate ciliogenesis. Proceedings of the National Academy of Sciences 109: 2388 - 2393

      Zhou HM, Brust-Mascher I, Scholey JM (2001) Direct visualization of the movement of the monomeric axonal transport motor UNC-104 along neuronal processes in living Caenorhabditis elegans. J Neurosci 21: 3749-3755

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this study, Kim et al. investigated the mechanism by which uremic toxin indoxyl sulfate (IS) induces trained immunity, resulting in augmented pro-inflammatory cytokine production such as TNF and IL6. The authors claim that IS treatment induced epigenetic and metabolic reprogramming, and the aryl hydrocarbon receptor (AhR)-mediated arachidonic acid pathway is required for establishing trained immunity in human monocytes. They also demonstrated that uremic sera from end-stage renal disease (ESRD) patients can generate trained immunity in healthy control-derived monocytes.

      These are interesting results that introduce the important new concept of trained immunity and its importance in showing endogenous inflammatory stimuli-induced innate immune memory. Additional evidence proposing that IS plays a critical role in the initiation of inflammatory immune responses in patients with CKD is also interesting and a potential advance of the field. This study is in large part well done, but some components of the study are still incomplete and additional efforts are required to nail down the main conclusions.

      Thank you very much for your positive feedback.

      Specific comments:

      (1) Of greatest concern, there are concerns about the rigor of these experiments, whether the interpretation and conclusions are fully supported by the data. (1) Although many experiments have been sporadically conducted in many fields such as epigenetic, metabolic regulation, and AhR signaling, the causal relationship between each mechanism is not clear. (2) Throughout the manuscript, no distinction was made between the group treated with IS for 6 days and the group treated with the second LPS (addressed below). (3) Besides experiments using non-specific inhibitors, genetic experiments including siRNA or KO mice should be examined to strengthen and justify central suggestions.

      We are grateful for the invaluable constructive feedback provided. 

      (1) In response to the reviewer's feedback, we conducted additional experiments employing appropriate inhibitors to investigate the causal relationship among the AhR pathway, epigenetic modifications, and metabolic rewiring in IS-induced trained immunity. Notably, metabolic rewiring, particularly the upregulation of aerobic glycolysis via the mTORC1 signaling pathway, stands as a pivotal mechanism underlying the induction of trained immunity through the modulation of epigenetic modifications (Riksen NP et al. Figure 1). Initially, we assessed the enrichment of H3K4me3 at 6-day on promoters of TNFA and IL6 loci after treatment of zileuton, an inhibitor of ALOX5, and 2-DG, a glycolysis inhibitor. Additionally, we evaluated the alteration in the activity of S6K, a downstream molecule of mTORC1, following zileuton treatment. Our findings indicate that AhR-dependent arachidonic acid (AA) signaling induces epigenetic modifications, albeit without inducing metabolic rewiring, in IS-induced trained immunity (Author response image 1). However, IS stimulation promotes mTORC1-mediated glycolysis in an AhR-independent manner. Notably, inhibition of glycolysis with 2-DG impacts epigenetic modifications. We have updated Figure 7 of the revised manuscript to incorporate these additional experimental findings, elucidating the correlation between the diverse mechanisms implicated in IS-induced innate immune memory (Fig. 7 in the revised manuscript). These data have been integrated into the revised manuscript as Figure 3D and 5I, and supplementary Figure 5I.

      (2) We apologize for any confusion arising from the unclear description regarding the distinction between the group treated with IS for 6 days and the group subjected to secondary lipopolysaccharide (LPS) stimulation. It is imperative to clarify that induction of trained immunity necessitates 1 day of IS stimulation followed by 5 days of rest, rendering the 6th day sample representative of a trained state. Subsequent to this, a 24-hour LPS stimulation is applied, designating the 7th day sample as a secondary LPS-stimulated cell. This clarification is now explicitly indicated throughout the entirety of Figure 1A and Figure 3A in the revised manuscript.

      (3) In accordance with your feedback, we performed siRNA knockdown of AhR and ALOX5 in primary human monocytes. AhR knockdown markedly attenuated the mRNA expression of TNF-α and IL-6, which are augmented in IS-trained macrophages. Similarly, knockdown of ALOX5 using ALOX5 siRNA abrogated the increase in TNF-α and IL-6 levels upon LPS stimulation in IS-trained macrophages (Author response image 2). Our experiments utilizing AhR siRNA corroborate the involvement of AhR in the expression of AA pathway-related molecules, such as ALOX5, ALOX5AP, and LTB4R1, in IS-induced trained immunity. These data have been incorporated into the revised manuscript as Figure 4E and 5G, and supplementary Figure 5H.  

      Author response image 1.

      Epigenetic modification is regulated by arachidonic acid (AA) pathway and metabolic rewiring, but metabolic rewiring is not affected by the AA pathway. A-B. Monocytes were pre-treated with zileuton (ZLT), an inhibitor of ALOX5, or 2DG, a glycolysis inhibitor, followed by stimulation with IS for 24 hours. After a resting period of 5 days, the enrichment of H3K4me3 on the promoters of TNFA and IL6 loci was assessed. Normalization was performed using 2% input. C. Monocytes were pre-treated with zileuton (ZLT) and stimulated with IS for 24 hr. Cell lysates were immunoblotted for phosphorylated S6 Kinase, with β-actin serving as a normalization control. Band intensities in the immunoblots were quantified using densitometry. D, A schematic representation of the mechanistic framework underlying IS-trained immunity. Bar graphs show the mean ± SEM. * = p < 0.05, **= p < 0.01, and *** = p < 0.001 by two-tailed paired t-test.

      Author response image 2.

      Inhibition of IS-trained immunity by knockdown of AhR or ALOX5 in human monocytes. A-C. Human monocytes were transfected with siRNA targeting AhR (siAhR), ALOX5 (siALOX5), or negative control (siNC) for 1 day, followed by stimulation with IS for 24 hours. After a resting period of 5 days, cells were re-stimulated with LPS for 24 hours. mRNA expression levels of AhR and ALOX5 at 1 day after transfection, and TNF-α and IL-6 at 1 day after LPS treatment, were assessed using RT-qPCR. D. Human monocytes were transfected with AhR siRNA or negative control (NC) siRNA for 1 day, followed by stimulation with IS for 24 hours. After resting for 5 days, mRNA expression levels of ALOX5, ALOX5AP, and LTB4R1 were analyzed using RT-qPCR. Bar graphs show the mean ± SEM. * = p < 0.05, ** = p < 0.01, and *** = p < 0.001 by two-tailed paired t-test.  

      (2) The authors showed that IS-trained monocytes showed no change in TNF or IL-6, but increased the expression levels of TNF and IL-6 in response to the second LPS (Fig. 1B). This suggests that the different LPS responsiveness in IS-trained monocytes caused altered gene expression of TNF and IL6. However, the authors also showed that IS-trained monocytes without LPS stimulation showed increased levels of H3K4me3 at the TNF and IL-6 loci, as well as highly elevated ECAR and OCR, leading to no changes in TNF and IL-6. Therefore, it is unclear why or how the epigenetic and metabolic states of IS-trained monocytes induce different LPS responses. For example, increased H3K4me3 in HK2 and PFKP is important for metabolic rewiring, but why increased H3K4me3 in TNF and IL6 does not affect gene expression needs to be explained.

      We acknowledge the constructive critiques provided by the reviewer. While epigenetic modifications in the promoters of TNF-α, IL-6, HK2, and PFKP (Figure 3B and Supplementary Figure 3C in the revised manuscript), and metabolic rewiring (Figure 2A-D in the revised manuscript) were observed in IS-trained macrophages at 6 days prior to LPS stimulation, these macrophages do not exhibit an increase in TNF-α and IL-6 mRNA and protein levels before LPS stimulation. This lack of response is attributed to a 5-day resting period, allowing the macrophages to revert to a non-activated state, as depicted in Author response image 3 and 4. This phenomenon aligns with the concept of typical trained immunity.

      Trained immunity is characterized by the long-term functional reprogramming of innate immune cells, which is evoked by various primary insults and which leads to an altered response towards a second challenge after the return to a non-activated state. Metabolic and epigenetic reprogramming events during the primary immune response persist partially even after the initial stimulus is removed. Upon a secondary challenge, trained innate immune cells exhibit a more robust and more prompt response than the initial response (Netea MG et al. Defining trained immunity and its role in health and disease. Nat Rev Immunol. 2020 Jun;20(6):375-388).

      Numerous studies have demonstrated the observation of epigenetic modifications in the promoters of TNF-α and IL-6, and metabolic rewiring prior to LPS stimulation as a secondary challenge. However, cytokine production is contingent on LPS stimulation (Arts RJ et al. Glutaminolysis and Fumarate Accumulation Integrate Immunometabolic and Epigenetic Programs in Trained Immunity. Cell Metab. 2016 Dec 13;24(6):807-819; Arts RJW et al. Immunometabolic Pathways in BCG-Induced Trained Immunity. Cell Rep. 2016 Dec 6;17(10):2562-2571; Ochando J et al. Trained immunity - basic concepts and contributions to immunopathology. Nat Rev Nephrol. 2023 Jan;19(1):23-37). The prolonged presence of higher levels of H3K4me3 on immune gene promoters, even after returning to baseline, is associated with open chromatin and results in a more rapid and stronger response, such as cytokine production, upon a secondary insult (Netea MG et al. Defining trained immunity and its role in health and disease. Nat Rev Immunol. 2020 Jun;20(6):375-388).

      The results in Figure 1B may be interpreted as indicating different LPS responsiveness in IStrained monocytes caused altered gene expression of TNF and IL-6. However, it is plausible that trained immune cells respond more robustly even to low concentrations of LPS. In fact, the aim of this experiment was to determine the appropriate LPS concentration.

      Author response image 3.

      The changes in mRNA and protein level of TNF-α and IL-6 during induction of IS-trained immunity. Human monocytes were treated with or without IS (1 mM) for 24 hrs, succeeded by 5-day resting period to induce trained immunity. Cells were stimulated with LPS for 24 hrs. Protein and mRNA levels were assessed by ELISA and RT-qPCR, respectively. Bar graphs show the mean ± SEM. * = p < 0.05 and **= p < 0.01, by two-tailed paired t-test.

      Author response image 4.

      The changes in mRNA of HK2 and PFKP induced by IS during induction of IS-trained immunity. Human monocytes were treated with or without IS (1 mM) for 24 hrs, succeeded by 5-day resting period to induce trained immunity. mRNA levels were assessed by RT-qPCR. Bar graphs show the mean ± SEM. * = p < 0.05 by two-tailed paired ttest.

      (3) The authors used human monocytes cultured in human serum without growth factors such as MCSF for 5-6 days. When we consider the short lifespan of monocytes (1-3 days), the authors need to explain the validity of the experimental model.

      We appreciate the reviewer’s constructive critiques. As pointed out by the reviewer, human circulating CD14+ monocytes exhibit a relatively short lifespan (1-3 days) when cultured in the absence of growth factors (Patel AA et al. The fate and lifespan of human monocyte subsets in steady state and systemic inflammation. J Exp Med. 2017 Jul 3;214(7):1913-1923). In this study, purified CD14+ monocytes were subjected to adherent culture for a duration of 7 days in RPMI1640 media supplemented with 10% human AB serum, a standard in vitro culture protocol widely employed in studies focusing on trained immunity (Domínguez-Andrés J et al. In vitro induction of trained immunity in adherent human monocytes. STAR Protoc. 2021 Feb 24;2(1):100365). In response to the reviewer's suggestions, we assessed cell viability on days 0, 1, 4, and 6, utilizing the WST assay. Despite a marginal reduction in cell viability observed at day 1, attributed to detachment from the culture plate, the cultured monocytes exhibited a notable enhancement in cell viability on days 4 and 6 when compared to days 0 or 1 (Author response image 5).

      It has been demonstrated that the adhesion of human monocytes to a cell culture dish leads to their activation and induces the synthesis of substantial amounts of IL-1β mRNA as observed in monocytes adherent to extracellular matrix components such as fibronectin and collagen.

      Morphologically, human adherent monocytes cultured with 10% human serum appear to undergo partial differentiation into macrophages by day 6, potentially explaining the observed lack of decrease in monocyte viability. Notably, Safi et al. have reported that adherent monocytes cultured with 10% human serum exhibit no significant difference in cell viability over a 7-day period when compared to cultures supplemented with growth factors such as M-CSF and IL-3 (Safi W et al. Differentiation of human CD14+ monocytes: an experimental investigation of the optimal culture medium and evidence of a lack of differentiation along the endothelial line. Exp Mol Med. 2016 Apr 15;48(4):e227).

      Author response image 5.

      Viability of human monocytes during the induction of trained immunity. Purified human monocytes were seeded on plates with RPIM1640 media supplemented with 10% human AB serum. Cell viability was assessed on days 0, 1, 4, and 6 utilizing the WST assay (Left panel). Cell morphology was examined under a light-inverted microscope at the indicated times (Right panel).

      (4) The authors' ELISA results clearly showed increased levels of TNF and IL-6 proteins, but it is well established that LPS-induced gene expression of TNF and IL-6 in monocytes peaked within 1-4 hours and returned to baseline by 24 hours. Therefore, authors need to investigate gene expression at appropriate time points.

      We appreciate the valuable constructive feedback provided by the reviewer. As indicated by the reviewer, the LPS-induced gene expression of TNF-α and IL-6 in IS-trained monocytes exhibited a peak within the initial 1 to 4 hours, followed by a decrease by the 24-hour time point, as illustrated in Author response image 6. Nevertheless, the mRNA expression levels of TNFα and IL-6 were still elevated at the 24-hour mark. Furthermore, the protein levels of both TNFα and IL-6 apparently increased 24 hours after LPS stimulation. Due to technical constraints, sample collection had to be conducted at a single time point, and the 24-hour post-stimulation interval was deemed optimal for this purpose.

      Author response image 6.

      Kinetics of protein and mRNA expression of TNF-α and IL-6 after treatment of LPS as secondary insult in IS-trained monocytes. IS-trained cells were re-stimulated by LPS (10 ng/ml) for the indicated time. The supernatant and lysates were collected for ELISA assay and RT-qPCR analysis, respectively. Bar graphs show the mean ± SEM. * = p <0.05 and **= p < 0.01, by two-tailed paired t-test.

      (5) It is a highly interesting finding that IS induces trained immunity via the AhR pathway. The authors also showed that the pretreatment of FICZ, an AhR agonist, was good enough to induce trained immunity in terms of the expression of TNF and IL-6. However, from this point of view, the authors need to discuss why trained immunity was not affected by kynurenic acid (KA), which is a well-known AhR ligand accumulated in CKD and has been reported to be involved in innate immune memory mechanisms (Fig. S1A).

      We appreciate the constructive criticism provided by the reviewer, and we comprehend the raised points. In our initial experiments, we hypothesized that kynurenic acid (KA), an aryl hydrocarbon receptor (AhR) ligand, might instigate trained immunity in monocytes, despite KA not being our primary target uremic toxin. However, our findings, as depicted in Fig. S1A, demonstrated that KA did not induce trained immunity. Notably, KA-treated monocytes exhibited induction of CYP1B1, an AhR-responsive gene, and elevated levels of TNF-α and IL-6 mRNA at 24 hours post-treatment, comparable to FICZ-treated monocytes. This observation underscores KA's role as an AhR ligand in human monocytes, as emphasized by the reviewer. 

      Of particular interest, proteins associated with the arachidonic acid pathway, such as ALOX5 and ALOX5AP - integral to the mechanisms underlying IS-induced trained immunity - did not exhibit an increase at day 6 following KA treatment, in contrast to the significant elevation observed with IS and FICZ treatments (Author response image 7). The rationale behind this disparity remains unknown, necessitating further investigation to elucidate the underlying factors. These data have been incorporated into the revised manuscript as Supplementary Figure 5C.

      Author response image 7.

      Divergent impact of AhR agonists, especially IS, FICZ, and KA on the AhR-ALOX5 pathway. Purified ytes underwent treatment with IS (1 mM), FICZ (100 nM), or KA (0.5 mM) for 1 day, followed by 5-day resting period to trained immunity. Activation of AhR through ligand binding was assessed by examining the induction of CYP1B1, an AhR ene, and cytokines one day post-treatment. The expression of genes related to the arachidonic acid pathway, such as ALOX5, 5AP, and LTB4R1, was analyzed via RT-qPCR six days after inducing trained immunity. Bar graphs show the mean ± SEM. * .05, **= p < 0.01, and ***= p < 0.001 by two-tailed paired t-test.

      Indeed, it has been demonstrated that FICZ and TCDD, two high-affinity AhR ligands, exert opposite effects on T-cell differentiation, with TCDD inducing regulatory T cells and FICZ inducing Th17 cells. This dichotomy has been attributed to ligand-intrinsic differences in AhR activation (Ho PP et al. The aryl hydrocarbon receptor: a regulator of Th17 and Treg cell development in disease. Cell Res. 2008 Jun;18(6):605-8; Ehrlich AK et al. TCDD, FICZ, and Other High Affinity AhR Ligands Dose-Dependently Determine the Fate of CD4+ T Cell Differentiation. Toxicol Sci. 2018 Feb 1;161(2):310-320). These outcomes imply the involvement of an intricate interplay involving metabolic rewiring, epigenetic reprogramming, and the AhR-ALOX5 pathway in IS-induced trained immunity within monocytes.

      (6) The authors need to clarify the role of IL-10 in IS-trained monocytes. IL-10, an anti-inflammatory cytokine that can be modulated by AhR, whose expression (Fig. 1E, Fig. 4D) may explain the inflammatory cytokine expression of IS-trained monocytes.

      We appreciate the reviewer’s valuable comment, recognizing its significant importance. IL-10, characterized by potent anti-inflammatory attributes, assumes a pivotal role in constraining the host immune response against pathogens. This function serves to mitigate potential harm to the host and uphold normal tissue homeostasis. In the context of atherosclerosis (Mallat Z et al. Protective role of interleukin-10 in atherosclerosis. Circ Res. 1999 Oct 15;85(8):e17-24.) and kidney disease (Wei W et al. The role of IL-10 in kidney disease. Int Immunopharmacol. 2022 Jul;108:108917), IL-10 exerts potent deactivating effects on macrophages and T cells, influencing various cellular processes that could impact the development and stability of atherosclerotic plaques. Additionally, it is noteworthy that IL-10-deficient macrophages exhibit an augmentation in the proinflammatory cytokine TNF-α (Smallie T et al. IL-10 inhibits transcription elongation of the human TNF gene in primary macrophages. J Exp Med. 2010 Sep 27;207(10):2081-8; Couper KN et al. IL-10: the master regulator of immunity to infection. J Immunol. 2008 May 1;180(9):5771-7). As emphasized by the reviewer, the reduced gene expression of IL-10 by IS-trained monocytes may contribute to the heightened expression of proinflammatory cytokines. We have thoroughly addressed and discussed this specific point in response to the reviewer's comment (Line 394-399 of page 18 in the revised manuscript).

      (7) The authors need to show H3K4me3 levels in TNF and IL6 genes in all conditions in one figure. (Fig. 2B). Comparing Fig. 2B and Fig. S2B, H3K4me3 does not appear to be increased at all by LPS in the IL6 region. 

      We are grateful for the constructive criticism provided by the reviewer. In response to the reviewer's comment, we endeavored to conduct an experiment demonstrating H3K4me3 enrichment on the promoters of TNF-α and IL-6 across all experimental conditions. However, due to limitations in the availability of purified human monocytes, we conducted an additional three independent experiments for ChIP-qPCR across all conditions. Despite encountering a notable variability among individuals, even within the healthy donor cohort, our results demonstrated an increase in H3K4me3 enrichment on the TNF-α and IL-6 promoters in IS-trained groups, irrespective of subsequent LPS treatment (Author response image 8).

      Author response image 8.

      Analysis of H3K4me3 enrichment on the promoters of TNFA and IL6 Loci in IS-trained macrophages. ChIP-qPCR was employed to assess the enrichment of H3K4me3 on the promoters of TNFA and IL6 loci before (day 6) and after LPS stimulation (day 7) in IS-trained macrophages. The normalization control utilized 2% input. Bar graphs show the mean ± SEM. The data presented are derived from three independent experiments utilizing samples from different donors.

      (8) The authors need to address the changes of H3K4me3 in the presence of MTA.

      We appreciate the constructive criticism provided by the reviewer. In response to the reviewer's feedback, we conducted an analysis of the changes in H3K4me3 in the presence of MTA, a general methyltransferase inhibitor, using identical conditions as depicted in Figure 2C of the original manuscript. Our findings revealed that MTA exerted inhibitory effects on the levels of H3K4me3, as isolated through the acid histone extraction method, which were otherwise increased by IS-training, as illustrated in Author response image 9. 

      Author response image 9.

      The reduction of H3K4me3 by MTA treatment in IS-trained macrophages. IS-trained cells were restimulated by LPS (10 ng/ml) as a secondary challenge for 24 hrs, followed by isolation of histone and WB analysis for H3K4me3, Histone 3 (H3), and β-actin. The blot data from two independent experiments with different donors were shown.

      (9) Interpretation of ChIP-seq results is not entirely convincing due to doubts about the quality of sequencing results. First, authors need to provide information on the quality of ChIP-seq data in reliable criteria such as Encode Pipeline. It should also provide representative tracks of H3K4me3 in the TNF and IL-6 genes (Fig. 2F). And in Fig. 2F, the author showed the H3K4me3 track of replicates, but the results between replicates were very different, so there are concerns about reproducibility. Finally, the authors need to show the correlation between ChIP-seq (Fig. 2) and RNA-seq (Fig. 5).

      We appreciate the constructive criticism provided by the reviewer. 

      As indicated by the reviewer, for evaluation of sample read quality, analysis was performed using the histone ChIP-seq standard from the ENCODE project, focusing on metrics such as read depth, PCR bottleneck coefficient (PBC)1, PBC2, and non-redundant fraction (NRF). Five of the total samples were displayed moderate bottleneck levels (0.5 ≤ PBC1 < 0.8, 1 ≤ PBC2 < 3) with acceptable (0.5 ≤ NRF < 0.8) complexity. One sample showed mild bottlenecks (0.8 ≤ PBC1 < 0.9, 3 ≤ PBC2 < 10) with compliance (0.8 ≤ NRF < 0.9) complexity. This quality metrics indicated ChIP-seq data quality meets at least the standards required for downstream analysis according to ENCODE project criteria (Author response image 10A).

      To examine the differences in H3K4me3 enrichment patterns between two groups, we normalized the read counts around the TSS ±2 kb of human genes to CPM. Sequentially, we compared the average values of IS-treated macrophage compare to control and displayed in waterfall plots. In addition, we marked genes of interest in red including the phenotypes of IStrained macrophages (TNF and IL6), the activation of the innate immune responses (XRCC5, IFI16, PQBP1), and the regulation of ornithine decarboxylase (OAZ3, PSMA3, PSMA1) (Author response image 10B and C). Also, H3K4me3 peak tracks of TNF and IL6 loci and H3K4me3 enrichment pattern were added in supplementary Figure 3D and 3F in the revised manuscript.

      Next, to evaluate the consistency among replicates within a group, we analyzed enrichment values, expressed as Counts per Million (CPM) using edgeR R-package, by applying Spearman's correlation coefficients. we analyzed two sets included total 7,136 H3K4me3 peak sets, as described in Figure 3E in the revised manuscript and 2 kbp around transcription start sites (TSS) from hg19 human genomes. The resulting Spearman's correlation coefficients and associated P-values demonstrated a concordance between replicates, confirming reproducibility and consistent performance (Author response image 10D). 

      Finally, the correlation between gene expression and H3K4me3 enrichment around transcription start sites (TSS) has been reported in previous research (Reshetnikov VV et al. Data of correlation analysis between the density of H3K4me3 in promoters of genes and gene expression: Data from RNA-seq and ChIP-seq analyses of the murine prefrontal cortex. Data Brief. 2020 Oct 2;33:106365). To verify this association in our study, we applied Spearman's correlation for comparative analysis and conducted linear regression to determine if a consistent global trend in RNA expression existed. In our analysis, count values from regions extending 2 kbp around the TSSs in H3K4me3 ChIP-seq data were converted to Counts per Million (CPM) using edgeR R-package. These were then contrasted with the Transcripts Per Million (TPM) values of genes. Our results revealed a significant positive correlation, reinforcing the consistent relationship between H3K4me3 enrichment and gene expression (Author response image 10E and Supplementary Fig. 6D in revised manuscripts).

      Author response image 10.

      The information on quality of ChIP-seq data and correlation between ChIP-seq and RNA-seq. A, information on quality of ChIP-seq data. B, H3K4me3 peak of promoter region on TNFA and IL6. C, The differences in H3K4me3 enrichment patterns between control group and IS-training group. D, The consistency among replicates within a group. E, Correlation between ChIP-seq and RNA-seq in IS-induced trained immunity.

      (10) AhR changes in the cell nucleus should be provided (Fig. 4A).

      We appreciate the constructive feedback from the reviewer. In response to the reviewer's suggestions, we investigated the nuclear translocation of AhR on 6 days after the induction of ISmediated trained immunity, as illustrated in Author response image 11. For this purpose, the lysate from IS-trained monocytes was fractionated into the nucleus and cytosol, and AhR protein was subsequently immunoblotted. The results depicted in Figure X demonstrate that IS-trained monocytes exhibited a higher level of AhR protein in the nucleus compared to non-trained monocytes. Notably, the nuclear translocation of AhR was significantly attenuated in IS-trained monocytes treated with GNF351. These findings imply that the activation of AhR, facilitated by the binding of IS, persisted partially up to 6 days, indicating that IS-mediated degradation of AhR was not fully recovered even on day 6 after the induction of IS training. Consequently, we have replaced Figure 4A in the revised manuscript.

      Author response image 11.

      The activation of AhR, facilitated by IS binding, is persisted partially up to 6 days during induction of trained immunity. The lysate of IS-trained cells treated with or without GNF351, were separated into nuclear and cytosol fraction, followed by WB analysis for AhR protein (Left panel). Band intensity in immunoblots was quantified by densitometry (Right panel). β-actin was used as a normalization control. Bar graphs show the mean ± SEM. * = p < 0.05, by two-tailed paired t-test.

      (11) Do other protein-bound uremic toxins (PBUTs), such as PCS, HA, IAA, and KA, change the mRNA expression of ALOX5, ALOX5AP, and LTB4R1? In the absence of genetic studies, it is difficult to be certain of the ALOX5-related mechanism claimed by the authors.

      We are grateful for the constructive criticism provided by the reviewer. In response to the reviewer's comment, we investigated whether uremic toxins, specifically PBUTs such as PCS, HA, IAA, and KA, induce changes in the mRNA expression of ALOX5, ALOX5AP, and LTB4R1 in trained monocytes. Intriguingly, the examination revealed no discernible induction in the mRNA expression of these genes by PBUTs, with the exception of IS, as depicted in Author response image 12 of the letter. These findings once again underscore the implication of the AhR-ALOX5 pathway in the induction of trained immunity in monocytes by IS.

      Author response image 12.

      No obvious impact of PBUTs except IS on the expression of arachidonic acid pathway-related genes on 6 days after treatment with PBUTs. Purified monocytes were treated with several PBUTs including IS, PCS, HA, IAA, and KA for 24 hrs., following by 5-day resting period to induce trained immunity. The mRNA expression of ALOX5, ALOX5AP, and LTB4R1 were quantified using RT-qPCR. Bar graphs show the mean ± SEM. * = p < 0.05, by two-tailed paired t-test.

      (12) Fig.6 is based on the correlated expression of inflammatory genes or AA pathway genes. It does not clarify any mechanisms the authors claimed in the previous figures. 

      We express our sincere appreciation for the constructive criticism provided by the reviewer, and we have taken careful note of the points raised. In response to the reviewer's feedback, we adopted two distinct approaches utilizing samples obtained from ESRD patients and IS-trained mice. Initially, we investigated the correlation between ALOX5 protein expression in monocytes and IS concentration in the plasma of ESRD patients presented in Figure 6E of the original manuscript. Despite the limited number of samples, our analysis revealed a nonsignificant correlation between IS concentration and ALOX5 expression; however, it demonstrated a positive trend (Author response image 13A). Subsequently, we examined the potential inhibitory effects of zileuton, an ALOX5 inhibitor, on the production of TNF-α and IL-6 in LPSstimulated splenic myeloid cells derived from IS-trained mice. Our findings indicate that zileuton significantly inhibits the production of TNF-α and IL-6 induced by LPS in splenic myeloid cells from IS-trained mice (Author response image 13B). These data were added in Figure 6N of the revised manuscript (Line 350-354 of page 16 in the revised manuscript).

      Author response image 13.

      Assessment of the correlation between ALOX5 and the concentration of IS in ESRD patients, and investigation of ALOX5 effects in mouse splenic myeloid cells in IS-trained mice. A. Examination of the correlation between ALOX5 protein expression in monocytes and IS concentration in the plasma of ESRD patients. B. C57BL/6 mice were administered daily injections of 200 mg/kg IS for 5 days, followed by a resting period of another 5 days. Subsequently, IS-trained mice were sacrificed, and spleens were mechanically dissociated. Isolated splenic myeloid cells were subjected to ex vivo treatment with LPS (10 ng/ml), along with zileuton (100 µM). The levels of TNF-α and IL-6 in the supernatants were quantified using ELISA. The graphs show the mean ± SEM. * = p < 0.05, by two-tailed paired t-test between zileuton treatment group and no-treatment group.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Minor corrections to the figures

      (1) No indicators for the control group in Fig. 1B.

      We thank you for the reviewer’s comment. According to the reviewer’s comment, the control group was indicated with (-).

      (2) The same paper is listed twice in the references section. (No. 19 and 28)

      We thank you for the reviewer’s comment. We deleted the reference No. 28.

      Reviewer #2 (Public Review):

      Manuscript entitled "Uremic toxin indoxyl sulfate (IS) induces trained immunity via the AhR-dependent arachidonic acid pathway in ESRD" presented some interesting findings. The manuscript strengths included use of H3K4me3-CHIP-Seq, AhR antagonist, IS treated cell RNA-Seq, ALOX5 inhibitor, MTA inhibitor to determine the roles of IS-AhR in trained immunity related to ESRD inflammation and trained immunity.

      Thank you very much for your positive feedback.

      Reviewer #2 (Recommendations For The Authors):

      However, the manuscript needs to be improved by fixing the following concerns.

      There are concerns:

      (1) The experiments in Figs. 1G, 1H and 1I need to have AhR siRNA, and siRNA control to demonstrate that the results in uremic toxins-containing serum-treated experiments were related to IS;

      We extend our gratitude to the reviewer for their invaluable comment, acknowledging its significant relevance to our study. In accordance with the reviewer's suggestion, we endeavored to conduct additional experiments utilizing AhR siRNA to elucidate the direct impact of IS present in the serum of end-stage renal disease (ESRD) patients on the induction of IS-mediated trained immunity. 

      Regrettably, owing to limitations in the availability of monocytes post-siRNA transfection, we were unable to establish a direct relationship between the observed outcomes in experiments utilizing uremic toxins-containing serum and IS in AhR siRNA knockdown monocytes. However, treatment with GNF351, an AhR antagonist, resulted in the inhibition of TNF-α production in trained monocytes exposed to uremic toxins-containing serum (Author response image 14).

      In our previous studies, we have already reported that uremic serum-induced TNF-α production in human monocytes is dependent on the AhR pathway, using GNF351 (Kim HY et al. Indoxyl sulfate (IS)-mediated immune dysfunction provokes endothelial damage in patients with end-stage renal disease (ESRD). Sci Rep. 2017 Jun 8;7(1):3057). Additionally, we have provided evidence demonstrating an augmentation in the activity of the AhR pathway within monocytes derived from ESRD patients, indicative of a significant reduction in AhR protein levels (Kim HY et al. Indoxyl sulfate-induced TNF-α is regulated by crosstalk between the aryl hydrocarbon receptor, NF-κB, and SOCS2 in human macrophages. FASEB J. 2019 Oct;33(10):10844-10858). It is noteworthy that other major protein-bound uremic toxins (PBUTs), such as PCS, HA, IAA, and KA, failed to induce trained immunity in human monocytes (Supplementary Figure 1A in the revised manuscript). Nevertheless, knockdown of AhR via siRNA effectively impeded the induction of IS-mediated trained immunity in human monocytes (Figure 4E in the revised manuscript). 

      Taken collectively, our findings suggest a critical role for IS present in the serum of ESRD patients in the induction of trained immunity in human monocytes. 

      Author response image 14.

      Inhibition of uremic serum (US)-induced trained immunity by AhR antagonist, GNF351. Monocytes were pre-treated with or without GNF351 (AhR antagonist; 10 µM) for 1 hour, followed by treatment with pooled normal serum (NS) or uremic serum (US) at a concentration of 30% (v/v) for 24 hours. After a resting period of 5 days, cells were stimulated with LPS for 24 hours. The production of TNF-α and IL-6 in the supernatants was quantified using ELISA. The data presented are derived from three independent experiments utilizing samples from different donors.

      (2) Fig. 3 needs to be moved as Fig. 2

      We express appreciation for the constructive suggestion provided by the reviewer. In response to the reviewer's comment, the sequence of Figure 3 and Figure 2 was adjusted in the revised manuscript.

      (3, 4) The connection between bioenergetic metabolism pathways and H3K4me3 was missing; The connection between bioenergetic metabolism pathways and ALOX5 was missing;

      We appreciate the reviewer’s constructive criticism and fully understood the reviewer's points. In response to the reviewer's feedback, we conducted additional experiments employing appropriate inhibitors to elucidate the interrelation between bioenergetic metabolism and H3K4me3 and between bioenergetic metabolism and ALOX5. Initially, we assessed the enrichment of H3K4me3 at 6-day on promoters of TNFA and IL6 loci after treatment of 2-DG, a glycolysis inhibitor. Additionally, we evaluated the alteration in the activity of S6K, a downstream molecule of mTORC1, following treatment with zileuton, an inhibitor of ALOX5. Our findings indicate that AhR-dependent arachidonic acid (AA) signaling induces epigenetic modifications, albeit without inducing metabolic rewiring, in IS-induced trained immunity (Author response image 15). However, IS stimulation promotes mTORC1-mediated glycolysis in an AhR-independent manner. Notably, inhibition of glycolysis with 2-DG impacts epigenetic modifications. We have updated Figure 7 of the revised manuscript to incorporate these additional experimental findings, elucidating the correlation between the diverse mechanisms implicated in IS-induced innate immune memory (Fig. 7 in the revised manuscript).

      Author response image 15.

      Epigenetic modification is regulated by arachidonic acid (AA) pathway and metabolic rewiring, but metabolic rewiring is not affected by the AA pathway. A-B. Monocytes were pre-treated with zileuton (ZLT), an inhibitor of ALOX5, or 2DG, a glycolysis inhibitor, followed by stimulation with IS for 24 hours. After a resting period of 5 days, the enrichment of H3K4me3 on the promoters of TNFA and IL6 loci was assessed. Normalization was performed using 2% input. C. Monocytes were pre-treated with ziluton (ZLT) and stimulated with IS for 24 hr. Cell lysates were immunoblotted for phosphorylated S6 Kinase, with β-actin serving as a normalization control. Band intensities in the immunoblots were quantified using densitometry. D, A schematic representation of the mechanistic framework underlying IS-trained immunity. Bar graphs show the mean ± SEM. * = p < 0.05, **= p < 0.01, and *** = p < 0.001 by two-tailed paired t-test.

      (5) It was unclear whether histone acetylations such as H3K27acetylation and H3K14 acetylation are involved in IS-induced epigenetic reprogramming or IS-induced trained immunity is highly histone methylation-specific.

      We appreciate the constructive comment provided by the reviewer. As highlighted by the reviewer, alterations in epigenetic histone markers, specifically H3K4me3 or H3K27ac, have been recognized as the underlying molecular mechanism in trained immunity. Due to limitations in the availability of trained cells, this study primarily focused on histone methylation. In response to the reviewer's inquiry, we briefly investigated the impact of histone acetylation using C646, a histone acetyltransferase inhibitor, on IS-induced trained immunity (Author response image 16). Our experiments revealed that C646 treatment effectively hinders the production of TNF-α and IL-6 by IS-trained monocytes in response to LPS stimulation, comparable to the effects observed with MTA (5’methylthioadenosine), a non-selective methyltransferase inhibitor. This suggests that histone acetylation also contributes to the epigenetic modifications associated with IS-induced trained immunity. We sincerely appreciate the valuable input from the reviewer.

      Author response image 16.

      The role of histone acetylation in epigenetic modifications in IS-induced trained immunity. Monocytes were pretreated with MTA (methylthioadenosine, methyltransferase inhibitor) or C646 (histone acetyltransferase p300 inhibitor), followed treatment with IS 1 mM for 24 hrs. After resting for 5 days, trained cells were re-stimulated by LPS 10 ng/ml as secondary insult. TNF-α and IL-6 in supernatants were quantified by ELISA. Bar graphs show the mean ± SEM. * = p < 0.05 and **= p < 0.01 by two-tailed paired t-test.

      Reviewer #3 (Public Review):

      The manuscript entitled, "Uremic toxin indoxyl sulfate induces trained immunity via the AhRdependent arachidonic acid pathway in ESRD" demonstrates that indoxyl sulfate (IS) induces trained immunity in monocytes via epigenetic and metabolic reprogramming, resulting in augmented cytokine production. The authors conducted well-designed experiments to show that the aryl hydrocarbon receptor (AhR) contributes to IS-trained immunity by enhancing the expression of arachidonic acid (AA) metabolism-related genes such as arachidonate 5-lipoxygenase (ALOX5) and ALOX5 activating protein (ALOX5AP). Overall, this is a very interesting study that highlights that IS mediated trained immunity may have deleterious outcomes in augmented immune responses to the secondary insult in ESRD. Key findings would help to understand accelerated inflammation in CKD or RSRD.

      We greatly appreciate your positive feedback.

      Reviewer #3 (Recommendations for The Authors):

      This reviewer, however, has the following concerns.

      Major comments:

      (1) Figure 1B: IS is known to induce the expression of TNF-a and IL-6. This reviewer wonders why these molecules were not detected in the IS (+) LPS (-) condition.

      We appreciate the constructive comment provided by the reviewer. In our prior investigation, it was observed that the expression of TNF-α and IL-6 was induced 24 hours after IS treatment in human monocytes and macrophages (Couper KN et al. IL-10: the master regulator of immunity to infection. J Immunol. 2008 May 1;180(9):5771-7). In adherence to the trained immunity protocol, the medium was replaced at the 24 hours post-IS treatment to eliminate IS, with a subsequent change after a 5-day resting period. Probably, TNF-α and IL-6 are accumulated and detected in the IS (+) LPS (-) culture supernatant if the media was not changed at these specific time points. Our primary objective, however, was to ascertain the role of IS in the induction of trained immunity, prompting an investigation into whether IS contributes to an increase in the production of TNF-α and IL-6 in response to LPS stimulation as a secondary insult. 

      (2) 1' stimulus is IS followed by 2' stimulus LPS/Pam3. It would be interesting to know what the immune profile is when other uremic toxin is used for secondary insult, this would be more relevant in clinical context of ESRD.

      The reviewer's insightful comment is greatly appreciated. To address their feedback, IStrained macrophages were subjected to additional stimulation using protein-bound uremic toxins (PBUTs) as a secondary challenge. As illustrated in Letter figure 17, the examined uremic toxins, namely p-cresyl sulfate (PCS), Hippuric acid (HA), Indole 3-acetic acid (IAA), and kynurenic acid (KA), failed to elicit the production of proinflammatory cytokines, specifically TNF-α and IL-6, by IS-trained monocytes.

      Author response image 17.

      No obvious effect of protein-bound uremic toxin (PBUTs) as secondary insults on the production of proinflammatory cytokines in IS-trained monocytes. IS-trained monocytes were re-stimulated with several PBUTs, such as IS (1 mM), PCS (1 mM), HA (2 mM), IAA. (0.5 mM), and KA (0.5 mM) as a secondary challenge for 24 hrs. TNF-α and IL-6 in supernatants were quantified by ELISA. The data from two independent experiments with different donors were shown. ND indicates ‘not detected’.

      (3) The authors need to explain a rationale why RNA and protein data used different markers.

      We appreciate the constructive input provided by the reviewer. Given that TNF-α and IL6 represent prototypical cytokines synthesized by trained monocytes in humans, we conducted a comprehensive analysis of their mRNA and protein levels. In human macrophages, the release of active IL-1β necessitates a second priming event, such as the presence of ATP. Consequently, we posited that assessing the mRNA levels of IL-1β would suffice to demonstrate the induction of trained immunity in our experimental protocol. Nevertheless, in response to the reviewer's comment, we proceeded to assess the protein levels of IL-1β, IL-10, and MCP-1 as illustrated in Author response image 189. These data have been incorporated into the revised manuscript as supplementary Figure 1E. 

      Author response image 18.

      Modulation of cytokine levels in IS-trained macrophages in response to secondary stimulation with LPS. Human monocytes were stimulated with the IS for 24 hr, followed by resting period for 5 days. On day 6, the cells were re-stimulated with LPS for 24 hr. The levels of each cytokine in the supernatants were quantified using ELISA. Bar graphs show the mean ± SEM. ** = p < 0.01 and ***= p < 0.001 by two-tailed paired t-test.

      (4) Epigenetic modification primarily involves histone modification and DNA methylation. The authors presented convincing data on histone modification (Figure 2), but did not provide any insights in the promoter DNA methylation status.

      We express our gratitude to the reviewer for providing valuable comments, which highlight a crucial aspect of our study. Despite the well-established primary role of DNA methylation in epigenetic modifications, recent suggestions propose that histone modifications, particularly H3K4me3 or H3K27ac, play a predominant role in the induction of trained immunity. In this context, our primary inquiry was focused on determining whether IS, as an endogenous insult, induces trained immunity in monocytes, and if so, whether IS-trained immunity is mediated through metabolic and epigenetic modifications - recognized as the major mechanisms underlying the generation of trained immunity. It is imperative to note that our study's primary objective did not encompass the identification of various epigenetic changes. In response to the reviewer's inquiry, we conducted a brief examination of the impact of DNA methylation using ZdCyd (5-aza-2’-deoxycytidine), a DNA methylation inhibitor, on IS-induced trained immunity. Our experimental findings indicate that ZdCyd treatment exerts no discernible effect on the production of TNF-α and IL-6 by IS-trained monocytes upon stimulation with LPS, as illustrated in Author response image 19. However, a recent study has shed light on the role of DNA methylation in BCG vaccine-induced trained immunity in human monocytes (Bannister S et al. Neonatal BCG vaccination is associated with a long-term DNA methylation signature in circulating monocytes. Sci Adv. 2022 Aug 5;8(31):eabn4002). Consequently, further investigations utilizing DNA methylation sequencing are warranted to elucidate whether DNA methylation is implicated in the induction of IS-trained immunity.

      Author response image 19.

      The effect of DNA methylation on IS-induced trained immunity. Monocytes were pretreated with ZdCyd (5-aza-2’-deoxycytidine, DNA methylation inhibitor), followed by treatment with IS 1 mM for 24 hrs. After resting for 5 days, cells were re-stimulated by LPS 10 ng/ml as secondary insult. TNF-α and IL-6 in supernatants were quantified by ELISA. Bar graphs show the mean ± SEM. * = p < 0.05 and **= p < 0.01 by two-tailed paired t-test.

                     

      (5) Metabolic rewiring in trained immunity cells undergo metabolic changes which involved intertwined pathways of glucose and cholesterol metabolism. The authors presented nice data on glucose pathway (Figure 3) but failed to show any changes related to cholesterol metabolism.

      We express our gratitude to the reviewer for providing valuable comments, which underscore a noteworthy observation. In the current investigation, our primary emphasis has been on glycolytic reprogramming, recognized as a principal mechanism for inducing trained immunity in monocytes. This focus stems from preliminary experiments wherein Fluvastatin, a cholesterol synthesis inhibitor, demonstrated no discernible impact on TNF-α production by IS-trained monocytes, as illustrated in Author response image 20. Intriguingly, Fluvastatin treatment exhibited a partial inhibitory effect on the production of IL-6 by IS-trained monocytes. Subsequent investigations are imperative to elucidate the role of cholesterol metabolism in the induction of IS-trained immunity.

      Author response image 20.

      The effect of cholesterol metabolism on IS-induced trained immunity. Monocytes were pretreated with Fluvastatin (cholesterol synthesis inhibitor, HMG-CoA reductase inhibitor), followed treatment with IS 1 mM for 24 hrs. After resting for 5 days, cells were re-stimulated by LPS 10 ng/ml as secondary insult. TNF-α and IL-6 in supernatants were quantified by ELISA. Bar graphs show the mean ± SEM. * = p < 0.05 and **= p < 0.01 by two-tailed paired t-test.

      (6) Trained immunity involves neutrophils in addition to monocyte/macrophages. It is evident from the RNAseq data that neutrophil degranulation (Figure 5B) is the top enriched pathway. This reviewer wonders why the authors did not perform any assays on neutrophils.

      We appreciate the reviewer for valuable comment. IS represents a major uremic toxin that accumulates in the serum of patients with chronic kidney disease (CKD), correlating with CKD progression and the onset of CKD-related complications, including cardiovascular diseases (CVD). Our prior investigations have demonstrated that IS promotes the production of TNF-α and IL-1β by human monocytes and macrophages. Additionally, macrophages pre-treated with IS exhibit a significant augmentation in TNF-α production when exposed to a low dose of lipopolysaccharide (LPS). Considering the pivotal role of proinflammatory macrophages and TNF-α, a principal cardiotoxic cytokine, in CVD pathogenesis, our focus in this study has primarily focused on elucidating the trained immunity of monocytes/macrophages. Consequently, all experiments were meticulously conducted using highly purified monocytes and monocytederived macrophages derived from both healthy controls and end-stage renal disease (ESRD) patients. The reviewer's observation regarding the potential involvement of neutrophils in trained immunity has been duly noted. Subsequent investigations will be imperative to explore the conceivable role of IS-trained neutrophils in the pathogenesis of CVD. Once again, we appreciate the reviewer for their valuable comment.

      (7) Figure 5C (GSEA plots): This reviewer is not sure if one can present the plots assigned with groups (eg. IS(T) vs Control). More details are required in the Methods related to this.

      We apologize for any ambiguity resulting from the previously unclear description of methods concerning Gene Set Enrichment Analysis (GSEA) plots. To provide clarification, additional details pertaining to this aspect have been explained upon in the revised manuscript's Methods section. 

      (8) In vivo data (Figure 6 I-M): Instead of serum profile and whole set of spleen myeloid cells, it would be interesting to see changes of markers on peritoneal macrophages or bone marrow-derived macrophages since the in vitro findings are on monocyte-derived macrophages.

      We appreciate comment and the insightful suggestion provided by the reviewer. In response to the reviewer's feedback, we conducted additional in vivo experiments to examine the production of TNF-α and IL-6 in bone marrow-derived macrophages (BMDMs) derived from IStrained mice. Upon LPS stimulation, we observed an increase in the production of TNF-α and IL-6 in spleen myeloid cells from IS-trained mice. However, no such increase in these cytokines was noted in BMDMs derived from the same mice (Author response image 22, A and B). In fact, we already observed that that the expression of ALOX5 was not elevated in BM cells derived from IS-trained mice presented in Figure 6L and M of the original manuscript (Author response image 22C). 

      Recent studies have indicated that trained immunity can be induced in circulating immune cells, such as monocytes or resident macrophages (peripheral trained immunity), as well as in hematopoietic stem and progenitor cells (HSPCs) within the bone marrow (central trained immunity) (Kaufmann E et al. BCG Educates Hematopoietic Stem Cells to Generate Protective Innate Immunity against Tuberculosis. Cell. 2018 Jan 11;172(1-2):176-190.e19; Riksen NP et al. Trained immunity in atherosclerotic cardiovascular disease. Nat Rev Cardiol. 2023 Dec;20(12):799-811). It is plausible that central trained immunity in BM progenitor cells may not be elicited in our mouse model, which is relatively acute in nature. Further investigations are warranted to explore the role of IS in inducing central trained immunity, utilizing appropriate chronic disease models.

      We have included this additional data as supplementary figures in the revised manuscript (Suppl. Fig. 7, D and E, and line 355-362 of page 16 in the revised manuscript).

      Author response image 21.

      Absence of trained immunity in bone marrow derived macrophages (BMDMs) derived from IStrained mice. A-B, IS was intraperitoneally injected daily for 5 days, followed by training for another 5 days. Isolated BM progenitor cells and spleen myeloid cells were differentiated or treated with LPS for 24 hr. The supernatants were collected for ELISA. C, The level of ALOX5 protein in BM cells isolated from IS-trained or control mice was analyzed by western blot. The graph illustrates the band intensity quantified by densitometry. Bar graphs show the mean ± SEM. * = p < 0.05 and **= p < 0.01, by unpaired t-test.

      (9) Figure 7: There are no data on signaling pathway(s) that links IS and epigenetic changes, the authors therefore may want to add "?" to the proposed mechanism.

      We extend our sincere appreciation to the reviewer for providing valuable feedback. In light of the constructive comments provided by three reviewers, we have undertaken a series of additional experiments. These efforts have enabled us to propose a more elucidating schematic representation of the proposed mechanism, free of any ambiguous elements (Figure 7 in the revised manuscript). We are grateful for your insightful input.

      (10) Demographic data (Table S2): ESRD patients have co-morbidities including diabetes (33% of subjects), CAD (28%). How did the authors factor out the co-morbidities in the overall context of their findings?

      We express gratitude to the reviewer for providing valuable comments, particularly on a noteworthy and significant aspect. The investigation employed an End-Stage Renal Disease (ESRD) Cohort involving approximately 60 subjects undergoing maintenance hemodialysis at Severance Hospital in Seoul, Korea. The subset of participants subjected to analysis consisted of stable individuals who provided informed consent and had not undergone hospitalization for reasons related to infection or acute events within the preceding three months.

      (11) There are no data on the purity of IS.

      According to the reviewer's suggestion, we have included information regarding the purity (99%) of IS in the Methods section.

      (12) Figure 6L: Immunoblot on b-actin were merged. This reviewer wonders how the authors analyzed these blots. 

      We express gratitude for the constructive criticism provided by the reviewer, and we acknowledge and comprehend the concerns raised. In response to the reviewer's comments, a reanalysis of the ALOX5 expression level in Figure 6M was conducted, employing immunoblot analysis on β-actin, as depicted in Figure 6L, with a short exposure time (Author response image 22).

      Author response image 22.

      ALOX5 protein exhibited an elevation in splenic myeloid cells obtained from IS-trained mice.

      (13) qPCR data throughout the manuscript have control group with no error bar. The authors may not set all controls arbitrarily equal to 1 (Example Figure 1H and I). Data should be normalized in a test standard way. The average of a single datapoint may be scaled to 1, but variation must remain within the control groups.

      We express gratitude to the reviewer for their valuable feedback, acknowledging a comprehensive understanding of their perspectives. Our qPCR assays predominantly investigated the impact of various treatments on the expression of specific target genes (e.g., TNF-α, IL-6, Alox5) within monocytes/macrophages obtained from the same donors.

      Subsequently, normalization of gene expression levels occurred relative to ACTINB expression, followed by relative fold-increase determination using the comparative CT method (ΔΔCT).

      Statistical significance was assessed through a two-tailed paired analysis in these instances. Additionally, a substantial portion of the qPCR data was validated at the protein level through ELISA and immunoblotting techniques.

      Minor Comments:

      (1) Molecular weight markers are missing in immunoblots throughout the manuscript.

      According to the reviewer's comment, molecular weight markers are added into immunoblots

      (2)  ESRD should be spelled out in the title.

      According to the reviewer's comment, we spelled out ESRD in the title.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1:

      Summary:

      The experiment is interesting and well executed and describes in high detail fish behaviour in thermally stratified waters. The evidence is strong but the experimental design cannot distinguish between temperature and vertical position of the treatments.

      Strengths:

      High statistical power, solid quantification of behaviour.

      Weaknesses:

      A major issue with the experimental design is the vertical component of the experiment. Many thermal preference and avoidance experiments are run using horizontal division in shuttlebox systems or in annular choice flumes. These remove the vertical stratification component so that hot and cold can be compared equally, without the vertical layering as a confounding factor. The method chosen, with its vertical stratification, is inherently unable to control for this effect because warm water is always above, and cold water is always below. This complicates the interpretations and makes firm conclusions about thermal behaviour difficult.

      We highly appreciate this evaluation and have addressed the reviewer’s specific comments below.

      The sentence "Further, the metabolic performance (and thus functions including growth, reproduction, and locomotion) of ectotherms takes the form of a bell-shaped curve as a function of temperature6, peaking within a range of optimal temperatures (the 'preferendum') and going to zero at lower and upper temperature limits7." contains several over-simplifications and misconceptions:

      (1) Thermal performance curves are never bell-shaped.

      (2) The optimum for various traits often shows different TPCs.

      (3) The preferendum rarely lines up with the thermal optimum for various trait TPCs.

      (4) Performance for various traits rarely reaches zero at upper or lower limits, instead they can reach zero at less extreme temperatures (e.g. growth) or maintain high function all the way up to and sometimes beyond thermal limits (e.g. aerobic scope, heart rate).

      We highly appreciate this input. We have replaced that sentence with: L69-71: “Because temperature influences the rates of most physiological processes, rapid warming or cooling can affect fish performance traits, including metabolic rates, swimming ability, and thermal tolerance (Jutfelt et al. 2024).”

      The use of adaptation instead of acclimation is confusing. Adaptation should be reserved for evolutionary change. This is an issue in several parts of the manuscript.

      Thanks for this input, we have replaced the word adapt with acclimate in two instances: L79 and L398.

      It is not true that "very few quantitative studies of thermotaxis have been conducted in fish". There exists an extensive literature on thermal preference and avoidance in fish that the manuscript downplays.

      Thanks a lot for this input. We understand that thermal preference is ultimately driven by mechanistic responses to thermal gradients, and that thermotaxis and thermokinesis are the two mechanisms used by fish to navigate heterothermal environments. Our study and analysis are focused on understanding these mechanisms in vertically stratified conditions, not to understand thermal preferences per se. We have modified our text to clarify this aspect. Our literature review was focused on the behavioral mechanisms and our understanding is that the establishment of thermal preferences has a different goal compared to understanding how fish respond to rapid changes in water temperature. We have deleted that sentence and replaced it by (L107-110): “While the thermal preference of fish is a well-established field of research, very few quantitative studies of the behavioral mechanisms allowing fish to seek their preferendum (i.e. thermotaxis) have been conducted in fish.”

      (Methods) It is unclear why the blue dye was used in all experiments. The fish can see the differently coloured water layer and that may have affected their choices. Five control trials without dye were run but finding no difference there could also be due to low statistical power.

      We appreciate this comment. The blue dye was used to visualize the precise location of the thermal interface and was therefore necessary in all experiments (see Methods section ‘Visualization and evolution of the thermal interface’). We acknowledge that fish can perceive the colored water layer, but since the dye concentration and resulting color intensity were consistent across all treatments, we do not see how it could have acted as a confounding variable. While we recognize the possibility of some behavioral influence from the dye, the clear behavioral differences across treatments indicate that it was not a determining factor. To emphasize this we have added the following to the manuscript (L701-703): “Furthermore, because the dye concentration and resulting color intensity were consistent across all treatments, the dye did not act as a confounding variable in our statistical comparisons.”

      Regarding statistical power, our control experiment without dye (N = 16 fish, 4 replicates; see Fig. S34 and S35) provides sufficient statistical power to assess whether the dye influenced behavior. The reviewer indicated that the high statistical power was a strength of the paper, which aligns with our view that our study design enables robust statistical comparisons. It seems contradictory that statistical power is a concern for the control trials, given that our main experiments were conducted with a similar sample size. Indeed, the number of replicates used is consistent with similar studies and balances statistical rigor with the ethical goal of reducing the number of animals used in experimentation. To emphasize this, we have added the following to the manuscript (L865-868): “The number of replicates used in this study reflects a balance between statistical rigor and the ethical imperative to minimize the use of animals in experimentation. Regarding statistical power, our design (five replicates with groups of four fish each) is consistent with similar studies and represents an adequate sample size.”

      A major issue with the experimental design is the vertical component of the experiment. Many thermal preference and avoidance experiments are run using horizontal division in shuttlebox systems or in annular choice flumes. These remove the vertical stratification component so that hot and cold can be compared equally, without the vertical layering as a confounding factor. The method chosen, with its vertical stratification, is inherently unable to control for this effect because warm water is always above, and cold water is always below. This complicates the interpretations and makes firm conclusions about thermal behaviour difficult. This issue should be thoroughly discussed.

      Thank you very much for this comment. We revised the manuscript accordingly, to clearly indicate that our goal was to assess the response of fish to vertically thermally stratified water, a scenario that occurs frequently in nature. We have added the following paragraph the discussion (L523-530): “However, a generalization of our observations to horizontally oriented thermal gradients remains elusive. Our results are inherently tied to the vertical stratification created in our experiments. As warm water was always positioned above and cold water below, we could not control for the effect of vertical position (i.e., we could not do cold over warm layer experiments). This limits our ability to directly compare our findings to those obtained from horizontally oriented thermal gradients. On the other hand, the case we addressed is of direct environmental relevance, as natural waters often experience vertical thermal stratification.”

      It is unclear why the authors assume an "optimal temperature" (undefined for which trait) of 12°C for brown trout parr, and why they assume the preference temperature would match that "optimal" temperature. The thermal biology for any fish species is more complex than a single perfect temperature, with various traits showing differing optima and often a mismatch with the preferred temperature. The literature suggests brown trout growth optima between 13 and 16°C, and preference temperature has even been suggested to be as high as 21°C. In light of this, the authors' conclusion that brown trout avoid cold and don't avoid warm water is possibly misguided. It is possible that the brown trout had a preference temperature higher than 12°C, which should be acknowledged and discussed.

      This is indeed a very important aspect, which was partly (but indeed not fully) already addressed in the discussion. To reflect these considerations, we have expanded the existing paragraph in the discussion (additions are in yellow). (L422 - L439): “We conclude from the behavior of fish when warmer water was available that their acute thermal preferendum exceeded 12 °C, departing from the acclimation temperature we had chosen based on the thermal preferendum for trout reported in literature[33]. Indeed, the thermal biology for any fish species is more complex than a single, static thermal preferendum: Many internal and external factors, such as hypoxia, satiation, time of day, and life stage[5], can influence the temperature preference of fish. For example, the level of satiation can have an impact because when fish are well fed, their growth rate increases with body temperature as metabolic performance increases[40]. This modifies the preferred temperature, as observed in Bear Lake sculpin (Cottus extensus) that ascend into warmer water after feeding to stimulate digestion and thereby achieve a three-fold higher growth rate[41]. In contrast, field studies with adult fish have observed movement from warm to cold water in summer[42,43], allowing fish to lower their metabolic rate, likely in effort to conserve energy[2,44]. We propose that the behavior of trout parr upon exposure to warmer water in our experiments served to achieve a higher body temperature to ultimately increase growth rate, which is critical for this life stage[45,46]. Indeed, growth experiments on brown trout populations have shown that optimal growth temperatures can range between 15 and 19 °C, depending on the stream of origin[46].”

      The figures are unnecessarily complex and introduce a long list of abbreviations and Greek characters for no apparent reason. There are many simpler ways for showing the results so unclear why they are so opaque.

      We appreciate the reviewer’s feedback and agree on the importance of clarity, however (in the absence of specific suggestions) we did not make changes to the figures or the use of Greek characters (which align with convention), as we believe they effectively convey the results. We highlight that the data themselves are very rich (multiple fish, multiple phases, multiple treatments, etc.) and we wanted to convey this richness in a compact and transparent manner.

      Reviewer #2:

      This paper investigates an interesting question: how do fish react to and avoid thermal disturbances from the optimum that occur on fast timescales? Previous work has identified potential strategies for warm avoidance in fish on short timescales while strategies for cold avoidance are far more elusive. The work combines a clever experimental paradigm with careful analysis to show that trout parr avoid cold water by limiting excursions across a warm-cold thermal interface. While I found the paper interesting and convincing overall, there are a few omissions and choices in the presentation that limit interpretability and clarity.

      A main question concerns the thermal interface itself. The authors track this interface using a blue dye that is mixed in with either colder or warmer water before a gate is opened that leads to gravitational flow overlaying the two water temperatures. The dye likely allows to identify convective currents which could lead to rapid mixing of water temperatures. However, it is less clear whether it accurately reflects thermal diffusion. This is problematic as the authors identify upward turning behavior around the interface which appears to be the behavioral strategy for avoiding cold water but not warm water. Without knowing the extent of the gradient across the interface, it is hard to know what the fish are sensing. The authors appear to treat the interface as essentially static, leading them to the conclusion that turning away before the interface is reached is likely related to associative learning. However, thermal diffusion could very likely create a gradient across centimeters which is used as a cue by the fish to initiate the turn. In an ideal world, the authors would use a thermal camera to track the relationship between temperature and the dye interface. Absent that, the simulation that is mentioned in passing in the methods section should be discussed in detail in the main text, and results should be displayed in Figure 1. Error metrics on the parameters used in the simulation could then be used to identify turns in subsequent figures that likely are or aren't affected by a gradient formed across the interface.

      The authors assume that the thermal interface triggers the upward-turning behavior. However, an alternative explanation, which should be discussed, is that cold water increases the tendency for upward turns. This could be an adaptive strategy since for temperatures > 4C turning swimming upwards is likely a good strategy to reach warmer water.

      The paper currently also suffers from a lack of clarity which is largely created by figure organization. Four main and 38 supplemental figures are very unusual. I give some specific recommendations below but the authors should decide which data is truly supplemental, versus supporting important points made in the paper itself. There also appear to be supplemental figures that are never referenced in the text which makes traversing the supplements unnecessarily tedious.

      The N that was used as the basis for statistical tests and plots should be identified in the figures to improve interpretability. To improve rigor, the experimental procedures should be expanded.

      Specifically, the paper uses two thermal models which are not detailed at all in the methods section.

      We appreciate these crucial comments to our paper. We have addressed these points in detail below.

      As stated above, a characterization of the thermal interface is critical. Ideally via measurement or at least by expanding on the simulation.

      We appreciate the idea of using thermal cameras and, indeed, we had initially tried to use them. However, thermal cameras generally cannot see through plexiglass or glass-like material due to the way infrared radiation interacts with these materials. While thin plastics can transmit some infrared, thicker plastics and reflective materials like glass tend to block or reflect infrared light.

      We have attempted to better characterize the thermal interface thickness, namely the spatial extent of the thermal gradient over the time period of our experiments (20 min). Indeed, our simulations in the original SI were conducted precisely to estimate the thermal interface thickness, though based on thermal diffusion in still water, while turbulence generated by the moving gravity current can smear out the interface, particularly in the initial phase. To account for this in our in the reviewed manuscript, we adopted a phenomenological approach to estimate the initial increase in thickness of the thermal interface due to turbulence and present this refined simulation in our manuscript.

      Our analysis suggests that, rather than assuming an initial interface thickness of zero (as in the original version of the manuscript), the thermal diffusion simulations should begin with an initial thickness of 2.8 mm in TR1. To incorporate this adjustment, we set the initial interface thickness to 2.8 mm and ran the simulation forward for t = 20 min, assuming diffusion. This approach resulted in a final interface thickness ranging between 4 and 6 cm (see Fig. 29 in the Supplementary Information).

      To reflect this refinement, we have added a new paragraph (L717-758: "Characterization of the thermal gradient", to the Methods section. Additionally, we have updated Fig. S29 in the Supplementary Information and included an average (over time and across treatments) gradient thickness of 5 cm in Figs. 2 and 3 of the manuscript. The revised Figs. 2 and 3 now explicitly indicate the estimated vertical extent of the thermal gradient, with an extended caption detailing these changes.

      The simulation should be detailed in the methods so that its validity can be evaluated and ideally, it should involve curved interfaces as encountered in the experiment.

      To account for the effect of turbulence during the initial, inertia-dominated phase after the gate removal, we have provided a correction for the initial thickness of the interface (see the addition to the Methods section). Thank you for your suggestion regarding the incorporation of curved interfaces in the simulations. We believe that including curved interfaces in the simulations would not significantly affect the results. As shown in the manuscript, the interface is curved primarily during the initial phase of the process (first 2 min where the flow is inertia-dominated), which is currently not included in our data analysis (phase 1 begins 2 min after the gate removal).

      In that vein, distances from the interface rather than height above the interface should be reported for the fish.

      We acknowledge the reviewer’s suggestion to report distances from the interface rather than height above or below it. However, beyond the initial phase, we do not see a strong justification for using the orthogonal distance over the vertical distance, as the choice is inherently arbitrary (e.g., one could also measure the distance along the fish’s orientation vector). We have therefore kept our assessment based on the vertical distance.

      Absent measurements, the paragraph on associative learning should be struck from the discussion as it is purely speculative.

      We agree that the original paragraph on associative learning may have sounded overly speculative. However, after updating our manuscript with additional simulations of the thermal gradient's vertical extent, we found that fish perform upward turns not only above the thermal interface, but also before entering the thermal gradient itself. This observation makes us hesitant to attribute the response solely to thermotaxis. We believe it is essential to provide a plausible explanation—albeit speculative—for how fish initiate these turns before directly encountering the cold-water gradient. To support this, we have extended the discussion in this paragraph and added Supplementary Fig. 39. The new text now reads (additions in yellow): (L487 – 499): “Our findings show that fish were able to perform upward turns while still located above the thermal interface and that is, before actually sampling the cold water below the interface. In fact, our simulation of the vertical extent of the thermal gradient revealed that a substantial fraction of upward turns occurred before fish encountered the gradient itself — that is, prior to any sensory detection of the temperature change (Supplementary Fig. 39). This finding may be evidence of associative learning, whereby fish used information regarding the presence of colder water at depth obtained at prior times. While the current data do not provide conclusive evidence in this regard, they prompt the possibility that, rather than responding solely to immediate thermal cues, fish use spatial memory or associative learning to anticipate the location of colder water based on prior experience. Indeed, fish are able to perform associative learning based on non-visual cues[53], create mental maps of their surroundings54 and retain memory for hours[55], days[56] and months[57,58].”  

      The body-temperature simulations need to be detailed in the methods.

      Thanks for this comment. We have removed the supplementary text section and have included the paragraph “Body cooling during cold-water excursions” into the methods section of our manuscript (L804 - L829).

      Constant temperature experiments could be helpful in addressing the importance of a gradient/interface for triggering upward turning

      We agree, however, we were limited (for ethical reasons) to a maximum number of fish we could use in the experiments. Hence, we focused on getting approval to run experiments focused on the responses to thermal gradients. However, occupancy during the acclimation phase in 12 °C showed that fish were much more stationary and primarily occupied the lower half of the tank.

      A lot of ease of reading could be gained by labeling the conditions according to either the second temperature or perhaps even better the delta temperature (i.e. TR[-2C] instead of TR1).

      We agree that labeling conditions by the second temperature or delta temperature could in principle improve readability. However, since T_bottom and T_top are explicitly mentioned in each main figure at least once, they can be directly associated with the respective treatment. Therefore, we have opted to retain the current labeling for consistency.

      The figure legends are often short and do not accurately label all figure elements. This is especially true for supplemental figure legends which often appear rushed (e.g., the legend for Figure S2 stops mid-sentence, the legend of Figure S3 does not indicate what Ttop or Tbottom are).

      We appreciate the reviewer’s comment and have carefully revised all figure legends to ensure clarity and completeness. Specifically, we have corrected figure labels, expanded the descriptions for supplemental figures, and ensured that all elements are accurately defined. For instance, we have completed the legend for Figure S2 and clarified the definitions of T_top and T_bottom in Figure S3. Additionally, we have systematically reviewed all figure legends to prevent inconsistencies and omissions.

      For Figure S3, to improve clarity, plotting the standard deviation at different points in the tank across the phases could be more informative than the hard-to-distinguish multi-line plots in different shades of red.

      We appreciate the reviewer’s suggestion regarding Figure S3. However, the primary goal of this figure is to illustrate how the thermal interface moves over time. While plotting the standard deviation at different points in the tank could provide additional statistical insights, it would detract from the intended visualization of the interface dynamics. For this reason, we have opted to retain the current multi-line representation. Nevertheless, we have ensured that the figure is as clear as possible by refining the color contrast and improving the legend for better readability.

      There is an inconsistency in in-text citation styles (mixture of superscript and numbers in brackets).

      Thank you for pointing this out. We have carefully reviewed the manuscript and corrected any inconsistencies in the in-text citation style to ensure uniform formatting throughout.

      While the statement in the introduction, that increases in movement frequency could be purely metabolic in nature is correct, at least for larval zebrafish it has been shown that sensory neural activity is predictive of motor neuron activity and swim rates (Haesemeyer, 2018, cited by the authors).

      This is an interesting finding. It is however unclear to us why this information is crucial in our context of brown trout parr.

      Examples of summary results from Supplementary Figures 8-10 should be bundled in a main text figure since this appears to be important information supporting the conclusions.

      We agree that Supplementary Figures 8–10 contain important information (i.e. Boxplots) on vertical occupancy and the time individuals spent in different water temperatures. However, this information is already integrated into Figure 2C, D, F, and G, which display the vertical distributions of fish across treatments and over time. Given the current length of the manuscript, adding another main-text figure could dilute rather than enhance clarity. For this reason, we have opted to keep these details in the Supplementary Materials while ensuring they are appropriately referenced in the main text.

      The distributions of excursion length for all treatments should be graphed in a main figure to support the point made in the third paragraph of the "Trout parr... do not avoid warm water" section of the results.

      We appreciate the reviewer’s suggestion. However, we do not believe that plotting excursion length is necessary to support this statement, as the key finding is already well represented in the manuscript. Specifically, the transition to bimodal depth occupancy, with fish spending comparable time above and below the interface in warm-water treatments (TR6–TR9), is clearly conveyed in Figure 2F and Supplementary Figure 8B. Additionally, this information is explicitly stated in the results section (L235): "Fish did not avoid warmer water in any of the warm-water treatments (TR6–TR9). Instead, fish transitioned to a bimodal depth occupancy, with comparable time spent above and below the interface (Fig. 2F; Supplementary Fig. 8B)." Given this, we believe that adding an additional figure would not enhance clarity but may instead introduce redundancy.

      There should be a main figure panel that statistically compares the turn biases around the interface for the different conditions and the +/- 5cm interface line mentioned in the text should be visualized in the appropriate figures - incidentally, this length scale is on par with the diffusion seen in simulations further suggesting that fish in fact sense a gradient here rather than remembering an interface.

      To address the reviewer’s comment, we have made the following updates:

      • Extended and incorporated simulations of the thermal interface thickness (see Methods and Supplementary Fig. 29).

      • Plotted the vertical locations of up-turning events relative to the phase-averaged position of the thermal interface (see Supplementary Fig. 39), which includes the estimated 5 cm vertical extent of the thermal gradient.

      • Added the thermal interface thickness to the main figures (Fig. 3F,G and Fig. 2E,H) where applicable.

      While we do not claim that memory alone explains cold-water avoidance, our data still suggests that it may contribute to the observed behavior, particularly since a substantial number of upturns occurred before the fish entered the thermal gradient (see also Author response image 1 below). Our aim is not to statistically disentangle the relative contribution of thermotaxis versus associative learning, but to propose a plausible interpretation of this observed anticipatory behavior with due caution to clarify that this is only a possibility.

      Given that the thermal gradient is now visualized and characterized in detail, we respectfully suggest that an additional statistical comparison of turn biases would not add further clarity. We believe that is is evidence that vertical turning, away from the cold, occurred within and above the thermal gradient. However, we welcome the reviewer’s perspective and to demonstrate that turning points occur outside and above the thermal interface we have plotted them against gradient growth over time (see Author response image 1 below).

      Author response image 1.

      The colored area indicates the temporal growth of thermal interface thickness.

      Reviewer #3:

      In this study, the authors measured the behavioural responses of brown trout to the sudden availability of a choice between thermal environments. The data clearly show that these fish avoid colder temperatures than the acclimation condition, but generally have no preference between the acclimation condition or warmer water (though I think the speculation that the fish are slowly warming up is interesting). Further, the evidence is compelling that avoidance of cold water is a combination of thermotaxis and thermokinesis. This is a clever experimental approach and the results are novel, interesting, and have clear biological implications as the authors discuss. I also commend the team for an extremely robust, transparent, and clear explanation of the experimental design and analytical decisions. The supplemental material is very helpful for understanding many of the methodological nuances, though I admit that I found it overwhelming at times and wonder if it could be pruned slightly to increase readability. Overall, I think the conclusions are generally well-supported by the data, and I have no major concerns.

      Minor comments

      P2 intro paragraphs 1/3 - it is not clear that thermal preference generally reflects the thermal optimum, partly because it is not clear what trait is being optimized (fitness?). Some nuance here would be helpful, and would also link nicely to the discussion on p10.

      Thank you for this comment. We have now refined this section as follows (L67–71): "As most fish species are ectotherms, their body temperature fluctuates with the surrounding water temperature. Because temperature influences the rates of most physiological processes, rapid warming or cooling can affect fish performance traits, including metabolic rates, swimming ability, and thermal tolerance[6]."

      To further clarify how thermal preference relates to thermal optimum and what trait is being optimized, we have incorporated additional nuance in this section. Specifically, we now acknowledge that thermal preference may not always align with the thermal optimum for performance or fitness.

      P2 intro paragraph 2 - "adapt physiologically" implies evolution, but here you are referring to plasticity. Suggest saving the word "adapt/adaptation" for evolutionary changes (see also p9).

      Thank you for this comment. We have revised the wording to "acclimate physiologically" (L79) to more accurately reflect plastic responses rather than evolutionary adaptation.

      P7 - "This difference in probabilities (ρup - ρdown) was particularly large in the region immediately above and below the interface (-5 cm < D < 5 cm; Fig. 3F) and is a hallmark of a thermotactic behavior." I agree that the result provides compelling evidence for thermotaxis, but would it be possible to bolster this case by statistically testing for a difference in probabilities among the treatment groups here?

      In addition to Fig. 3F, we are presenting statistical evidence that for colder water temperatures, fish penetrate less deeply into the cold lower water. The decreasing trend was statistically significant (Mann–Kendall test: , p < 0.001; Supplementary Table 6) and is presented in Fig. 4C. The depth reached during each cold-water excursion is determined by the location of the vertical turning point, which redirects the fish upward toward the surface. We think this is sufficient evidence for thermotaxis.

      P9 paragraph 3 = "recent studies suggest that fish may instead respond to temporal changes of their internal body temperature." It seems like a citation is missing here. Would be useful to briefly summarize the evidence for internal temperature sensing that is the basis of this modelling exercise.

      Thanks, we have added that citation (L385).

      P10 "Our findings provide the first experimental evidence for this mode of behavioral thermoregulation in which fish navigate their heterothermal environment to achieve gradual body warming."

      I think this statement overreaches given the presented data. While there may be a trend towards fish in the warm treatment spending increasing amounts of time in the upper half of the tank, I do not see this pattern supported statistically. There is also no evidence of gradual body warming, and even if there was I disagree that this would constitute experimental evidence that this was happening "intentionally". By this reasoning, any shuttlebox experiment in which fish actively shuttle between relatively warm and cool sides to end up with a preference that is above the starting condition would also constitute evidence for gradual warming. Overall, this is an interesting pattern, but I do not think there is sufficient evidence to conclude that fish are strategically warming.

      We appreciate the reviewer’s comment and acknowledge that our original wording may have overstated the evidence. We have revised the sentence to better reflect the evdience presented (L411-415): “Our observations resemble this mode of behavioral thermoregulation, in which fish progressively favor warmer regions within a heterothermal environment. However, additional experimental evidence is required to determine the mechanisms underlying this behavior.”

      P11 "Despite the avoidance response of cold water, fish engaged in repeated cold-water excursions..."

      This is an interesting speculation, but I think it would be helpful to also point out that these fish are biased towards the bottom of the tank (based on control measurements) and this pattern may therefore simply reflect a desire to be lower in the water column.

      Thank you for this helpful comment. We have now added this point to the revised text, which reads (L475-477): “Despite the avoidance response to cold water, fish engaged in repeated cold-water excursions, potentially reflecting a behavioral strategy to map the thermal environment. This pattern may also reflect an inherent tendency to occupy the lower part of the tank, as observed during homogeneous temperature of 12 °C during the acclimation phase.”

      P13 - why was the dye always added to the right side of the tank, instead of being assigned to a side randomly? I think the control experiment is good evidence that the dye did not substantially affect behaviour, but it seems like it would have been nice to separate dye and novel temperature exposure.

      We agree that randomizing the side of dye application would have been ideal. The dye was consistently added to the right side to maintain procedural consistency, ensuring that the “incoming” or “novel” temperature was always dyed. That said, our control experiment provides strong evidence that the dye itself did not influence behavior (as discussed above and in the manuscript).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The major result in the manuscript is the observation of the higher order structures in a cryoET reconstruction that could be used for understanding the assembly of toroid structures. The crosslinking ability of ZapD dimers result in bending of FtsZ filaments to a constant curvature. Many such short filaments are stitched together to form a toroid like structure. The geometry of assembly of filaments - whether they form straight bundles or toroid like structures - depends on the relative concentrations of FtsZ and ZapD.

      Strengths:

      In addition to a clear picture of the FtsZ assembly into ring-like structures, the authors have carried out basic biochemistry and biophysical techniques to assay the GTPase activity, the kinetics of assembly, and the ZapD to FtsZ ratio.

      Weaknesses:

      The discussion does not provide an overall perspective that correlates the cryoET structural organisation of filaments with the biophysical data.

      The crosslinking nature of ZapD is already established in the field. The work carried out is important to understand the ring assembly of FtsZ. However, the availability of the cryoET observations can be further analysed in detail to derive many measurements that will help validate the model, and obtain new insights.

      We thank the reviewer for these insightful comments on our work. We have edited the manuscript to resolve and clarify most of the issues raised during the review process.

      Reviewer #2 (Public Review):

      Summary:

      In this paper, the authors set out to better understand the mechanism by which the FtsZ-associated protein ZapD crosslinks FtsZ filaments to assemble a large-scale cytoskeletal assembly. For this aim, they use purified proteins in solution and a combination of biochemical, biophysical experiments and cryo-EM. The most significant finding of this study is the observation of FtsZ toroids that form at equimolar concentrations of the two proteins.

      Strengths:

      Many experiments in this paper confirm previous knowledge about ZapD. For example, it shows that ZapD promotes the assembly of FtsZ polymers, that ZapD bundles FtsZ filaments, that ZapD forms dimers and that it reduces FtsZ's GTPase activity. The most novel discovery is the observation of different assemblies as a function of ZapD:FtsZ ratio. In addition, using CryoEM to describe the structure of toroids and bundles, the paper provides some information about the orientation of ZapD in relation to FtsZ filaments. For example, they found that the organization of ZapD in relation to FtsZ filaments is "intrinsic heterogeneous" and that FtsZ filaments were crosslinked by ZapD dimers pointing in all directions. The authors conclude that it is this plasticity that allows for the formation of toroids and its stabilization. Unfortunately, a high-resolution structure of the protein organization was not possible. These are interesting findings that in principle deserve publication.

      We thank the reviewer for this valuable assessment. We have made several changes to the manuscript to improve its readability and comprehensibility. In addition, we have addressed the reviewer’s main concerns in the point-by-point response below.

      Weaknesses:

      While the data is convincing, their interpretation has some substantial weaknesses that the authors should address for the final version of this paper.

      We have addressed most of the aspects highlighted by the reviewer to improve the quality and comprehensibility of our results.

      For example, as the authors are the first to describe FtsZ-ZapD toroids, a discussion why this has not been observed in previous studies would be very interesting, i.e. is it due to buffer conditions, sample preparation?

      Several factors may explain the absence of observed toroidal structures in other studies. FtsZ is a highly dynamic protein, and its behavior varies significantly with different environmental conditions, as detailed in the literature. These environmental factors include pH, salt concentration, protein type, GTP levels, and the purification strategy used. Previous research has employed negative stain electron microscopy (EM) to visualize ZapD-FtsZ structures. It is important to note that FtsZ is sensitive to surface effects when it is bound to or adsorbed onto membranes (Mateos-Gil et al. 2019 FEMS Microbiol Rev - DOI: 10.1093/femsre/fuy039). Therefore, the adsorption of FtsZ and ZapD onto the EM grid may influence the formation of higher order structures. In this study, we used cryo-electron microscopy (cryo-EM) and cryo-electron tomography (cryo-ET) to visualize the 3D organization of ZapD-mediated structures. This approach allows us to avoid staining artifacts and the distortion of structures caused by adsorption or drying of the grid. In addition, we can resolve single filaments. Our buffer conditions also differ slightly from those in previous studies, which may significantly impact the behavior of FtsZ, as illustrated in Supplementary Fig. 3.

      At parts of the manuscript, the authors try a bit too hard to argue for the physiological significance of these toroids. This, however, is at least very questionable, because: The typical diameter is in the range of 0.25-1.0 μm, which requires some flexibility of the filaments to be able to accommodate this. It's difficult to see how a FtsZ-ZapD toroid, which appears to be quite rigid with a narrow size distribution of 502 nm {plus minus} 55 nm could support cell division rather than stalling it at that cell diameter. which the authors say is similar to the E. coli cell.

      The toroidal structures formed by FtsZ and ZapD, with their characteristics similar to those of the bacterial division system, are significant in physiological contexts and warrant further study. The connections mediated by Zaps are expected to play a crucial role in filament organization, which is vital for the machinery enabling cellular constriction. Therefore, characterizing these structures in vitro can provide insight into divisome stabilization, assembly and constriction mechanisms. While we acknowledge the limitations of in vitro systems and do not expect to see the same toroidal structures in vivo, the way ZapD decorates and connects FtsZ filaments in vitro may resemble the processes that occur in the division ring formed inside the cell. This study represents an initial effort to characterize these toroidal structures, which could inspire further research and potentially reveal their physiological relevance.

      Regarding flexibility, it has been previously reported that an arrangement of loosely connected filaments forms the FtsZ ring. Our model is consistent with this observation despite the heterogeneity and density observed in the toroidal structures. We anticipate differences in vivo due to the high complexity of the cytoplasm, interactions with other cellular components, and attachment to the cell membrane, all of which would influence structural outcomes. However, our novel in vitro approach, which allows us to study FtsZ filament organization and connectivity – features that are challenging to explore in vivo and have not been thoroughly investigated before – has the potential to significantly advance our understanding of these structures. Consequently, these structures can aid our understanding of complex macrostructures in vivo, even if we have merely begun to scratch the surface of their characterization.

      Regarding the size of the toroids, we hypothesize that it reflects an optimal condition based on our experimental setup in solution. In vivo, these conditions are altered by interactions with various division partners, attachment to the plasma membrane, and system contraction. 

      We have better reformulated and edited the manuscript to discuss the potential physiological relevance of our toroidal structures.

      For cell division, FtsZ filaments are recruited to the membrane surface via an interaction of FtsA or ZipA the C-terminal peptide of FtsZ. As ZapD also binds to this peptide, the question arises who wins this competition or where is ZapD when FtsZ is recruited to the membrane surface? Can such a toroidal structure of FtsZ filaments form on the membrane surface? Additional experiments would be helpful, but a more detailed discussion on how the authors think ZapD could act on membrane-bound filaments would be essential.

      We appreciate this comment, which was indeed one of our main questions. The complexity of the division system raises many questions about the interaction of FtsZ with the plasma membrane. The competition between division components to interact with FtsZ and thus modulate its behavior is still largely unknown. FtsA and ZipA appear to have a greater affinity for the C-terminal domain (CTD) of FtsZ than ZapD. However, considering all FtsZ monomers forming a filament, we expect FtsZ filaments to interact with many different division partners. The ability of FtsZ to interact with many components is necessary to explain the current model of the system. According to this model, FtsZ filaments would be decorated by many different proteins, anchoring them to the membrane while crosslinking or promoting their disassembly in a spatiotemporally controlled manner. 

      We tried experiments combining FtsA, ZipA, and ZapD on supported lipid membranes and liposomes. However, they proved difficult to perform. We expect similar results to those observed for ZapA (Caldas et al. 2019 Nat Commun - DOI: 10.1038/s41467-019-13702-4). However, competition between proteins for interaction with the CTD of FtsZ adds an extra layer of complexity, making exploring this issue attractive in the future. However, as remarkably pointed out by Reviewer 3, our cryo-ET data of straight bundles provide new insights into how ZapD-FtsZ structures can bind to the plasma membrane. In these straight bundles, the CTDs of two parallel FtsZ filaments are oriented upwards. They can bind the plasma membrane directly or the ZapDs, which decorate the FtsZ filaments from above instead of from the side, as suggested previously (Schumacher et al. 2017 J Biol Chem - DOI: 10.1074/jbc.M116.773192), allowing ZapDs to interact with the membrane.

      The authors conclude that the FtsZ filaments are dynamic, which is essential for cell division. But the evidence for dynamic FtsZ filaments within these toroids seems rather weak, as it is solely the partial reassembly after addition of GTP. As ZapD significantly slows down GTP hydrolysis, I am not sure it's obvious to make this conclusion.

      FtsZ filaments are dynamic, as they can reassemble into macrostructures relatively quickly. Decreased GTPase activity is a good indicator of the formation of lateral interactions between filaments. For instance, under crowding conditions, FtsZ also reduces its GTPase activity, although the bundles disassemble very slowly over time (González et al. 2003 J. Biol. Chem - DOI: 10.1074/jbc.M305230200). We measured the GTPase activity during the first 5 minutes after GTP addition, conditions under which toroidal structures and bundles remain fully assembled. However, we expect GTPase activity to recover as the macrostructures disassemble, considering the reassembly of macrostructures after GTP resupply, which suggests that FtsZ filaments remain active and dynamic.

      On a similar note, on page 5 the authors claim that ZapD would transiently interact with FtsZ filaments. What is the evidence for this? They also say that this transient interaction could have a "mechanistic role in the functionality of FtsZ macrostructures." Could they elaborate?

      We have rephrased the whole paragraph in the revised version to clarify matters (page 10, lines 2434):

      “These results are consistent with the observation that ZapD interacts with FtsZ through its central hub, which provides additional spatial freedom to connect other filaments in different conformations. This flexibility allows different filament organizations and contributes to structural heterogeneity. In addition, these results suggest that these crosslinkers can act as modulators of the dynamics of the ring structure, spacing filaments apart and allowing them to slide in an organized manner. The ability of FtsZ to treadmill directionally, together with the parallel or antiparallel arrangement of short, transiently crosslinked filaments, is considered essential for the functionality of the Z ring and its ability to exert constrictive force34,36–38,50. Thus, Zap proteins can play a critical role in ensuring correct filament placement and stabilization, which is consistent with the toroidal structure formed by ZapD.”

      The author should also improve in putting their findings into the context of existing knowledge. For example:

      The authors observe a straightening of filament bundles with increasing ZapD concentration. This seems consistent with what was found for ZapA, but this is not explicitly discussed (Caldas et al 2019)

      We have discussed this similarity in the revised version of this manuscript (page 12, line 40 - page 13, line 8):

      “Understanding how the associative states of ZapA (as tetramers) and ZapD (as dimers), together with membrane tethering, influence the predominant structures formed in both systems is essential. The complexity of the division system raises important questions about the interaction dynamics between FtsZ and the plasma membrane. The competitive nature of the division components to engage with FtsZ and modulate its functionality remains to be thoroughly elucidated. It is important to note that FtsA and ZipA have a greater affinity for the C-terminal domain of FtsZ than ZapD. Our cryo-ET data on straight bundles provide new perspectives on how ZapD-FtsZ structures can effectively bind to the plasma membrane; in particular, the C-terminal domains of parallel FtsZ filaments are oriented upward, allowing direct membrane binding or interaction with ZapDs that reinforce these filaments from above, rather than from the side, as previously suggested.”

      A paragraph summarizing what is known about the properties of ZapD in vivo would be essential: i.e., what has been found regarding its intracellular copy number, location and dynamics?

      We thank the reviewer for this valuable suggestion. We describe the role of Zap proteins in vivo and the previous studies of ZapD in the introduction (page 2, lines 34 - page 3, line 17). Additionally, we added the estimated number of ZapD copies in the cell in the discussion (page 11, lines 2-7).

      In the introduction, the authors write that "GTP binding and hydrolysis induce a conformational change in each monomer that modifies its binding potential, enabling them to follow a treadmilling behavior". This seems inaccurate, as shown by Wagstaff et al. 2022, the conformational change of FtsZ is not associated with the nucleotide state. In addition, they write that FtsZ polymerization depends on the GTPase activity. It would be more accurate to write that polymerization depends on GTP, and disassembly on GTPase activity.”

      Following the reviewer's suggestions, we have adapted and corrected these text elements as follows (page 2, lines 7-9): 

      “FtsZ undergoes treadmilling due to polymerization-dependent GTP hydrolysis, allowing the ring to exhibit its dynamic behavior.”

      On page 2 they also write that "the mechanism underlying bundling of FtsZ filaments is unknown". I would disagree, the underlying mechanism is very well known (see for example Schumacher, MA JBC 2017), but how this relates to the large-scale organization of FtsZ filaments was not clear.

      We thank the reviewer for this comment. We have corrected and clarified the related text accordingly (page 3, lines 11-12):

      “…the link between FtsZ bundling, promoted by ZapD, and the large-scale organization of FtsZ filaments remains unresolved.”

      The authors describe the toroid as a dense 3D mesh, how would this be compatible with the Z-ring and its role for cell division? I don't think this corresponds to the current model of the Z-ring (McQuillen & Xiao, 2020). Apart from the fact it's a ring, I don't think the organization of FtsZ obviously similar to the current of the Z-ring in the bacterial cell, in particular because it's not obvious how FtsZ filaments can bind ZapD and membrane anchors simultaneously.

      We consider that the intrinsic characteristics of toroidal structures and the bacterial division ring have points in common. As indicated in the answer above, despite the differences and limitations that might result from an in vitro approach, the structures shown after ZapD crosslinking of FtsZ filaments can demonstrate intrinsic features occurring in vivo. The current model of the division ring consists of an arrangement of filaments loosely connected by crosslinkers in the center of the cell, forming a ring. This model is compatible with our findings, although many questions remain about the structural organization of the Z-ring in the cell.

      Reviewer 3 has brought a compelling new perspective to interpreting our cryo-ET data: ZapD decorates FtsZ from above, allowing ZapD or FtsZ to bind to the plasma membrane. We have discussed this point in more detail below. In the case of straight bundles, this favors the stacking of straight FtsZ filaments, whereas in the case of toroids, ZapD can also bind FtsZ filaments laterally and diagonally, and it is this less compact arrangement that could enable FtsZ bending and toroid size adjustment. 

      We have revised the text accordingly to incorporate the interpretation proposed by Reviewer 3 (page 12, lines 24-31):

      “The current model of the division ring consists of an array of filaments loosely connected by crosslinkers at the center of the cell, forming a ring. This model is consistent with our findings, although many questions remain regarding the structural organization of the Z ring within the cell. ZapD binds to FtsZ from above, allowing either ZapD or FtsZ to interact with the plasma membrane. In straight bundles, this facilitates the stacking of straight FtsZ filaments, while for toroids, ZapD can also bind FtsZ filaments diagonally. This less compact arrangement could allow bending of the FtsZ filaments and adjustment of toroid size.”

      The authors write that "most of these modulators" interact with FtsZ's CTP, but then later that ZapD is the only Zap protein that binds CTP. This seems to be inconsistent. Why not write that membrane anchors usually bind the CTP, most Zaps do not, but ZapD is the exception?

      We thank the reviewer for this pertinent suggestion, which we have followed in the revised version of the manuscript (page 2, lines 19-22):

      “Most of these modulators interact with FtsZ through its carboxy-terminal end, which modulates division assembly as a central hub.  ZapD is the only Zap protein known to crosslink FtsZ by binding its C-terminal domain, suggesting a critical Z ring structure stabilizing function.”

      I also have some comments regarding the experiments and their analysis:

      Regarding cryoET: the filaments appear like flat bands, even in the absence of ZapD, which further elongates these bands. Is this due to an anisotropic resolution? This distortion makes the conclusion that ZapD forms bi-spherical dimers unconvincing.

      The missing wedge caused by the limited angular range of the tomography data generates an elongation of the structures by a factor of 2 along the Z axis. This feature is visible in the undecorated FtsZ filament data (Supplementary Fig. 10). The more pronounced elongation along the Z-axis observed in the presence of ZapD indicates the presence of ZapD to connect two parallel FtsZ filaments along the Z-axis (see Supplementary Figs. 8, 9 and 10). We do not have sufficient resolution to precisely resolve ZapD proteins from the FtsZ filaments in the Z-axis, but we also observed bispherical ZapDs in the XY plane (Fig. 4b-d). Unfortunately, our data do not allow for a more detailed characterization.

      The authors say that the cryoET visualization provides crucial information on the length of the filaments within this toroid. How long are they? Could the authors measure it?

      Measuring the length of single filaments is not trivial, given the dense, heterogeneous mesh promoted by ZapD crosslinking. We tried to identify and track them, but the density of filaments and connections made precise measurement very difficult. Nevertheless, we could identify the formation of these toroids by an arrangement of short filaments (Supplementary Fig. 11) instead of continuous circular filaments.

      We have removed the following sentence text in the revised manuscript: “Visualization of ZapDmediated FtsZ toroidal structures by cryo-ET provided crucial information on the 3D organization, connectivity and length of filaments within the toroid.”

      Regarding the dimerization mutant of ZapD: there is actually no direct confirmation that mZapD is monomeric. Did the authors try SEC MALS or AUC? Accordingly, the statement that dimerization is "essential" seems exaggerated (although likely true).

      Unlike the wild-type ZapD protein, the mZapD mutant exists as a mixture of monomers (~15%) and dimers, as AUC assays performed at similar protein concentrations revealed. These results demonstrate that the mutant protein has a lower tendency to form dimers than the native ZapD protein. We have included the AUC data for mZapD in the supplementary material (Supp. Fig. 15a).

      What do the authors mean that toroid formation is compatible with robust persistence length? I.e. What does robust mean? It was recently shown that FtsZ filaments are actually surprisingly flexible, which matches well the fact that the diameter of the Z-ring must continuously decrease during cell division (Dunajova et al Nature Physics 2023).

      We have corrected this sentence in the revised version of the manuscript to improve clarity (page 11, lines 9-10): 

      “The persistence length and curvature of FtsZ filaments are optimized for forming bacterial-sized ring structures.”

      The authors claim that their observations suggest „that crosslinkers ... allows filament sliding in an organized fashion". As far as I know there is no evidence of filament sliding, as FtsZ monomers in living cells and in vitro are static.

      Filament sliding may be one of the factors contributing to the force generation mechanisms involved in cell division (Nguyen et al. 2021 J Bacteriol - DOI: 10.1128/JB.00576-20). Our results indicate that ZapD can separate filaments, creating space between them and facilitating their organization.

      Although the molecular dynamics of cell constriction are not yet fully understood, it is possible that filament sliding plays a role. If this is the case, the crosslinking of short FtsZ filaments in multiple directions by ZapD could provide the necessary flexibility to adjust the diameter of the constriction ring during bacterial division.

      What is the „proto-ring FtsA protein"?

      The proto-ring denotes the first molecular assembly of the Z-ring, which in E. coli consists of FtsZ, FtsA and ZipA (see, for example, Ortiz et al. 2016 FEMS Microbiol Rev - DOI: 10.1093/femsre/fuv040). To simplify matters, we have deleted the term “proto-ring” in the revised version of the MS.

      The authors refer to „increasing evidence" for „alternative network remodeling mechanisms that do not rely on chemical energy consumption as those in which entropic forces act through diffusible crosslinkers, similar to ZapD and FtsZ polymers." A reference should be given, I assume the authors refer to the study by Lansky et al 2015 of PRC on microtubules. However, I am not sure how the authors made the conclusion that this applies to FtsZ and ZapD, on which evidence is this assumption based?

      We refer to cytoskeletal network remodeling mechanisms independent of chemical energy consumption (Braun et al. 2016 Bioessays - DOI: 10.1002/bies.201500183) driven by entropic forces induced by macromolecular crowding agents or diffusible crosslinkers. The latter mechanism leads to an increase in filament overlap length and the contraction of filament networks. These mechanisms complement and act in synergy with energy-consuming processes (such as those involving nucleotide hydrolysis) to modulate actin- and microtubule-based cytoskeleton remodeling. Similarly, crosslinking proteins such as ZapD may contribute to remodeling the FtsZ division ring in the cell. 

      We have revised the corresponding text of the manuscript accordingly (page 13, lines 16-24):  “In addition, our findings could greatly enhance the understanding of how polymeric cytoskeletal networks are remodeled during essential cellular processes such as cell motility and morphogenesis. Although conventional wisdom points to molecular motors as the primary drivers of filament remodeling through energy consumption, there is increasing evidence that there are alternative mechanisms that do not rely on such energy, instead harnessing entropic forces via diffusible crosslinkers. This approach may also be applicable to ZapD and FtsZ polymers, suggesting a promising avenue for optimizing conditions in the reverse engineering of the division ring to enhance force generation in minimally reconstituted systems aimed at achieving autonomous cell division.”

      Some inconsistencies in supplementary figure 3: The normalized absorbances in panel a do not seem to agree with the absolute absorbance shown in panel e, i.e. compare maximum intensity for ZapD = 20 µM and 5 µM in both panels.

      We have corrected these inconsistencies in the revised version.

      It's not obvious to me why the structure formed by ZapD and FtsZ disassembles after some time even before GTP is exhausted, can the authors explain? As the structures disassemble, how is the "steadystate turbidity" defined? Do the structures also disassemble when they use a non-hydrolyzable analog of GTP?

      In the presence of ZapD, FtsZ rapidly forms higher order polymers after the addition of GTP, as shown by turbidity assays at 320 nm (the formation of single- or double-stranded FtsZ filaments in the absence of ZapD does not produce a significant increase in turbidity). Macrostructures formed by FtsZ in the presence of ZapD, while more stable than FtsZ filaments (which rapidly disassemble following GTP consumption), are also dynamic. These assembly reactions are GTP-dependent and considerably modify polymer dynamics. In agreement with our results, previous studies have shown that high concentrations of macromolecular crowders (such as Ficoll or dextran) promote the formation of dynamic FtsZ polymer networks (González et al. 2003 J. Biol. Chem - DOI: 10.1074/jbc.M305230200). In this case, FtsZ GTPase activity was significantly retarded compared with FtsZ filaments, resulting in a decrease in GTPase turnover. Similar mechanisms may apply to assembly reactions in the presence of ZapD.

      Parallel assembly studies replacing GTP with a slowly hydrolyzable GTP analog remain pending. We expect ZapD-containing FtsZ macrostructures to last assembled for longer but still disassemble upon GTP consumption, as occurs with the crowding-induced FtsZ polymer networks formed in the presence of nucleotide analogs.

      Accordingly, we have revised the corresponding text to clarify matters (page 4, line 37 – page 5 line 7). 

      Conclusion: Despite some weaknesses in the interpretation of their findings, I think this paper will likely motivate other structural studies on large scale assemblies of FtsZ filaments and its associated proteins. A systematic comparison of the effects of ZapA, ZapC and ZapD and how their different modes of filament crosslinking can result in different filament networks will be very useful to understand their individual roles and possible synergistic behavior.

      We appreciate the reviewer's remarks and comments, which provided us with valuable information and helped us considerably improve the revised manuscript.

      Reviewer #3 (Public Review):

      Summary:

      The authors provide the first image analysis by cryoET of toroids assembled by FtsZ crosslinked by ZapD. Previously toroids of FtsZ alone have been imaged only in projection by negative stain EM. The authors attempt to distinguish ZapD crosslinks from the underlying FtsZ filaments. I did not find this distinction convincing, especially because it seems inconsistent with the 1:1 stoichiometry demonstrated by pelleting. I was intrigued by one image showing straight filament pairs, which may suggest a new model for how ZapD crosslinks FtsZ filaments.

      We thank the reviewer for these valuable comments, to which we have responded in detail below. 

      Strengths:

      (1) The first image analysis of FtsZ toroids by cryoET.

      (2) The images are accompanied by pelleting assays that convincingly establish a 1:1 stoichiometry of FtsZ:ZapD subunits.

      (3) Fig. 5 shows an image of a pair of FtsZ filaments crosslinked by ZapD. This seems to have higher resolution than the toroids. Importantly, it suggests a new model for the structure of FtsZ-ZapD that resolves previously unrecognized conflicts. (This is discussed below under weaknesses, because it is so far only supported by a single image.)

      We thank the reviewer for this assessment and, in particular, for raising point 3, which provided a new perspective on the interpretation of our data. We have also included a new example of a straight bundle in Supplementary Fig. 13.

      Weaknesses:

      This paper reports a study by cryoEM of polymers and bundles assembled from FtsZ plus ZapD. Although previous studies by other labs have focused on straight bundles of filaments, the present study found toroids mixed with these straight bundles, and they focused most of their study on the toroids. In the toroids they attempt to delineate FtsZ filaments and ZapD crosslinks. A major problem here is with the stoichiometry. Their pelleting assays convincingly established a stoichiometry of 1:1, while the mass densities identified as ZapD are sparse and apparently well below the number of FtsZ (FtsZ subunits are not resolved in the reconstructions, but the continuous sheets or belts seem to have a lot more mass than the identified crosslinks.)  

      Apart from the stoichiometry I don't find the identification of crosslinks to be convincing. It is missing an important control - cryoET of toroids assembled from pure FtsZ, without ZapD.

      However, if I ignore these and jump to Fig. 5, I think there is an important discovery that resolves controversies in the present study as well as previous ones, controversies that were not even recognized. The controversy is illustrated by the Schumacher 2017 model (their Fig. 7), which is repeated in a simplified version in Fig. 1a of the present mss. That model has a two FtsZ filaments in a plane facing ZapD dimers which bridge them. In this planar model the C-terminal linker, and the ctd of FtsZ that binds ZapD facing each other and the ZapD in the middle, with. The contradiction arises because the C-terminus needs to face the membrane in order to attach and generate a bending force. The two FtsZ filaments in the planar model are facing 90{degree sign} away from the membrane. A related contradiction is that Houseman et al 2016 showed that curved FtsZ filaments have the C terminus on the outside of the curve. In a toroid the C termini should all be facing the outside. If the paired filaments had the C termini facing each other, they could not form a toroid because the two FtsZ filaments would be bending in opposite directions.

      Fig. 5 of the present ms seems to resolve this by showing that the two FtsZ filaments and ZapD are not planar, but stacked. The two FtsZ filaments have their C termini facing the same direction, let's say up, toward the membrane, and ZapD binds on top, bridging the two. The spacing of the ctd binding sites on the Zap D dimer is 6.5 nm, which would fit the ~8 nm width of the paired filament complex observed in the present cryoEM (Fig S13). In the Schumacher model the width would be about 20 nm. Importantly, the stack model has the ctd of each filament facing the same direction, so the paired filaments could attach to the membrane and bend together (using ctd's not bound by ZapD). Finally, the new arrangement would also provide an easy way for the complex to extend from a pair of filaments to a sheet of three or four or more. A problem with this new model from Fig. 5 is that it is supported by only a single example of the paired FtsZ-ZapD complex. If this is to be the basis of the interpretation, more examples should be shown. Maybe examples could be found with three or four FtsZ filaments in a sheet.

      We thank the reviewer for asking interesting questions and suggesting a compelling model for how ZapD could bind FtsZ filaments. Cryo-ET of straight bundles revealed that high ZapD density promotes vertical stacking of FtsZ filaments and decoration of FtsZ filaments by ZapD from above. In toroids, FtsZ filaments are vertically decorated by ZapD, which explains the high elongation of the filament structures observed, consisting of FtsZ-ZapD(-FtsZ) units. In addition, we observed a high abundance of diagonal connections between FtsZ filaments of different heights, revealing a certain flexibility/malleability of ZapD to link filaments that are not perfectly aligned vertically. This configuration could give rise to curved filaments and the overall toroid structure.

      The manuscript proposes that ZapD can bind FtsZ filaments in different directions. However, it seems to have a certain tendency to bind to the upper part of FtsZ filaments, stacking them vertically or vertically with a lateral shift (Supplementary Fig. 9). We also observe lateral connections, although the features of the toroidal structures limit their visualization. This enables both the binding to the membrane by ZapD or FtsZ and the formation of higher order FtsZ polymer structures. 

      In summary, ZapD is capable of linking FtsZ filaments in multiple directions, including from the upper part of the filaments as well as laterally or diagonally. At high concentrations of ZapD, the filaments become more compactly arranged, primarily stacking vertically, which results in the loss of curvature. In contrast, at lower concentrations of ZapD, the FtsZ filaments are less tightly packed, leading to curved filaments and an overall toroidal structure that may resemble the in vivo ring structures.

      We have edited our manuscript to accommodate this hypothesis, including the abstract and the cryoET section (page 7, lines 5-16): 

      “The isosurface confirmed the presence of extended structures along the Z-axis, well beyond the elongation expected from the missing wedge effect for single FtsZ filaments (for comparison, see Supplementary Fig. 10). The vertically extended structures appeared to correspond to filaments that were connected or decorated by additional densities along the Z-axis (Supplementary Fig. 9b). Importantly, these densities were only observed in the presence of ZapD (Supplementary Fig. 10b), suggesting that they represent ZapD connections (Fig. 3e and Supplementary Figs. 8e and 9b). We note that the resolution of the data is not sufficient to precisely resolve ZapD proteins from the FtsZ filaments in the Z-axis.

      These results suggest that the toroids are constructed and stabilized by interactions between ZapD and FtsZ, which are mainly formed along the Z-axis but also laterally and diagonally.”

      Page 7, lines 40-42: 

      “Cryo-ET imaging of ZapD-mediated FtsZ toroidal structures revealed a preferential vertical stacking and crosslinking of short ZapD filaments, which are also crosslinked laterally and diagonally, allowing for filament curvature.”

      And in the discussion (page 12, lines 27-31): 

      “ZapD binds to FtsZ from above, allowing either ZapD or FtsZ to interact with the plasma membrane. In straight bundles, this facilitates the stacking of straight FtsZ filaments, while for toroids, ZapD can also bind FtsZ filaments diagonally. This less compact arrangement could allow bending of the FtsZ filaments and adjustment of the toroid size.”

      What then should be done with the toroids? I am not convinced by the identification of ZapD as "connectors." I think it is likely that the ZapD is part of the belts that I discuss below, although the relative location of ZapD in the belts is not resolved. It is likely that the resolution in the toroid reconstructions of Fig. 4, S8,9 is less than that of the isolated pf pair in Fig. 5c.

      We agree with the reviewer's interpretation that ZapD can attach to FtsZ filaments from both above and laterally. The data from the straight bundles, which are more clearly resolved due to their thinner structure, demonstrate that ZapD can decorate FtsZ filaments vertically. Additionally, the toroidal data supports the notion that ZapD can act as a crosslinker between filaments that are not perfectly vertical, allowing for lateral offsets (see, for example, Fig. 4d) or lateral connections (Fig. 4b). 

      We recognize that the resolution and high density of structures in our cryo-ET data make it challenging to accurately annotate proteins or connectors. Despite this difficulty, we have made efforts to label and identify the ZapD proteins and connectors. We employed an arbitrary labeling method to assist with visual interpretation. However, we acknowledge that some errors may exist and that ZapD proteins were not labeled, particularly along the Z-axis, where the missing wedge limits our ability to distinguish between ZapD and FtsZ proteins (page 7, lines 8-13):

      “The vertically extended structures appeared to correspond to filaments that were connected or decorated by additional densities along the Z-axis (Supplementary Fig. 9b). Importantly, these densities were only observed in the presence of ZapD (Supplementary Fig. 10b), suggesting that they represent ZapD connections (Fig. 3e and Supplementary Figs. 8e and 9b). We note that the resolution of the data is not sufficient to precisely resolve ZapD proteins from the FtsZ filaments in the Z-axis. We note that the resolution of the data is not sufficient to precisely resolve ZapD proteins from the FtsZ filaments in the Z-axis.”

      We draw attention to the limitation of our manual segmentation in the text as follows (page 7, lines 20-24):

      “We manually labeled the connecting densities in the toroid isosurfaces to analyze their arrangement and connectivity with the FtsZ filaments. The high density of the toroids and the wide variety of conformations of these densities prevented the use of subtomogram averaging to resolve their structure and spatial arrangement within the toroids.”

      Importantly, If the authors want to pursue the location of ZapD in toroids, I suggest they need to compare their ZapD-containing toroids with toroids lacking ZapD. Popp et al 2009 have determined a variety of solution conditions that favor the assembly of toroids by FtsZ with no added protein crosslinker. It would be very interesting to investigate the structure of these toroids by the present cryoEM methods, and compare them to the FtsZ-ZapD toroids. I suspect that the belts seen in the ZapD toroids will not be found in the pure FtsZ toroids, confirming that their structure is generated by ZapD.

      The only reported toroidal structure of E. coli FtsZ can be found in the literature by Popp et al. (2009 Biopolymers – DOI: 10.1002/bip.21136). It is important to note that methylcellulose (MC) must be added to the working solution to induce the formation of these structures, as FtsZ toroids do not form in the absence of MC. The mechanisms by which MC promotes this assembly process go beyond mere excluded volume effects due to crowding, as the concentration of MC used is very low (less than 1 mg/ml), which is below the typical crowding regime. This suggests that there are additional interactions between MC and FtsZ. Such complexities and secondary interactions prevent the use of this system as a reliable control for the FtsZ toroidal structures reported here. Alternatively, we also considered the toroidal structures of FtsZ from Bacillus subtilis (Huecas et al. 2017 Biophys J - DOI: 10.1016/j.bpj.2017.08.046) and Cyanobacterium synechocystis (Wang et al. 2019 J Biol Chem – DOI: 10.1074/jbc.RA118.005200). However, these structures do not serve as appropriate controls due to the structural and molecular differences between these FtsZ proteins.

      Recommendations for the authors:  

      Reviewing Editor:

      While the three referees recognize and appreciate the importance of this work several technical and interpretational questions have been raised. There was a prolonged discussion amongst the three expert referees, and it was felt that the current version suffers from a number of problems that the authors need to consider. These are to do with 1. Stoichiometry of ZapD-FtsZ 2. the evidence for crosslinks 3. how the cryo-ET data correlates with the biophysical data 4. Physiological relevance of the elucidated structures. Please take note of the public reviews (strengths and weaknesses) as well as "Recommendations to the authors" sections below, if you choose to prepare a revision.

      In reading the reviews very carefully (as well as while following the ensuing robust discussion between the referees) I noticed that all points raised are extremely important to be addressed / reconciled (with experiments and / or discussion) for this study to become an outstanding contribution to bacterial cell biology field. I would therefore urge you to consider these carefully and revise the manuscript accordingly.

      We thank the editorial board and reviewers for their excellent work evaluating and reviewing our manuscript. Their constructive suggestions and comments have been taken into account in preparing the revised version. We have paid particular attention to the four points mentioned above by the reviewing editor. We hope that the new version and this point-by-point rebuttal letter will answer most of the questions and weaknesses raised by the reviewers.

      Reviewer #1 (Recommendations for the authors):

      Suggestions for improvement of the manuscript:

      (1) ZapD to FtsZ ratio:

      i) Page 3: Results section, paragraph 1:

      FtsZ to ZapD shows a 1:2 ratio. How does this explain cross linking by a dimeric species, as this will be equivalent to a 1:1 ratio of FtsZ and ZapD? The crystal structure in the reference cited has FtsZ peptide bound only to one side of the dimer, however a crosslinking effect can happen only if FtsZ binds to both protomers of ZapD dimer. If the decoration is not uniform as given in the toroid model based on cryoET, this should lead to a model with excess of FtsZ in the toroid?

      On page 3 of the original manuscript, we stated that the binding stoichiometry of ZapD to FtsZ was 2:1, based on estimates derived from sedimentation velocity experiments involving the unassembled GDP form of FtsZ. However, upon reanalyzing these experiments, we found that the previous characterization of the association mode was overly simplistic. We determined that there are two predominant molecular species of ZapD:FtsZ complexes in solution, which correspond to ZapD dimers bound to either one or two FtsZ monomers, resulting in stoichiometries of 2:1 and 1:1, respectively. The revised binding stoichiometry data for ZapD and GDP-FtsZ suggests the presence of 1:1 ZapD-FtsZ complexes which aligns with the idea that FtsZ polymers can be crosslinked by dimeric ZapD species. In mixtures where ZapD is present in excess over FtsZ, the crosslinking corresponds to 1:1 binding stoichiometries, leading to the formation of straight macrostructures. Conversely, when the concentration of ZapD is reduced in the reaction mixture, the resulting macrostructures take the form of toroids. In this scenario, there is an excess of FtsZ because only some of the FtsZ molecules within the polymers are crosslinked by ZapD dimers, resulting in a binding stoichiometry of approximately 0.4 ZapD molecules per FtsZ, as quantified by differential sedimentation experiments.

      We have rewritten the corresponding texts in the revised version to explain these matters (page 4 lines 14-18):

      “Sedimentation velocity analysis of mixtures of the two proteins revealed the presence of two predominant molecular species of ZapD:FtsZ complexes in solution. These complexes are compatible with ZapD dimers bound to one or two FtsZ monomers, corresponding to ZapD:FtsZ stoichiometries of 2:1 and 1:1, respectively (Supplementary Fig. 1a (III-IV)). This observation is consistent with the proposed interaction model.”

      ii) How does 40 - 80 uM of ZapD correspond to a molar ratio of approximately 6?

      It was a typo from previous versions. We have corrected it in the revised version. 

      iii) The ratios of ZapD to FtsZ are different when described later in page 4 in the context of the toroid. Are these ratios relevant compared to the contradicting ratios mentioned later in page 4?

      To clarify issues related to the binding of ZapD to FtsZ, we have rewritten the sections on ZapD binding stoichiometries to both FtsZ-GDP and FtsZ polymers in the presence of GTP (see page 4 lines 14-18 and page 5 lines 15-26).

      iv) Supplementary Figure 5:

      In the representative gel shown, the amount of ZapD in the pellet does not appear to be double compared to 10 and 30 uM concentrations. However, the estimated amount in the plot shown in panel (c) appears to indicate that that ZapD has approximately doubled at 30 uM compared to 10 uM. Please re-check the quantification.

      Without prior staining calibration of the gels, there is no simple quantitative relationship between gel band intensities after Coomassie staining and the amount of protein in a band (Darawshe et al. 1993 Anal Biochem - DOI: 10.1006/abio.1993.1581). The latter point precludes a quantitative comparison of pelleting / SDS-PAGE data and analytical sedimentation measurements.

      v) How can a consistent ratio being maintained be explained in an irregular structure of the toroid? The number of ZapD should be much less compared to FtsZ according to the model.

      See answers to points i) and iii)

      (2) GTPase activity and assembly/disassembly of toroids:

      i) Page 3, Results section: last paragraph:

      What is the explanation or hypothesis for decrease in GTPase activity upon ZapD binding? Given that FtsZ core is not involved in the interaction of the higher order assemblies, what is the probable reason on decrease in GTPase activity upon ZapA binding?

      Excluded volume effects caused by macromolecular crowding, such as high concentrations of Ficoll or dextran, promote the formation of dynamic FtsZ polymer networks (González et al. 2003 J. Biol. Chem - DOI: 10.1074/jbc.M305230200). In these conditions, FtsZ GTPase activity is significantly slowed down compared to the activity observed in FtsZ filaments formed without crowding, leading to a decreased GTPase turnover rate. Similar mechanisms may also apply to assembly reactions in the presence of ZapD (see, for example, Durand-Heredia et al. 2012 J Bacteriol - DOI: 10.1128/JB.0017612).

      ii) How is the decrease in GTPase activity compatible with dynamics of disassembly? Please substantiate on why disassembly is linked to transient interaction with ZapD. Shouldn't disassembly and transient interaction be linked to recovery of GTPase activity rates? 

      iii) Does the decrease in GTPase activity imply a reduced turnover of disassembly of FtsZ to monomers? Hence, how is the reduction in turbidity related to the decrease in GTPase activity? How does the GTPase activity change with time? iv) How can the decrease in GTPase activity with increasing ZapD be explained?

      We conducted GTPase activity assays within the first two minutes following GTP addition, a timeframe that promotes bundle formation. Previous studies, such as those by Durand-Heredia et al. (2012 J Bacteriol - DOI: 10.1128/JB.00176-12), have also indicated a reduction in GTPase activity during the initial moments of bundling. The reviewer’s suggestion that GTPase activity should recover after the disassembly of toroids is valid and warrants further investigation. To test this hypothesis, measuring GTPase activity over extended periods would be necessary. When comparing FtsZ filaments observed in vitro, we found that ZapD-containing FtsZ bundles exhibit decreased GTPase activity. Although we did not measure it directly, we anticipate a reduction in the rate of GTP exchange within the polymer, similar to the behavior of FtsZ bundles formed in the presence of crowders (González et al. 2003 J. Biol. Chem - DOI: 10.1074/jbc.M305230200), which also display a delay in GTPase activity. High levels of ZapD enhance bundling, which may explain the decrease in GTPase activity as ZapD levels increase.

      (3) Treadmilling and FtsZ filament organisation:

      If the FtsZ filaments are cross linked antiparallel, how can tread milling behaviour be explained? Doesn't tread milling imply a directionality of filament orientations in the FtsZ bundles?

      Our model can only suggest filament alignment. The latter is compatible with parallel and antiparallel filament organization.

      The correlation between observed effects on GTPase activity, treadmilling and ZapD interaction will provide an interesting insight to the model.

      Establishing a detailed correlation among these three factors could yield valuable insights into the mechanisms and potential physiological implications of the structural organization of FtsZ polymers influenced by crosslinking proteins and ZapD. To precisely characterize these interactions, further time-resolved assays in solution and reconstituted systems would be necessary, which is beyond the scope of this study.

      (4) Toroid dimensions and intrinsic curvature:

      i) Page 4: What is the correlation between the toroid dimensions and the intrinsic curvature of the FtsZ filaments? Given the thickness of ~ 127 nm, please provide an explanation of how the intrinsic curvature of FtsZ is compatible with both the inner and outer diameters of 500 nm and 380 nm.

      We added a paragraph for clarification (page 6, lines 20-24):

      “Previous studies have shown different FtsZ structures at different concentrations and buffer conditions. FtsZ filaments are flexible and can generate different curvatures ranging from mini rings of ~24 nm to intermediate circular filaments of ~300 nm or toroids of ~500 nm in diameter (reviewed in Erickson and Osawa 2017 Subcell Biochem - DOI: 10.1007/978-3-319-53047-5_5, and Wang et al. 2019 J Biol Chem - DOI: 10.1074/jbc.RA119.009621). It is reasonable to assume that FtsZ filaments can accommodate the toroid shape promoted by ZapD crosslinking.”

      ii) For the curvature of FtsZ filaments to be similar, the length of the filaments in the inner circles of the toroid have to be smaller than those in the outer circles? Is this true? Or are the FtsZ filaments of uniform length throughout?

      Due to the limitations in the resolution of the toroidal structure, we could not accurately measure the length or curvature of the filaments. Considering the FtsZ flexibility, these filaments may exhibit various curvatures and lengths, as previously mentioned.

      iii) Is the ZapD density uniform thought the inner and outer regions of the toroid?

      The heterogeneity found in the structures suggests a difference in ZapD binding densities; however, we lack quantitative data to confirm this. The outer regions are likely more exposed to the attachment of free ZapDs in the surrounding environment, which leads to the recruitment of more ZapDs and the formation of straight bundles. Supplementary Fig. 7b (right) features a zoomed-in image of a toroid adorned with globular densities in the outer areas, which may correspond to ZapD oligomers. Similar characteristics appear in the straight filaments illustrated in the panels of this figure. However, these features are absent or present in significantly lower quantities in toroids with a 1:1 ratio and toroids formed under a 1:6 ratio, suggesting that the external decoration is due to ZapD saturation. Unfortunately, we cannot provide further details on the characteristics of these protein associations.

      (5) Regular arrangement and toroid structure:

      i) Page 4: last section, first sentence: What is meant by 'regular' arrangement here? The word regular will imply a periodicity, which is not a feature of the bundles.

      We have rephrased the sentence in the revised manuscript as follows (page 5, lines 35-36): “Previous studies have visualized bundles with similar features using negative-stain transmission electron microscopy.”

      ii) Similarly, page 6 first sentence mentions about a conserved toroid structure. Which aspects of the toroid structure are conserved and what are the other toroids that are compared with?

      We noted several features that are conserved in the ZapD-mediated toroidal structures, including their diameter, thickness, height, and roundness, as shown in Fig. 2d-e and Supplementary Fig. 6b-c. However, the internal organization of the toroid does not exhibit a periodic or regular structure. We have rephrased this to say: “…resulting in a toroidal structure observed for the first time following the interaction between FtsZ and one of its natural partners in vitro.” (page 7, lines 42-43):

      iii) Discussion, para 1, last sentence: How is the toroid structural correlated with the bacterial cell FtsZ ring? What do the authors mean by 'structural compatibility' with the ring?

      The toroidal structures described in this work are consistent with the intermediate curved conformation of FtsZ polymers observed more generally across bacterial species and are likely to be part of the FtsZ structure responsible for constriction-force generation (Erickson and Osawa 2017 Subcell Biochem - DOI: 10.1007/978-3-319-53047-5_5). In the case of E. coli, if we assume an average of around 5000 FtsZ monomers in the polymeric form (two-thirds of the total found in dividing cells), this number of FtsZ molecules would be enough to encircle the cell around 6-8 times (considering the axial spacing between FtsZ monomers and the cell perimeter), which would be compatible with the structure adopting the form of a discontinuous toroidal assembly. 

      The term “structural compatibility” could be confusing, so we have removed it from the revised text. 

      iv) Discussion, para 2:

      Resemblance with the division ring in bacterial cells is mentioned in paragraph 2, however the features that are compared to claim resemblance comes later in the discussion. It will be helpful to rearrange the sections so that these are presented together.

      We have reorganized the sections following the reviewer’s suggestion.

      (6) CryoET of toroid and interpretation of the tomogram:

      i) Supplementary figure 10: It is not convincing that the indicated densities correspond to ZapD. Is the resolution and the quality of the tomogram sufficient to comment on the localisation of ZapD? It is challenging to see any interpretable difference between FtsZ filament dimers in 10a vs FtsZ+ZapD in panel (b).

      We acknowledge that localizing ZapDs in the structure is a challenge due to the limited resolution of the cryo-ET data (page 7, lines 11-13, 21-24). We have manually labeled putative ZapDs in the data and have done our best to identify the structures reasonably while recognizing the limitations of the segmentation. We use different colors to guide the eye without clearly stating what is or is not a ZapD. However, filaments found in 1:1 and 1:6 ratio toroids have a clear difference in thickness to those observed in the absence of ZapD. The filaments in 1:0 ratio toroids provide a reasonable control for elongation due to the missing wedge and allow us to attribute the extra filament thickness to ZapD densities confidently (page 7, lines 5-12).

      ii) How is it quantified that the elongation in Z is beyond the missing wedge effect? Please include the explanation for this in the methods or the relevant data as Supplementary figure panels.

      The missing wedge effect causes an elongation by a factor of 2 along the Z-axis. This elongation is evident in the filaments of the 1:0 ratio toroids. Consequently, the elongation in the filaments of the 1:1 and 1:6 ratio toroids exceed that observed due to the missing wedge effect. We have also added this information to the methods section (page 17, lines 31-33).

      iii) Segmentation analysis of the tomogram and many method details of analysis and interpretation of the tomography data has not been described. This is essential to understand the reliability of the interpretation of the tomography data.

      We provided thresholds for volume extraction as isosurfaces and clarified how the putative ZapDs are colored in the revised methods section (page 17, line 24-30). However, we could not perform quantitative analysis of the segmented structures.

      (7) Quantification of structural features of the toroid:

      i) Page 5 last sentence mentions that it provides crucial information on the connectivity and length of the filaments. Is it possible to show a quantification of these features in the toroid models?

      Based on our data, we hypothesize that ZapD crosslinks filaments by creating a network of short filaments rather than long ones. These short filaments assemble to form a complete ring. However, the current resolution of the data precludes precise quantification of this process.

      In the revised version, we have changed this last sentence to put the emphasis on the crosslinking geometry instead (page 7, lines 40-43):

      “Cryo-ET imaging of ZapD-mediated FtsZ toroidal structures revealed a preferential vertical stacking and crosslinking of short ZapD filaments, which are also crosslinked laterally and diagonally, allowing for filament curvature and resulting in a toroidal structure observed for the first time following the interaction between FtsZ and one of its natural partners in vitro.”

      ii) In toroids with increasing concentrations, will it be possible to quantify the number of blobs which have been interpreted as ZapD? Is this consistent with the data of FtsZ to ZapD ratios?

      These quantifications would assist in interpreting the data. However, due to the limited resolution of the data, we are reluctant to provide estimates.

      iii) What is the average length of the filaments in the toroid? Can this be quantified from the tomography data? Similarly, can there be an estimation of curvature of the filaments from the data?

      Unfortunately, the complexity of the toroidal structure and the limited resolution we achieved prevent us from providing accurate quantification. We attempted to track and measure the length of the filaments, but this proved challenging due to the high concentration of connections. Regarding curvature, the arrangement of the filaments into toroids makes it difficult to measure the curvature of each filament. Additionally, the filaments are not perfectly aligned, which suggests that there may be various curvatures present.

      iv) What is the average distance between the FtsZ filaments in the toroid? Does this correlate with the ZapD dimensions, when a model has been interpreted as ZapD?

      We measured the spacing (not the center-to-center distance) between filaments in the toroids and showed this in Supplementary Fig. 14b (sky blue). We observed that the distances are very similar to those found for straight bundles (light blue), with a slightly greater variability. We should point out here that the distances were measured in the XY plane to simplify the measurements.

      v) What is the estimate of average inter-filament distances within the toroid? (Similar data as in Figure 13 for bundles?) When the distance between filaments is less, is the angle between ZapD and FtsZ filament axis different from 90 degrees? This might help in validation of interpretation of some of the blobs as ZapD.

      The distances between the filaments presented in Supplementary Figure 14b include those for toroids (1:1 ratio, represented in sky blue) and straight bundles (1:6 ratio, shown in light blue). We focused solely on the distance between filaments in the XY plane and did not differentiate based on the connection angle. Although the distance may vary with changes in the angles between filaments, our data does not permit us to make any quantitative measurements regarding these variations.

      vi) How does the inter filament distance in the toroids compare with the dimensions of ZapD dimers, in the toroids and bundles? Is there a role played by the FtsZ linker in deciding the spacing?

      The dimension of a ZapD dimer is ~7 nm along the longest axis. Huecas et al. (2017 Biophys J - DOI: 10.1016/j.bpj.2017.08.046) estimated an interfilament distance of ~6.5-6.7 nm for toroids of FtsZ from Bacillus subtilis. These authors also observed a difference in this spacing as a function of the linker, assuming that linker length would modulate FtsZ-FtsZ interactions. We observe a similar spacing for double filaments (5.9 ± 0.8 nm) and a longer spacing in the presence of ZapD (7.88 ± 2.1 nm). Previous studies with ZapD did not measure the distance between filaments but hypothesized that distances of 6-12 nm are allowed based on the structure of the protein (Schumacher M. 2017 J Biol Chem - DOI: 10.1074/jbc.M116.773192). Longer linkers may also provide additional freedom to spread the filaments further apart and facilitate a higher degree of variability in the connections by ZapD. This discussion has been included in the revised text (page 6, line 10-18).

      (8) Crosslinking by ZapD and toroid reorganisation by transient interactions:

      i) Page 5, paragraph 2: Presence of putative ZapD decorating a single FtsZ': When ZapD is interacting with 2 FtsZ monomers within the same protofilament, it does not have any more valency to crosslink filaments. How do the authors propose that this can connect nearby filaments?

      We thank the reviewer for raising this interesting question. We see examples of ZapD dimers binding a filament through only one of the monomers, occupying one valency of the interaction and leaving one of the monomers available for another binding. We expect to see higher densities of ZapD in the outer regions of toroids simply because there are no longer (or not as frequent) FtsZ filaments available to be attached and join the overall toroid structure. Assuming that a ZapD dimer could bind the same FtsZ filament, this region would not be able to connect to other nearby filaments via these interactions.

      ii) Page 5: How are the authors coming up with the proposal of a reorganisation of toroid structures to a bundle? Given the extensive cross linking, a transition from a toroid to a bundle has to be a cooperative process and may not be driven by transient interactions. I would imagine that the higher concentration of ZapD will directly result in straight bundles because of the increased binding events of a dimer to one filament.

      Theoretically, this is correct. A certain degree of cooperativity linked to multivalent interactions would also favor the establishment of other ZapD connections. Furthermore, the formation of these structures occurs relatively quickly, within the first two minutes following the addition of GTP. We observed various intermediate structures, ranging from sparse filament bundles to toroids and straight filaments. However, the limited data prevents us from proposing a model that eventually explains the formation of higher-order structures over time.

      iii) Given such a highly cross-linked mesh, how can you justify transient interactions and loss of ZapD leading to disassembly? The possibility that ZapD can diffuse out of such a network seems impossible. Hence, what is the significance of a transient interaction? What is the basis of calling the interactions transient?

      We have noted that the term “transient” used to define the interaction between ZapD and FtsZ seems to generate confusion. Therefore, we have decided to replace this term to improve the readability of our manuscript, which has been edited accordingly.

      iv) Does the spacing between ZapD connections decide the curvature of the toroid?

      The FtsZ linker connected to ZapD molecules could modulate filament spacing and curvature, as previously suggested (Huecas et al. 2017 Biophys J - DOI: 10.1016/j.bpj.2017.08.046; Sundararajan and Goley 2017 J Biol Chem - DOI: 10.1074/jbc.M117.809939, and Sundararajan et al. 2018 Mol Microbiol - DOI: 10.1111/mmi.14081). In our structures, we observe a mixture of curvatures in the internal organization of the toroid. Despite the flexibility of FtsZ, filaments have a preferred curvature that FtsZ would initially determine. However, the amount of ZapD connections will eventually force the filament structure to adapt and align with neighboring filaments, facilitating connections with more ZapDs. Thus, the binding density of ZapD molecules significantly impacts FtsZ curvature rather than the ZapD connections themselves. However, the molecular mechanism describing the link between ZapD binding and polymer curvature remains unsolved.

      v) What is the difference in conditions between supplementary figure 6 and 12? Why is it that toroids are not observed in 12, for the same ratios?

      Both figures show images of samples under the same conditions. At high ZapD concentrations in the sample, we observe a mixture of structures ranging from single filaments, bundles, toroids, and straight bundles. In Supplementary Fig. 6, we have selected images of toroids, while in Supplementary Fig. 12, we have focused on single and double filaments. We aim to compare similar structures at different ZapD concentrations.

      (9) Correlation with in vivo observations:

      What is the approximate ratio of ZapD to FtsZ concentrations in the cell? In this context, within a cell which one - a toroid or bundle - will be preferred?

      Previous studies have estimated that E. coli cells contain approximately 5,000 to 15,000 FtsZ protein molecules, resulting in a concentration of around 3 to 10 µM (Rueda et al. 2003 J Bacteriol - DOI: 10.1128/JB.185.11.3344-3351.2003). Furthermore, only about two-thirds of these FtsZ molecules participate in forming the division ring (Stricker et al. 2002 PNAS - DOI: 10.1073/pnas.052595099). In contrast, ZapD is a low-abundance protein, with only around 500 molecules per cell (DurandHeredia et al. 2012 J Bacteriol - DOI: 10.1128/JB.00176-12), making it a relatively small fraction compared to the FtsZ molecules. Under these circumstances, toroidal structures are more likely to form than straight bundles, as the latter would require significantly higher concentrations of ZapD for proper assembly. We have added these considerations in the revised text (page 11, lines 1-7).

      (10) Interpretation of mZapD results:

      i) What is the experimental proof for weakened stability of the dimer? Rather than weakened stability, does this form a population of only monomeric ZapD or a proportion of non-functional or unfolded dimer? This requires to be shown by AUC or SEC to substantiate the claim of a weakened interface.

      We have provided new AUC results indicating that mZapD is partially monomeric, which suggests a weakened dimerization interface (page 9, line 15-16 and Supp. Fig. 15a). The assays revealed no signs of protein aggregation.

      ii) How does a weaker dimer result in thinner bundles and not toroids? A weaker dimer would imply that the number of ZapD linked to FtsZ will be less than the wild type, leading to less cross linking, which should lead to toroid formation rather than thinner bundles.

      This observation provides the most plausible explanation. However, we did not detect any toroidal structures, even at high concentrations of mZapD. This finding indicates that a more potent dimerization interface is essential for promoting the formation of toroidal structures rather than merely the number of ZapD-FtsZ connections. mZapD presumably has a reduced affinity for FtsZ, which, along with a weaker binding interface, may explain mZapD's inability to facilitate toroid formation.

      iii) This observation would imply that the geometry of the dimeric interaction plays a role in the bending of the FtsZ filaments into toroids? Please comment.

      Our data suggest that the binding density of ZapD to FtsZ polymers is a crucial factor governing the transition from toroidal structures to straight bundles. Toroids form when the polymers have excess free FtsZ (that ZapD does not crosslink). Additional factors, such as the orientation of the interactions, the length of the flexible linker, and the strength of the ZapD dimerization interface, are likely to contribute to these structural reorganizations. However, our current data do not allow for further analysis, and future experiments will be necessary to address these questions.

      (11) Curvature and plasticity of toroid:

      i) What are the factors that stabilise curved protofilaments/toroid structures in the absence of a cross linker, based on earlier studies from B. subtilis. A comparison will be insightful. ii) What is the effect of the linker length between FtsZ globular domain and CTP in the toroid spacing?

      Huecas et al. 2017 (Biophys J - DOI: 10.1016/j.bpj.2017.08.046) concluded that the disordered CTL of FtsZ serves as a spacer that modulates the self-organization of FtsZ polymers. They proposed that this intrinsically disordered CTL, which spans the gap between protofilament cores, provides approximately 70 Å of lateral spacing between the curved Bacillus subtilis FtsZ (BsFtsZ), forming toroidal structures. In contrast, the parallel filaments of tailless BsFtsZ mutants, which have a reduced spacing of 50 Å, will likely stick together, resulting in the straight bundles observed. In the full-length BsFtsZ filament, the flexibility allowed by the lateral association favors the coalescence of these curved protofilaments, leading to the formation of toroidal structures. 

      The role of the C-terminal tail of FtsZ in E. coli is critical for its functionality (Buske and Levin 2012 J Biol Chem - DOI: 10.1074/jbc.M111.330324). However, its structural involvement in complex formations remains unclear. Research indicates that any disordered peptide between 43 and 95 amino acids in length can function as a viable linker, while peptides that are significantly shorter or longer impede cell division (Gardner et al. 2013 Mol Microbiol - DOI: 10.1111/mmi.12279). Studies in E. coli and B. subtilis suggest that intrinsically disordered CTLs play a role in determining FtsZ assembly and function in vivo, and this role is dependent on the length, flexibility, and disorder of the tails. These aspects still require further exploration.

      iii) How is it concluded that the concentration of ZapD is modulating the behaviour of the toroid structure? ZapD as a molecule does not have much room for conformational flexibility beyond a few angstroms, in the absence of long flexible regions. Rather, shouldn't the linker length of FtsZ to the CTP decide the plasticity of the toroid?

      The length and flexibility of the linker can significantly influence structural interactions. As previously mentioned, a longer linker will likely enhance the range of interaction distances and orientations. However, specific interaction of ZapD and FtsZ is stronger than non-specific electrostatic FtsZ-FtsZ interactions, and this is not solely due to the flexibility of the linker. Instead, it can modulate the formation of either a toroidal structure or straight bundles.

      iv) "a minor free energy perturbation to bring about significant changes in the geometry of the fibers due to modifications in environmental conditions" - this sentence is not clear to me. How did the data described in the paper relate to minor free energy perturbations and how do environmental conditions affect this?

      This sentence aimed to convey the notion of polymorphism in FtsZ polymers. We acknowledge that the original version may have been unclear, so we have removed it in the new version of the manuscript (page 12, lines 1-2).

      (12) Missing controls:

      i) Supplementary Figure 2a: Interaction between ZapD and FtsZ: what was the negative control used in this experiment? Use of FtsZ with the CTP deletion or ZapD specific mutations will help in confirming that the Kd estimation is indeed driven by a specific interaction.

      Negative controls correspond to FtsZ and ZapD alone.

      ii) In a turbidity measurement, how will you distinguish between ZapD mediated bundling, ZapD independent bundling and FtsZ filaments alone? Here again, having a data with non-interacting mutational partners will make the data more reliable.

      The turbidity signal of individual proteins in the absence and presence of GTP is indistinguishable from that of the buffer. We have indicated this in the figure legend.

      iii) Control experiments to show that mZapD is folded (see point below) and to indeed prove that it is monomeric is missing.

      We have included the missing AUC data in the supplementary information (Supp Fig 15a).

      Minor points:

      -  Page 2, para 4: beta-sheet domain (instead of beta-strand)

      Done.

      -  Fig 2a and b: Why is a ratio mentioned in Figure 2a legend? I understood these images as individual proteins at 10 uM concentrations.

      That was a typing error; it corresponds to two individual proteins at 10 µM concentrations. 

      -  Fig 2. Y-axis - spelling of frequency (change in all figures where applicable)

      Corrected.

      -  Supplementary Figure 5: FtsZ 5 uM - change u to micro symbol. FtsZ - t is missing

      Corrected. 

      -  Molecular weight marker is xx. What does xx stand for?

      Corrected. 

      -  Fig 1: Units for GTPase activity on the y-axis is missing.

      Done.

      -  Suppl Fig 3: How was the normalisation carried out for the turbidity data?

      We have explained it the revised methods section. 

      -  Page 4, line 5: p missing in ZapD

      Done. 

      -  Page 5: paragraph 1, last sentence: stabilised or established?

      Done.

      -  Page 6: 3rd sentence from last: correct the sentence (one ZapD two FtsZ)

      Corrected. 

      -  Page 14: Fluorescence microscopy and FRAP experiments have not been described in the manuscript. Hence, these are not required in the methods.

      Corrected. 

      -  Please include representative gels of purified protein samples used in the assay for sample quality control.

      Controls for each protein are shown in Supplementary Fig. 5a as “control samples” corresponding to 5 µM of each protein before centrifugation.

      Reviewer #3 (Recommendations for the authors):

      Fig. S2a confirms and quantitates the interaction of ZapD with FtsZ-GDP monomers by F.A. It shows a surprisingly high Kd of ~10 µM. This seems important but it is ignored in the overall interpretation. Fig. S2b (FCS) suggests an even weaker interaction, but this may reflect higher order aggregates.

      As the reviewer points out, the interaction between ZapD and FtsZ in the GDP form is weak, consistent with the need for high concentrations of ZapD to form FtsZ macrostructures in the presence of GTP.

      We did not observe the formation of ZapD aggregates, even at higher protein (Author response image 1A) and salt (Author response image 1B) concentrations.

      Author response image 1.

      A) Sedimentation velocity (SV) profiles of ZapD over a concentration range of 2 to 30 µM in 50 mM KCl, 5 mM MgCl2, Tris-HCl pH 7. B) SV profiles of ZapD at 10 µM in different ionic strength concentrations in buffer 50-500 mM KCl, 5 mM MgCl2, 50 mM Tris-HCl pH 7. Abs280 measurements were collected at 48,000 rpm and 20 ºC. 

      Describing their assembly of toroids the authors state "Upon adding equimolar amounts of ZapD, corresponding to the subsaturating ZapD binding densities described in the previous section". My reading of Fig. 1b and S5 is that FtsZ is almost fully saturated at 1:1 concentration; In S5a at 5:5 µM about 25% of each is in the pellet, which is near 1:1 saturation. It is certainly >50% saturated. Shouldn't this be clarified to read "slightly substoichiometric. Of course, that undermines the identification of ZapD as such a substoichiometric number.

      We have rephrased the sentence following the reviewer’s suggestions to clarify matters (page 5, lines 39-40).

      The cryoET images in Fig. 3 are an average of five slices with a total thickness of 32 nm. The circular "short filaments..almost parallel" are therefore not single 5 nm diameter FtsZ filaments but must be alignment of filaments axially into sheets (or belts, the axial structure shown in Fig. S8e, discussed next). Importantly, the authors indicate "connections between filaments" by red arrows. This seems wrong for two reasons. (1) The "connections" are very sparse, and therefore not consistent with the near saturation of FtsZ by ZapD. (2) To show up in the 32 nm averaged slice, connections from multiple filaments would have to be aligned. Fig. 3e is a "view of the segmented toroidal structure." I think it shows sheets of filaments as noted above, and the suggested "crosslinks" are again very sparse and no more convincing.

      We thank the reviewer for pointing this out. This was an error on our part, which we have corrected in the figure legend of the revised version of the manuscript. The tomographic slice shown in Fig. 3a is an average of 5 slices, each with a pixel size of 0.86 nm, corresponding to a pixel size of 4.31 nm. It therefore corresponds to the thickness of a single FtsZ filament. The few red arrows indicate lateral connections between filaments, and as discussed earlier, ZapDs also crosslinks FtsZ filaments vertically, giving rise to the elongated structures observed in the Z-direction.

      All 3-D reconstructions and segmented renditions should have a scale bar. The axial cylindrical sheets seem to be confirmed and qualified in Fig. S8e. The cylindrical sheets are not continuous, but seem to consist of belt-like filaments that are ~8-10 nm wide in the axial direction. Adjacent belts are separated axially by ~5 nm gaps, and radially by 4-20 nm. The densest filaments in the projection image Fig. 3b are probably an axial superposition of 2-3 belts, while the lighter filaments may be individual belts.

      Fig. 4 shows a higher number of crosslinks but nowhere near a 1:1 stoichiometry. Most importantly to me, the identification of crosslinks vs filaments seems completely arbitrary. For example, if one colored grey all of the densities I 4a right panel, I would have no way to duplicate the distinctions shown in red and blue. Even if we accept the authors' distinction, it does not provide much structural insight. Continuous bands or sheets are identified as FtsZ, without any resolution of substructure, and any density outside these bands is ZapD. The spots identified as ZapD seem randomly dispersed and much too sparse to include all the ~1:1 ZapD.

      We appreciate the reviewer's comments. Scale bars are present in the tomographic slices but not in the 3D views, as these are perspective views, and it would be inappropriate to include scale bars. To provide context for the images, we added the dimensions of the toroids and toroid sections to the figure legends. 

      As previously mentioned, the resolution of our data limits our ability to accurately segment ZapD densities, especially in the Z direction. In Fig. 4, we have done our best to segment the ZapD densities at the top and sides of the FtsZ filaments, but many densities have been missed. We have clarified this point in the text and in the figure legend. We have clarified this point in both the text and the figure legends. This preliminary annotated view is meant to help illustrate the formation of the toroids. In Fig. 3, we have labeled only a few arrows to highlight the lateral connections between the FtsZ filaments; however, there are many more connections than those indicated.

      Fig. S12 explores the effect of increasing ZapD to 1:6, and the authors conclude "the high concentration of ZapD molecules increased the number of links between filaments and ultimately promoted the formation of straight bundles." However, the binding sites on FtsZ are already nearly saturated at 10:10.

      We cannot assume that all FtsZ binding sites are present at a 1:1 ratio. Our pelleting assay confirms the presence of both proteins in the pellet, but we should be cautious about quantification due to the limitations of this technique. Based on our cryo-EM experiments, the amount of ZapD associated with these structures is much lower. We hypothesize that ZapD proteins sediment with the large FtsZ structures, acting as an external decoration for the toroids. A single ZapD monomer may be bound to multiple outer filaments of the structures, which could effectively increase the total µM concentration observed in the pelleting assay. This situation may explain the enrichment of ZapD in the pellet at high concentrations, when theoretically only a 1:1 ratio should be possible. We have observed external decorations of ZapD at high concentrations (see Supplementary Fig. 6). We believe that the pelleting assay simplifies the system and should be used to complement the cryo-EM images.

      Minor points.

      In the Intro "..to follow a treadmilling behavior, similar to that of actin filaments.9-13." These refs have little to do with treadmilling. I suggest: Wagstaff..Lowe mBio 2017; Du..Lutkenhaus PNAS 2018; Corbin Erickson BJ 2020; Ruis..Fernandez-Tornero Plos Biol 2022.

      Following the reviewer’s suggestions, we have modified the references in the revised version. 

      The authors responded to a query during review stating that the concentration of ZapD always refers to the monomer subunit. That seems certainly the case for Fig. S1, but the caption to Fig. 1a confuses the stoichiometry issue: "expecting (sic) at around 2:1 FtsZ:ZapD." Perhaps it could be clarified by stating that the Fig. shows only half the FtsZ's occupied. But in Fig. 1b the absorbance reaches its maximum at equimolar FtsZ and ZapD. That means that all FtsZ's are bound to a ZapD monomer. Why not draw the model in 1A show that? Fig. S5 is also consistent with this 1:1 stoichiometry. And this might be the place to contrast the planar model with the stacked model suggested by Fig. 5 where the two FtsZ filaments are ~8 nm apart, and the ZapD bridging them is on top.

      We have revised the legend for Fig. 1a to improve its readability. In Fig. 1b, the absorbance data indicate that most FtsZ proteins form macrostructures; however, this does not imply that all FtsZ proteins are bound to ZapDs. Our findings demonstrate that this binding only occurs in the case of straight bundles.

      It may help to note that some previous studies have expressed the concentration of ZapD as the dimer. E.g., Roach..Khursigara 2016 found maximal pelleting at FtsZ:ZapD(dimer) of 2:1 (their Fig. 3), completely consistent with the 1:1 FtsZ:ZapD(monomer) in the present study.

      We recognize this discrepancy in the literature. Therefore, throughout the manuscript, the molar concentrations of both proteins are expressed in terms of the FtsZ and ZapD monomer species.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      Lodhiya et al. demonstrate that antibiotics with distinct mechanisms of action, norfloxacin, and streptomycin, cause similar metabolic dysfunction in the model organism Mycobacterium smegmatis. This includes enhanced flux through the TCA cycle and respiration as well as a build-up of reactive oxygen species (ROS) and ATP. Genetic and/or pharmacologic depression of ROS or ATP levels protect M. smegmatis from norfloxacin and streptomycin killing. Because ATP depression is protective, but in some cases does not depress ROS, the authors surmise that excessive ATP is the primary mechanism by which norfloxacin and streptomycin kill M. smegmatis. In general, the experiments are carefully executed; alternative hypotheses are discussed and considered; the data are contextualized within the existing literature. Clarification of the effect of 1) ROS depression on ATP levels and 2) ADP vs. ATP on divalent metal chelation would strengthen the paper, as would discussion of points of difference with the existing literature. The authors might also consider removing Figures 9 and 10A-B as they distract from the main point of the paper and appear to be the beginning of a new story rather than the end of the current one. Finally, statistics need some attention.

      Strengths:

      The authors tackle a problem that is both biologically interesting and medically impactful, namely, the mechanism of antibiotic-induced cell death.

      Experiments are carefully executed, for example, numerous dose- and time-dependency studies; multiple, orthogonal readouts for ROS; and several methods for pharmacological and genetic depletion of ATP.

      There has been a lot of excitement and controversy in the field, and the authors do a nice job of situating their work in this larger context.

      Inherent limitations to some of their approaches are acknowledged and discussed e.g., normalizing ATP levels to viable counts of bacteria.

      We sincerely appreciate the reviewer’s encouraging feedback.

      Weaknesses:

      The authors have shown that treatments that depress ATP do not necessarily repress ROS, and therefore conclude that ATP is the primary cause of norfloxacin and streptomycin lethality for M. smegmatis. Indeed, this is the most impactful claim of the paper. However, GSH and dipyridyl beautifully rescue viability. Do these and other ROS-repressing treatments impact ATP levels? If not, the authors should consider a more nuanced model and revise the title, abstract, and text accordingly.

      We thank the reviewer for asking this question. In the revised version of the manuscript, we have included data on the impact of the antioxidant GSH on antibiotic-induced ATP levels as the supplementary figure (S9C)

      Does ADP chelate divalent metal ions to the same extent as ATP? If so, it is difficult to understand how conversion of ADP to ATP by ATP synthase would alter metal sequestration without concomitant burst in ADP levels.

      We sincerely thank the reviewer for raising this insightful question. Indeed, ADP and AMP can also form complexes with divalent metal ions; however, these complexes tend to be less stable. According to the existing literature, ATP-metal ion complexes exhibit a higher formation constant compared to ADP or AMP complexes. This has been attributed to the polyphosphate chain of ATP, which acts as an active site, forming a highly stable tridentate structure (Khan et al., 1962; Distefano et al., 1953). An antibiotic-induced increase in ATP levels, irrespective of any changes in ADP levels or a total pool size of purine nucleotides, could still result in the formation of more stable complexes with metal ions, potentially leading to metal ion depletion. Although recent studies indicate that antibiotic treatment stimulates purine biosynthesis (Lobritz MA et al., 2022; Yang JH et al., 2019), thereby imposing energy demands and enhancing ATP production, and therefore, the possibility of a corresponding increase in total purine nucleotide levels (ADP+ATP) exist (is mentioned in discussion section). However, this hypothesis requires further investigation.

      Khan MMT, Martell AE. Metal Chelates of Adenosine Triphosphate. Journal of Physical Chemistry (US). 1962 Jan 1;Vol: 66(1):10–5

      Distefano v, Neuman wf. Calcium complexes of adenosinetriphosphate and adenosinediphosphate and their significance in calcification in vitro. Journal of Biological Chemistry. 1953 Feb 1;200(2):759–63

      Lobritz MA, Andrews IW, Braff D, Porter CBM, Gutierrez A, Furuta Y, et al. Increased energy demand from anabolic-catabolic processes drives β-lactam antibiotic lethality. Cell Chem Biol [Internet]. 2022 Feb 17.

      Yang JH, Wright SN, Hamblin M, McCloskey D, Alcantar MA, Schrübbers L, et al. A White-Box Machine Learning Approach for Revealing Antibiotic Mechanisms of Action. Cell [Internet]. 2019 May 30

      Reviewer #1 (Recommendations for the authors):

      (1) Some of the results in the paper diverge from what has been previously reported by some of the referenced literature. These discrepancies should be clarified.

      We apologize for any confusion, but we are uncertain about the specific discrepancies the reviewer is referring. In the discussion section, we have addressed and analysed our results within the broader context of the existing literature, regardless of whether our findings align with or differ from previous studies.

      (a) CCCP, nigericin, BDQ, and the atpD mutant all appear to affect M. smegmatis growth (Figures S6C, S7C, S7D-E, and Figure 1B from reference 41). Could depressed growth contribute to the rescue effects of these compounds?

      We concur with the reviewer that the reagents we used (CCCP, Nigericin, and BDQ) to suppress the ATP burst in the presence of antibiotics do affect bacterial growth. This growth sub-inhibitory effect is expected given their roles in either uncoupling the electron transport chain from oxidative phosphorylation or directly inhibiting ATP synthase, leading to reduced ATP production compared to the untreated control. However, we chose concentrations that reduces the antibiotic-induced surge in ATP levels without significantly depriving the bacteria of the ATP  essential for their survival, thereby avoiding cell death.

      Consequently, all three reagents (as shown in Figures S6C, S7C, and S7D-E) were employed at non-lethal concentrations. We would like to emphasize, however, that it was not feasible to select a reagent concentration that had no impact on growth yet still suppressed the antibiotic-induced ATP burst. We recognize the possibility that growth retardation may have contributed to the observed rescue effects. To address this concern, we used multiple orthogonal methods (CCCP, Nigericin, and BDQ), each with distinct mechanisms having a common effect of reducing the ATP surge, to minimize off-target effects and support our findings.

      Also, the authors report no growth phenotype for atpD mutant (Figure S8) but only carry out the growth curve to an OD of 2, which is approximately where the growth curve from ref 41 begins to diverge.

      Additionally, to further confirm that bacterial rescue was not due to growth retardation caused by these reagents, we utilized the atpD mutant. All experiments, including those involving the atpD mutant, were conducted when the OD600nm reached 0.8 (during the exponential phase). We specifically ensured that the growth of the atpD mutant was not compromised during this phase (Figure S8) and restricted our growth curve to the early stationary phase (OD600 between 1.5 and 2). While it is possible that the atpD mutant may exhibit slower growth compared to wild-type bacteria in stationary phase at an OD600nm of 4 (as shown in ref 41), however, this does not impact our observations.

      (b) Reference 41 also reports that the atpD mutant is more sensitive to some antibiotics  (Figure 6). This includes isoniazid, which references 34 and 35 have both reported caused an ATP burst.

      We acknowledge the reviewer’s query regarding the phenotype of the atpD mutant against isoniazid (Reference 41). However, the cited reference does not provide clarity on why the M. smegmatis atpD mutant exhibits increased sensitivity to isoniazid and other antibiotics, nor does it explain whether this sensitivity is due to reduced ATP levels or altered cell wall properties, such as enhanced drug uptake, as observed with Nile red and ethidium bromide.

      While references 34 and 35 reported an ATP burst following isoniazid treatment in slow-growing M. bovis BCG and M. tuberculosis, it remains to be tested whether isoniazid acts similarly in the fast-growing M. smegmatis, where it is bacteriostatic rather than being bactericidal as observed in M. bovis BCG and M. tuberculosis.  

      (2) The statistics require some attention. First, the wording for almost all of the figures is something like "data points represent the mean of at least three independent replicates," is that correct? CFUs are notoriously messy so it is surprising (impressive?) that the variability between replicates is so low. Second, t-tests are not appropriate for multiple comparisons.

      We thank the reviewer for raising this important query. It is correct that all our experiments included at least three independent replicates, and many of our results exhibit a high degree of variability, as indicated by the large error bars. We would like to clarify that we did not perform multiple comparisons on our results. For all analyses, an unpaired t-test was conducted between the control group and one experimental group at a time. Consequently, statistical data were generated for each pair of results, and the comparisons were displayed on the graph relative to the control data points, as mentioned in the Methods section under the heading “Statistical analysis”

      (3) Figures 9 and 10A-B seem tangential to the main point of the paper and, in the case of Figure 10A-B, preliminary.

      In this study, our aim was to comprehensively investigate the nature of antibiotic-induced stresses (i.e., mechanisms of action from T = 15 hrs) and leverage these insights to enhance our understanding of bacterial adaptation mechanisms, particularly antibiotic tolerance (from T = 25 hrs). While a significant portion of the manuscript focuses on the secondary consequences of antibiotic exposure, we also sought to assess the bacteria's ability to counteract these stresses, contributing to our understanding of how antibiotic tolerance phenotypes develop.

      The results presented in Figure 9 clearly demonstrate that bacteria attempt to reduce respiration by decreasing flux through the complete TCA cycle, thereby mitigating ROS and ATP production in response to antibiotics. These findings not only uncovers potential metabolic pathways to downregulate respiration but also validate our observations regarding the role of increased respiration, ROS generation, and subsequent ATP production in antibiotic action.

      Importantly, bacterial responses to antibiotics were not limited to metabolic adaptations. They also included the upregulation of the intrinsic drug resistance determinant Eis (Figure 10A) and an increase in mutation frequency (Figure 10B), both of which indicate a greater likelihood of these bacteria developing antibiotic tolerance and resistance. Therefore, the data presented in Figures 9 and 10A-B are not peripheral to the central theme of the paper. Rather, they complement and strengthen it by providing a comprehensive understanding of the consequences of antibiotic exposure, which aligns with the primary objectives of our study.

      Do the various perturbations used here (especially streptomycin) effect expression and/or turnover of the genetically-encoded sensors Mrx1-roGFP2 or Peredox-mCherry?

      We appreciate the reviewer for raising this query. Since streptomycin treatment leads to mistranslation and eventually inhibits protein synthesis, it is possible that such treatment could impact the expression and/or turnover of the genetically encoded biosensors, Mrx1-roGFP2 (1) or Peredox-mCherry (2). However, we do not anticipate any effects on the readout as both biosensors provide ratiometric measurements of redox potential and NADH levels, respectively, which eliminates errors due to variations in protein abundance. Nevertheless, in our experiments with both drugs, we employed multiple time- and dose-dependent responses, ensuring that all meaningful conclusions were drawn from the overall trends seen in the data rather than an individual data point.

      (1) Bhaskar A, Chawla M, Mehta M, Parikh P, Chandra P, Bhave D, et al. (2014) Reengineering Redox Sensitive GFP to Measure Mycothiol Redox Potential of Mycobacterium tuberculosis during Infection. PLoS Pathog 10(1): e1003902. https://doi.org/10.1371/journal.ppat.1003902

      (2) Shabir A. Bhat, Iram K. Iqbal, and Ashwani Kumar*. Imaging the NADH:NAD+ Homeostasis for Understanding the Metabolic Response of Mycobacterium to Physiologically Relevant Stresses. Front Cell Infect Microbiol. 2016; 6: 145. doi: 10.3389/fcimb.2016.00145

      (4) Do the antibiotics affect permeability? Especially relevant to CellROX experiments.

      Antibiotics can impact, or even increase, bacterial membrane permeability, a phenomenon noticed in case of self-promoted uptake of aminoglycosides. When aminoglycosides bind to ribosomes, they induce mistranslation, including of membrane proteins, leading to the formation of membrane pores, which in turn enhances antibiotic uptake and lethality (1-2). However, whether the antibiotics used in our study (norfloxacin and streptomycin) at the concentrations applied altered membrane permeability is not known.

      Experiments involving the CellROX dye are unlikely to be influenced by changes in membrane permeability, as the dye is freely permeable to the mycomembrane.

      References:

      (1) Davis BD Chen LL Tai PC (1986) Misread protein creates membrane channels: an essential step in the bactericidal action of aminoglycosides PNAS 83:6164–6168.

      (2) Ezraty B Vergnes A Banzhaf M Duverger Y Huguenot A Brochado AR Su SY Espinosa L Loiseau L Py B Typas A Barras F (2013) Fe-S cluster biosynthesis controls uptake of aminoglycosides in a ROS-less death pathway Science 340:1583–1587.

      (5) Figures 4E-H does GSH affect bacterial growth/viability on its own i.e. in the absence of a drug?

      We thank the reviewer for raising this query. Indeed, the 10 mM GSH used in our experiments to mitigate and rescue cells from antibiotic-induced ROS does impact bacterial growth on its own, though it does not affect viability, likely due to GSH inducing reductive stress on bacterial physiology. For clarification, we have included the viability measurement data in the presence of 10 mM GSH alone in the revised version of the manuscript, as supplementary figure (S4E).

      (6) p. 2 "...antibiotic resistance involves more complex mechanisms and manifests as genotypic resistance, antibiotic tolerance, and persistence." This reads as tolerance and persistence being a subset of resistance, which is not quite accurate. There is at least one other example of similar wording in the text.

      We thank the reviewer for highlighting this point. Our intention was to convey that resistance to antibiotics can manifest in two forms: permanent or genetic resistance, and transient resilience through antibiotic tolerance and persistence.

      (7) p. 3 "...and showing no visible differences in the growth rate...". It is hard to say this as all the values appear to be 0 - possible to zoom in on the CFU counts in this region? Same comment for p. 5 "...the unaffected growth rate in the early response phase...".

      We apologize for the lack of clarity regarding the resolution of the early time points in the growth curve. Unfortunately, it was not feasible for us to zoom in on the initial time points due to the significant difference in cell viability between T=0 and T=25 hours (i.e., spanning 8 generations). For clarification in the growth phenotype at early time points, please refer to Author response image 1, where CFU counts are plotted on a logarithmic scale. The y-axis spans 6-8 orders of magnitude across different conditions, making it difficult to visualize early time points on a linear scale.

      Author response image 1.

      (8) p. 5 "...data for each condition were subjected to rigorous quality control analysis (S2B)." I believe that this is the case, but how Figure S2B demonstrates this fact is not clear.

      Figures S2A and S2B present the quality assessment data for all six proteomics datasets. Figure S2A illustrates the consistency in the number of proteins identified across 10 samples (5 independent replicates for both control and drug treatment). The minimal variation in the number of identified proteins indicates reproducibility across the different runs. Similarly, Figure S2B displays the variability in Pearson correlation coefficient values of protein abundance (LFQ intensities) across the 10 samples. The closer and more consistent the Pearson correlation values, the greater the reproducibility of the quantitative data acquisition.

      (9) p. 7 "To look for a shared mechanism of antibiotic action..." The wording implies an assumption. Perhaps "to test whether" would be more appropriate? Same comment for p. 12 "To further confirm whether enhanced respiration ...".

      We appreciate the reviewer’s suggestions for both sentences and have made the necessary changes in the revised version. Thank you for bringing this to our attention.

      (10) Figure S1A-B figure legend. How was this assay performed?

      The experiment for Figures S1A-B was conducted using a standard REMA assay, as described in the methods section. Cells were harvested at the 25th-hour time point, and drug MICs were compared between cells grown with and without 1/4x MBC99 of the drugs. This was done to determine whether the growth recovery observed during the recovery phase was due to the presence of drug-resistant bacteria.

      (11) p. 14 "...(CCCP), a protonophore, at non-toxic levels..." Figure S6C implies an effect on growth.

      As clarified earlier in response to query 1(a), the CCCP reagent was used at concentrations that effectively minimize the antibiotic-induced surge in ATP levels. However, at these concentrations, CCCP reduces cellular ATP production (Figure S6A), leading to bacterial growth delay (Figure S6C). By "non-toxic levels," we intended to convey that these concentrations of CCCP are non-lethal to the bacteria, as evidenced in Figure S6C.

      (12) Figure 8A y axis is this CFU/mL or OD/mL?

      The y-axis for the figure 8A depicts CFU/ml as it measures the cell survival in response to increasing concentrations of bipyridyl.

      Reviewer #2 (Public review):

      Summary:

      The authors are trying to test the hypothesis that ATP bursts are the predominant driver of antibiotic lethality of Mycobacteria.

      Strengths:

      This reviewer has not identified any significant strengths of the paper in its current form.

      Weaknesses:

      A major weakness is that M. smegmatis has a doubling time of three hours and the authors are trying to conclude that their data would reflect the physiology of M. tuberculosis which has a doubling time of 24 hours. Moreover, the authors try to compare OD measurements with CFU counts and thus observe great variabilities.

      If the authors had evidence to support the conclusion that ATP burst is the predominant driver of antibiotic lethality in mycobacteria then this paper would be highly significant. However, with the way the paper is written, it is impossible to make this conclusion.

      We have identified a new mechanism of antibiotic action in Mycobacterium smegmatis. However, as discussed extensively in the manuscript's discussion section, whether and to what extent this mechanism applies to other organisms still needs to be tested.

      We have always drawn inferences from the CFU counts as the OD600nm is never a reliable method as reported in all of our experiments.

      Reviewer #2 (Recommendations for the authors):

      Figure 1 needs to have an x-axis that has intervals that have 10E5 CFU to 4 x 10E8. But even 4 x 10E8 CFU/ml is a late log and not exponentially growing cells.

      Figure 1 illustrates the growth curve. We hope the reviewer meant the Y axis which represents CFU/ml on a linear scale. As mentioned in response to reviewer #1’s query no. 7, it was not feasible to include the viability (CFU/ml) values at T=0 and a few subsequent time points. Naturally, the starting cell count was not zero; we began with approximately 600,000 CFU/ml, corresponding to an OD600nm of 0.0025/ml. For clarification, we have mentioned the initial OD as well CFU/ml at T= 0 hr in the figure legend.  

      Carefully look at Figure 1, what were you trying to show? Your x-axis goes from 0 to 10E8, of course you did not inoculate 0 cells, but if you had measured CFUs, you might not have gotten the great variability you reported in your graph.

      We assume that the reviewer is suggesting that "if we had measured OD600nm/ml instead of CFU/ml, we might not have observed the high variability we reported." While we agree with the reviewer's comment, our decision to use CFU/ml for growth measurement was to obtain more resolved and detectable data points, as an OD600nm of 0.0025/ml cannot be reliably measured with a spectrophotometer. Additionally, at around T=15 hours, where we observed an extended lag phase (referred to as the stress phase), the OD600nm was approximately 0.05, which is barely detectable. Therefore, the significant differences between the control group and the ¼ x MBC99 drug-treated group might not have been observed if we had relied on OD-based measurements. Despite the presence of high error bars and variability in the data points, we were still able to demonstrate clear differences in bacterial growth between treated and untreated samples at sub-lethal drug doses. This ultimately allowed us to capture the nature of antibiotic-induced stresses.

      There is no doubt that sublethal concentrations of antibiotics will have an effect on the bacterial cells. But it is not clear how you are concluding that ATP burst is the dominant driver of lethality. M. smegmatis can be very different from Mtb.

      Using a series of time- and dose-dependent experiments with plasmid and kit-based approaches, we demonstrated that both antibiotics generate and rely on ROS and ATP bursts to induce lethality in M. smegmatis. Careful monitoring of oxidative stress in cells, following specific quenching of the antibiotic-induced ATP burst (Figure 7, S9A-B), revealed that the ATP burst is the dominant driver of antibiotic lethality. In all tested experiments, surviving bacteria exhibited elevated levels of oxidative stress but were able to maintain their viability, suggesting that oxidative stress alone is not the dominant factor in antibiotic-induced lethality. Furthermore, quenching of ROS by glutathione also suppressed antibiotic-induced surge in ATP levels, thus supporting the notion that ROS alone, is not the dominant driver of antibiotic action as previously understood.

      All experiments reported were conducted using fast-growing M. smegmatis, and have acknowledged the need for similar experiments in other bacterial systems, including M. tuberculosis, to assess whether our findings are applicable to other systems.

      Another point, the use of a mutant in the ATP synthase is an interesting idea, but would it be better to use something where you knock out the ATP synthase activity with siRNA or a temperature-sensitive allele?

      We appreciate the reviewer’s encouraging comment. Knocking out ATP synthase would completely halt oxidative phosphorylation and shut down aerobic respiration, leading to severe metabolic and growth defects. Such stressful and non-growing conditions are not suitable for testing the efficacy of antibiotics, as it is widely accepted that antibiotics are more effective against metabolically active bacteria.

      Lastly, the conclusion is that norfloxacin and streptomycin have common mechanisms of action, but the authors do not explain how a DNA gyrase inhibitor shows the same mechanisms of action as a ribosome inhibitor.

      The connection between antibiotic target corruption (DNA gyrase or ribosome) and the activation of respiration is indeed unclear, intriguing, and represents one of the most exciting questions in the field of antibiotic mechanisms of action. In the discussion section, we have speculated on potential pathways for this connection, including the possibility that the inhibition of cell division by both drugs may create a perception of resource scarcity (energy and biosynthetic precursors), which could subsequently trigger increased metabolism, respiration, ROS production, and ATP synthesis. However, the precise mechanisms underlying this connection require further investigation and are beyond the scope of the present study.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This manuscript highlights single-stranded DNA exo- and endo-nuclease activities of ExoIII as a potential caveat and an underestimated source of decreased efficiency in its use in biosensor assays. The data present convincing evidence for the ssDNA nuclease activity of ExoIII and identifies residues that contribute to it. The findings are useful, but the study remains incomplete as the effect on biosensor assays was not established.

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, the authors show compelling data indicating that ExoIII has significant ssDNA nuclease activity that is posited to interfere with biosensor assays. This does not come as a surprise as other published works have indeed shown the same, but in this work, the authors provide a deeper analysis of this underestimated activity.

      Response: Thank you so much for reviewing and summarizing our work.

      Strengths:

      The authors used a variety of assays to examine the ssDNA nuclease activity of ExoIII and its origin. Fluorescence-based assays and native gel electrophoresis, combined with MS analysis clearly indicate that both commercial and laboratory purified ExoIII contain ssDNA nuclease activity. Mutational analysis identifies the residues responsible for this activity. Of note is the observation in this submitted work that the sites of ssDNA and dsDNA exonuclease activity overlap, suggesting that it may be difficult to identify mutations that affect one activity but not the other. In this regard, it is of interest the observation by the authors that the ssDNA nuclease activity depends on the sequence composition of the ssDNA, and this may be used as a strategy to suppress this activity when necessary. For example, the authors point out that a 3′ A4-protruding ssDNA could be employed in ExoIII-based assays due to its resistance to digestion. However, this remains an interesting suggestion that the authors do not test, but that would have strengthened their conclusion.

      Response: Thank you so much for the positive evaluation and insightful comments on our manuscript. In the revised version, we have modified the manuscript to address the reviewer’s concerns by providing point-to-point responses to all the comments.

      Weaknesses:

      The authors provide a wealth of experimental data showing that E. coli ExoIII has ssDNA nuclease activities, both exo- and endo-, however this work falls short in showing that indeed this activity practically interferes with ExoIII-driven biosensor assays, as suggested by the authors. Furthermore, it is not clear what new information is gained compared to the one already gathered in previously published works (e.g. references 20 and 21). Also, the authors show that ssDNA nuclease activity has sequence dependence, but in the context of the observation that this activity is driven by the same site as dsDNA Exo, how does this differ from similar sequence effects observed for the dsDNA Exo? (e.g. see Linxweiler, W. and Horz, W. (1982). Nucl. Acids Res. 10, 4845-4859).

      Response: We agree with the reviewer regarding the limitations in showing the practical influence of the ssDNAse activity in the commercial detection kit. Different from the biosensor in reference 20, our results showed a potential impact of ExoⅢ on another frequently used detection system, as the primer and probe required for the detection kit could be digested by ExoⅢ, leading to a lower detection efficiency. Since the activities of ExoⅢ on ssDNA and dsDNA share a same active center, we reason that the difference in sequence specificity of ExoⅢ on these two types of substrates might be caused in two aspects: on the nuclease, some unidentified residues of ExoⅢ that play an auxiliary role in digesting ssDNA but not in dsDNA, might exist, which contribute to the difference we observed; on the substrate structure, without the base-pairing of complementary sequence, the structure of ssDNA is more flexible (changeable with environmental factors such as ions and temperature) than that of dsDNA. The two aspects may collectively result in the difference in sequence specificity of ExoⅢ on ssDNA and dsDNA. We believe that cryo-electronic microscopy-based structure analysis of the ExoⅢ-ssDNA complex would provide more comprehensive and direct evidence.

      Because of the claim that the underestimated ssDNA nuclease activity can interfere with commercially available assays, it would have been appropriate to test this. The authors only show that ssDNA activity can be identified in commercial ExoIII-based kits, but they do not assess how this affects the efficiency of a full reaction of the kit. This could have been achieved by exploiting the observed ssDNA sequence dependence of the nuclease activity. In this regard, the work cited in Ref. 20 showed that indeed ExoIII has ssDNA nuclease activity at concentrations as low as 50-fold less than what test in this work. Ref 20 also tested the effect of the ssDNA nuclease activity in Targeted Recycle Assays, rather than just testing for its presence in a kit.

      Response: Thanks so much for your comments. Logically, to evaluate the practical influence, we need to compare the current and improved detection kits. Our result suggested that raising the temperature or using the mutant may minimize the ssDNase activity of ExoⅢ. But the RAA or RPA-ExoⅢ detection kit is multiple-component system consisting of recombinase T4 UvsX, loading factor T4 UvsY, ssDNA binding protein T4 gp32 polymerase Bsu and ExoⅢ (Analyst. 2018 Dec 17;144(1):31-67. doi: 10.1039/c8an01621f), which collectively decide the performance of the kit. By increasing the temperature, the activities or functions of other proteins contained in the detection kit would also be affected, and the resultant change in detection efficiency would not reflect the real practical influence of the ssDNase activity of ExoⅢ; By replacing the wild type with the mutant, the other four proteins need to be prepared and combined with an optimized ratio for rebuilding the detection system, which is challenging. The targeted recycle assays in Ref 20 is a simple system composed of ExoⅢ and corresponding nucleic acid adapters, which could be easily simulated by the researchers for evaluation. Being a much more complex system, the RAA or RPA-ExoⅢ detection kit is difficult to manipulate for displaying the practical influence. Thank you again for your insightful suggestions; and we may conduct a systematic investigation improve the detection kit in future studies.

      Because of the implication that the presence of ssDNA exonuclease activity may have in reactions that are supposed to only use ExoIII dsDNA exonuclease, it is surprising that in this submitted work no direct comparison of these two activities is done. Please provide an experimental determination of how different the specific activities for ssDNA and dsDNA are.

      Response: As for your suggestion, we have compared the digesting rate of two activities by using an equal amount of the commercial ExoⅢ (10 U/µL) on the two types of substrates (10 µM). The results below revealed that ExoⅢ required 10 minutes to digest the 30-nt single-stranded DNA (ssDNA) (A), whereas it could digest the same sequence on double-stranded DNA (dsDNA) within 1 minute (B) (in a newly produced Supplementary Figure S1). This indicated that ExoⅢ digested the dsDNA at a rate at least ten times faster than ssDNA. In conjunction with these results, a recent study has shown that the ssDNase activity of ExoⅢ surpasses that of the conventional ssDNA-specific nuclease ExoI (Biosensors (Basel), 2023, May 26; 13(6):581, doi: 10.3390/bios13060581), suggesting a potential biological significance of ExoⅢ in bacteria related to ssDNA, even though the digesting rate is not as rapid as the dsDNA. The corresponding text has been added to the result (Lines 200-207).

      Author response image 1.

      Reviewer #2 (Public Review):

      Summary:

      This paper describes some experiments addressing 3' exonuclease and 3' trimming activity of bacterial exonuclease III. The quantitative activity is in fact very low, despite claims to the contrary. The work is of low interest with regard to biology, but possibly of use for methods development. Thus the paper seems better suited to a methods forum.

      Response: We thank you for your time and effort in improving our work. In the following, we have revised the manuscript by providing point-to-point responses to your comments.

      Strengths:

      Technical approaches.

      Response: Thanks for your evaluation.

      Weaknesses:

      The purity of the recombinant proteins is critical, but no information on that is provided. The minimum would be silver-stained SDS-PAGE gels, with some samples overloaded in order to detect contaminants.

      Response: As suggested, we have performed the silver-stained SDS-PAGE on the purified proteins. The result below indicated that no significant contaminant was found, except for a minor contaminant in S217A (in a newly produced Supplementary Figure S4).

      Author response image 2.

      Lines 74-76: What is the evidence that BER in E. coli generates multinucleotide repair patches in vivo? In principle, there is no need for the nick to be widened to a gap, as DNA Pol I acts efficiently from a nick. And what would control the extent of the 3' excision?

      Response: Thank you for the insightful questions. The team of Gwangrog Lee lab has found that ExoⅢ is capable of creating a single-stranded DNA (ssDNA) gap on dsDNA during base excision repair, followed by the repair of DNA polymerase I. The gap size is decided by the rigidity of the generated ssDNA loop and the duplex stability of the dsDNA (Sci Adv. 2021 Jul 14;7(29):eabg0076. doi: 10.1126/sciadv.abg0076).

      Figure 1: The substrates all report only the first phosphodiester cleavage near the 3' end, which is quite a limitation. Do the reported values reflect only the single phosphodiester cleavage? Including the several other nucleotides likely inflates that activity value. And how much is a unit of activity in terms of actual protein concentration? Without that, it's hard to compare the observed activities to the many published studies. As best I know, Exo III was already known to remove a single-nucleotide 3'-overhang, albeit more slowly than the digestion of a duplex, but not zero! We need to be able to calculate an actual specific activity: pmol/min per µg of protein.

      Response: Yes, once the FQ reporter is digested off even one nucleotide or phosphodiester, fluorescence will be generated, and the value reflects how many phosphodiesters at least have been cleaved during the period, based on which the digesting rate or efficiency of the nuclease on ssDNA could be calculated. The following Figure 2 and 3 showed ExoⅢ could digest the ssDNA from the 3’ end, not just a single nucleotide. Since the “unit” has been widely used in numerous studies (Nature. 2015 Sep 10;525(7568):274-7; Cell. 2021 Aug 19;184(17):4392-4400.e4; Nat Nanotechnol. 2018 Jan;13(1):34-40.), its inclusion here aids in facilitating comparisons and evaluations of the activity in these studies. And the actual activity of ExoⅢ had been calculated in Figure 4D.

      Figures 2 & 3: These address the possible issue of 1-nt excision noted above. However, the question of efficiency is still not addressed in the absence of a more quantitative approach, not just "units" from the supplier's label. Moreover, it is quite common that commercial enzyme preparations contain a lot of inactive material.

      Response: Thanks for your comments. In fact, numerous studies have used the commercial ExoⅢ (Nature. 2015 Sep 10;525(7568):274-7; Cell. 2021 Aug 19;184(17):4392-4400.e4; Nat Nanotechnol. 2018 Jan;13(1):34-40.). Using this universal label of “units” helps researchers easily compare or evaluate the activity and its influence. The commercial ExoⅢ is developed by New England Biolabs Co., Ltd., and its quality has been widely examined in a wide range of scientific investigations.

      Figure 4D: This gets to the quantitative point. In this panel, we see that around 0.5 pmol/min of product is produced by 0.025 µmol = 25,000 pmol of the enzyme. That is certainly not very efficient, compared to the digestion of dsDNA or cleavage of an abasic site. It's hard to see that as significant.

      Response: Thanks for your comments; the possible confusion could have arisen due to the arrangement of the figure. Please note that based on Figure 4D, the digestion rate of 0.025 µM ExoⅢ on the substrate is approximately 5 pmol/min (as shown on the right vertical axis), rather than 0.5 pmol/min. Given that the reaction contained ExoⅢ with a concentration of 0.025 uM in a total volume of 10 µL, the quantity of ExoⅢ was determined to be 0.25 pmol (0.025 µmol/L × 10 µL, rather than 25,000 pmol), resulting in a digestion rate of 5 pmol/min. It suggested each molecule of ExoⅢ could digest one nucleotide in 3 seconds (5 pmol nucleotides /0.25 pmol ExoⅢ/60second=0.33 nucleotides/molecular/second). While it may not be as rapid as the digestion of ExoⅢ on dsDNA, a recent study has shown that the ssDNase activity of ExoⅢ surpasses that of the conventional ssDNA-specific nuclease ExoI (Biosensors (Basel), 2023, May 26; 13(6):581, doi: 10.3390/bios13060581), suggesting a potential biological significance of ExoⅢ in bacteria related to ssDNA.

      Line 459 and elsewhere: as noted above, the activity is not "highly efficient". I would say that it is not efficient at all.

      Response: We respectfully disagree with this point. Supported by the outcomes from fluorescence monitoring of FQ reporters, gel analysis of the ssDNA probe, and mass spectrometry findings, the conclusion is convincing, and more importantly, our findings align with a recent study (Biosensors 2023, 13(6), 581; https://doi.org/10.3390/bios13060581).

      Reviewer #3 (Public Review):

      Overall:

      ExoIII has been described and commercialized as a dsDNA-specific nuclease. Several lines of evidence, albeit incomplete, have indicated this may not be entirely true. Therefore, Wang et al comprehensively characterize the endonuclease and exonuclease enzymatic activities of ExoIII on ssDNA. A strength of the manuscript is the testing of popular kits that utilize ExoIII and coming up with and testing practical solutions (e.g. the addition of SSB proteins ExoIII variants such as K121A and varied assay conditions).

      Response: We really appreciate the reviewer for pointing out the significance and strength of our work. Additionally, we have responded point-by-point to the comments and suggestions.

      Comments:

      (1) The footprint of ExoIII on DNA is expected to be quite a bit larger than 5-nt, see structure in manuscript reference #5. Therefore, the substrate design in Figure 1A seems inappropriate for studying the enzymatic activity and it seems likely that ExoIII would be interacting with the FAM and/or BHQ1 ends as well as the DNA. Could this cause quenching? Would this represent real ssDNA activity? Is this figure/data necessary for the manuscript?

      Response: Thanks so much for your questions. The footprint of ExoⅢ on the dsDNA appears to exceed 5 nucleotides based on the structural analysis in reference #5. However, the footprint may vary when targeting ssDNA. Mass spectrometry analysis in our study demonstrated that ExoⅢ degraded a ~20-nucleotide single-stranded DNA substrate to mononucleotides (Figure 3), suggesting its capability to digest a 5-nt single-stranded DNA into mononucleotides as well. Otherwise, the reaction product left would only be 5-nt ssDNA fragment. Thus, the 5-nt FQ reporter is also a substrate for ExoⅢ. ExoⅢ possibly interacts with BHQ1 and affects its quenching efficiency on FAM to trigger the fluorescence release, as shown in Figure 1A, but this possibility has already been ruled out by the development of the RPA-ExoⅢ detection kit. As pointed out in the introduction part, the kit requires a probe labeled with fluorophore and quencher. If ExoⅢ could affect the fluorophore and quencher causing fluorescence release, the detection kit would yield a false-positive result regardless of the presence of the target, rendering the detection system ineffective. Thus, ExoⅢ does not interfere with the fluorophore and quencher. The digestion of ExoⅢ on the ssDNA within the FQ reporter was the sole cause of fluorescence release, and the emitted fluorescence represented the ssDNA activity. The result suggested that the FQ reporter might offer an effective approach to sensitively detect or quantitatively study the ssDNase activity of a protein that has not been characterized.

      (2) Based on the descriptions in the text, it seems there is activity with some of the other nucleases in 1C, 1F, and 1I other than ExoIII and Cas12a. Can this be plotted on a scale that allows the reader to see them relative to one other?

      Response: Thanks so much for your suggestions. We attempted to adjust the figure, but due to most of the values being less than or around 0.005, it was challenging to re-arrange for presentation.

      (3) The sequence alignment in Figure 2N and the corresponding text indicates a region of ExoIII lacking in APE1 that may be responsible for their differences in substrate specificity in regards to ssDNA. Does the mutational analysis support this hypothesis?

      Response: Our result indicated that the mutation of R170 located in the region (αM helix) resulted in lower digesting efficiency on ssDNA than the wild type, which showed that R170 was an important residue for the ssDNase activity, partially supported the hypothesis. Further investigation is needed to determine whether the structure of the αM helix accounts for the distinctions observed between ExoⅢ and APE1. Future research may require more residue mutations in this area for validation.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      • A significant fraction of amplitude is missing in the presented fluorescence time courses reporting on ssDNA nuclease activity (Figs 1 B, E, and H). Please indicate the dead time of mixing in these experiments, and if necessary include additional points in this time scale. It is unacceptable for the authors to simply connect the zero-time point and the first experimental point with a dashed line.

      Response: We thank the reviewer for pointing out the critical detail. We agree that simply connecting with a dashed line is an inappropriate way for indicating the real fluorescence generated in the initial stage. The fluorescence monitor machine needs about two minutes to initiate from the moment we place the reaction tube into the machine. But ExoⅢ can induce significant fluorescence immediately, reaching the peak within ~40 seconds, as shown in the video data. Therefore, it is difficult to record the initial real-time fluorescence generated. To avoid misleading, we have added a description in the legend as follows: “The dashed line used in the figure does not indicate the real-time fluorescence generated in the reaction but only represents a trend in the period for the monitor machine to initiate (~2 minutes).” The text was added in Lines 836-838.

      • The authors chose to utilize a 6% agarose electrophoresis to analyze digestion products. However, while this approach clearly shows that the substrates are being digested, it does not allow us to clearly estimate the extent. It would be appropriate to include control denaturing PAGE assays to test the extent of reaction, especially for dsDNA that contains a ssDNA extension, as in Figure 8, or for selected mutants to test whether exo activity may be limited to just a few nts, that may not be resolved with the lower resolution agarose gels.

      Response: We agree with the reviewer that denaturing PAGE assays usually is the choice for high-resolution analysis. And we performed this experiment on the short ssDNA, but observed that the bands of digestion products frequently shifted more or less in the gel. Of note, the other independent study also showed a similar phenomenon (Nucleic Acids Res. 2007;35(9):3118-27. doi: 10.1093/nar/gkm168). Even slight band shifting would significantly interfere with our analysis of the results, especially on the short ssDNA utilized in the study. After numerous attempts, we discovered that 6% agarose gel electrophoresis could detect the digested ssDNA bands with lower resolution than PAGE, but less shifting was observed. Considering all the factors, the 6% agarose gel was finally selected to analyze the digestion process.

      Reviewer #2 (Recommendations For The Authors):

      Line 158: tipycal should be typical

      Response: Thanks so much, and as the reviewer pointed, we have corrected the typo.

      Lines 299-300: "ssD-NA" should not be hyphenated, i.e., it should be ssDNA. .

      Response: Thank you for pointing this out. We have rectified the error and thoroughly reviewed the entire paper for any necessary corrections.

      Reviewer #3 (Recommendations For The Authors):

      Figure 2A should indicate the length of the substate. The legend says omitted nucleotides - I assume they were present in the substrate and just not in the figure? The authors should be very clear about this. Moreover, the text and figure do not well describe the design differences between the three probes. Are they the same except just 23, 21, and 20 nt in length? Are the sequences selected at random?

      Response: Thank you for your questions. The lengths of probes were described in the figure (23, 21, and 20 nt). The legend has been reworded in Line 843 as “The squiggle line represents the ~20 nucleotides of the ssDNA oligo.” And the sequences of three ssDNA substrates were randomly selected, and all the detailed information was provided in Supplementary Table S4.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer 1 (Public Review):

      Summary:

      The authors propose that the energy landscape of animals can be thought of in the same way as the fundamental versus realized niche concept in ecology. Namely, animals will use a subset of the fundamental energy landscape due to a variety of factors. The authors then show that the realized energy landscape of eagles increases with age as the animals are better able to use the energy landscape. Strengths:

      This is a very interesting idea and that adds significantly to the energy landscape framework. They provide convincing evidence that the available regions used by birds increase with size.

      Weaknesses:

      Some of the measures used in the manuscript are difficult to follow and there is no mention of the morphometrics of birds or how these change with age (other than that they don’t change which seems odd as surely they grow). Also, there may need to be more discussion of other ontogenetic changes such as foraging strategies, home range size etc.

      We thank reviewer 1 for their interest in our study and for their constructive recommendations. We have included further discussions of these points in the manuscript and outline these changes in our responses to the detailed recommendations below.

      Reviewer 2 (Public Review):

      Summary:

      With this work, the authors tried to expand and integrate the concept of realized niche in the context of movement ecology by using fine-scale GPS data of 55 juvenile Golden eagles in the Alps. Authors found that ontogenic changes influence the percentage of area flyable to the eagles as individuals exploit better geographic uplifts that allow them to reduce the cost of transport.

      Strengths:

      Authors made insightful work linking changes in ontogeny and energy landscapes in large soaring birds. It may not only advance the understanding of how changes in the life cycle affect the exploitability of aerial space but also offer valuable tools for the management and conservation of large soaring species in the changing world.

      Weaknesses:

      Future research may test the applicability of the present work by including more individuals and/or other species from other study areas.

      We are thankful to reviewer 2 for their encouragement and positive assessment of our work. We have addressed their specific recommendations below.

      Recommendations for the authors:

      Reviewer 1 (Recommendations For The Authors):

      I found this to be a very interesting paper which adds some great concepts and ideas to the energy landscape framework. The paper is also concise and well-written. While I am enthusiastic about the paper there are areas that need clarifying or need to be made clearer. Specific comments below:

      Line 64: I disagree that competition is the fundamental driver of the realized niche. In some cases, it may be but in others, predation may be far more important (as an example).

      We agree with this point and have now clarified that competition is an example of a driver of the realized niche. We have also included predation as another example:

      "However, just as animals do not occupy the entirety of their fundamental Hutchinsonian niche in reality [1], for example due to competition or predation risk, various factors can contribute to an animal not having access to the entirety of its fundamental movement niche."

      Intro: I think the authors should emphasize that morphological changes with ontogeny will change the energy landscape for many animals. It may not be the case specifically with eagles but that won’t be true for other animals. For example, in many sharks, buoyancy increases with age.

      We agree and have now clarified that the developmental processes that we are interested in happen in addition to morphological changes:

      "In addition to morphological changes, as young animals progress through their developmental stages, their movement proficiency [2] and cognitive capabilities [3] improve and memory manifests [4]."

      Line 91-93: The idea that birds fine-tune motor performance to take advantage of updrafts is a very important one to the manuscript and should be discussed in a bit more detail. How? At the moment there is a single sentence and it doesn’t even have a citation yet this is the main crux of the changes in realized energy landscape with age. This point should be emphasized because, by the end of the introduction, it is not clear to me why the landscape should be cheaper as the birds age?

      Thank you for pointing out this missing information. We have now added examples to clarify how soaring birds fine-tune their motor performance when soaring. These include for example adopting high bank angles in narrow and weak thermals [5] and reducing gliding airspeed when the next thermal has not been detected [6]:

      "Soaring flight is a learned and acquired behavior [7, 8], requiring advanced cognitive skills to locate uplifts as well as fine-tuned locomotor skills for optimal adjustment of the body and wings to extract the most energy from them, for example by adopting high bank angles in narrow and weak thermals [5] and reducing gliding airspeed when the next thermal has not been detected [6]."

      Results:

      Line 106: explain the basics of the life history of the birds in the introduction. I have no idea what emigration refers to or the life history of these animals.

      Thank you for pointing out the missing background information. We have now added this

      information to the introduction:

      "We analyzed 46,000 hours of flight data collected from bio-logging devices attached to 55 wild-ranging golden eagles in the Central European Alps. These data covered the transience phase of natal dispersal (hereafter post-emigration). In this population, juveniles typically achieve independence by emigrating from the parental territory within 4-10 months after fledging. However, due to the high density of eagles and consequently the scarcity of available territories, the transience phase between emigration and settling by eventually winning over a territory is exceptionally long at well over 4 years. Our hypothesis posited that the realized energy landscape during this transience phase gradually expands as the birds age."

      What I still am having a hard time understanding is the flyability index. Is this just a measure of the area animals actively select and then the assumption that it’s a good region to fly within?

      We have modified our description of the flyability index for more clarity. In short, we built a step-selection model and made predictions using this model. The predictions estimate the probability of use of an area based on the predictors of the model. For the purpose of our study and what our predictors were (proxies for uplift + movement capacity), we interpreted the predicted values as the "flyability index". We have now clarified this in the methods section:

      "We made the predictions on the scale of the link function and converted them to values between 0 and 1 using the inverse logit function [9]. These predicted values estimated the probability of use of an area for flying based on the model. We interpreted these predicted values as the flyability index, representing the potential energy available in the landscape to support flight, based on the uplift proxies (TRI and distance to ridge line) and the movement capacity (step length) of the birds included in the model."

      It might also be useful to simply show the changes in the area the animals use with age as well (i.e. a simple utilization distribution). This should increase in age for many animals but would also be a reflection of the resources animals need to acquire as they get older.

      We have now added the figure S2 to the supplementary material. This plot was created by calculating the cumulative area used by the birds in each week after emigration. This was done by extracting the commuting flights for each week, converting these to line objects, overlapping the lines with a raster of 100*100 m cell size, counting the number of overlapping cells and calculating the area that they covered. We did not calculate UDs or MCPs because the eagles seem to be responding to linear features of the landscape, e.g. preferring ridgelines and avoiding valleys. Using polygons to estimate used areas would have made it difficult to ensure that decision-making with regards to these linear features was captured.

      In a follow-up project, a PhD student in the golden eagle consortium is exploring the individuals’ space use after emigration considering different environmental and social factors. The outcome of that study will further complete our understanding of the post-emigration behavior of juvenile golden eagles in the Alps.

      How much do the birds change in size over the ontogeny measured? This is never discussed.

      Thank you for bringing up this question. The morphometrics of juvenile golden eagles are not significantly different from the adults, except in the size of culmen and claws [10]. Body mass changes after fledging, because of the development of the pectoral muscles as the birds start flying. Golden eagles typically achieve adult-like size and mass within their natal territory before emigration, at which time we started quantifying the changes in energy landscape. Given our focus on post-emigration flight behavior, we do not expect any significant changes in size and body mass during our study period. We now cover this in the discussion:

      "Juvenile golden eagles complete their morphological development before gaining independence from their parents, with their size and wing morphology remaining stable during the post-emigration phase [10, 11]. Consequently, variations in flyability of the landscape for these birds predominantly reflect their improved mastery of soaring flight, rather than changes in their morphology."

      Discussion:

      Line 154: Could the increase in step length also be due to changes in search strategies with age? e.g. from more Brownian motion when scavenging to Levy search patterns when actively hunting?

      This is a very good point and we tried to look for evidence of this transition in the tracking data. We explored the first passage time for two individuals with a radius of 50 km to see if there is a clear transition from a Brownian to a Levy motion. The patterns that emerge are inconclusive and seem to point to seasonality rather than a clear transition in foraging strategy (Author response image 1). We have modified our statement in the discussion about the change in preference of step lengths indicating improve flight ability, to clarify that it is speculative:

      Author response image 1.

      First passage times using a 50 km radius for two randomly selected individuals.

      "Our findings also reveal that as the eagles aged, they adopted longer step lengths, which could indicate an increasing ability to sustain longer uninterrupted flight bouts."

      Methods:

      Line 229: What is the cutoff for high altitude or high speed?

      We used the Expectation-maximization binary clustering (EMbC) method to identify commuting flights. The EmbC method does not use hard cutoffs to cluster the data. Each data point was assigned to the distribution to which it most likely belonged based on the final probabilities after multiple iterations of the algorithm. Author response image 2 shows the distribution of points that were either used or not used based on the EmbC classification.

      Author response image 2.

      Golden eagle tracking points were either retained (used) or discarded (not used) for further data analysis based on the EmbC algorithm. The point were clustered based on ground speed and height above ground.

      Figure 1: The figure captions should stand on their own but in this case there is no information as to what the tests are actually showing.

      We have now updated the caption to provide information about the model:

      "Coefficient estimates of the step selection function predicting probability of use as a function of uplift proxies, week since emigration, and step length. All variables were z-transformed prior to modeling.

      The error bars show 95% confidence intervals."

      Reviewer 2 (Recommendations For The Authors):

      First, I want to congratulate you on this fantastic work. I enjoyed reading it. The manuscript is clear and well-written, and the findings are sound and relevant to the field of movement ecology. Also, the figures are neatly presented and easy to follow.

      I particularly liked expanding the old concept of fundamental vs realized niche into a movement ecology context. I believe that adds a fresh view into these widely accepted ecological assumptions on species niche, which may help other researchers build upon them to better understand movement "realms" on highly mobile animals in a rapidly changing world.

      I made some minor comments to the manuscript since it was hard to find important weaknesses in it, given the quality of your work. However, there was a point in the discussion that I feel deserves your attention (or rather a reflection) on how major biological events such as moulting could also influence birds to master the flying and exploitation of the energy landscape. You may find my suggestion quite subjective, but I think it may help expand your idea for future works and, what is more, link concepts such as energy landscapes, ontogeny, and important life cycle events such as moulting in large soaring birds. I consider this relevant from a mechanistic perspective to understand better how individuals negotiate all three concepts to thrive and persist in changing environments and to maximise their

      fitness.

      Once again, congratulations on this excellent piece of research.

      We thank the reviewer for their enthusiasm about our work and for bringing up important points about the biology of the species. Our detailed response are below.

      MINOR COMMENTS:

      (Note: Line numbers refer to those in the PDF version provided by the journal).

      Line 110: Distinguished (?)

      corrected

      Line 131: Overall, I agree with the authors’ discussion and very much liked how they addressed crucial points. However, I have a point about some missing non-discussed aspects of bird ecology that had not been mentioned.

      The authors argue that morphological traits are less important in explaining birds’ mastery of flight (thus exploiting all available options in the landscape). However, I think the authors are missing some fundamental aspects of bird biology that are known to affect birds’ flying skills, such as moult.

      The moulting process affects species’ flying capacity. Although previous works have not assessed moults’ impact on movement capacity, I think it is worth including the influence of flyability on this ecologically relevant process.

      For instance, golden eagles change their juvenile plumage to intermediate, sub-adult plumage in two or three moult cycles. During this process, the moulting process is incomplete and affects the birds’ aerodynamics, flying capacity, and performance (see Tomotani et al. 2018; Hedenström 2023). Thus, one could expect this process to be somewhat indirectly linked to the extent to which birds can exploit available resources.

      Hedenström, A. (2023). Effects of wing damage and moult gaps on vertebrate flight performance.

      Journal of Experimental Biology, 226(9), jeb227355. Tomotani, B. M., Muijres, F. T., Koelman, J., Casagrande, S., & Visser, M. E. (2018). Simulated moult reduces flight performance, but overlap with breeding does not affect breeding success in a longdistance migrant. Functional Ecology, 32(2), 389-401.

      We thank the reviewer for bringing up this relevant topic. We explored the literature listed by the reviewer and also other sources. We came to the conclusion that moulting does not impact our findings. In our study, we included data for eagles that had emigrated from the natal territories, with their fully grown feathers in juvenile plumage. The moulting schedule in juvenile birds is similar to that of adults: the timing, intensity, and sequence of feathers being replaced is consistent every year (Author response image 3). For these reasons, we do not believe that moulting stage noticeably impacts flight performance at the scale of our study (hourly flights). Fine details of soaring flight performance (aerodynamics within and between thermals) could differs during moulting of different primary and secondary feathers, but this is something that would occur every time the eagle replaces these feather and we do not expect it to be any different for juveniles. Such fine scale investigations are outside the scope of this study.

      Author response image 3.

      Moulting schedule of golden eagles [12]

      Lines 181-182: I don’t think trophic transitions rely only on individual flying skill changes. Furthermore, despite its predominant role, scavenging does not mean it is the primary source of food acquisition in golden eagles. This also depends on prey availability, and scavenging is an auxiliary font of easy-to-catch food.

      Scavenging implies detecting carcasses. Should this carcass appearance occur in highly rugged areas, the likelihood of detection also reduces notably. This is not to say that there are not more specialized carrion consumers, such as vultures, that may outcompete eagles in searching for such resources more

      efficiently.

      In summary, I don‘t think such transition relies only on flying skills but on other non-discussed factors such as knowledge accumulation of the area or even the presence of conspecifics.

      Line 183: This is precisely what I meant with my earlier comment.

      Thank you for the discussion on the interaction between flight development and foraging strategy. We explored the transition from scavenging to hunting above as a response to Reviewer 1, but did not find a clear transition. This is in line with your comment that the birds probably use both scavenging and hunting methods opportunistically.

      Lines 193-195: I will locate this sentence somewhere in this paragraph. As it is now, it seems a bit out of context. It could be a better fit at the end of the first point in line 203.

      Thank you for pointing out the issue with the flow. We have now added a transitional sentence before this one to improve the paragraph. The beginning of the conclusion now reads as follows, with the new sentence shown in boldface.

      "Spatial maps serve as valuable tools in informing conservation and management strategies by showing the general distribution and movement patterns of animals. These tools are crucial for understanding how animals interact with their environment, including human-made structures. Within this context, energy landscapes play an important role in identifying potential areas of conflict between animals and anthropogenic infrastructures such as wind farms. The predictability of environmental factors that shape the energy landscape has facilitated the development of these conservation tools, which have been extrapolated to animals belonging to the same ecological guild traversing similar environments."

      References

      (1) Colwell, R. K. & Rangel, T. F. Hutchinson’s duality: The once and future niche. Proceedings of the National Academy of Sciences 106, 19651–19658. doi:10.1073/pnas.0901650106 (2009).

      (2) Corbeau, A., Prudor, A., Kato, A. & Weimerskirch, H. Development of flight and foraging behaviour in a juvenile seabird with extreme soaring capacities. Journal of Animal Ecology 89, 20–28. doi:10.1111/1365-2656.13121 (2020).

      (3) Fuster, J. M. Frontal lobe and cognitive development. Journal of neurocytology 31, 373–385.

      doi:10.1023/A:1024190429920 (2002).

      (4) Ramsaran, A. I., Schlichting, M. L. & Frankland, P. W. The ontogeny of memory persistence and specificity. Developmental Cognitive Neuroscience 36, 100591. doi:10.1016/j.dcn.2018.09.002 (2019).

      (5) Williams, H. J., Duriez, O., Holton, M. D., Dell’Omo, G., Wilson, R. P. & Shepard, E. L. C. Vultures respond to challenges of near-ground thermal soaring by varying bank angle. Journal of Experimental Biology 221, jeb174995. doi:10.1242/jeb.174995 (Dec. 2018).

      (6) Williams, H. J., King, A. J., Duriez, O., Börger, L. & Shepard, E. L. C. Social eavesdropping allows for a more risky gliding strategy by thermal-soaring birds. Journal of The Royal Society Interface 15, 20180578. doi:10.1098/rsif.2018.0578 (2018).

      (7) Harel, R., Horvitz, N. & Nathan, R. Adult vultures outperform juveniles in challenging thermal soaring conditions. Scientific reports 6, 27865. doi:10.1038/srep27865 (2016).

      (8) Ruaux, G., Lumineau, S. & de Margerie, E. The development of flight behaviours in birds. Proceedings of the Royal Society B: Biological Sciences 287, 20200668. doi:10.1098/rspb.2020.

      0668 (2020).

      (9) Bolker, B., Warnes, G. R. & Lumley, T. Package gtools. R Package "gtools" version 3.9.4 (2022).

      (10) Bortolotti, G. R. Age and sex size variation in Golden Eagles. Journal of Field Ornithology 55,

      54–66 (1984).

      (11) Katzner, T. E., Kochert, M. N., Steenhof, K., McIntyre, C. L., Craig, E. H. & Miller, T. A. Birds of the World (eds Rodewald, P. G. & Keeney, B. K.) chap. Golden Eagle (Aquila chrysaetos), version 2.0. doi:10.2173/bow.goleag.02 (Cornell Lab of Ornithology, Ithaca, NY, USA, 2020).

      (12) Bloom, P. H. & Clark, W. S. Molt and sequence of plumages of Golden Eagles and a technique for in-hand ageing. North American Bird Bander 26, 2 (2001).

    1. Author Response

      The following is the authors’ response to the original reviews.

      We thank the three reviewers and the reviewing editor for their positive evaluation of our manuscript. We particularly appreciate that they unanimously consider our work as “important contributions to the understanding of how the CAF-1 complex works”, “The large amounts of data provided in the paper support the authors' conclusion very well” and “The paper effectively addresses its primary objective and is strong”. We also thank them for a careful reading and useful comments to improve the manuscript. We have built on these comments to provide an improved version of the manuscript, and address them point by point below .

      Reviewer #1 (Public Review):

      Summary:

      This paper makes important contributions to the structural analysis of the DNA replication-linked nucleosome assembly machine termed Chromatin Assembly Factor-1 (CAF-1). The authors focus on the interplay of domains that bind DNA, histones, and replication clamp protein PCNA.

      Strengths:

      The authors analyze soluble complexes containing full-length versions of all three fission yeast CAF-1 subunits, an important accomplishment given that many previous structural and biophysical studies have focused on truncated complexes. New data here supports previous experiments indicating that the KER domain is a long alpha helix that binds DNA. Via NMR, the authors discover structural changes at the histone binding site, defined here with high resolution. Most strikingly, the experiments here show that for the S. pombe CAF-1 complex, the WHD domain at the C-terminus of the large subunit lacks DNA binding activity observed in the human and budding yeast homologs, indicating a surprising divergence in the evolution of this complex. Together, these are important contributions to the understanding of how the CAF-1 complex works.

      Weaknesses:

      1. There are some aspects of the experimentation that are incompletely described: <br /> In the SEC data (Fig. S1C) it appears that Pcf1 in the absence of other proteins forms three major peaks. Two are labeled as "1a" (eluting at ~8 mL) and "1b" (~10-11 mL). It appears that Pcf1 alone or in complex with either or both of the other two subunits forms two different high molecular weight complexes (e.g. 4a/4b, 5a/5b, 6a/6b). There is also a third peak in the analysis of Pcf1 alone, which isn't named here, eluting at ~14 mL, overlapping the peaks labeled 2a, 4c, and 5c. The text describing these different macromolecular complexes seems incomplete (p. 3, lines 32-33): "When isolated, both Pcf2 and Pcf3 are monomeric while Pcf1 forms large soluble oligomers". Which of the three Pcf1-alone peaks are oligomers, and how do we know? What is the third peak? The gel analysis across these chromatograms should be shown.

      We thank the reviewer for his/her careful reading of the manuscript. Indeed, we plotted two curves in Figure S1C in a color that does not match the legend, leading to confusion. Curve 1, Pcf1 alone, depicted in red, should appear in pink as indicated in the legend and in the SDS-PAGE analysis below. Curve 1 exhibits two peaks, labeled as 1a and 1b. With an elution volume of 8.5mL close to the dead volume of the column, peak 1a corresponds to soluble oligomers, while peak 1b (10.4mL) likely corresponds to monomeric Pcf1. Curve 5 (Pcf1 + Pcf2 mixture) was in pink instead of purple as indicated in the legend. This curve consists of three distinct peaks (5a, 5b, and 5c). The SDS-PAGE analysis revealed the presence of oligomers of Pcf1-Pcf2 (5a, 8.3mL), the Pcf1-Pcf2 complex (5b, 9.8mL), and Pcf2 alone (5c, 13.6 mL).

      The color has now been corrected in the revised manuscript.

      More importantly, was a particular SEC peak of the three-subunit CAF-1 complex (i.e. 4a or 4b) characterized in the further experimentation, or were the data obtained from the input material prior to the separation of the different peaks? If the latter, how might this have affected the results? Do the forms inter-convert spontaneously?

      We conducted all structural analyses and DNA/PCNA interactions Figures (1-4, S1-S4) with freshly SECpurified samples corresponding to the 4b peak (9.7mL). Aliquots were flash-frozen with 50% glycerol for in vitro histone assembly assays (Figure 5).

      1. Given the strong structural predication about the roles of residues L359 and F380 (Fig. 2f), these should be mutated to determine effects on histone binding.

      We are pleased that our structural predictions are considered as strong. We agree that investigating the role of the L359 and F380 residues will be critical to further refine the binding interface between histone H3-H4 and CAF-1. An in vitro and in vivo analysis of such mutated forms, alongside the current Pcf1-ED mutant characterized in this article and additional potential mutated forms, has the potential to provide a better understanding of the dynamic of histone deposition by CAF-1. However, these additional approaches would require to reach another step in breaking this enigmatic dynamic.

      1. Could it be that the apparent lack of histone deposition by the delta-WHD mutant complex occurs because this mutant complex is unstable when added to the Xenopus extract?

      We cannot formally exclude this possibility, and this could potentially applies to all mutated forms tested. However, in the absence of available antibodies against the fission yeast CAF-1 complex, we cannot test this hypothesis for technical reasons. Nevertheless, we feel reassured by the fact that the in vitro assays of nucleosome assembly are overall consistent with the in vivo assays. Indeed, all mutated forms tested that abolished or weakened nucleosome assembly also exhibited synthetic lethality/growth defect in the absence of a functional HIRA pathway, including the delta WHD mutated form. This genetic synergy, that reflects a defective histone deposition by CAF-1, is not specific to the fission yeast S. pombe and was previously reported in S. cerevisiae (Kaufman et al. MCB 1998; Krawitz et al. MCB 2002). This further supports the evolutionary conservation based on genetic assay as a read out for defective histone deposition by CAF-1.

      Reviewer #1 (Recommendations For The Authors):

      • p. 4: "An experimental molecular weight of 179 kDa was calculated using Small Angle X-ray Scattering (SAXS), consistent with a 1:1:1 stoichiometry (Figure S1e). These data are in agreement with a globular complex with a significant flexibility (Figure S1f)." There needs to be more description of the precision of the molecular weight measurement, and what aspects of these data indicate the flexibility.

      The molecular weight was estimated using the correlation volume (Vc) defined by (Rambo & Tainer, Nature 2013, 496, 477-481). The estimated error with this method is around 10%. We added this information together with supporting arguments for the existence of flexibility: “An experimental molecular weight of 179 kDa was calculated using Small Angle X-ray Scattering (SAXS). Assuming an accuracy of around 10% with this method (Rambo and Tainer 2013), this value is consistent with a 1:1:1 stoichiometry for the CAF-1 complex (calculated MW 167kDa) (Figure S1e). In addition, the position of the maximum for the dimensionless Kratky plot was slightly shifted to higher values in the y and x axis compared to the position of the expected maximum of the curve for a fully globular protein (Figure S1f).

      This shows that the complex was globular with a significant flexibility.”

      • p. 6, lines 21-22: "In contrast, a large part of signals (338-396) did not vanish anymore upon addition of a histone complex preformed with two other histone chaperones known to compete with CAF-1 for histone binding..." Given the contrast made later with the 338-351 region which is insensitive to Asf1/Mcm2, it would be clearer for the reader to describe the Asf1/Mcm2-competed regions as residues 325-338 plus 352-396. Note that the numerical scale of residues doesn't line up perfectly with the data points in Figure 2d, and this should be fixed as well.

      We thank this reviewer for spotting this typographical error; we intended to write "In contrast, a large part of signals (348-396) did not vanish anymore… “. We modified paragraph as suggested by the reviewer because we agree it is clearer for the reader : “In contrast, only a shorter fragment (338-347) vanished upon addition of Asf1-H3-H4-Mcm2(69-138), a histone complex preformed with two other histone chaperones, Asf1 and Mcm2, known to compete with CAF-1 for histone binding (Sauer et al. 2017) and whose histone binding modes are well established (Figure 2e) (Huang et al. 2015, Richet et al. 2015). This finding underscores a direct competition between residues (325-338) and (349-396) within the ED domain and Asf1/Mcm2 for histone binding.”

      The slight shift in the numerical scale Figure 2d was also corrected.

      • p. 8. Lines 22-24: "EMSAs with a double-stranded 40bp DNA fragment confirmed the homogeneity of the bound complex. When increasing the SpCAF-1 concentration, additional mobility shifts suggest, a cooperative DNA binding (Figure 3a)." I agree that the migration of the population is further retarded upon the addition of more protein. However, doesn't this negate the first sentence? That is, if multiple CAF-1 complexes can bind each dsDNA molecule, can these complexes be described as homogeneous?

      We fully agree with the reviewer's comment and have removed the notion of homogeneity from the first sentence. “EMSAs with a double-stranded 40bp DNA fragment showed the formation of a bound complex.”

      • Figure S2b Legend: "1H-15N HSQC spectra of Pcf1_ED (425-496)." The residue numbers should read 325-396.

      The typo has been corrected.

      • Is the title for Figure 5 correct?: "Figure 5: Rescue using Y340 and W348 in the ED domain, the intact KER DNA binding domain and the C-terminal WHD of Pcf1 in SpCAF-1 mediated nucleosome assembly." I don't see that any point mutation rescue experiments are done here.

      The title of figure 5 has been modified for “Efficient nucleosome assembly by SpCAF-1 in vitro requires interactions with H3-H4, DNA and PCNA, and the C-terminal WHD domain”.

      • Figure S6C. I assume the top strain lacks the Pcf2-GFP but this should be stated explicitly.

      The following sentence “The top strain corresponds to a strain expressing wild-type and untagged Pcf2 as a negative control of GFP fluorescence” is now added to the figure legend. The figure S6C has been modified accordingly to mention “Pcf2 (untagged)” and state more explicitly.

      • Regarding point #3 in the public review, a simple initial test of this idea would be to determine if similar amounts of wt and mutant complexes can be immunoprecipitated at the endpoint of the assembly reactions.

      In the absence of available antibodies against the fission yeast CAF-1 complex, we cannot test this hypothesis for technical reasons. However, the in vitro assays of nucleosome assembly are overall consistent with the in vivo assays. Indeed, all mutated forms tested that abolished or weakened nucleosome assembly also exhibited synthetic lethality/growth defect in the absence of a functional HIRA pathway, including the delta WHD mutated form. This genetic synergy, reflecting defective histone deposition by CAF-1, is not specific to the fission yeast S. pombe, as it was previously reported in S. cerevisiae (Kaufman et al. MCB 1998; Krawitz et al. MCB 2002), further supporting the evolution conservation in the genetic assay as a read out for defective histone deposition by CAF-1.

      • Foundational findings that should be cited: The role of PCNA in CAF-1 activity was first recognized by pioneering studies in the Stillman laboratory (PMID: 10052459, 11089978). The earliest recombinant studies of CAF-1 showed that the large subunit is the binding platform for the other two, showed that the KER and ED domains were required for histone deposition activity, and roughly mapped the p60-binding site on the large subunit (PMID: 7600578). Another early study roughly mapped the binding site for the third subunit and showed that biological effects of impairing the PCNA binding synergized with defects in the HIR pathway (PMID: 11756556), a genetic synergy first demonstrated in budding yeast (PMID: 9671489).

      We thank the reviewer for providing these important references that are now cited in the manuscript. PMID: 10052459 and 11089978 are cited page 2 line 18 and 19, PMID: 7600578 page 19 line 5 and PMID: 11756556 and 9671489 page 18 line 2.

      Reviewer #2 (Public Review):

      Summary:

      The authors describe the structure-functional relationship of domains in S. pombe CAF-1, which promotes DNA replication-coupled deposition of histone H3-H4 dimer. The authors nicely showed that the ED domain with an intrinsically disordered structure binds to histone H3-H4, that the KER domain binds to DNA, and that, in addition to a PIP box, the KER domain also contributes to the PCNA binding. The ED and KER domains as well as the WHD domain are essential for nucleosome assembly in vitro. The ED, KER domains, and the PIP box are important for the maintenance of heterochromatin.

      Strengths:

      The combination of structural analysis using NMR and Alphafold2 modeling with biophysical and biochemical analysis provided strong evidence on the role of the different domain structures of the large subunit of SpCAF-1, spPCF-1 in the binding to histone H3-H4, DNA as well as PCNA. The conclusion was further supported by genetic analysis of the various pcf1 mutants. The large amounts of data provided in the paper support the authors' conclusion very well.

      Reviewer #2 (Recommendations For The Authors):

      The paper by Ochesenbein describes the structural and functional analysis of S. pombe CAF-1 complex critical for DNA replication-coupled histone H3/H4 deposition. By using structural, biophysical, and biochemical analyses combined with genetic methods, the authors nicely showed that a large subunit of SpCAF1, SpPCF-1, consists of 5 structured domains with four connecting IDR domains. The ED domain with IDR nature binds to histone H3-H4 dimer with the conformational change of the other domain(s). SpCAF-1 binds to dsDNA by using the KER domain, but not the WHD domain. The experiments have been done with great care and a large amount of the data are highly reliable. Moreover, the results are clearly presented and convincingly written. The conclusion in the paper is very solid and will be useful for researchers who work in the field of chromosome biology.

      Major points:

      1. DNA binding of the KER mutant shown in Figures S3h and S3i, which was measured by the EMSA, looks similar to that of wild-type control in Figure S3f, which is different from the data in Figures 3b and 3e measured by the MST. The authors need a more precise description of the EMSA result of the KER mutant shown in Figures 3 and S3. The quantification of the EMSA result would resolve the point (should be provided).

      A proposed by this reviewer, we performed quantification of all EMSA presented in Figure 3 and Figure S3. We quantified the signal of the free DNA band to calculate a percentage of bound DNA in each condition. All EMSA experiments were conducted in duplicate, allowing us to calculate an average value and standard deviation for each interaction. Representative curves and fitted values are reported below in the figure provided for the reviewer (panel a data for Pcf1_KER domain with two fitting models, panel b for the entire CAF-1 complexes and mutants, panel c for the isolated Pcf1_KER domains), all fitted values in panel d. Importantly, as illustrated in panel a, the complete model for a single interaction (complete KD model, dashed line curve) does not adequately fit the data. In contrast, a function incorporating cooperativity (Hill model) better accounts for the measured data (solid line curve). Consistently, we also used the Hill model to fit the binding curves measured with the MST technique. As also specified now in the text, the Hill model allows to determine an EC50 value (concentration of protein resulting in the disappearance of half of the free DNA band intensity) and a Hill coefficient value (representing cooperativity during the interaction) for each curve.

      We measure a value of 3.4 ± 0.4 μM for the EC50 of SpCAF-1 WT, which is higher than the value measured by MST (0.7 ± 0.1 μM). Higher values were also calculated for all mutants and isolated Pcf1_KER domains compared to MST. These discrepancies could raise from the fact that the DNA concentration used in the two techniques were very different (20nM for MST experiments and 1μM for EMSA). Unlike the complete KD model, which includes in the calculation the DNA concentration (considered here as the "receptor"), the Hill model is fitted independently of this value. This model assumes that the “receptor” concentration is low compared to the KD. Here we calculate EC50 values on the same order of magnitude as the DNA concentration (low micromolar), The quantification obtained by EMSA is thus challenging to interpret. In contrast, values fitted by the MST measurements are more reliable since this limitation of low “receptor” concentration is correct.

      Therefore, although measurements of EC50 and Hill coefficient from EMSA are reproducible, they may be confusing for quantifying apparent affinity values through EC50. Nevertheless, this quantitative analysis of EMSA, requested by the reviewer, has highlighted an interesting characteristic of the KER mutant that is consistent across both methods: even though the EMSA pointed by the reviewer (Figures S3h and S3i compared to the wild-type control in Figure 3d and Figure S3f) show similar EC50 values, the binding cooperativity is different. Binding curves for the KER mutants is no longer cooperative (Hill coefficient ~1), and this is observed for all KER curves (isolated Pcf1_KER domain and the entire SpCAF-1 complex) with both methods, EMSA and MST. We thus decided to emphasize this characteristic of the KER mutant in the text (page 9 line 30-32). “Importantly, this mutant also shows a lower binding cooperativity for DNA binding, as estimated by the Hill coefficient value close to 1, compared to values around 3 for the WT and other mutants.”

      Since EMSA quantifications did not show a loss of “affinity” (as measured by the EC50 value) for the KER* mutants, compared to the WT contrary to MST measurements and because the DNA concentration was close to the measured EC50, we consider that EC50 values calculated by EMSA do not represent a KD value. If we add this quantification, we should discuss this point in detail. Thus, for sake of clarity, we prefer to put in the manuscript EMSA measurements as illustrations and qualitative validations of the interaction but not to include the quantification.

      Author response image 1.

      Quantitative analysis of interaction with DNA by EMSA. a: quantification of the amount of bound DNA for the Pcf1_KER domain (blue points with error bars). The fit with a KD model is shown as a dashed line, and the fit with a Hill model with a solid line. b: Examples of quantifications and fits (Hill model) for reconstituted SpCAF-1 WT and mutants. c: Examples of quantifications and fits (Hill model) for Pcf1_KER domains WT and mutant. d: EC50 values and Hill coefficients obtained for all EMSA experiments presented in Figure 3 and S3.

      1. As with the cooperative DNA binding of CAF-1, it is very important to show the stoichiometry of CAF-1 to the DNA or the site size. Given a long alpha-helix of the KER domain with biased charges, it is also interesting to show a model of how the dsDNA binds to the long helix with a cooperative binding property (this is not essential but would be helpful if the authors discuss it).

      We agree that having a molecular model for the binding of the KER helix to DNA would be especially interesting, but at this point, considering the accuracy of the tools currently at our disposal for predicting DNA-protein interactions, such a model would remain highly speculative.

      1. Figure 5 shows nucleosome assembly by SpCAF-1. SpCAF-1-PIP* mutant produced a product with faster mobility than the control at 2 h incubation. How much amounts of SpCAF-1 was added in the reaction seems to be critical. At least a few different concentrations of proteins should be tested.

      The slightly faster migration of the SpCAF-1-PIPis not systematically reproduced and we observed in several experiments that the band corresponding to supercoiled DNA migrated slightly above or below the one for the complementation by the SpCAF-1-WT (see Author response image 2 below). Thus this indicates that after 2 hours incubation the supercoiling assay with the SpCAF-1-PIP mutant compared to those achieved with the SpCAF-1-WT. To further document whether the WT or the PIP mutant are similar or not, we monitored difference of their nucleosome assembly efficiency by testing their ability to produce supercoiled DNA over shorter time, after 45 minute incubation. Under these conditions, we reproducibly detected supercoiled forms at earlier times with SpCAF-1-WT when compared to the SpCAF-1-PIP* (see figure 5 and Author response image 2). These observations indicate that mutation in the PIP motif of Pcf1 affects the rate of supercoiling in a distinct manner when compared to the other mutations that dramatically impair SpCAF-1 capacity to promote supercoiling.

      Author response image 2.

      Minor points:

      1. Page 8, line 26 or Table 1 legend: Please explain what "EC50" is.

      The definition of EC50, together with a reference paper for the Hill model have been added in the text page 8 lines 23-26, “The curves were fitted with a Hill model (Tso et al. 2018) with a EC50 value of 0.7± 0.1µM (effective concentration at which a 50% signal is observed) and a cooperativity (Hill coefficient, h) of 2.7 ± 0.2, in line with a cooperative DNA binging of SpCAF-1.”, in the Table 1 figure legend and in the method section (page 26).

      1. Page 13, lines 9, 11: "Xenopus" should be italicized.

      This is corrected

      1. Page 14, second half: In S. pombe, the pcf1 deletion mutant is not lethal. It is helpful to mention the phenotype of the deletion mutant a bit more when the authors described the genetic analysis of various pcf1 mutants.

      This point has been added on page 15, line 1.

      1. Figure 1d and Figure S2a: Captions and labels on the X and Y axes are overlapped or misplaced.

      This is corrected

      1. Figure 5: Please add a schematic figure of the assay to explain how one can check the nucleosome assembly by looking at the form I, supercoiled DNAs.

      A new panel has been added to Figure 5. This scheme depicts the supercoiling assay where supercoiled DNA (form I) is used as an indication of efficient nucleosome assembly. The figure legend has also been modified accordingly.

      Reviewer #3 (Public Review):

      Summary:

      The study conducted by Ouasti et al. is an elegant investigation of fission yeast CAF-1, employing a diverse array of technologies to dissect its functions and their interdependence. These functions play a critical role in specifying interactions vital for DNA replication, heterochromatin maintenance, and DNA damage repair, and their dynamics involve multiple interactions. The authors have extensively utilized various in vitro and in vivo tools to validate their model and emphasize the dynamic nature of this complex.

      Strengths:

      Their work is supported by robust experimental data from multiple techniques, including NMR and SAXS, which validate their molecular model. They conducted in vitro interactions using EMSA and isothermal microcalorimetry, in vitro histone deposition using Xenopus high-speed egg extract, and systematically generated and tested various genetic mutants for functionality in in vivo assays. They successfully delineated domain-specific functions using in vitro assays and could validate their roles to large extent using genetic mutants. One significant revelation from this study is the unfolded nature of the acidic domain, observed to fold when binding to histones. Additionally, the authors also elucidated the role of the long KER helix in mediating DNA binding and enhancing the association of CAF-1 with PCNA. The paper effectively addresses its primary objective and is strong.

      Weaknesses:

      A few relatively minor unresolved aspects persist, which, if clarified or experimentally addressed by the authors, could further bolster the study.

      1. The precise function of the WHD domain remains elusive. Its deletion does not result in DNA damage accumulation or defects in heterochromatin maintenance. This raises questions about the biological significance of this domain and whether it is dispensable. While in vitro assays revealed defects in chromatin assembly using this mutant (Figure 5), confirming these phenotypes through in vivo assays would provide additional assurance that the lack of function is not simply due to the in vitro system lacking PTMs or other regulatory factors.

      Our work demonstrates that the WHD domain is important CAF-1 function during DNA replication. Indeed, the deletion of this domain lead to a synthetic lethality when combined with mutation of the HIRA complex, as observed for a null pcf1 mutant, indicating a severe loss of function in the absence of the WHD domain. We propose that these genetic interactions, previously reported in S. cerevisiae (Kaufman et al. MCB 1998; Krawitz et al. MCB 2002) are indicative of a defective histone deposition by CAF-1. Moreover, our work establishes that this domain is dispensable to prevent DNA damage accumulation and to maintain silencing at centromeric heterochromatin, indicating that the WHD domain specifies CAF-1 functions. Moreover, our work further demonstrates that, in contrast to the S. cerevisiae and human WHD domain, the S. pombe counterpart exhibits no DNA binding activity. We thus agree that the WHD domain may contribute to nucleosome assembly in vivo via PTMs or interactions with regulatory factors that may potentially lack in in vitro systems. However, addressing these aspects deserves further investigations beyond the scope of this article.

      1. The observation of increased Pcf2-gfp foci in pcf1-ED cells, particularly in mono-nucleated (G2phase) and bi-nucleated cells with septum marks (S-phase), might suggest the presence of replication stress. This could imply incomplete replication in specific regions, leading to the persistence of Caf1-ED-PCNA factories throughout the cell cycle. To further confirm this, detecting accumulated single-stranded DNA (ssDNA) regions outside of S-phase using RPA as an ssDNA marker could be informative.

      We cannot formally exclude that cells expressing the Pcf1-ED mutated form exhibit incomplete replication in specific regions, an aspect that would require careful investigations. However, the microscopy analysis (Fig. 6c and S6c) of this mutant showed no alteration in the cell morphology, including the absence of elongated cells compared to wild type, a hallmark of checkpoint activation caused by ssDNA (Enoch et al. Gene & Dev 1992). Therefore, investigating the consequences of the interplay between the binding of CAF-1 to PCNA and histones on the dynamic of DNA replication, is of particular interest but out of the scope of the current manuscript.

      1. Moreover, considering the authors' strong assertion of histone binding defects in ED through in vitro assays (Figure 2d and S2a), these claims could be further substantiated, especially considering that some degree of histone deposition might still persist in vivo in the ED mutant (Figure 7d, viable though growth defective double ED*+hip1D mutants). For example, the approach, akin to the one employed in Fig. 6a (FLAG-IPs of various Pcf1-FLAG-tagged mutants), could also enable a comparison of the association of different mutants with histones and PCNA, providing a more thorough validation of their findings.

      We have provided in the current manuscript data establishing how Pcf1 mutated forms interacted with PCNA (Fig. 6a, 6b). Regarding the interactions with histone H3-H4, the approach based on immunoprecipitation using various Pcf1-FLAG tagged mutants has been unsuccessful in our hands. Indeed, we were unable to obtain robust and reproducible interactions between Pcf1 or its various mutated form with H3-H4. This is likely because Co-IP approaches do not probe for direct interactions. Indirect interactions between Pcf1 and H3-H4 are potentially bridged by additional factors, including the two other subunits of CAF-1, Pcf2 and Pcf3, or Asf1. Therefore, we are not in a position to address in vivo the direct interactions between Pcf1 and histone H3-H4.

      1. It would be valuable for the authors to speculate on the necessity of having disordered regions in CAF1. Specifically, exploring the overall distribution of these domains within disordered/unfolded structures could provide insightful perspectives. Additionally, it's intriguing to note that the significant disparities observed among mutants (ED, PIP, and KER*) in in vitro assays seem to become more generic in vivo, except for the indispensability of the WHD-domain. Could these disordered regions potentially play a crucial role in the phase separation of replication factories? Considering these questions could offer valuable insights into the underlying mechanisms at play.

      We agree that the potential mechanistic role of partial disorder in CAF-1 is particularly interesting. Disordered regions of human CAF-1 have been reported to form nuclear bodies with liquid-liquid phase separation properties to maintain HIV latency (Ma et al EMBO J. 2021). As suggested, this raises the question of how disordered domains of Pcf1 could promote phase separation for replication factories, if such phenomenon happens in vivo. Moreover, numerous factors of the replisome also harbor disordered regions (Bedina, A. et al, 2013. Intrinsically Disordered Proteins in Replication Process. InTech. doi: 10.5772/51673), adding complexity in disentangling experimentally such questions. We have added these elements at the end of the discussion in the revised manuscript (page 20, lines 23-29). “Such plasticity and cross-talks provided by structurally disordered domains might be key for the multivalent CAF-1 functions. Human CAF-1 has been reported to form nuclear bodies with liquid-liquid phase separation properties to maintain HIV latency (Ma et al. 2021). This raises the question of a potential role of the disordered domains of Pcf1, together with other replisome factor harbouring such disordered regions (Bedina 2013), in promoting phase separation of replication factories, if such phenomenon happens in vivo. Further studies will be needed to tackle these questions.”

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In the manuscript by Tie et.al., the authors couple the methodology which they have developed to measure LQ (localization quotient) of proteins within the Golgi apparatus along with RUSH based cargo release to quantify the speed of different cargos traveling through Golgi stacks in nocodazole induced Golgi ministacks to differentiate between cisternal progression vs stable compartment model of the Golgi apparatus. The debate between cisternal progression model and stable compartment model has been intense and going on for decades and important to understand the basic way of function/organization of the Golgi apparatus. As per the stable compartment model, cisterna are stable structures and cargo moves along the Golgi apparatus in vesicular carriers. While as per cisternal progression model, Golgi cisterna themselves mature acquiring new identity from the cis face to the trans face and act as transport carriers themselves. In this work, authors provide a missing part regarding intra-Golgi speed for transport of different cargoes as well as the speed of TGN exit and based on the differences in the transport velocities for different cargoes tested favor a stable compartment model. The argument which authors make is that if there is cisternal progression, all the cargoes should have a similar intra-Golgi transport speed which is essentially the rate at which the Golgi cisterna mature. Furthermore, using a combination of BFA and Nocodazole treatments authors show that the compartments remain stable in cells for at least 30-60 minutes after BFA treatment.

      Strengths:

      The method to accurately measure localization of a protein within the Golgi stack is rigorously tested in the previous publications from the same authors and in combination with pulse chase approaches has been used to quantify transport velocities of cargoes through the Golgi. This is a novel aspect in this paper and differences in intra-Golgi velocities for different cargoes tested makes a case for a stable compartment model.

      Weaknesses:

      Experiments are only tested in one cell line (HeLa cells) and predominantly derived from experimental paradigm using RUSH assays where a secretory cargo is released in a wave (not the most physiological condition) and therefore additional approaches would make a more compelling case for the model.

      We have added datasets from 293T cells in the revamped manuscript.

      Reviewer #2 (Public Review):

      Summary:

      This manuscript describes the use of quantitative imaging approaches, which have been a key element of the labs work over the past years, to address one of the major unresolved discussions in trafficking: intra-Golgi transport. The approach used has been clearly described in the labs previous papers, and is thus clearly described. The authors clearly address the weaknesses in this manuscript and do not overstate the conclusions drawn from the data. The only weakness not addressed is the concept of blocking COPI transport with BFA, which is a strong inhibitor and causes general disruption of the system. This is an interesting element of the paper, which I think could be improved upon by using more specific COPI inhibitors instead, although I understand that this is not necessarily straightforward.

      I commend the authors on their clear and precise presentation of this body of work, incorporating mathematical modelling with a fundamental question in cell biology. In all, I think that this is a very robust body of work, that provides a sound conclusion in support of the stable compartment model for the Golgi.

      General points:

      The manuscript contains a lot of background in its results sections, and the authors may wish to consider rebalancing the text: The section beginning at Line 175 is about 90% background and 10% data. Could some data currently in supplementary be included here to redress this balance, or this part combined with another?

      In the revamped manuscript, we have moved the background information on rapid partitioning and rim progression models to the Introduction.

      Reviewer #3 (Public Review):

      The manuscript by Tie et al. provides a quantitative assessment of intra-Golgi transport of diverse cargos. Quantitative approaches using fluorescence microscopy of RUSH synchronized cargos, namely GLIM and measurement of Golgi residence time, previously developed by the author's team (publications from 20216 to 2022), are being used here.

      Most of the results have been already published by the same team in 2016, 2017, 2020 and 2021. In this manuscript, very few new data have been added. The authors have put together measurements of intra-Golgi transport kinetics and Golgi residence time of many cargos. The quantitative results are supported by a large number of Golgi mini-stacks/cells analyzed. They are discussed with regard to the intra-Golgi transport models being debated in the field, namely the cisternal maturation/progression model and the stable compartments model. However, over the past decades, the cisternal progression model has been mostly accepted thanks to many experimental data.

      The authors show that different cargos have distinct intra-Golgi transport kinetics and that the Golgi residence time of glycosyltransferases is high. From this and the experiment using brefeldinA, the authors suggest that the rim progression model, adapted from the stable compartments model, fits with their experimental data.

      Strengths:

      The major strength of this manuscript is to put together many quantitative results that the authors previously obtained and to discuss them to give food for thought about the intraGolgi transport mechanism.

      The analysis by fluorescence microscopy of intra-Golgi transport is tough and is a tour de force of the authors even if their approach show limitations, which are clearly stated. Their work is remarkable in regards to the numbers of Golgi markers and secretory cargos which have been analyzed.

      Weaknesses:

      As previously mentioned, most of the data provided here were already published and thus accessible for the community. Is there is a need to publish them again?

      The authors' discussion about the intra-Golgi transport model is rather simplistic. In the introduction, there is no mention of the most recent models, namely the rapid partitioning and the rim progression models. To my opinion, the tubular connections between cisternae and the diffusion/biochemical properties of cargos are not enough taken into account to interpret the results. Indeed, tubular connections and biochemical properties of the cargos may affect their transit through the Golgi and the kinetics with which they reach the TGN for Golgi exit.

      Nocodazole is being used to form Golgi mini-stacks, which are necessary to allow intra-Golgi measurement. The use of nocodazole might affect cellular homeostasis but this is clearly stated by the authors and is acceptable as we need to perturb the system to conduct this analysis. However, the manual selection of the Golgi mini-stack being analyzed raises a major concern. As far as I understood, the authors select the mini-stacks where the cargo and the Golgi reference markers are clearly detectable and separated, which might introduce a bias in the analysis.

      The terms 'Golgi residence time ' is being used but it corresponds to the residence time in the trans-cisterna only as the cargo has been accumulated in the trans-Golgi thanks to a 20{degree sign}C block. The kinetics of disappearance of the protein of interest is then monitored after 20{degree sign}C to 37{degree sign}C switch.

      Another concern also lies in the differences that would be introduced by different expression levels of the cargo on the kinetics of their intra-Golgi transport and of their packaging into post-Golgi carriers.

      Please see below for our replies to intra-Golgi transport models, the Golgi residence time, and different expression levels of cargos.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      The data shown by the authors to measure differential intra Golgi velocities based on previously established methodology make a case for a stable compartment model, however more data is needed to make a complete story and the clarity of presentation can be improved.

      We sincerely appreciate the reviewer's insightful, detailed, and constructive feedback. Your thoughtful comments have helped us refine our analyses, clarify key points, and strengthen the overall quality of our manuscript. We are grateful for the time and effort you have dedicated to reviewing our work and providing valuable suggestions. Your input has been instrumental in improving both the scientific rigor and presentation of our findings. Thank you for your thorough and thoughtful review.

      Main points:

      (1) Along with the studies in yeast, which authors describe in this paper, the main evidence for cisternal maturation model in mammalian cells comes from Bonfanti et.al., (https://doi.org/10.1016/S0092-8674(00)81723-7), which used EM to visualize a wave of Collagen through Golgi stacks. It is therefore important this work needs to include collagen as one of the cargos tested. Can the authors use the RUSH-Col1AGFP (see: https://doi.org/10.1083/jcb.202005166) as a cargo to monitor intra-Golgi velocities?

      I understand that Hela cells are not professional collagen-secreting, but the authors can use U2OS cells to measure collagen export and two other extreme (slow and fast) cargos to validate the same trend in intra-Golgi transport velocities is seen in other cell lines. This will address three concerns: a. This is not a Hela-specific phenomenon; b. Transport of large cargoes like collagen agree with their proposal; c. To see if the same cargo has the same (similar) intra-Golgi speed and the trend between different cargoes is conserved across cell lines.

      Due to the difficulty of manipulating and imaging the procollagen-I RUSH reporter, we selected the collagenX-RUSH reporter (SBP-GFP-collagenX) instead. Our previous study (Tie et al., eLife, 2028) demonstrated that SBP-GFP-collagenX assembles as a large molecular weight particle, each having ~ 190 copies of SBP-GFP-collagenX. With an estimated mean size of ~ 40 nm, these aggregates are not as large as FM4 aggregates and procollagen-I (> 300 nm) and, therefore, are not excluded from conventional transport vesicles, which typically have a size of 50 – 100 nm. However, collagenX has distinct intra-Golgi transport behaviour from conventional secretory cargos -- while conventional secretory cargos localize to the cisternal interior, collagenX partitions to the cisternal rim (Tie et al., eLife, 2028).

      We studied the intra-Golgi transport of SBP-GFP-collagenX in HeLa cells via GLIM and side averaging. The new results are included in Figure 3 of the revamped manuscript. CollagenX has similar intra-Golgi transport kinetics as conventional secretory cargos, displaying the first-order exponential function in LQ vs. time and velocity vs. time plots.

      The side-averaging images are consistent with previous and current results. collagenX displays a double-punctum during the intra-Golgi transport, indicating a cisternal rim localization, as expected for large secretory cargos. Therefore, our new data demonstrated that cisternal rim partitioned large-size secretory cargos might follow intra-Golgi transport kinetics similar to those of cisternal interior partitioned conventional secretory cargos.

      We tried SBP-GFP-CD59 and SBP-GFP-Tac-TC, cargos with fast and slow intra-Golgi transport velocities, respectively, in 293T cells. Results are included in Figure 2, Supplementary Figure 2, and Table 1 of the revamped manuscript. We found that SBP-GFPTac-TC showed similar t<sub>intra</sub>s, 17 and 14 min, respectively, in HeLa and 293T cells. Considering our previous finding that glycosylation has an essential role in the Golgi exit (Sun et al., JBC, 2020), the distinct intra-Golgi transport kinetics of SBP-GFP-CD59 (t<sub>intra</sub>s, 13 and 5 min, respectively, in HeLa and 293T cells) might be due to its distinct luminal glycosylation between HeLa and 293T cells. Supporting this hypothesis, SBP-GFP-Tac-TC does not have any glycosylation sites due to the truncation of the Tac luminal domain.

      (2) RUSH assay has its own caveats which authors also refer to in the manuscript. Authors should test their model by using pulse chase approaches by SNAP tagged constructs which will allow them to do pulse chase assays without the requirement to release cargo as a wave (see: doi: 10.1242/jcs.231373). It is not necessary to test all the cargoes but the two on the ends of the spectrum (slow and fast). To avoid massive overexpression, authors could express the proteins using weaker promoters. Authors could also use this approach to simultaneously measure the two cargoes by tagging them with CLIP and SNAP tags and doing the pulse chase simultaneously (see: DOI: 10.1083/jcb.202206132). In this case it may be difficult to stain both GM130 and TGN, but authors could monitor the rate of segregation from the GM130 signal.

      During the RUSH assay, the sudden release of a large amount of secretory reporters does not occur under native secretory conditions and, consequently, might introduce artifacts. The reviewer suggests using pulse-chase labeling of SNAP (or CLIP)-tagged secretory cargos, which occurs in a steady state and hence more closely resembles native secretory transport. This is an excellent suggestion. However, we have not yet tested this method due to the following concerns.

      The standard protocol involves blocking existing reporters, pulse-labeling newly synthesized reporters, and chasing their movement along the secretory pathway. However, the typical 20minute pulse labeling period used in the two references would be too long, as a substantial portion of the reporters would already reach the trans-Golgi or exit the Golgi before the chase begins. Conversely, reducing the pulse labeling time would significantly weaken the GLIM signal.

      (3) While the intra-Golgi velocities are different for different cargoes tested, authors should show a control that the arrival of the cargoes from ER to the cis-Golgi follows similar kinetics or if there are differences there is no correlation with the intra-Golgi velocities. In other words, do cargoes which show slow intra-Golgi velocities also take more time to reach the cis-Golgi and vice versa.

      In nocodazole-induced Golgi ministacks, the ER exit site, ERGIC, and cis-Golgi are spatially closely associated. At the earliest measurable time point—5 minutes after biotin treatment— we observed that the secretory cargo had already reached the cis-Golgi (Figure 2 and Supplementary Figure 2). The rapid ER-to-cis-Golgi transport exceeds the temporal resolution of our current protocol, making it difficult to address the reviewer’s question (see our reply to Minor Points (2) of Reviewer #2 for more detailed discussion on this).

      (4) Were the different cargos traveling (at different speeds) through Golgi at the rims, or in the middle of ministack, or by vesicles?

      Please also refer to our reply to Question 1 of Reviewer #1. For the nocodazole-induced Golgi ministack, we previously investigated the lateral cisternal localization of RUSH secretory reporters using our en face average imaging (Tie et al., eLife, 2018). We found that small or conventional cargos (such as CD59 and E-cadherin) partition to the cisternal interior while large cargos (collagenX and FM4-CD8a) partition to the cisternal rim during their intra-Golgi transport. Using GLIM, we showed that the intra-Golgi transport kinetics of collagenX is similar to that of small cargos as both follow the first-order exponential function (Figure 3A-C). Therefore, cisternal rim partitioned large size secretory cargos might have intra-Golgi transport kinetics similar to those of cisternal interior partitioned conventional secretory cargos.

      (5) Figure 4, under both nocodazole and BFA treatment for 30mins, would the stacks have the same number (274 nm per LQ) as thickness? Or does it shrink a little? Considering extended BFA treatment reduced intact Golgi ministacks. This is important to understand the LQ numbers of those Golgi proteins. Besides, can they include one ERGIC marker in this assay, would it be approaching cis-Golgi? Images used for quantification in Figure 4 should be shown in the main figure.

      We define the axial size of the Golgi ministack as the axial distance from the GM130 to the GalT-mCherry, d<sub>(GM130-GalT-mCherry)</sub>, measured using the Gaussian centers of their line intensity profiles. As the reviewer suggested, we measured the axial size of the ministack during the nocodazole and BFA treatment. Indeed, we found a decrease in the ministack axial size from 300 ± 10 nm at 0 min to 190 ± 30 nm at 30 min of BFA treatment. This observation is further confirmed by our side average imaging. The new data is presented in Fig. 6G.

      Our study focuses on changes in the organization of the Golgi ministack. So, we didn’t include ERGIC53 in the current analysis. Instead, we quantified the axial distance between GalTmCherry and CD8a-furin, d<sub>(GalT-mCherry-CD8a-furin)</sub>, and found that it decreased from 200 ± 20 nm at 0 min to 100 ± 30 nm at 30 min of BFA treatment, suggesting the collapse of the TGN. The collapse of the TGN is further visualized by our side average imaging. The new data is presented in Fig. 6H.

      Therefore, our new data demonstrates that the Golgi ministack shrinks, and the TGN collapses under BFA treatment.

      Minor points:

      (1) The LQ data come from confocal/airy scan images, but no such images were shown in this paper. The authors can't assume every reader to have prior knowledge of their previous work. It will be beneficial to have one example image and how the LQ was measured.

      As advised by the reviewer, we have prepared Supplementary Figure 1 to provide a brief illustration of the principle behind GLIM and image processing steps involved.

      (2) The cargos used in this paper need to be introduced: what are they, how were they used in previous literature. Especially the furin constructs come out of the blue (also see point 7).

      As suggested by the reviewer, we have included a schematic diagram in Fig. 1 of the revised manuscript to illustrate all RUSH reporters and their corresponding ER hooks. In this diagram, we also highlight the key sequence differences in the cytosolic tails of different furin mutants.

      Additionally, we have added references for each RUSH reporter at the beginning of the Results and Discussion section.

      (3) There are two categories of exocytosis, constitutive and regulated. It important to state that the phenomenon observed is in cells predominantly showing only constitutive secretion.

      As the reviewer advised, we have added the following sentences in the section titled “Limitations of the study”.

      “Third, all RUSH reporters used in this study are constitutive secretory cargos. As a result, the intra-Golgi transport dynamics observed here might not reflect those of regulated secretion, which involves the synchronized release of a large quantity of cargo in response to a specific signal.”

      (4) All the cargoes show a progressive reduction in instantaneous velocities from cis to medial to trans. Authors should discuss how do they mechanistically explain this. Is the rate of vesicle production progressively decreasing from cis to trans and if so, why?

      As our imaging methods cannot differentiate vesicles from the cisternal rim, we could not tell if the vesicle production rate had changed during the intra-Golgi transport. We have provided an explanation of the progressive reduction of the intra-Golgi transport velocity in the Results and Discussion section. Please see the text below.

      “The progressive reduction in intra-Golgi transport of secretory cargo might result from the enzyme matrix's retention at the trans-Golgi. As the secretory cargos progress along the Golgi stack from the cis to the trans-side, more and more cargos become temporarily retained in the trans-Golgi region, gradually reducing their overall intra-Golgi transport velocity. If the release or Golgi exit of these cargos from the enzyme matrix follows a constant probability per unit time, i.e., a first-order kinetics process, the rate of cargo exiting from the Golgi should follow the first-order exponential function. Since the mechanism underlying intra-Golgi transport kinetics reflects fundamental molecular and cellular processes of the Golgi, further experimental data are essential to rigorously test this hypothesis.”

      (5) The supp file 1 nicely listed the raw data for plotting, and n for numbers of ministacks. Could the authors also show number of cells or experiment repeats?

      In the revamped version of the Supplementary File 1, we have added the cell number for each LQ measurement.

      (6) This recent work used novel multiplexing methods to show that nocodazole-treated cells had similar protein organization as in control may be cited. It also showed the effect of BFA. https://www.cell.com/cell/abstract/S0092-8674(24)00236-8.

      We have added this reference to the Introduction section to support that nocodazole-induced Golgi ministacks have a similar organization as the native Golgi. However, our BFA treatment was combined with the nocodazole treatment, while this paper’s BFA treatment does not contain nocodazole.

      (7) Figure 1G-J, authors should show a schematic to show the difference between different furin constructs. Also, LQ values in Fig 1I start from 1. Authors may need to include even earlier timepoints.

      As suggested by the reviewer, we have shown the domain organization of wild type and mutant furin RUSH reporters in Figure 1, highlighting key amino acids in the cytosolic tail. Please also see our reply to Minor Points (2) of Reviewer #1.

      In the revised manuscript, Fig. 1l (SBP-GFP-CD8a-furin-AC #1) has been updated to become Fig. 2J. In this dataset, the first time point was selected at a relatively late stage (20 min), resulting in an initial LQ value of 0.92. However, this should not pose an issue, as SBP-GFPCD8a-furin-AC reaches a plateau of ~ 1.6. The number of data points is sufficient to capture the rising phase and fit the first-order exponential function curve with an adjusted R<sup>2</sup> = 0.99. Furthermore, we have four independent datasets in total on the intra-Golgi transport of SBPGFP-CD8a-furin-AC (#1-4), demonstrating the consistency of our measurements.

      (8) Figure 2A need to show the data points, not just the lines.

      In the revamped manuscript, Fig. 2A has been updated to become Fig. 4A. The plot of Fig. 4A is calculated based on Equation 3.

      So, it does not have data points. However, t<sub>intra</sub> is calculated based on the experimental LQ vs. t kinetic data. 

      (9) Imaging and camera settings like exposure time, pixel size, etc should be reported in Methods.

      As suggested by the reviewer, we have supplied this information in the Materials and Methods section of the revised manuscript.

      (1) The exposure time and pixel size for the wide-field microscopy:

      “The image pixel size is 65 nm. The range of exposure time is 400 – 5000 ms for each channel.”

      (2) The exposure time and pixel size for the spinning disk confocal microscopy: “The image pixel size is 89 nm. The range of exposure time is 200 – 500 ms for each channel.”

      (3) The pixel dwelling time and pixel size for the Airyscan microscopy:

      “For side averaging, images were acquired under 63× objective (NA 1.40), zoomed in 3.5× to achieve 45 nm pixel size using the SR mode. The pixel dwelling time is 1.16 µs.”

      Reviewer #2 (Recommendations For The Authors):

      We sincerely appreciate the reviewer's insightful, detailed, and constructive feedback. Your thoughtful comments have helped us refine our analyses, clarify key points, and strengthen the overall quality of our manuscript. We are grateful for the time and effort you have dedicated to reviewing our work and providing valuable suggestions. Your input has been instrumental in improving both the scientific rigor and presentation of our findings. Thank you for your thorough and thoughtful review.

      Minor points:

      (1) Equation 2: A should be in front of the ln2. It's already resolved in equation 3, so likely only needs changing in the text

      As suggested by the reviewer, we have changed it accordingly.

      (2) Line 152: Why is there a lack of experimental data? High ER background and low golgi signal make it difficult to select ministacks: would be good to see examples of these images. Is 0 a relevant timepoint as cargo is still at the ER? Instead would a timepoint <5' be better demonstrate initial arrival in fast cargo, and 0' discarded?

      We observed that RUSH reporters typically do not exit the ER in < 5 min of biotin treatment, resulting in a high ER background and low Golgi signal. Example images of SBP-GFP-CD59 are shown below (scale bar: 10 µm). Possible reasons include: 1) the time required for biotin diffusion into the ER, 2) the time needed to displace the RUSH hook from the RUSH reporter, and 3) the time for recruitment of RUSH reporters to ER exit sites. As a result, we could not obtain LQs for time points earlier than 5 min during the biotin chase.

      Author response image 1.

      Despite the challenge in measuring LQs at early time points, 0 is still a relevant time point. At t = 0 min, RUSH reporters should be at the ER membrane near the ER exit site, a definitive pre-Golgi location along the Golgi axis, although we still don’t have a good method to determine its LQ.

      (3) Table 1 Line 474: 1-3 independent replicates: is there a better way of incorporating this into the table to make it more streamlined? It would be useful to see each cargo as a mean with error. Is there a more demonstrative way to present the table, for example (but does not have to be) fastest cargo first (Tintra) as in Table 2?

      As suggested by the reviewer, we revised Table 1. We calculated the mean and SD of t<sub>intra</sub> and arranged our RUSH reporters in ascending order based on their t<sub>intra</sub> values.

      (4) Line 264 / Fig 3B: It's unclear to me why the VHH-anti-GFP-mCherry internalisation approach was used, when the cells were expressing GFP, that could be used for imaging. Also, this introduces a question over trafficking of the VHH itself, to access the same compartments as the GFP-proteins are localised. It would be useful to describe the choice of this approach briefly in the text.

      Here, the surface-labeling approach is used to investigate if GFP-Tac-TC possesses a Golgi retrieval pathway after its exocytosis to the plasma membrane. When VHH-anti-GFP-mCherry is added to the tissue culture medium, it binds to the cell surface-exposed GFP-fused MGAT1, MGAT2, Tac, Tac-TC, CD8a, and CD8a-TC. Next, VHH-anti-GFP-mCherry traces the internalized GFP-fused transmembrane proteins. The surface-labeling approach has two advantages in this case. 1) It is much more sensitive in revealing the minor number of GFPtransmembrane proteins at the plasma membrane and endosomes, which are usually drowned in the strong Golgi and ER background fluorescence in the GFP channel. 2) While the GFP fluorescence distribution has reached a dynamic equilibrium, the surface labeling approach can reveal the endocytic trafficking route and dynamics.

      As the reviewer suggested, we added the following sentence to describe the choice of the cellsurface labeling – “By binding to the cell surface-exposed GFP, VHH-anti-GFP-mCherry serves as a sensitive probe to track the endocytic trafficking itinerary of the above GFP-fused transmembrane proteins”. 

      Regarding the trafficking of VHH-anti-GFP-mCherry itself, in HeLa cells that do not express GFP-fused transmembrane proteins, VHH-anti-GFP-mCherry can be internalized by fluidphase endocytosis. However, the fluid-phase endocytosis is negligible under our experimental condition, as we previously demonstrated (Sun et al., JCS, 2021; PMID: 34533190).

      (5) 446 Typo "internalization"

      It has been corrected.

      Reviewer #3 (Recommendations For The Authors):

      Below are my recommendations for the authors to improve their manuscript:

      We sincerely appreciate the reviewer's insightful, detailed, and constructive feedback. Your thoughtful comments have helped us refine our analyses, clarify key points, and strengthen the overall quality of our manuscript. We are grateful for the time and effort you have dedicated to reviewing our work and providing valuable suggestions. Your input has been instrumental in improving both the scientific rigor and presentation of our findings. Thank you for your thorough and thoughtful review.

      (1) Line 48: Tie at al. 2016 is cited. Please add references to original work showing that cargos transit from cis to trans Golgi cisternae.

      After reviewing the literature, we identified two references that provide some of the earliest morphological evidence of secretory cargo transit from the cis- to the trans-Golgi:

      (1) Castle et al, JCB, 1972; PMID: 5025103

      (2) Bergmann and Singer, JCB, 1983; PMID: 6315743

      The first study utilized pulse-chase autoradiographic EM imaging to track secretory protein movement, while the second employed immuno-EM imaging to observe the synchronized release of VSVGtsO45. Accordingly, we have removed Tie et al., 2016 and replaced it with these newly identified references.

      (2) I would suggest to cite earlier (in the Introduction) the rapid partitioning and rim progression models.

      As suggested, we have moved the rapid partitioning and rim progression models to the Introduction section.

      (3) Figure 1: LQ vs. time plot for SBP-GFP-CD8a-furinAC (panel I, 0.9 to 1.75 in 150 min) is different from Fig 7G of Tie et al. 2016 (LQ O-1.5 in 100 min). Please comment on why those 2 sets of data are different.

      We appreciate the reviewer for pointing out this error. In our previous publication (Tie et al., MBoC, 2016), we presented a total of four datasets on SBP-GFP-CD8a-furin-AC. However, in the earlier version of our manuscript, we mistakenly listed only three datasets, inadvertently omitting Fig. 7G from Tie et al., MBoC, 2016.

      In the revised version, we have now included Fig. S2T (SBP-GFP-CD8a-furin-AC #4), which corresponds to Fig. 7G from Tie et al., MBoC, 2016.

      (4) As mentioned in the public review, I think measurement of the expression level of the cargos is necessary to compare their transport kinetics.

      The reviewer raises a valid concern that is challenging to address. All our data were obtained by imaging overexpressed reporters, and we assume that their overexpression does not significantly impact the Golgi or the secretory pathway. Our previous studies have demonstrated that overexpression does not substantially affect LQs (Figure S2 of Tie et al., MBoC, 2016, and Figure S1 of Tie et al., JCB, 2022).

      We acknowledge this concern as one of the limitations in our study at the end of our manuscript:

      “First, our approach relied on the overexpression of fluorescence protein-tagged cargos. The synchronized release of a large amount of cargo could significantly saturate and skew the intra-Golgi transport.” 

      (5) To my opinion, cisternal continuities would also affect retrograde transport (accelerate) (by diffusion for instance) and not only retrograde transport. Please comment on how this would affect intra-Golgi transport kinetics.

      We believe the reviewer is suggesting “cisternal continuities would also affect retrograde transport (accelerate) (by diffusion for instance) and not only anterograde transport.”

      Transient cisternal continuities have been reported to facilitate the anterograde transport of large quantities of secretory cargos (Beznoussenko et al., 2014; PMID: 24867214) (Marsh et al., 2004; PMID: 15064406) (Trucco et al., 2004; PMID: 15502824). However, we are not aware of any reports demonstrating that such continuities facilitate the retrograde transport of secretory cargo, although Trucco et al. (2004) speculated that Golgi enzymes might use these connections to diffuse bidirectionally (anterograde and retrograde direction). For this reason, we did not discuss this scenario in our manuscript.

      (6) Lines 188-190: I don't understand why the rapid partitioning model is excluded. Please detail more the arguments used for this statement.

      Below is the section from the Introduction that addresses the reviewer's question.

      “This model (rapid partitioning model) suggests that cargos rapidly diffuse throughout the Golgi stack, segregating into multiple post-translational processing and export domains, where cargos are packed into carriers bound for the plasma membrane. Nonetheless, synchronized traffic waves have been observed through various techniques, including EM (Trucco et al., 2004) and advanced light microscopy methods we developed, such as GLIM and side-averaging(Tie et al., 2016; Tie et al., 2022). These findings suggest that the rapid partitioning model might not accurately represent the true nature of the intra-Golgi transport.”

      (7) I would suggest replacing the 'Golgi residence time' by another name as it reflects mainly the time of Golgi exit if I am not mistaken.

      We believe the term “Golgi residence time” more accurately reflects the underlying mechanism – retention. The same approach to measure the Golgi residence time can also be applied to Golgi enzymes such as ST6GAL1. Its slow Golgi exit kinetics (t<sub>1/2</sub> = 5.3 hours) (Sun et al., JCS, 2021) should be primarily due to a strong Golgi retention at its steady state Golgi localization.

      In contrast, the conventional secretory cargos’ Golgi exit times are usually much shorter (t<sub>1/2</sub> < 20 min) (Table 2) due to weaker Golgi retention. In a broader sense, the Golgi exit kinetics of a secretory cargo should be influenced by its Golgi retention. Furthermore, we have consistently used the term “Golgi residence time” in our previous publications. So, we propose maintaining this terminology in the current manuscript.

      (8) Lines 300-306: I would suggest that the authors remove this part as it is highly speculative and not supported by data.

      We have relocated this discussion to the section titled "Our data supports the rim progression model, a modified version of the stable compartment model."

      Our enzyme matrix hypothesis offers a potential explanation for key observations, including the differential cisternal localization of small and large cargos and the interior localization of Golgi enzymes. Cryo-FIB-ET has shown that the interior of Golgi cisternae is enriched with densely packed Golgi enzymes (Engel et al., PNAS, 2015; PMID: 26311849), supporting this hypothesis.

      Additionally, this hypothesis helps explain the gradual reduction in intra-Golgi transport velocities of secretory cargos, as requested by Reviewer #1 (Minor Points 4). For these reasons, we propose retaining this discussion in the manuscript.

      (9) In Figure 3B, percentage of MGAT2-GFP cells with anti-GFP signal at the Golgi is of 41% while Sun et al. 2021 reported 25%, please comment this difference. Reply:

      We included more cells for the quantification. The percentage of cells showing Golgi localization of VHH-anti-GFP-mCherry is now 32% (n = 266 cells). The observed difference, 32% vs. 25% (Sun et al., JCS, 2021), is likely due to uncontrollable variations in experimental conditions, which might have influenced the endocytic Golgi targeting efficiency.

      (10) The effects of brefeldinA are pleiotropic as it disassembles COPI and clathrin coats but also induces tubulation of endosomes. I would recommend using Golgicide A, which is more specific.

      We agree with the reviewer that Golgicide A might be more specific as an inhibitor of Arf1. We will certainly consider using this inhibitor next time.

    1. Author Response:

      The following is the authors' response to the original reviews.

      Reviewer #1 (Public Review):

      The authors investigated state-dependent changes in evoked brain activity, using electrical stimulation combined with multisite neural activity across wakefulness and anesthesia. The approach is novel, and the results are compelling. The study benefits from an in-depth sophisticated analysis of neural signals. The effects of behavioral state on brain responses to stimulation are generally convincing.

      It is possible that the authors' use of "an average reference montage that removed signals common to all EEG electrodes" could also remove useful components of the signal, which are common across EEG electrodes, especially during deep anesthesia. For example, it is possible (in fact from my experience I would be surprised if it is not the case) that under isoflurane anesthesia, electrical stimulation induces a generalized slow wave or a burst of activity across the brain. Subtracting the average signal will simply remove that from all channels. This does not only result in signals under anesthesia being affected more by the referencing procedure than during waking but also will have different effects on different channels, e.g. depending on how strong the response is in a specific channel.

      We thank the reviewer for the positive comments and for raising this point. We do not believe that the average reference montage is obscuring an evoked slow wave in the isoflurane-anesthetized mice. Electrical stimulation did elicit a brief activation in nearby neurons that was followed by roughly 200 ms of quiescence, but no significant changes in firing in the other regions we recorded from (Author response image 1).

      Author response image 1

      ERP and evoked population activity during isoflurane anesthesia do not show evidence of global responses. (Top). ERP (-0.2 to +0.8 s around stimulus onset) with all EEG electrode traces superimposed. Data represented is the same: red traces have been processed with the average reference montage, black traces have not. (Bottom) Population mean firing rates from the areas of interest from the same experiment as above.

      We are familiar with the work from Dasilva et al. (2021), a study similar to ours because they also performed cortical electrical stimulation in mice anesthetized with isoflurane. They show widespread evoked multi-unit activity (derived from LFP) in isoflurane-anesthetized mice in response to electrical stimulation, but critical experimental differences may underlie the conflicting results presented in our study. Both works use similar levels of isoflurane to maintain anesthesia (we use a level roughly equivalent to their “deep” level). However, our experiments use only isoflurane, whereas Dasilva et al. induced anesthesia with ketamine and medetomidine followed by isoflurane. It has been shown that isoflurane and ketamine have different effects on neural dynamics (Sorrenti et al., 2021). Typically, isoflurane causes reduced spontaneous firing rates and decreased evoked response amplitudes compared to wakefulness, whereas ketamine has been shown to increase firing rates and evoked response amplitudes (Aasebø et al., 2017; Michelson & Kozai, 2018). Perhaps a more relevant difference are the electrical stimulation parameters used to perturb the brain. Dasilva et al. used 1 ms pulses of 500 μA, which would have a much larger effect than the stimulation used in this work, 0.2 ms pulses of 10-100 μA.

      Additionally, we would like to clarify that the average reference montage is not impacting the main findings of this work. As the reviewer correctly pointed out, the average reference montage does change the appearance of the ERP in the butterfly plots (Top panel in Author response image 1). However, all the quantitative analyses of the EEG-ERPs are performed on the global field power, computed by taking the standard deviation across all EEG channels, which is not affected by the average reference montage.

      Reviewer #2 (Public Review):

      […] The conclusions regarding the thalamic contributions to the ERP components are strongly supported by the data.

      The spatiotemporal complexity is almost a side point compared to what seems to be the most important point of the paper: showing the contribution of thalamic activity to some components of the cortical ERP. Scalp ERPs have long been regarded as purely cortical phenomena, just like most EEGs, and this study shows convincing evidence to the contrary.

      The data presented seemingly contradicts the results presented by Histed et al. (2009), who assert that cortical microstimulation only affects passing fibers near the tip of the electrodes, and results in distant, sparse, and somewhat random neural activation. In this study, it is clear that the maximum effect happens near the electrodes, decays with distance, and is not sparse at all, suggesting that not only passing fibers are activated but that also neuronal elements might be activated by antidromic propagation from the axonal hillock. This appears to offer proof that microstimulation might be much more effective than it was thought after the publication of Histed 2009, as the uber-successful use of DBS to treat Parkinson's disease has also shown.

      We thank the reviewer for their positive comments and thoughtful suggestions. We appreciate and agree with the reviewer’s perspective that the thalamic contribution to the cortical ERP is one of the key points of this study. We also thank the reviewer for their comment on the apparently contradictory results reported by Histed et al. (2009). This gives us the opportunity to further highlight the important contribution of our study to the field.

      First, we would like to highlight some key experimental differences between the two studies. In our study we used single pulse stimulation with currents between 10 and 100 μA, whereas Histed et al. used trains of pulses (100 ms in duration at 250 Hz) with lower current intensities (between 2 and 50 μA). We varied the depth of stimulation, targeting superficial and deep cortical layers; Histed et al. exclusively stimulated superficial cortical layers. In addition, the two studies used recording methods that are orthogonal in nature. We used Neuropixels probes that record from neurons that span all cortical layers depth-wise while Histed et al. used two-photon calcium imaging to record from a horizontal plane of neurons (again, in the superficial cortical layers).

      Because of these important methodological differences, it is more appropriate to compare the Histed et al. results to our results from superficial stimulation at comparable current intensities. In this case, we believe the two studies show similar results: stimulation activated a small fraction of neurons even hundreds of microns away from the stimulating electrode (see Figure 4A from our manuscript). However, our study adds an important observation pointing to the critical role of the depth of the stimulating electrode. We observe significant excitation of local cortical neurons (Figure 4D) and trans-synaptic activation of the thalamus only when we delivered deep stimulation (Figure5A). This effect is likely mediated by activation of large, myelinated cortico-thalamic fibers, which are thought to be more excitable that non-myelinated horizontal fibers (Tehovnik & Slocum, 2013).

      To summarize, Histed et al. (2009) concluded that microstimulation causes a sparse activation of a distributed set of neurons with little evidence of synaptically driven activation. Instead, we showed that microstimulation can robustly activate local neurons and trans-synaptically activate distant neurons when stronger stimuli are directed to deep cortical layers. Based on this, we conclude that electrical stimulation is indeed highly effective, and is a valid tool that can be used to probe and characterize the cortico-thalamo-cortical network of any behavioral state.

      ----------

      Reviewer #1 (Recommendations for the authors):

      1. I am not clear how "putative pyramidal" or RS and "putative inhibitory" fast-spiking neurons were identified. Please provide some further details on that, including average spike wave shapes, and distribution of firing rates, and it would be interesting to know the proportion of "putative" RS and FS neurons in your recorded population. Obviously, caution is warranted here because, without further work, you cannot be sure that those are indeed pyramidal cells or interneurons! Is this subdivision necessary at all?

      We added details regarding the cell-type classification to the Results (lines 136-140) and the Methods section. This classification is common practice in cortical extracellular electrophysiology recordings given that cell-type specific analyses can reveal important differences between the two putative populations (Barthó et al., 2004; Bortone et al., 2014; Bruno & Simons, 2002; Jia et al., 2016; Niell & Stryker, 2008; Sirota et al., 2008). Based on our findings that the two populations respond to electrical stimulation in similar ways (excitation followed by a period of quiescence and rebound excitation), we agree the subdivision is not necessary to support our conclusions. However, we believe that some readers will appreciate seeing the two putative populations presented separately.

      2. I wonder how the authors know whether the animals were awake, specifically when they were not running. Did you observe animals falling asleep when head-fixed? Providing some analyses of spontaneous EEG/LFP signals in each state could add some reassurance that only wakefulness was included, as intended.

      While we cannot conclusively rule out that mice were asleep during the “quiet wakefulness” periods we analyzed, we believe they are likely to be awake for two main reasons: 1) all the experiments are performed during the dark phase of the light/dark cycle, when the mice are less likely to enter a sleep state (Franken et al., 1999); 2) the animals are not undergoing specific training to promote drowsiness or sleep. Indeed, many sleep-focused studies in head-fixed mice are performed during the light phase of the animal’s cycle to maximize the likelihood of capturing sleep states (Kobayashi et al., 2023; Turner et al., 2020; Yüzgeç et al., 2018; Zhang et al., 2022). We have added this note to the Discussion section (lines 402-406).

      Because we do not specifically record during sleep states and our recording does not include electromyography, which is commonly used in conjunction with EEG to classify sleep stages, we cannot accurately perform spectral comparison between “quiet wakefulness” and sleep states in our recordings.

      3. I was unsure about the meaning of some of the terminology, specifically "rebound", "rebound spiking", "rebound excitation" etc. Why do you call it "rebound"?

      “Rebound” is a term often used to describe a period of enhanced spiking following a period of prolonged silence or inhibition (Guido & Weyand, 1995; Roux et al., 2014). Grenier et al. list “postinhibitory rebound excitation” as an intrinsic property of cortical and thalamic neurons (1998). We added this description to the text (lines 79-80).

      Reviewer #2 (Recommendations For The Authors):

      Regarding analysis, I would make three main points:

      Regarding the CSD analysis, I think the authors have done a good job of circumventing several of the known issues of this technique, especially by using ERPs rather than ongoing activity. However, although I do not immediately have access to the literature to back up this claim, I've heard that many assumptions behind CSD require a laminar structure with electrodes positioned perpendicular to these layers. In Figure 1B it seems like the neuropixels probe is not really perpendicular to the cortical layers, and I wonder if this might be an issue. I am also wondering how to interpret the thalamic CSD, as this structure is not laminar, lacks the mass of neatly stacked neuronal dipoles present in the cortex, and does not have an orderly array of synaptic inputs and outputs. I understand that CSD analysis helps minimize the contributions of volume conduction, but in this case, I also wonder if the thalamic CSD is even necessary to back up the paper's claims.

      One-dimensional CSD is computed assuming that the electrode is inserted perpendicular to cortex. This is mainly important for the interpretation of sinks and sources, since CSD can be also computed on radial voltages (e.g., EEG [Tenke & Kayser, 2012]). In general, our Neuropixels probes do not significantly deviate from perpendicular (mean deviation from perpendicular 15.3 degrees, minimum 5.2 degrees, and maximum 36.6 degrees). The probe represented in Figure 1B deviates from perpendicular by 31.2 degrees, which is an outlier compared to the rest of the insertions. Any deviation from perpendicular would result in the “effective” cortical thickness being larger by a factor of 1/cos(angle deviation from perpendicular) and thus would not affect the relative location of sources and sinks. We have added a statement to clarify this in the text (lines 126 and 454-456).

      We agree with the statement regarding CSD analysis in the thalamus. We originally included the CSD for the thalamus in Figure 2F for completeness. As the reviewer pointed out, thalamic CSD was not used to perform any subsequent analysis and is, therefore, not necessary to back up any claims. As such, we have removed CSD plot from Figure 2F to avoid any confusion and made a comment to this effect in the legend (lines 1175-1177).

      On the merits of using the z-score normalization for spike rates vs. other strategies like standardizing to maximum firing, I am aware that both procedures have limitations, but the z-score changes the range of the firing rate from [0, +Inf] to [-Inf, +Inf]. This does not seem correct considering that negative spiking rates do not exist. The standardization to maximum rate keeps the range within [0, 1], not creating negative rates. Another point that it will be worth discussing is the reported values of the z-scored values. For example, what does it mean to be 54 standard deviations away from the mean? 6 standard deviations is already a big distance from the mean.

      For Figure 2, we chose to represent the neural firing rates as z-scores because we found it important to report the magnitude of both the increase and decrease of the evoked firing rates in the post-stimulus period relative to the pre-stimulus rate. The normalization we used helps to visualize the magnitude of the effects of electrical stimulation in neuronal activity for both directions, which is an important result of the study. Despite the differences between the two normalization methods, the normalization based on the maximum firing does not significantly change the qualitative interpretation of Figure 2 in the manuscript (Author response image 2).

      Author response image 2

      Evoked firing rates for neurons in the areas of interest in response to deep stimulation in MO during the awake state. (Left) Firing rates of all neurons normalized by the average, pre-stimulus firing rate. (Right) Firing rates of all neurons normalized by the maximum post-stimulus firing rate.

      Regarding Figure 3 and the associated text, we would like to clarify that the magnitude metric is not simply a z-score value (with units of s.d.) but rather it is the integrated area under the z-scored response over the response window (with units of s.d.∙seconds). This can help explain why we see values of ~50 s.d.∙s. We chose to z-score firing rates, LFP, and CSD to normalize across the different signals and magnitudes of the evoked responses. We often observed the largest responses in the LFP (see Figure 3A), which may be partly due to the signal naturally having a larger dynamic range than the measured neural firing rates. Then we integrated the z-score response time series to capture the dynamic of the signal over the response window, rather than a static value such as the mean or maximum z-score. After performing a thorough literature search, we found no other ways to capture and compare the magnitudes of the different signals. We have added language to clarify the magnitude metric (lines 155-156) and added the appropriate units.

      In reporting the p-values, I recommend increasing the number of significant digits to four because the p-value seems to be the same for different tests in several places (e.g.: lines 207 to 218), which seems odd. I also wonder whether this could be an artifact of the z-scoring procedure. In the figures, I would like to advise the use of 1 asterisk to denote "weak evidence to reject the null hypothesis (0.05 > p > 0.01)" and two asterisks to denote "strong evidence to reject the null hypothesis (0.01 > p)", and make a note of it accordingly in the manuscript and/or figure legends.

      According to the reviewer’s suggestion, we have changed the statistics language to “* weak evidence to reject null hypothesis (0.05 > p > 0.01), ** strong evidence to reject null hypothesis (0.01 > p > 0.001), *** very strong evidence to reject null hypothesis (0.001 > p)” throughout the manuscript.

      We have also increased the number of significant digits to four throughout the manuscript. It is true that some of the p-values reported for Figure 3 (lines 169-180) are the same for different tests. This is not an artifact of the z-scoring, but rather a consequence of performing the Wilcoxon signed-rank test (an ordinal statistical test) with small sample numbers. Because the p-value depends only on the relative ordering, not the continuous distribution of values, the small sample size (N=6-14) increases the likelihood of obtaining the exact same p-value if the relative ordering of samples is the same.

      Line 202: If the magnitude corresponds to z-score data, please add "s.d." after the number, as z-scored values are expressed in standard deviation units. Please update this throughout the paper.

      As stated above the magnitude metric is the integrated area under the z-scored response over the response window (with units of s.d.∙seconds). We have added the correct units in all places.

      Line 214: Please report how the multiple comparisons correction was performed

      We have added the test used for multiple comparisons in line 169 (formerly line 214) and in the Methods section (line 770).

      Line 462: please replace "Neuropixels activity" with "LFP and single-unit activity".

      We changed the wording to specify “LFP, and single neuron responses…” (now line 337).

      Line 475: a short explanation of the bi-stability phenomena will be helpful for the reader.

      We added the following description: “a state characterized by spontaneous alternation between bouts of activity and periods of silence” (lines 350-351).

      Line 601: It is asserted that "Electrical stimulation directly activates local cells and axons that run near the stimulation site via activation of the axon initial segment" and the paper by Histed et al. 2009 is cited. This does not seem like an appropriate citation, as Histed et al. explicitly state that electrical microstimulation does not activate local neuronal bodies near the electrode tip. See my comment above.

      Upon further reading, we believe we are seeing evidence of direct axonal activation and subsequent antidromic activation of local cell bodies, as you suggested in your above comment and has been proposed by many including Histed et al. (2009) and Nowak and Bullier (1998). We edited our sentence accordingly, kept the Histed et al. citation, and added other relevant citations (lines 487-490).

      References

      • Aasebø, I. E. J., Lepperød, M. E., Stavrinou, M., Nøkkevangen, S., Einevoll, G., Hafting, T., & Fyhn, M. (2017). Temporal Processing in the Visual Cortex of the Awake and Anesthetized Rat. ENeuro, 4(4), 59–76. https://doi.org/10.1523/ENEURO.0059-17.2017

      • Barthó, P., Hirase, H., Monconduit, L., Zugaro, M., Harris, K. D., & Buzsáki, G. (2004). Characterization of Neocortical Principal Cells and Interneurons by Network Interactions and Extracellular Features. Journal of Neurophysiology, 92(1), 600–608. https://doi.org/10.1152/jn.01170.2003

      • Bortone, D. S., Olsen, S. R., & Scanziani, M. (2014). Translaminar Inhibitory Cells Recruited by Layer 6 Corticothalamic Neurons Suppress Visual Cortex. Neuron, 82, 474–485. https://doi.org/10.1016/j.neuron.2014.02.021

      • Bruno, R. M., & Simons, D. J. (2002). Feedforward Mechanisms of Excitatory and Inhibitory Cortical Receptive Fields. The Journal of Neuroscience, 22(24), 10966–10975. https://doi.org/10.1523/JNEUROSCI.22-24-10966.2002

      • Dasilva, M., Camassa, A., Navarro-Guzman, A., Pazienti, A., Perez-Mendez, L., Zamora-López, G., Mattia, M., & Sanchez-Vives, M. V. (2021). Modulation of cortical slow oscillations and complexity across anesthesia levels. NeuroImage, 224, 117415. https://doi.org/10.1016/j.neuroimage.2020.117415

      • Franken, P., Malafosse, A., & Tafti, M. (1999). Genetics of sleep regulation in mice-Franken et al Genetic Determinants of Sleep Regulation in Inbred Mice. SLEEP, 22(2). https://academic.oup.com/sleep/article/22/2/155/2731698

      • Grenier, F., Timofeev, I., & Steriade, M. (1998). Leading role of thalamic over cortical neurons during postinhibitory rebound excitation. Proceedings of the National Academy of Sciences of the United States of America, 95(23), 13929–13934. https://doi.org/10.1073/pnas.95.23.13929

      • Guido, W., & Weyand, T. (1995). Burst responses in thalamic relay cells of the awake behaving cat. Journal of Neurophysiology, 74(4), 1782–1786. https://doi.org/10.1152/JN.1995.74.4.1782

      • Histed, M. H., Bonin, V., & Reid, R. C. (2009). Direct Activation of Sparse, Distributed Populations of Cortical Neurons by Electrical Microstimulation. Neuron, 63(4), 508–522. https://doi.org/10.1016/j.neuron.2009.07.016

      • Jia, X., Siegle, J., Bennett, C., Gale, S., Denman, D. R., Koch, C., & Olsen, S. (2016). High-density extracellular probes reveal dendritic backpropagation and facilitate neuron classification 1 2. Journal of Neurophysiology, 121(5), 1831–1847. https://doi.org/10.1101/376863

      • Kobayashi, G., Tanaka, K. F., & Takata, N. (2023). Pupil Dynamics-derived Sleep Stage Classification of a Head-fixed Mouse Using a Recurrent Neural Network. The Keio Journal of Medicine, 2022-0020-OA. https://doi.org/10.2302/KJM.2022-0020-OA

      • Michelson, N. J., & Kozai, T. D. Y. (2018). Isoflurane and ketamine differentially influence spontaneous and evoked laminar electrophysiology in mouse V1. Journal of Neurophysiology, 120(5), 2232. https://doi.org/10.1152/JN.00299.2018

      • Niell, C. M., & Stryker, M. P. (2008). Highly selective receptive fields in mouse visual cortex. Journal of Neuroscience, 28(30), 7520–7536. https://doi.org/10.1523/JNEUROSCI.0623-08.2008

      • Nowak, L. G., & Bullier, J. (1998). Axons, but not cell bodies, are activated by electrical stimulation in cortical gray matter. II. Evidence from selective inactivation of cell bodies and axon initial segments. Experimental Brain Research, 118(4), 489–500. https://doi.org/10.1007/S002210050305/METRICS

      • Roux, L., Stark, E., Sjulson, L., & Buzsáki, G. (2014). In vivo optogenetic identification and manipulation of GABAergic interneuron subtypes. Current Opinion in Neurobiology, 26, 88–95. https://doi.org/10.1016/j.conb.2013.12.013

      • Sirota, A., Montgomery, S., Fujisawa, S., Isomura, Y., Zugaro, M., & Buzsáki, G. (2008). Entrainment of Neocortical Neurons and Gamma Oscillations by the Hippocampal Theta Rhythm. Neuron, 60(4), 683–697. https://doi.org/10.1016/j.neuron.2008.09.014

      • Sorrenti, V., Cecchetto, C., Maschietto, M., Fortinguerra, S., Buriani, A., & Vassanelli, S. (2021). Understanding the Effects of Anesthesia on Cortical Electrophysiological Recordings: A Scoping Review. International Journal of Molecular Sciences, 22(3), 1286. https://doi.org/10.3390/IJMS22031286

      • Tehovnik, E. J., & Slocum, W. M. (2013). Two-photon imaging and the activation of cortical neurons. Neuroscience, 245(March), 12–25. https://doi.org/10.1016/j.neuroscience.2013.04.022

      • Tenke, C. E., & Kayser, J. (2012). Generator localization by current source density (CSD): Implications of volume conduction and field closure at intracranial and scalp resolutions. Clinical Neurophysiology, 123(12), 2328–2345. https://doi.org/10.1016/J.CLINPH.2012.06.005

      • Turner, K. L., Gheres, K. W., Proctor, E. A., & Drew, P. J. (2020). Neurovascular coupling and bilateral connectivity during nrem and rem sleep. ELife, 9, 1. https://doi.org/10.7554/ELIFE.62071

      • Yüzgeç, Ö., Prsa, M., Zimmermann, R., & Huber, D. (2018). Pupil Size Coupling to Cortical States Protects the Stability of Deep Sleep via Parasympathetic Modulation. Current Biology, 28(3), 392. https://doi.org/10.1016/J.CUB.2017.12.049

      • Zhang, X., Landsness, E. C., Chen, W., Miao, H., Tang, M., Brier, L. M., Culver, J. P., Lee, J. M., & Anastasio, M. A. (2022). Automated sleep state classification of wide-field calcium imaging data via multiplex visibility graphs and deep learning. Journal of Neuroscience Methods, 366, 109421. https://doi.org/10.1016/J.JNEUMETH.2021.109421

    1. Author response:

      The following is the authors’ response to the current reviews.

      Reviewer #1 (Public Review):

      Overall the authors provide a very limited data set and in fact only a proof of concept that their sensor can be applied in vivo. This is not really a research paper, but a technical note. With respect to their observation of clustered activity, they now provide an overview image, next to zoomed details. However, from these images one cannot conclude 'by eye' any clustering event. This aligns with the very low r values. All neurons in the field show variable activity and a clustering is not really evident from these examples. Even within a cluster, there is variability. The authors now confirm that expression levels are indeed variable but are independent from the ratio measurements. Further, they controlled for specificity by including DAPT treatments, but opposite to their own in vitro data (in primary neurons) the ratios increased. The authors argue that both distance and orientation can either decrease or increase ratios and that the use of this biosensor should be explored model-by-model. This doesn't really confer high confidence and may hinder other groups in using this sensor reliably.

      Secondly, there is still no physiological relevance for this observation. The experiments are performed in wild-type mice, but it would be more relevant to compare this with a fadPSEN1 KI or a PSEN1cKO model to investigate the contribution of a gain of toxic function or LOF to the claimed cell non-autonomous activations. The authors acknowledge this shortcoming but argue that this is for a follow-up study.

      For instance, they only monitor activity in cell bodies, and miss all info on g-sec activity in neurites and synapses: what is the relevance of the cell body associated g-sec and can it be used as a proxy for neuronal g-sec activity? If cells 'communicate' g-sec activities, I would expect to see hot spots of activity at synapses between neurons.

      Without some more validation and physiologically relevant studies, it remains a single observation and rather a technical note paper, instead of a true research paper.

      The effect size was small, as stated in the original and revised manuscripts and the point-by-point responses to the 1st round review. Such subtle effects will likely be challenging to detect by eye. However, our unbiased quantification allowed us to detect a statistically significant linear correlation between the 720/670 ratio in each neuron and the average ratio in neighboring neurons, which we have verified using many different approaches (Figure 3, Figure 3—figure supplement 2, and Figure 4), and the correlation was canceled by the administration of g-secretase inhibitor (Figure 5). Such objective analysis made us more confident to conclude that g-secretase affects g-secretase in neighboring neurons.

      We would also like to make clear the design of the C99 720-670 biosensor. Both C99, the sensing domain that is cleaved by g-secretase, and the anchoring domain fused to miRFP670 are integrated into the membrane (Figure 1A). Therefore, how these two domains with four transmembrane regions are embedded in the membrane should affect the orientation between the donor, miRFP670, and the acceptor, miRFP720. As noted in our point-by-point responses to the initial review, we have previously validated that pharmacological inhibition of g-secretase significantly increases the FRET ratio in various cell lines, including CHO, MEF, BV2 cells, and mouse cortical primary neurons (Maesako et al., 2020; Houser et al., 2020, and unpublished observations). On the other hand, FRET reduction by g-secretase inhibition was found in mouse primary neurons derived from the cerebellum (unpublished observations) as well as the somatosensory cortex neurons in vivo (this study). While we could not use the exact same imaging set-up between cortical primary neurons in vitro and those in vivo due to different expression levels of the biosensor, we could do it for in vitro cortical primary neurons vs. in vitro cerebellum neurons. We found by the direct comparison that 720/670 ratios are significantly higher in the cerebellum than the cortex neurons even in the presence of 1 mM DAPT (Author response image 1), a concentration that nearly completely inhibits g-secretase activity. This suggests a different integration and stabilization pattern of the sensing and anchoring domains in the C99 720-670 biosensor between the cortex and cerebellum primary neurons, and thus, orientation between the donor and acceptor varies in the two neuronal types. We expect a similar scenario between cortical primary neurons in vitro and those in vivo. Of note, we have recently demonstrated that the cortex and cerebellum primary neurons exhibit distinct membrane properties (Lundin and Wieckiewicz et al., 2024 in revision), suggesting the different baseline FRET could be related to the different membrane properties between the cortex and cerebellum primary neurons. On the other hand, this raises a concern that 720/670 ratios can be affected not only by g-secretase activity but also by other cofounders, such as altered membrane properties. However, a small but significant correlation between the 720/670 ratio in a neuron and those ratios in its neighboring neurons is canceled by g-secretase inhibitor (Figure 5), suggesting that the correlation between the 720/670 ratio in a neuron and those in its neighboring neurons is most likely dependent on g-secretase activity. Taken together, we currently think orientation plays a significant role in our biosensor and would like to emphasize the importance of ensuring on a model-by-model basis whether the cleavage of the C99 720-670 biosensor by g-secretase increases or decreases 720/670 FRET ratios.

      Author response image 1.

      Furthermore, we co-expressed the C99 720-670 biosensor and visible range fluorescence reporters to record other biological events, such as changes in ion concentration, in cortex primary neurons. Interestingly, several biological events uniquely detected in the neurons with higher 720/670 ratios, which are expected to exhibit lower endogenous g-secretase activity, are recapitulated by pharmacological inhibition of g-secretase (unpublished observations), ensuring that higher 720/670 ratios are indicative of lower g-secretase activity in mouse cortex primary neurons. Such multiplexed imaging will help to further elucidate how the C99 720-670 biosensor behaves in response to the modulation of g-secretase activity.

      Lastly, the scope of this study was to develop and validate a novel imaging assay employing a NIR FRET biosensor to measure g-secretase activity on a cell-by-cell basis in live wild-type mouse brains. However, we do appreciate the reviewer’s suggestion and think employing this new platform in FAD PSEN1 knock-in (KI) or PSEN1 conditional knockout (cKO) mice would provide valuable information. Furthermore, we are keen to expand our capability to monitor g-secretase with subcellular resolution in live mouse brains in vivo, which we will explore in follow-up studies. Thank you for your thoughtful suggestions.

      Reference

      - Maesako M, Sekula NM, Aristarkhova A, Feschenko P, Anderson LC, Berezovska O. Visualization of PS/γ-Secretase Activity in Living Cells. iScience. 2020 Jun 26;23(6):101139.

      - Houser MC, Hou SS, Perrin F, Turchyna Y, Bacskai BJ, Berezovska O, Maesako M. A Novel NIR-FRET Biosensor for Reporting PS/γ-Secretase Activity in Live Cells. Sensors (Basel). 2020 Oct 22;20(21):5980.

      - Lundin B, Wieckiewicz N, Dickson JR, Sobolewski RGR, Sadek M, Armagan G, Perrin F, Hyman BT, Berezovska O, and Maesako M. APP is a regulator of endo-lysosomal membrane permeability. 2024 in revision

      Reviewer #2 (Public Review):

      Regarding the variability and spatial correlation- the dynamic range of the sensor previously reported in vitro is in the range of 20-30% change (Houser et al 2020) whereas the range of FR detected in vivo is between cells is significantly larger in this MS. This raises considerable doubts for specific detection of cellular activity.

      One direct way to test the dynamic range of the sensor in vivo, is to increase or decrease endogenous gamma-secretase activity and to ensure this experimental design allows to accurately monitor gamma-secretase activity. In the previous characterization of the reporter (Hauser et al 2020), DAPT application and inhibition of gamma-secretase activity results in increased FR (Figures 2 and 3 of Houser et al). This is in agreement with the design of the biosensor, since FR should be inversely correlated with enzymatic activity. Here, the authors repeated the experiment, and surprisingly found an opposite effect, in which DAPT significantly reduced FR.

      The authors maintain that this result could be due to differences in cell-types, However, this experiment was previously performed in cultures cortical neurons and many different cell types, as noted by the authors in their rebuttal.

      Instead, I would argue that these results further highlight the concerns of using FR in vivo, since based on their own data, there is no way to interpret this quantification. If DAPT reduces FR, does this mean we should now interpret the results of higher FR corresponds to higher g-sec activity? Given a number of papers from the authors claiming otherwise, I do not understand how one can interpret the results as indicating a cell-specific effect.

      In conclusion, without any ground truth, it is impossible to assess and interpret what FR measurements of this sensor in vivo mean. Therefore, the use of this approach as a way to study g-sec activity in vivo seems premature.

      Please find our response to reviewer 1’s similar critique above. Here, we again would like to re-clarify the design of our C99 720-670 biosensor. The orientation between the donor, miRFP670, and acceptor, miRFP720, is dependent on how C99, the sensing domain that is cleaved by g-secretase, and the anchoring domain are integrated into the membrane (Figure 1A). Although it was surprising to us, it is possible that g-secretase inhibition decreases 720/670 ratios if 1) the donor-acceptor orientation plays a significant role in FRET and 2) the baseline structure of the C99 720-670 biosensor is different between cell types. This appears to be the case between the cortex and cerebellum primary neurons (i.e., DAPT increases 720/670 ratios in the cortex neurons while decreasing in the cerebellum neurons), and we expect it in cortical neurons in vitro vs. in vivo as well. Hence, we recommend that users first validate whether the cleavage of the C99 720-670 biosensor by g-secretase increases or decreases 720/670 FRET ratios in their models. If DAPT increases 720/670 ratios (like in cortex primary neurons, CHO, MEF, and BV2 cells that we have validated), the results of higher ratios should be interpreted as lower g-secretase activity. If DAPT reduces 720/670 ratios (like in cerebellum primary neurons and the somatosensory cortex neurons in vivo), we should interpret the results of higher ratios corresponding to higher g-secretase activity. From a biosensing perspective, although we need to know which is the case on a model-by-model basis, we think whether g-secretase activity increases or decreases the 720/670 ratio is not critical; rather, if it can significantly change FRET efficiency is more important. Thank you for your critical comments.

      Reviewer #3 (Public Review):

      This paper builds on the authors' original development of a near infrared (NIR) FRET sensor by reporting in vivo real-time measurements for gamma-secretase activity in the mouse cortex. The in vivo application of the sensor using state-of-the-art techniques is supported by a clear description and straightforward data, and the project represents significant progress because so few biosensors work in vivo. Notably, the NIR biosensor is detectable to ~ 100 µm depth in the cortex. A minor limitation is that this sensor has a relatively modest ΔF as reported in Houser et al, which is an additional challenge for its use in vivo. Thus, the data is fully dependent on post-capture processing and computational analyses. This can unintentionally introduce biases but is not an insurmountable issue with the proper controls that the authors have performed here.

      The following opportunity for improving the system didn't initially present itself until the authors performed an important test of the FRET sensor in vivo following DAPT treatment. The authors get credit for diligently reporting the unexpected decrease in 720/670 FRET ratio. In turn this has led to a suggestion that this sensor would benefit from a control that is insensitive to gamma-secretase activity. FRET influences that are independent of gamma-secretase activity could be distinguished by this control.

      From previous results in cultured neurons, the authors expected an increase in FRET following DAPT treatment in vivo. These expectations fit with the sensor's mode-of-action because a block of gamma-secretase activity should retain the fluorophores in proximity. When the authors observed decreased FRET, the conclusion was that the sensor performs differently in different cellular contexts. However, a major concern is that mechanistically it is unclear how this could occur with this type of sensor. The relative orientation of fluorophores indeed can contribute to FRET efficiency in tension-based sensors. However, the proteolysis expected with gamma-secretase activity would release tension and orientation constraints. Thus, the major contributing FRET factor is expected to be distance, not orientation. Alternative possibilities that could inadvertently affect readouts include an additional DAPT target in vivo sequestering the inhibitor, secondary pH effects on FRET, photo-bleaching, or an unidentified fluorophore quencher in vivo stimulated by DAPT. Ultimately this new FRET sensor would benefit from a control that is insensitive to gamma-secretase activity. FRET influences that are independent of gamma-secretase activity could be distinguished by this control.

      Given that the anchoring domain is composed of three transmembrane regions and the linker connecting the donor, miRFP670, and the acceptor, miRFP720, is highly flexibility, we are still not sure if the orientation constraint of the C99 720-670 biosensor is canceled by g-secretase cleavage. This means that the orientation between the donor and acceptor in the cleaved form of the sensor can be different between model and model. As explained in response to the similar critique of reviewer 1, we found that the 720/670 ratio is significantly higher in the cerebellum than in the cortex neurons even in the presence of DAPT (Figure 1 for the review only). Therefore, we currently think the donor-acceptor orientation, both in the cleaved and non-cleaved forms of the sensor, plays a role in determining whether g-secretase activity increases or decreases the 720/670 ratio (but this view may change depends on the future discoveries).

      As the reviewer pointed out, the NIR g-secretase biosensor with no biological activity is important; however, a point mutation in the transmembrane region of the C99 sensing domain could also result in altered orientation between the donor, miRFP670, and the acceptor, miRFP720, since C99 is connected to the acceptor, which may bring additional complexity. Also, as noted in our point-by-point responses to the initial review, the mutation(s) that can fully block C99 processing by g-secretase has not been established. Therefore, we asked if a subtle but significant correlation we found between the 720/670 ratio in a neuron and those ratios in its neighboring neurons is canceled by g-secretase inhibitor administration. Since the correlation was abolished (Figure 5), it suggests that the correlation between the 720/670 ratio in a neuron and those ratios in the neighboring neurons depends on g-secretase activity.

      It is not fully established how g-secretase activity is spatiotemporally regulated; therefore, the development of more appropriate control biosensors and further validation of our findings with complementary approaches would be crucial in our follow-up studies. Thank you for your valuable comments.


      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      (1) Overall the authors provide a very limited data set and in fact only a proof of concept that their sensor can be applied in vivo. This is not really a research paper, but a technical note. With respect to their observation of clustered activity, the images do not convince me as they show only limited areas of interest: from these examples (for instance fig 5) one sees that merely all neurons in the field show variable activity and a clustering is not really evident from these examples. Even within a cluster, there is variability. With r values between 0.23 to .36, the correlation is not that striking. The authors herein do not control for expression levels of the sensor: for instance, can they show that in all neurons in the field, the sensor is equally expressed, but FRET activity is correlated in sets of neurons? Or are the FRET activities that are measured only in positively transduced neurons, while neighboring neurons are not expressing the sensor? Without such validation, it is difficult to make this conclusion.

      We appreciate the reviewer’s comment. We agree with the reviewer that this study is not testing a new hypothesis but rather developing and validating a novel tool. However, we do believe such a “technical note” is as important as a “research paper” since advancing technique(s) is the only way to break the barrier in our understanding of complex biological events. Therefore, this study aimed to develop and validate a novel imaging assay employing a recently engineered NIR FRET biosensor to measure γ-secretase activity (Houser et al., 2020) on a cell-by-cell basis in live mouse brains, enabling us for the first time to examine how γ-secretase activity is regulated in individual neurons in vivo, and uncover that γ-secretase activity may influence γ-secretase in neighboring neurons. Like the reviewer, we found that the cell-to-cell correlation is not that striking, as we clearly stated in the original manuscript: “Although the effect size is modest, we also found a statistically significant correlation between…” 

      We were also aware that there is variability in a cluster of neurons exhibiting similar γ-secretase activities. Per the reviewer’s request, the images have been expanded to the entire imaging field of view (new Figure 3A). Although the effect size is small, our unbiased quantification showed a statistically significant linear correlation between the 720/670 ratio in each neuron and the average ratio in five neighboring neurons (Figure 3, Figure 3—figure supplement 2, and Figure 4), and the correlation was canceled by the administration of γ-secretase inhibitor (Figure 5). These findings made it impossible to conclude that γ-secretase does not affect γ-secretase in neighboring neurons.

      Regarding the expression levels and pattern of the sensor, an AAV-based gene delivery approach employed in this study results in the expression of the sensor not in all but in selected neurons. We have newly performed immunohistochemistry, showing that approximately 40% of NeuN-positive neurons express the C99 720-670 biosensor (new Figure 1—figure supplement 2A and 2B).

      Reference

      - Houser MC, Hou SS, Perrin F, Turchyna Y, Bacskai BJ, Berezovska O, Maesako M. A Novel NIRFRET Biosensor for Reporting PS/γ-Secretase Activity in Live Cells. Sensors (Basel). 2020 Oct 22;20(21):5980. 

      (2) Secondly, I am lacking some more physiological relevance for this observation. The experiments are performed in wild-type mice, but it would be more relevant to compare this with a fadPSEN1 KI or a PSEN1cKO model to investigate the contribution of a gain of toxic function or LOF to the claimed cell non-autonomous activations. Or what would be the outcome if the sensor was targeted to glial cells?

      The AAV vector in this study encodes the human synapsin promoter and our new immunohistochemistry demonstrates that nearly 100% of the cells expressing the C99 720-670 sensor are NeuN positive, and we hardly detected the sensor expression in Iba-1 or GFAP-positive cells (new Figure 1— figure supplement 2A and 2C). 

      The mechanism underlying the cell non-autonomous regulation of γ-secretase remains unclear. As discussed in our manuscript, one of the potential hypotheses could be that secreted abeta42 plays a role (Zoltowska et al., 2023 eLife). Whereas this report focuses on the development and validation of a novel assay using wildtype mice, future follow-up studies employing FAD PSEN1 knock-in (KI) and PSEN1 conditional knockout (cKO) mice would allow us test the hypothesis above since abeta42 is known to increase in some FAD PSEN1 KI mice (Siman et al., 2000 J Neurosci, Vidal et al., 2012 FASEB J) while decreases in PSEN1 cKO mice (Yu et al., 2001 Neuron).  

      Reference

      - Siman R, Reaume AG, Savage MJ, Trusko S, Lin YG, Scott RW, Flood DG. Presenilin-1 P264L knockin mutation: differential effects on abeta production, amyloid deposition, and neuronal vulnerability. J Neurosci. 2000 Dec 1;20(23):8717-26. 

      - Vidal R, Sammeta N, Garringer HJ, Sambamurti K, Miravalle L, Lamb BT, Ghetti B. The Psen1-L166Pknock-in mutation leads to amyloid deposition in human wild-type amyloid precursor protein YAC transgenic mice. FASEB J. 2012 Jul;26(7):2899-910. 

      - Yu H, Saura CA, Choi SY, Sun LD, Yang X, Handler M, Kawarabayashi T, Younkin L, Fedeles B, Wilson MA, Younkin S, Kandel ER, Kirkwood A, Shen J. APP processing and synaptic plasticity in presenilin-1 conditional knockout mice. Neuron. 2001 Sep 13;31(5):713-26. 

      - Zoltowska KM, Das U, Lismont S, Enzlein T, Maesako M, Houser MC, Franco ML, Moreira DG, Karachentsev D, Becker A, Hopf C, Vilar M, Berezovska O, Mobley W, Chávez-Gutiérrez L. Alzheimer's disease linked Aβ42 exerts product feedback inhibition on γ-secretase impairing downstream cell signaling. eLife. 2023. 12:RP90690

      (3) For this reviewer it is not clear what resolution they are measuring activity, at cellular or subcellular level? In other words are the intensity spots neuronal cell bodies? Given g-sec activity are in all endosomal compartments and at the cell surface, including in the synapse, does NIR imaging have the resolution to distinguish subcellular or surface localized activities? If cells 'communicate' g-sec activities, I would expect to see hot spots of activity at synapses between neurons: is this possible to assess with the current setup? 

      Since this study aimed to determine how γ-secretase activity is regulated on a cell-by-cell basis in live mouse brains, the FRET signal was detected in neuronal cell bodies. While our current set-up for in vivo can only record γ-secretase activity with a cellular resolution, we previously detected predominant γ-secretase activity in the endo-lysosomal compartments (Maesako et al., 2022 J Neurosci) as well as in certain spots of neuronal processes (Maesako et al., 2020 iScience) in cultured primary neurons using the same microscope set-up. Therefore, future studies will expand our capability to monitor γ-secretase with subcellular resolution in live mouse brains in vivo.

      Reference

      - Maesako M, Sekula NM, Aristarkhova A, Feschenko P, Anderson LC, Berezovska O. Visualization of PS/γ-Secretase Activity in Living Cells. iScience. 2020 Jun 26;23(6):101139. 

      - Maesako M, Houser MCQ, Turchyna Y, Wolfe MS, Berezovska O. Presenilin/γ-Secretase Activity Is Located in Acidic Compartments of Live Neurons. J Neurosci. 2022 Jan 5;42(1):145-154. 

      (4) Without some more validation and physiological relevant studies, it remains a single observation and rather a technical note paper, instead of a true research paper.

      Please find our response above to the critique (1).  

      Reviewer #2 (Public Review):

      (1) Regarding the variability and spatial correlation- the dynamic range of the sensor previously reported in vitro is in the range of 20-30% change (Houser et al 2020) whereas the range of FR detected in vivo is between cells is significantly larger (Fig. 3). This raises considerable doubts for specific detection of cellular activity (see point 3).

      Please find our response below to the critique (2).

      (2) One direct way to test the dynamic range of the sensor in vivo, is to increase or decrease endogenous gamma-secretase activity and to ensure this experimental design allows to accurately monitor gamma-secretase activity. In the previous characterization of the reporter (Hauser et al 2020), DAPT application and inhibition of gammasecretase activity results in increased FR (Figures 2 and 3 of Houser et al). This is in agreement with the design of the biosensor, since FR should be inversely correlated with enzymatic activity. Here, while the authors repeat the same manipulation and apply DAPT to block gamma-secretase activity, it seems to induce the opposite effect and reduces FR (comparing figures 8 with figures 5,6,7). First, there is no quantification comparing FR with and without DAPT. Moreover, it is possible to conduct this experiment in the same animals, meaning comparing FR before and after DAPT in the same mouse and cell populations. This point is absolutely critical- if indeed FR is reduced following DAPT application, this needs to be explained since this contradicts the basic design and interpretation of the biosensor.

      We appreciate the reviewer’s comment. In our hand, overexpression of γ-secretase four components (PSEN, Nct, Aph1, and Pen2) is the only reliable and reproducible approach to increase the cellular activity of γ-secretase, which we successfully employed in vitro but not in vivo yet. Therefore, a γ-secretase inhibitor was used to determine the dynamic range of our FRET biosensor in vivo. FRET efficiency depends on the proximity and orientation of donor and acceptor fluorescent proteins. In our initial study, we engineered the original C99 EGFP-RFP biosensor (C99 R-G), and the replacement of EGFP and RFP with mTurquoise-GL and YPet, respectively, expanded the dynamic range of the sensor approximately 2 times. Moreover, extending the linker length from 20 a.a. to 80 a.a. increased the dynamic range 2.2 times (Maesako et al., 2020 iScience). Of note, the C99 720-670 NIR analog, which has the same 80 a.a. linker but miRFP670 and miRFP720 as the donor and acceptor, exhibited a slightly better dynamic range than the C99 Y-T sensor (Houser et al., 2020 Sensor). Our interpretation, at that time, was that the cleavage of the C99 720-670 biosensor by γ-secretase results in a longer distance between the donor and acceptor, and thus, the FRET ratio always increases by γ-secretase inhibition (i.e., proximity plays a more significant role than orientation in our biosensors). As expected, a significantly increased FRET ratio was detected in various cell lines by γ-secretase inhibitors, including CHO, MEF, BV2 cells, and mouse cortical primary neurons. Moreover, to further ensure the C99 720-670 biosensor records changes in γ-secretase activity, the multiplexing capability of the biosensor was utilized. In other words, we co-expressed the C99 720-670 biosensor and visible range fluorescence reporters to record other biological events, such as changes in ion concentration, etc., in cortex primary neurons. Strikingly, several biological events uniquely detected in the neurons with diminished endogenous γ-secretase activity, i.e., neurons with higher FRET ratios, are recapitulated by pharmacological inhibition of γ-secretase (unpublished observation). This approach has allowed us to ensure that increased FRET ratios are indicative of decreased endogenous γ-secretase activity in mouse cortical primary neurons. 

      However, as recommended by the reviewer, we have performed a new experiment to compare the FRET ratio before and after DAPT, a potent γ-secretase inhibitor, administration in the same mouse and cell populations. Surprisingly, we found that of DAPT significantly decreases 720/670 ratios, which is included in our revised manuscript (Figure 2—figure supplement 2C). This unexpected FRET reduction by γ-secretase inhibition was also found in mouse primary neurons derived from the cerebellum (unpublished observation). These findings suggest that orientation plays a significant role in our γ-secretase FRET biosensor and whether the FRET ratio is increased or decreased by the γ-secretase-mediated cleavage depends on cell types. Of note, the difference in FRET ratios with and without DAPT was comparable between primary cortex neurons (24.3%) and the somatosensory cortex neurons in vivo (22.1%). Our new findings suggest that how our biosensors report γ-secretase activity (i.e., increased vs. decreased FRET ratio) must be examined on a model-by-model basis, which is clearly noted in the revised manuscript: 

      Reference

      - Houser MC, Hou SS, Perrin F, Turchyna Y, Bacskai BJ, Berezovska O, Maesako M. A Novel NIRFRET Biosensor for Reporting PS/γ-Secretase Activity in Live Cells. Sensors (Basel). 2020 Oct 22;20(21):5980. 

      - Maesako M, Sekula NM, Aristarkhova A, Feschenko P, Anderson LC, Berezovska O. Visualization of PS/γ-Secretase Activity in Living Cells. iScience. 2020 Jun 26;23(6):101139. 

      (3) For further validation, I would suggest including in vivo measurements with a sensor version with no biological activity as a negative control, for example, a mutation that prevents enzymatic cleavage and FRET changes. This should be used to showcase instrumental variability and would help to validate the variability of FR is indeed biological in origin. This would significantly strengthen the claims regarding spatial correlation within population of cells.

      We fully agree with the reviewer that having a sensor version containing a mutation, which prevents enzymatic cleavage and thus FRET changes, as a negative control is preferable. In our previous study, we developed and validated the APP-based C99 Y-T and Notch1-based N100 Y-T biosensors (Maesako et al., 2020 iScience). It is well established that Notch1 cleavage is entirely blocked by Notch1 V1744G mutation (Schroeter et al., 1998 Nature; Huppert et al., 2000 Nature), and therefore, we introduced the mutation into N100 Y-T biosensor and used it as a negative control. On the other hand, such a striking mutation has never been identified in APP processing. To successfully monitor γ-secretase activity in deep tissue in vivo, we replaced Turquoise-GL and YPet in the C99 Y-T and N100 Y-T biosensors with miRFP670 and miRFP720, respectively. While the APP-based C99 720-670 biosensor allows recording γ-secretase activity (Houser et al., 2020 Sensors), we found the N100 720-670 sensor exhibits a very small dynamic range, not enabling to reliably measure γ-secretase activity. Taken together, there is not currently available NIR γ-secretase biosensor with no biological activity.

      Reference

      - Houser MC, Hou SS, Perrin F, Turchyna Y, Bacskai BJ, Berezovska O, Maesako M. A Novel NIRFRET Biosensor for Reporting PS/γ-Secretase Activity in Live Cells. Sensors (Basel). 2020 Oct 22;20(21):5980. 

      - Huppert SS, Le A, Schroeter EH, Mumm JS, Saxena MT, Milner LA, Kopan R. Embryonic lethality in mice homozygous for a processing-deficient allele of Notch1. Nature. 2000 Jun 22;405(6789):966-70. 

      - Maesako M, Sekula NM, Aristarkhova A, Feschenko P, Anderson LC, Berezovska O. Visualization of PS/γ-Secretase Activity in Living Cells. iScience. 2020 Jun 26;23(6):101139. 

      - Schroeter EH, Kisslinger JA, Kopan R. Notch-1 signalling requires ligand-induced proteolytic release of intracellular domain. Nature. 1998 May 28;393(6683):382-6. 

      (4) In general, confocal microcopy is not ideal for in vivo imaging. Although the authors demonstrate data collected using IR imaging increases penetration depth, out of focus fluorescence is still evident (Figure 4). Many previous papers have primarily used FLIM based analysis in combination with 2p microscopy for in vivo FRET imaging (Some examples: Ma et al, Neuron, 2018; Massengil et al, Nature methods, 2022; DIaz-Garcia et al, Cell Metabolism, 2017; Laviv et al, Neuron, 2020). This technique does not rely on absolute photon number and therefore has several advantage sin terms of quantification of FRET signals in vivo.

      It is therefore likely that use of previously developed sensors of gamma-secretase with conventional FRET pairs, might be better suited for in vivo imaging. This point should be at least discussed as an alternative.

      The reviewer notes that 2p-FLIM may provide certain advantages over our confocal spectral imaging approach for detecting in vivo FRET. In our response below, we will address both the FRET detection method (FLIM vs. spectral) and microscope modality (2p vs. confocal). 

      As noted by the reviewer, we do acknowledge that 2p-FLIM has been utilized to detect FRET in vivo. On the other hand, the ratiometric spectral FRET approach has also been utilized in many in vivo FRET studies (Kuchibhotla et al., 2008 Neuron; Kuchibhotla et al., 2014 PNAS; Hiratsuka et al., 2015 eLife; Maesako et al., 2017 eLife; Konagaya et al., 2017 Cell Rep; Calvo-Rodriguez et al., 2020 Nat Communi; Hino et al., 2022 Dev Cell). We think both approaches have advantages and disadvantages, as discussed in a previous review (Bajar et al., 2016 Sensors), but they complement each other. Indeed, we regularly employ FLIM in cell culture studies (Maesako et al., 2017 eLife; McKendell et al., 2022 Biosensors; Devkota 2024 Cell Rep), and our recent study also utilized 2p-FLIM for in vivo NIR imaging (although not for detecting FRET) (Hou et al., 2023, Nat Biomed Eng); therefore, we are confident that 2p-FLIM can be adapted in our follow-up studies for γ-secretase recording.

      Regarding microscope modality, we agree with the reviewer’s point that generally two-photon microscopy can achieve larger penetration depths than confocal microscopy and is therefore more ideal for in vivo FRET imaging. However, in this study, since our aim was to quantify γ-secretase activity in the superficial layers of the cortex (<200 microns in depth), both NIR confocal and multiphoton microscopies could be used to achieve this imaging objective. Additionally, we chose to use confocal microscopy with our NIR C99 720-670 probe due to the probe’s slightly but higher sensitivity compared to our C99 Y-T probe (Houser et al., 2020 Sensors). Imaging γ-secretase activity with our NIR C99-720-670 probe has the additional advantage that it will allow us in future studies to multiplex with visible FRET pairs using multiphoton microscopy in the same brain region. Furthermore, our demonstration of in vivo FRET imaging using NIR confocal microscopy avoids some of the issues associated with multiphoton microscopy, including potential phototoxicity due to high average and peak laser powers and the high complexity and costs of the instrumentation. For future studies aimed at interrogating γ-secretase activity in deeper cortical regions, multiphoton microscopy could be applied for FLIM or ratiometric spectral imaging of either our NIR or visible FRET probes. Per the reviewer’s request, we have added multiphoton FRET imaging as an alternative in the discussion section. 

      Reference

      - Bajar BT, Wang ES, Zhang S, Lin MZ, Chu J. A Guide to Fluorescent Protein FRET Pairs. Sensors (Basel). 2016 Sep 14;16(9):1488.  

      - Calvo-Rodriguez M, Hou SS, Snyder AC, Kharitonova EK, Russ AN, Das S, Fan Z, Muzikansky A,

      Garcia-Alloza M, Serrano-Pozo A, Hudry E, Bacskai BJ. Increased mitochondrial calcium levels

      associated with neuronal death in a mouse model of Alzheimer's disease. Nat Commun. 2020 May

      1;11(1):2146  

      - Devkota S, Zhou R, Nagarajan V, Maesako M, Do H, Noorani A, Overmeyer C, Bhattarai S, Douglas JT, Saraf A, Miao Y, Ackley BD, Shi Y, Wolfe MS. Familial Alzheimer mutations stabilize synaptotoxic γ-secretase-substrate complexes. Cell Rep. 2024 Feb 27;43(2):113761. 

      - Hino N, Matsuda K, Jikko Y, Maryu G, Sakai K, Imamura R, Tsukiji S, Aoki K, Terai K, Hirashima T, Trepat X, Matsuda M. A feedback loop between lamellipodial extension and HGF-ERK signaling specifies leader cells during collective cell migration. Dev Cell. 2022 Oct 10;57(19):2290-2304.e7.

      - Hiratsuka T, Fujita Y, Naoki H, Aoki K, Kamioka Y, Matsuda M. Intercellular propagation of extracellular signal-regulated kinase activation revealed by in vivo imaging of mouse skin. eLife. 2015 Feb 10;4:e05178.  

      - Hou SS, Yang J, Lee JH, Kwon Y, Calvo-Rodriguez M, Bao K, Ahn S, Kashiwagi S, Kumar ATN, Bacskai BJ, Choi HS. Near-infrared fluorescence lifetime imaging of amyloid-β aggregates and tau fibrils through the intact skull of mice. Nat Biomed Eng. 2023 Mar;7(3):270-280.  

      - Houser MC, Hou SS, Perrin F, Turchyna Y, Bacskai BJ, Berezovska O, Maesako M. A Novel NIRFRET Biosensor for Reporting PS/γ-Secretase Activity in Live Cells. Sensors (Basel). 2020 Oct 22;20(21):5980. 

      - Konagaya Y, Terai K, Hirao Y, Takakura K, Imajo M, Kamioka Y, Sasaoka N, Kakizuka A, Sumiyama K, Asano T, Matsuda M. A Highly Sensitive FRET Biosensor for AMPK Exhibits Heterogeneous AMPK Responses among Cells and Organs. Cell Rep. 2017 Nov 28;21(9):2628-2638.  

      - Kuchibhotla KV, Goldman ST, Lattarulo CR, Wu HY, Hyman BT, Bacskai BJ. Abeta plaques lead to aberrant regulation of calcium homeostasis in vivo resulting in structural and functional disruption of neuronal networks. Neuron. 2008 Jul 31;59(2):214-25  

      - Kuchibhotla KV, Wegmann S, Kopeikina KJ, Hawkes J, Rudinskiy N, Andermann ML, Spires-Jones TL, Bacskai BJ, Hyman BT. Neurofibrillary tangle-bearing neurons are functionally integrated in cortical circuits in vivo. Proc Natl Acad Sci U S A. 2014 Jan 7;111(1):510-4  

      - Maesako M, Horlacher J, Zoltowska KM, Kastanenka KV, Kara E, Svirsky S, Keller LJ, Li X, Hyman BT, Bacskai BJ, Berezovska O. Pathogenic PS1 phosphorylation at Ser367. Elife. 2017 Jan 30;6:e19720.  

      - McKendell AK, Houser MCQ, Mitchell SPC, Wolfe MS, Berezovska O, Maesako M. In-Depth

      Characterization of Endo-Lysosomal Aβ in Intact Neurons. Biosensors (Basel). 2022 Aug 20;12(8):663. 

      (Recommendations For The Authors):

      (5) Minor issues- Figure 4 describes the analysis procedure, which seems to be standard practice in the field. This can be described in the methods section rather than in the main figure.

      Per the reviewer’s suggestion, this figure has been moved to Figure 2—figure supplement 1. 

      Reviewer #3 (Public Review):

      (1) This paper builds on the authors' original development of a near infrared (NIR) FRET sensor by reporting in vivo real-time measurements for gamma-secretase activity in the mouse cortex. The in vivo application of the sensor using state of the art techniques is supported by a clear description and straightforward data, and the project represents significant progress because so few biosensors work in vivo. Notably, the NIR biosensor is detectable to ~ 100 µm depth in the cortex. A minor limitation is that this sensor has a relatively modest ΔF as reported in Houser et al, which is an additional challenge for its use in vivo. Thus, the data is fully dependent on post-capture processing and computational analyses. This can unintentionally introduce biases but is not an insurmountable issue with the proper controls that the authors have performed here.

      We appreciate the reviewer’s overall positive evaluation. As described in our response to the Reviewer 2’s critique (2), ΔF in vivo has been characterized (Figure 2—figure supplement 2C).

      (2) The observation of gamma-secretase signaling that spreads across cells is potentially quite interesting, but it can be better supported. An alternative interpretation is that there exist pre-formed and clustered hubs of high gamma-secretase activity, and that DAPT has stochastic or differential accessibility to cells within the cluster. This could be resolved by an experiment of induction, for example, if gamma-secretase activity is induced or activated at a specific locale and there was observed coordinated spreading to neighboring neurons with their sensor.

      We agree with the reviewer that the stochastic or differential accessibility of DAPT to cell clusters with different γ-secretase can be an alternative interpretation of our data, which is now included in the Discussion of the revised manuscript. Undoubtedly, the activation of γ-secretase would provide valuable information. However, as described in the response above to Reviewer 2’s critique #2, overexpressing the four components of γ-secretase (PSEN, Nct, Aph1, and Pen2) is the only reliable and reproducible approach to increasing the cellular activity of γ-secretase, which was achieved in our in vitro study but not yet in vivo. Our future study will develop and characterize the approach to induce γ-secretase activity to further perform detailed mechanistic studies.

      (3) Furthermore, to rule out the possibility that uneven viral transduction was not simply responsible for the observed clustering, it would be helpful to see an analysis of 670nm fluorescence alone.

      Our new analysis comparing 670 nm fluorescence intensity and that in five neighbor neurons shows a positive correlation (Figure 3—figure supplement 1A), suggesting that AAV was unevenly transduced. On the other hand, the 720/670 ratio (i.e., γ-secretase activity) is not correlated with 670 nm fluorescence intensity (i.e., C99 720-670 biosensor expression) (Figure 3—figure supplement 1B). This strongly suggests that, while C99 720-670 biosensor expression was not evenly distributed in the brain, the uneven probe expression did not impact the capability of γ-secretase recording.  

      Reviewer #3 (Recommendations For The Authors):

      (4) One minor suggestion might be to consider Figures 6-7 as orthogonal supporting analyses rather than "validation". It might then be helpful to present them together with Figure 5.

      We have moved the initial Figure 6 and 7 to Figure 3—figure supplement 2 and Figure 4, respectively.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This study presents a valuable conceptual advance of how Vitamin A and its derivatives contribute to atherosclerosis. There is solid evidence invoking the contributions of specialized populations of T cells in atherosclerosis resolution, including use of multiple in vivo models to validate the functional effect. The significance of the study would be strengthened with more detailed interrogation of lesions composition and consolidation with previous work on the topic from human studies.

      Answer: We thank the reviewers and editorial office for their comments and constructive criticism. Below we provide point by point responses to the comments and concerns, which include the issues of lesion composition and consolidation with human studies. We also proofread the manuscript and included information about the immunostaining procedures that were previously missing (Lines 199 – 206).

      Public Reviews

      REVIEWER #1:

      This is an interesting study by Pinos and colleagues that examines the effect of beta carotene on atherosclerosis regression. The authors have previously shown that beta carotene reduces atherosclerosis progress and hepatic lipid metabolism, and now they seek to extend these findings by feeding mice a diet with excess beta carotene in a model of atherosclerosis regression (LDLR antisense oligo plus Western diet followed by LDLR sense oligo and chow diet). They show some metrics of lesion regression are increased upon beta carotene feeding (collagen content) while others remain equal to normal chow diet (macrophage content and lesion size). These effects are lost when beta carotene oxidase (BCO) is deleted. The study adds to the existing literature that beta carotene protects from atherosclerosis in general, and adds new information regarding regulatory T-cells. However, the study does not present significant evidence about how beta-carotene is affecting T-cells in atherosclerosis. For the most part, the conclusions are supported by the data presented, and the work is completed in multiple models, supporting its robustness. However there are a few areas that require additional information or evidence to support their conclusions and/or to align with the previously published work.

      Specific additional areas of focus for the authors:

      1. The premise of the story is that b-carotene is converted into retinoic acid, which acts as a ligand of the RAR transcription factor in T-regs. The authors measure hepatic markers of retinoic acid signaling (retinyl esters, Cyp26a1 expression) but none of these are measured in the lesion, which calls into question the conclusion that Tregs in the lesion are responsible for the regression observed with b-carotene supplementation.

      Answer: We agree with the Reviewer’s comment, which prompted us to quantify the expression of the retinoic acid-sensitive maker Cyp26b1 in the atherosclerotic lesions. Cyp26b1, together with Cyp26a1 and c1, contain retinoic acid response elements (RAREs) in their promoter, and therefore, are highly sensitive to retinoic acid. Indeed, the mRNA/protein expression of Cyp26s are widely considered surrogate markers for retinoic acid levels in cells or tissues.

      We typically use Cyp26a1 as a surrogate marker for retinoic acid signaling in the adipose tissue and the liver, as we did in this study. However, our RNA seq data in murine bone-marrow derived macrophages (mBMDMs) exposed to retinoic acid revealed that Cyp26b1 is the only Cyp26 family member responsive to retinoic acid (PMID: 36754230). Actually, Cyp26a1 or c1 were not expressed in our mBMDMs (data not shown). Unlike the M2 marker arginase 1, Cyp26b1 did not respond to IL-4 (Figure iA). Hence, Cyp26b1 is an adequate marker to evaluate retinoic acid signaling in the lesion of mice, rich in macrophages.

      Before staining the lesions, we validated the Cyp26b1 antibody by staining mBMDMs exposed to retinoic acid (Figure iB).

      Author response image 1.

      (A) mBMDMs were divided in M0 or M2 (exposed to IL-4 for 24 h), and then treated with either DMSO or retinoic acid for 6 h before harvesting for RNA seq analysis. Exploring the RNA seq dataset, we identified Cyp26b1 as a RA-sensitive gene in mBMDMs (PMID: 36754230). (B) Validation of Cyp26b1 antibody in mBMDMs exposed to retinoic acid confirms the suitability of this antibody for measuring retinoic acid signaling in our experimental settings.

      In the current version of the manuscript, we include the results of Cyp26b1 quantifications (Figure 5H, I), (Lines: 362 - 366). To put these findings in perspective to human studies, we discuss these results with the role human CYP26B1 plays in the atherosclerotic lesion (Lines: 450 - 464).

      1. There does not appear to be a strong effect of Tregs on the b-carotene induced pro-regression phenotype presented in Figure 5. The only major CD25+ cell dependent b-carotene effect is on collagen content, which matches with the findings in Figure 1 +2. This mechanistically might be very interesting and novel, yet the authors do not investigate this further or add any additional detail regarding this observation. This would greatly strengthen the study and the novelty of the findings overall as it relates to b-carotene and atherosclerosis.

      Answer: As the Reviewer points out, the effects of β-carotene on collagen content are more pronounced than those on CD68 content in the lesion. Indeed, we have observed the majority of the experiments in this manuscript.

      Collagen accumulation in the lesion is a complex process, where smooth muscle cells secrete collagen and plaque macrophages (typically) degrade it. Matrix metalloproteases produced by macrophages contribute to the degradation of collagen, and studies show that retinoic acid regulates the expression of metalloproteinases in various cell types (PMID: 2324527, 24008270). We explored the expression of metalloproteases in macrophages exposed to retinoic acid in our mBMDM RNA seq, but we did not observe any significant result (data not shown).

      Interestingly, M2 macrophages can secrete collagen by upregulating arginase 1 expression. In the current version of the manuscript, we acknowledge this in the results (Lines: 358-359) and in the discussion section (Lines: 443-449).

      1. The title indicates that beta-carotene induces Treg 'expansion' in the lesion, but this is not measured in the study.

      Answer: Following the suggestion by the Reviewer, we have re-worded the title to “β-carotene accelerates the resolution of atherosclerosis in mice”

      REVIEWER #2:

      Pinos et al present five atherosclerosis studies in mice to investigate the impact of dietary supplementation with b-carotene on plaque remodeling during resolution. The authors use either LDLR-ko mice or WT mice injected with ASO-LDLR to establish diet-induced hyperlipidemia and promote atherogenesis during 16 weeks, and then they promote resolution by switching the mice for 3 weeks to a regular chow, either deficient or supplemented with b-carotene. Supplementation was successful, as measured by hepatic accumulation of retinyl esters. As expected, chow diet led to reduced hyperlipidemia, and plaque remodeling (both reduced CD68+ macs and increased collagen contents) without actual changes in plaque size. But, b-carotene supplementation resulted in further increased collagen contents and, importantly, a large increase in plaque regulatory T-cells (TREG). This accumulation of TREG is specific to the plaque, as it was not observed in blood or spleen. The authors propose that the anti-inflammatory properties of these TREG explain the atheroprotective effect of b-carotene, and found that treatment with anti-CD25 antibodies (to induce systemic depletion of TREG) prevents b-carotene-stimulated increase in plaque collagen and TREG.

      1. An obvious strength is the use of two different mouse models of atherogenesis, as well as genetic and interventional approaches. The analyses of aortic root plaque size and contents are rigorous and included both male and female mice (although the data was not segregated by sex). Unfortunately, the authors did not provide data on lesions in en face preparations of the whole aorta.

      Answer: We appreciate the positive comments on rigor. We considered displaying our data segregated by sex, although for some experiments, we did not have matching numbers of male and female mice, which could be distracting for the reader. The goal of our study was to analyze changes in plaque composition. Therefore, our experimental approach was designed to study atherosclerosis resolution (plaque composition changes, but not plaque size) instead of atherosclerosis regression (both plaque composition and size change). As expected, we did not observe differences in plaque size at the level of the atherosclerotic root for any of our experiments, which deterred us from quantifying plaque content by en-face in the aorta.

      2.Overall, the conclusion that dietary supplementation with b-carotene may be atheroprotective via induction of TREG is reasonably supported by the evidence presented. Other conclusions put forth by the authors (e.g., that vitamin A production favors TREG production or that BCO1 deficiency reduces plasma cholesterol), however, will need further experimental evidence to be substantiated.

      Answer: We apologize for the lack of clarity in the presentation of our results and overstating our conclusions. We have rephrased some of these conclusions in the results and discussion sections.

      3.The authors claim that b-carotene reduces blood cholesterol, but data shown herein show no differences in plasma lipids between mice fed b-carotene-deficient and -supplemented diets (Figs. 1B, 2A, and S3A).

      Answer: As Reviewer 2 points out, we did not observe changes in plasma cholesterol between mice undergoing Resolution in response to β-carotene. For clarity, we rephrased our plasma lipids results for each of our experimental designs (Lines: 230 – 236, 270 – 272, and 288-290). We also include a clarification in the discussion section about the differential effects of β-carotene on plasma lipids when mice undergo atherosclerosis progression and resolution. (Lines: 419 - 430).

      1. Also, the authors present no experimental data to support the idea that BCO1 activity favors plaque TREG expansion (e.g., no TREG data in Fig 3 using Bco1-ko mice).

      Answer: We appreciate the suggestion by the Reviewer 2. In the current version of the manuscript, we stained the aortic roots from Bco1-/- mice for FoxP3. We did not observe differences between Control and β-carotene resolution groups, in agreement with the results in plaque composition (CD68 and collagen contents). These new data strengthen our manuscript and now we included these results as a Supplementary Figure 3D, E. (Lines: 465 - 471).

      5.As the authors show, the treatment with anti-CD25 resulted in only partial suppression of TREG levels. Because CD25 is also expressed in some subpopulation of effector T-cells, this could potentially cloud the interpretation of the results. Data in Fig 4H showing loss of b-carotene-stimulated increase in numbers of FoxP3+GFP+ cells in the plaque should be taken cautiously, as they come from a small number of mice. Perhaps an orthogonal approach using FoxP3-DTR mice could have produced a more robust loss of TREG and further confirmation that the loss of plaque remodeling is indeed due to loss of TREG.

      Answer: We agree with the reviewer, and we rephrased the results and discussion to avoid overstating our findings. We now acknowledge a second experimental approach would help us confirm our findings employing a blocking antibody targeting CD25. We favored the use of anti-CD25 infusions over other depletion methods based on the experimental protocol carried out by our collaborators in which the examined the effect of Tregs on atherosclerosis regression (PMID: 32336197). The utilization of FoxP3-DTR mice would nicely complement our findings. In the current version of the manuscript, we discuss this alternative approach (Line : 491 - 501).

      Recommendations for the Authors

      All reviewers agreed that despite the claims of the title, there is no direct interrogation of Tregs or vitamin A signaling in lesions.

      The work does not consolidate well with the role of B-carotene in human heart disease. Additional discussion and synthesis are required to elaborate on the significance of the findings. For example, the idea of beta carotene supplementation for cardiovascular prevention has attracted attention for years but recent meta-analysis showed no benefit, and, if anything, an increase in cardiovascular events. The U.S. Preventive Services Task Force (USPSTF) went as far to recommend AGAINST the use of beta-carotene for the prevention of cardiovascular disease.

      In light of the above point and elife editorial policies, please revise the title to include species.

      Answer: Thanks for your feedback. Carotenoid metabolism in mammals is complex, and establishing direct parallelisms between humans and rodents must be done with caution. For example, β-carotene supplementation in humans inevitably results in the accumulation of this compound in plasma, while in rodents, β-carotene is quickly metabolized to vitamin A. Our findings over the years reveal that the effects of β-carotene in mice derive exclusively from its role as vitamin A precursor.

      In the current study, we confirm our previous work utilizing Bco1-/- mice, which are unable to produce vitamin A when fed β-carotene. Then, we observe that vitamin A promotes atherosclerosis resolution in mice independently of alterations in plasma cholesterol in two independent mouse models. Lastly, we utilized anti-CD25 blocking antibodies to deplete Tregs to establish a direct connection between dietary β-carotene/vitamin A and Tregs in the lesion. While this experimental approach failed to completely deplete Tregs, our morphometric assays indicates that these infusions were sufficient to partially mitigate the effect of β-carotene on atherosclerosis resolution.

      Regardless, in the discussion section of our manuscript, we attempt to consolidate our preclinical studies with clinical data (Lines: 374 – 376, and 461 – 464).

      We have also revised the title, as suggested by Reviewer 1. We also included “mice” in the title to align with the editorial policies of eLife.

      Reviewer #1:

      1.1. The authors need to measure retinoic acid signaling directly in the lesion and in Tregs to be able to draw the conclusion that b-carotene is directly activating Tregs to promote regression.

      Answer: Please see comments above.

      1.2. The authors to investigate the role of beta carotene on collagen production by T-regs.

      Answer: Please see comments above.

      Reviewer #2 (Recommendations For The Authors):

      Major:

      2.1. If the authors still have frozen sections of the aortas from their Bco1-ko experiment, it should be trivial to look at plaque TREG contents to confirm that vitamin A production is indeed needed for the effect of b-carotene on plaque remodeling.

      Answer: Please see comments above.

      Minor:

      2.2. This reviewer wonders if the axis for lesion size in all figures is off by an order of magnitude. Most studies show aortic root lesions in the 10^5 um2 range, not in the 10^6 um2.

      Answer: We apologize for this error. We have corrected the units in all our quantifications.

      2.3. FPLC lipoprotein profiles would enhance the manuscript.

      Answer: We have run FPLCs for the plasmas and included them in the results (Lines: 233 – 236). Data are presented in Figure 1C, D.

      2.4.This reviewer could not cope with the thought that mice that are fed 16+ weeks a diet that is vitamin A-deficient did not become vit A-deficient (e.g., Fig. 1E). Perhaps the authors could elaborate a little on this in their discussion.

      Answer: Mice are extremely resistant to vitamin A deficiency. A common protocol to achieve deficiency in mice requires feeding a vitamin A deficient diet to dams during their pregnancy and lactation to deplete new-born pups of vitamin A stores. Even in that situation, pups display enough vitamin A stores to sustain circulating vitamin A levels to those observed in wild-type mice. In the current version of the manuscript, we have included a paragraph in the discussion to cover this “interesting” aspect. (Lines: 476 – 483).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews (consolidated):

      In the microglia research community, it is accepted that microglia change their shape both gradually and acutely along a continuum that is influenced by external factors both in their microenvironments and in circulation. Ideally, a given morphological state reflects a functional state that provides insight into a microglia's role in physiological and pathological conditions. The current manuscript introduces MorphoCellSorter, an open-source tool designed for automated morphometric analysis of microglia. This method adds to the many programs and platforms available to assess the characteristics of microglial morphology; however, MorphoCellSorter is unique in that it uses Andrew's plotting to rank populations of cells together (in control and experimental groups) and presents "big picture" views of how entire populations of microglia alter under different conditions. Notably, MorphoCellSorter is versatile, as it can be used across a wide array of imaging techniques and equipment. For example, the authors use MorphoCellSorter on images of fixed and live tissues representing different biological contexts such as embryonic stages, Alzheimer's disease models, stroke, and primary cell cultures.

      This manuscript outlines a strategy for efficiently ranking microglia beyond the classical homeostatic vs. active morphological states. The outcome offers only a minor improvement over the already available strategies that have the same challenge: how to interpret the ranking functionally.

      We would like to thank the reviewers for their careful reading and constructive comments and questions. While MorphoCellSorter currently does not rank cells functionally based on their morphology, its broad range of application, ease of use and capacity to handle large datasets provide a solid foundation. Combined with advances in single-cell transcriptomics, MorphoCellSorter could potentially enable the future prediction of cell functions based on morphology.

      Strengths and Weaknesses:

      (1) The authors offer an alternative perspective on microglia morphology, exploring the option to rank microglia instead of categorizing them with means of clusterings like k-means, which should better reflect the concept of a microglia morphology continuum. They demonstrate that these ranked representations of morphology can be illustrated using histograms across the entire population, allowing the identification of potential shifts between experimental groups. Although the idea of using Andrews curves is innovative, the distance between ranked morphologies is challenging to measure, raising the question of whether the authors oversimplify the problem.

      We have access to the distance between cells through the Andrew’s score of each cell. However, the challenge is that these distances are relative values and specific to each dataset. While we believe that these distances could provide valuable information, we have not yet determined the most effective way to represent and utilize this data in a meaningful manner.

      Also, the discussion about the pipeline's uniqueness does not go into the details of alternative models.The introduction remains weak in outlining the limitations of current methods (L90). Acknowledging this limitation will be necessary.

      Thank you for these insightful comments. The discussion about alternative methods was already present in the discussion L586-598 but to answer the request of the reviewers, we have revised the introduction and discussion sections to more clearly address the limitations of current methods, as well as discussed the uniqueness of the pipeline. Additionally, we have reorganized Figure 1 to more effectively highlight the main caveats associated with clustering, the primary method currently in use.

      (2) The manuscript suffers from several overstatements and simplifications, which need to be resolved. For example:

      a)  L40: The authors talk about "accurately ranked cells". Based on their results, the term "accuracy" is still unclear in this context.

      Thank you for this comment. Our use of the term "accurately" was intended to convey that the ranking was correct based on comparison with human experts, though we agree that it may have been overstated. We have removed "accurately" and propose to replace it with "properly" to better reflect the intended meaning.

      b) L50: Microglial processes are not necessarily evenly distributed in the healthy brain. Depending on their embedded environment, they can have longer process extensions (e.g., frontal cortex versus cerebellum).

      Thank you for raising this point to our attention. We removed evenly to be more inclusive on the various morphologies of microglia cells in this introductory sentence

      c)  L69: The term "metabolic challenge" is very broad, ranging from glycolysis/FAO switches to ATP-mediated morphological adaptations, and it needs further clarification about the author's intended meaning.

      Thank you for this comment, indeed we clarified to specify that we were talking about the metabolic challenge triggered by ischemia and added a reference as well.

      d) L75: Is morphology truly "easy" to obtain?

      Yes, it is in comparison to other parameters such as transcripts or metabolism, but we understand the point made by the reviewer and we found another way of writing it. As an alternative we propose: “morphology is an indicator accessible through…”

      e) L80: The sentence structure implies that clustering or artificial intelligence (AI) are parameters, which is incorrect. Furthermore, the authors should clarify the term "AI" in their intended context of morphological analysis.

      We apologize for this confusing writing, we reformulated the sentence as follows: “Artificial intelligence (AI) approaches such as machine learning have also been used to categorize morphologies (Leyh et al., 2021)”.

      f) L390f: An assumption is made that the contralateral hemisphere is a non-pathological condition. How confident are the authors about this statement? The brain is still exposed to a pathological condition, which does not stop at one brain hemisphere.

      We did not say that the contralateral is non-pathological but that the microglial cells have a non-pathological morphology which is slightly different. The contralateral side in ischemic experiments is classically used as a control (Rutkai et al 2022). Although It has been reported that differences in transcript levels can be found between sham operated animals and contralateral hemisphere in tMCAO mice (Filippenkov et al 2022) https://doi.org/10.3390/ijms23137308 showing that indeed the contralateral side is in a different state that sham controls, no report have been made on differences in term of morphology.

      We have removed “non-pathological” to avoid misinterpretations

      g)  Methodological questions:

      a) L299: An inversion operation was applied to specific parameters. The description needs to clarify the necessity of this since the PCA does not require it.

      Indeed, we are sorry for this lack of explanation. Some morphological indexes rank cells from the least to the most ramified, while others rank them in the opposite order. By inverting certain parameters, we can standardize the ranking direction across all parameters, simplifying data interpretation. This clarification has been added to the revised manuscript as follows:

      “Lacunarity, roundness factor, convex hull radii ratio, processes cell areas ratio and skeleton processes ratio were subjected to an inversion operation in order to homogenize the parameters before conducting the PCA: indeed, some parameters rank cells from the least to the most ramified, while others rank them in the opposite order. By inverting certain parameters, we can standardize the ranking direction across all parameters, thus simplifying data interpretation.”

      b) Different biological samples have been collected across different species (rat, mouse) and disease conditions (stroke, Alzheimer's disease). Sex is a relevant component in microglia morphology. At first glance, information on sex is missing for several of the samples. The authors should always refer to Table 1 in their manuscript to avoid this confusion. Furthermore, how many biological animals have been analyzed? It would be beneficial for the study to compare different sexes and see how accurate Andrew's ranking would be in ranking differences between males and females. If they have a rationale for choosing one sex, this should be explained.

      As reported in the literature, we acknowledge the presence of sex differences in microglial cell morphology. Due to ethical considerations and our commitment to reducing animal use, we did not conduct dedicated experiments specifically for developing MorphoCellSorter. Instead, we relied on existing brain sections provided by collaborators, which were already prepared and included tissue from only one sex—either female or male—except in the case of newborn pups, whose sex is not easily determined. Consequently, we were unable to evaluate whether MorphoCellSorter is sensitive enough to detect morphological differences in microglia attributable to sex. Although assessing this aspect is feasible, we are uncertain if it would yield additional insights relevant to MorphoCellSorter’s design and intended applications.

      To address this, we have included additional references in Table 1 of the revised manuscript and clearly indicated the sex of the animals from which each dataset was obtained.

      c) In the methodology, the slice thickness has been given in a range. Is there a particular reason for this variability?

      We could not spot any range in the text, we usually used 30µm thick sections in order to have entire or close to entire microglia cells.

      Although the thickness of the sections was identical for all the sections of a given dataset, only the plans containing the cells of interest were selected during the imaging for both of the ischemic stroke model. This explains why depending on how the cell is distributed in Z the range of the plans acquired vary.

      Also, the slice thickness is inadequate to cover the entire microglia morphology. How do the authors include this limitation of their strategy? Did the authors define a cut-off for incomplete microglia?

      We found that 30 µm sections provide an effective balance, capturing entire or nearly entire microglial cells (consistent with what we observe in vivo) while allowing sufficient antibody penetration to ensure strong signal quality, even at the section's center. In our segmentation process, we excluded microglia located near the section edges (i.e., cells with processes visible on the first or last plane of image acquisition, as well as those close to the field of view’s boundary). Although our analysis pipeline should also function with thicker sections (>30 µm), we confirmed that thinner sections (15 µm or less) are inadequate for detecting morphological differences, as tested initially on the AD model. Segmented, incomplete microglia lack the necessary structural information to accurately reflect morphological differences thus impairing the detection of existing morphological differences.

      c) The manuscript outlines that the authors have used different preprocessing pipelines, which is great for being transparent about this process. Yet, it would be relevant to provide a rationale for the different imaging processing and segmentation pipelines and platform usages (Supplementary Figure 7). For example, it is not clear why the Z maximum projection is performed at the end for the Alzheimer's Disease model, while it's done at the beginning of the others.

      The same holds through for cropping, filter values, etc. Would it be possible to analyze the images with the same pipelines and compare whether a specific pipeline should be preferable to others?

      The pre-processing steps depend on the quality of the images in each dataset. For example, in the AD dataset, images acquired with a wide-field microscope were considerably noisier compared to those obtained via confocal microscopy. In this case, reducing noise plane-by-plane was more effective than applying noise reduction on a Z-projection, as we would typically do for confocal images. Given that accurate segmentation is essential for reliable analysis in MorphoCellSorter, we chose to tailor the segmentation approach for each dataset individually. We recommend future users of MorphoCellSorter take a similar approach. This clarification has been added to the discussion.

      On a note, Matlab is not open-access,

      This is correct. We are currently translating this Matlab script in Python, this will be available soon on Github. https://github.com/Pascuallab/MorphCellSorter.

      This also includes combining the different animals to see which insights could be gained using the proposed pipelines.

      Because of what we have been explaining earlier, having a common segmentation process for very diverse types of acquisitions (magnification, resolution and type of images) is not optimal in terms of segmentation and accuracy in the analysis. Although we could feed MorphoCellSorter with all this data from a unique segmentation pipeline, the results might be very difficult to interprete.

      d) L227: Performing manual thresholding isn't ideal because it implies the preprocessing could be improved. Additionally, it is important to consider that morphology may vary depending on the thresholding parameters. Comparing different acquisitions that have been binarized using different criteria could introduce biases.

      As noted earlier, segmentation is not the main focus of this paper, and we leave it to users to select the segmentation method best suited to their datasets. Although we acknowledge that automated thresholding would be in theory ideal, we were confronted toimage acquisitions that were not uniform, even within the same sample. For instance, in ischemic brain samples, lipofuscin from cell death introduces background noise that can artificially impact threshold levels. We tested global and local algorithms to automatically binarize the cells but these approaches resulted often on imperfect and not optimized segmentation for every cell. In our experience, manually adjusting the threshold provides a more accurate, reliable, and comparable selection of cellular elements, even though it introduces some subjectivity. To ensure consistency in segmentation, we recommend that the same person performs the analysis across all conditions. This clarification has been added to the discussion.

      e) Parameter choices: L375: When using k-means clustering, it is good practice to determine the number of clusters (k) using silhouette or elbow scores. Simply selecting a value of k based on its previous usage in the literature is not rigorous, as the optimal number of clusters depends on the specific data structure. If they are seeking a more objective clustering approach, they could also consider employing other unsupervised techniques, (e.g. HDBSCAN) (L403f).

      We do agree with the referee’s comment but, the purpose of the k-mean we used was just to illustrate the fact that the clusters generated are artificial and do not correspond to the reality of the continuum of microglia morphology. In the course of the study we used the elbow score to determine the k means but this did not work well because no clear elbow was visible in some datasets (probably because of the continuum of microglia morphologies). Anyway, using whatever k value will not change the problem that those clusters are quite artificial and that the boundaries of those clusters are quite arbitrary whatever the way k is determined manually or mathematically.

      L373: A rationale for the choice of the 20 non-dimensional parameters as well as a detailed explanation of their computation such as the skeleton process ratio is missing. Also, how strongly correlated are those parameters, and how might this correlation bias the data outcomes?

      Thank you for raising this point. There is no specific rationale beyond our goal of being as exhaustive as possible, incorporating most of the parameters found in the literature, as well as some additional ones that we believed could provide a more thorough description of microglial morphology.

      Indeed, some of these parameters are correlated. Initially, we considered this might be problematic, but we quickly found that these correlations essentially act as factors that help assign more weight to certain parameters, reflecting their likely greater importance in a given dataset. Rather than being a limitation, the correlated parameters actually enhance the ranking. We tested removing some of these parameters in earlier versions of MorphoCellSorter, and found that doing so reduced the accuracy of the tool.

      Differences between circularity and roundness factors are not coming across and require further clarification.

      These are two distinct ways of characterizing morphological complexity, and we borrowed these parameters and kept the name from the existing literature, not necessarily in the context of microglia. In our case, these parameters are used to describe the overall shape of the cell. The advantage of using different metrics to calculate similar parameters is that, depending on the dataset, one method may be better suited to capture specific morphological features of a given dataset. MorphoCellSorter selects the parameter that best explains the greatest dispersion in the data, allowing for a more accurate characterization of the morphology. In Author response image 1 you will see how circularity and roundness describe differently cells

      Author response image 1.

      Correlation between Circularity and Roundness Factor in the Alzheimer disease dataset. A second order polynomial correlation exists between the two parameters in our dataset. Indeed (1) a single maximum is shared between both parameters. However, Circularity and Roundness Factor are not entirely redundant, as examplified by (2) the possible variety of Roundness Factors for a given Circularity as well as (3) the very different morphology minima of these two parameters.

      One is applied to the soma and the other to the cell, but why is neither circularity nor loudness factor applied to both?

      None of the parameters concern the cell body by itself. The cell body is always relative to another metric(s). Because these parameters and what they represent does not seem to be very clear we have added a graphic representation of the type of measurements and measure they provide in the revised version of the manuscript (Supplemental figure 8).

      f) PCA analysis:

      The authors spend a lot of text to describe the basic principles of PCA. PCA is mathematically well-described and does not require such depth in the description and would be sufficient with references.

      Thank you for this comment indeed the description of PCA may be too exhaustive, we will simplify the text.

      Furthermore, there are the following points that require attention:

      L321: PC1 is the most important part of the data could be an incorrect statement because the highest dispersion could be noise, which would not be the most relevant part of the data. Therefore, the term "important" has to be clarified.

      We are not sure in the case of segmented images the noise would represent most of the data, as by doing segmentation we also remove most of the noise, but maybe the reviewer is concerned about another type of noise? Nonetheless, we thank the reviewer for his comment and we propose the following change, that should solve this potential issue.

      PC<sub>1<.sub> is the direction in which data is most dispersed.”

      L323: As before, it's not given that the first two components hold all the information.

      Thank you for this comment we modified this statement as follows: “The two first components represent most of the information (about 70%), hence we can consider the plan PC<sub>1</sub>, PC<sub>2</sub> as the principal plan reducing the dataset to a two dimensional space”

      L327 and L331 contain mistakes in the nomenclature: Mix up of "wi" should be "wn" because "i" does not refer to anything. The same for "phi i = arctan(yn/wn)" should be "phi n".

      Thanks a lot for these comments. We have made the changes in the text as proposed by the reviewer.

      L348: Spearman's correlation measures monotonic correlation, not linear correlation. Either the authors used Pearson Correlation for linearity or Spearman correlation for monotonic. This needs to be clarified to avoid misunderstandings.

      Sorry for the misunderstanding, we did use Spearman correlation which is monotonic, we thus changed linear by monotonic in the text. Thanks a lot for the careful reading.

      g) If the authors find no morphological alteration, how can they ensure that the algorithm is sensitive enough to detect them? When morphologies are similar, it's harder to spot differences. In cases where morphological differences are more apparent, like stroke, classification is more straightforward.

      We are not entirely sure we fully understand the reviewer's comment. When data are similar or nearly identical, MorphoCellSorter performs comparably to human experts (see Table 1). However, the advantage of using MorphoCellSorter is that it ranks cells do.much faster while achieving accuracy similar to that of human experts AND gives them a value on an axis (andrews score), which a human expert certainly can't. For example, in the case of mouse embryos, MorphoCellSorter’s ranking was as accurate as that made by human experts. Based on this ranking, the distributions were similar, suggesting that the morphologies are generally consistent across samples.

      The algorithm itself does not detect anything—it simply ranks cells according to the provided parameters. Therefore, it is unlikely that sensitivity is an issue; the algorithm ranks the cells based on existing data. The most critical factor in the analysis is the segmentation step, which is not the focus of our paper. However, the more accurate the segmentation, the more distinct the parameters will be if actual differences exist. Thus, sensitivity concerns are more related to the quality of image acquisition or the segmentation process rather than the ranking itself. Once MorphoCellSorter receives the parameters, it ranks the cells accordingly. When cells are very similar, the ranking process becomes more complex, as reflected in the correlation values comparing expert rankings to those from MorphoCellSorter (Table 1).

      Moreover, MorphoCellSorter does not only provide a ranking: the morphological indexes automatically computed offer useful information to compare the cells’ morphology between groups.

      h) Minor aspects:

      % notation requires to include (weight/volume) annotation.

      This has been done in the revised version of the manuscript

      Citation/source of the different mouse lines should be included in the method sections (e.g. L117).

      The reference of the mouse line has been added (RRID:IMSR_JAX:005582) to the revised version of the manuscript.

      L125: The length of the single housing should be specified to ensure no variability in this context.

      The mice were kept 24h00 individually, this is now stated in the text

      L673: Typo to the reference to the figure.

      This has been corrected, thank you for your thoughtful reading.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Methods

      (1) Alzheimer's disease model: was a perfusion performed and then an hour later brains extracted? Please clarify.

      This is indeed what has been done.

      (2) For in vitro microglial studies: was a percoll gradient used for the separation of immune cells? What percentage percoll was used? Was there separation of myelin and associated debris with the percoll centrifugation? Please clarify the protocol as it is not completely clear how these cells were separated from the initial brain lysate suspension. What cell density was plated?

      The protocol has been completed, as followed: “Myelin and debris were then eliminated thanks to a Percoll® PLUS solution (E0414, Sigma-Aldrich) diluted with DPBS10X (14200075, Gibco) and enriched in MgCl<sub>2</sub> and CaCl<sub>2</sub> (for 50 mL of myelin separation buffer: 90 mL of Percoll PLUS, 10 mL of DPBS10X, 90 μL of 1 M CaCl<sub>2</sub> solution, and 50 μL of 1 M MgCl<sub>2</sub> solution).”. Thank you for your feedback.

      (3) How are the microglia "automatically cropped" in FIJI (for the Phox2b mutant)? Is there a function/macro in the program you used? This is very important for the workflow and needs to be clarified. The methods section of this manuscript is a guide for future users of this workflow and should be as descriptive as possible. It would be useful to give detailed information on the manual classification process, perhaps as a supplement. The authors do a nice job pointing out that these older methods are not effective in categorizing microglia that don't necessarily fit into a predefined phenotype.

      The protocol has been completed, as follows “. Briefly, the centroid of each detected object (i.e. microglia), except the ones on the borders, were detected, and a crop of 300x300 pixels around the objects were generated. Then, the pixels belonging to neighboring cells were manually removed on each generated crop.

      (4) Please address the concern that manual tuning and thresholding are required for this method's accuracy. Is this easily reproducible?

      Yes, it is easily reproducible for a given experimenter and is better suited than automatic thresholding. Although segmentation is not the primary focus of this paper, we leave it to users to choose the segmentation method that best fits their datasets.

      To address your question, we acknowledge that automated thresholding would theoretically be ideal. However, we encountered challenges due to non-uniform image acquisitions, even within the same sample. For instance, in ischemic brain samples, lipofuscin resulting from cell death introduced background noise that could artificially influence threshold levels. We tested both global and local algorithms for automatic binarization of cells, but these approaches often produced suboptimal segmentation results for individual cells.

      Based on our experience, manually adjusting the threshold provided more accurate, reliable, and consistent selection of cellular elements, even though it introduces a degree of subjectivity. To maintain consistency, we recommend that the same individual perform the analysis across all conditions.

      This clarification has been incorporated into the discussion as follows: “Although, automated thresholding would be ideal. In our case, image acquisitions were not entirely uniform, even within the same sample. For instance, in ischemic brain samples, lipofuscin from cell death introduces background noise that can artificially impact threshold levels. This effect is observed even when comparing contralateral and ipsilateral sides of the same brain. In our experience, manually adjusting the threshold provides a more accurate, reliable, and comparable selection of cellular elements, even though it introduces some subjectivity. To ensure consistency in segmentation, we recommend that the same person performs the analysis across all conditions. “

      (5) How are the authors performing the PCA---what program (e.g .R)? Again, please be explicit about how these mathematical operations were computed. (lines 302-345).

      The PCA was made in Matlab, the code can be found on Github (https://github.com/Pascuallab/MorphCellSorter), as stated in the discussion.

      Other:

      (1) Can the authors comment on the challenges of the in vitro microglial analyses? The correlation of the experts v. MorphoCellSorter is much less than the fixed tissue. This is not addressed in the manuscript.

      In vitro, microglial cells exhibit a narrower range of morphological diversity compared to ex vivo or in vivo conditions. A higher proportion of cells share similar morphologies or morphologies with comparable complexities, which makes establishing a precise ranking more challenging. Consequently, the rank of many cells could be adjusted without significantly affecting the overall quality of the ranking.

      This explains why the rankings tend to show slightly greater divergence between experts. Interestingly, the ranking generated by MorphoCellSorter, which is objective and not subject to human bias, lies roughly midway between the rankings of the two experts.

      (2) You point out that the MorphoCellSorter may not be suited for embryonic/prenatal microglial analysis.

      This must be a misunderstanding because it is not what we concluded; we found that the ranking was correct but that we could not spot any differences due to transgenic alteration.

      The lack of differences observed in the embryonic microglia (Figure 5) is not necessarily surprising, as embryonic microglia have diverse morphological characteristics--- immature microglia do not possess highly ramified processes until postnatal development [see Hirosawa et al. (2005) https://doi.org/10.1002/jnr.20480 -they use an Iba1-GFP transgenic mouse to visualize prenatal microglia]. Also, see Bennett et al. (2016) [https://doi.org/10.1073/pnas.1525528113] which shows mature microglia not appearing until 14 days postnatal.

      We agree with the reviewer on that point nonetheless MorphoCellSorter provides an information on the fact that the population is homogeneous and that the mutation has no effect on the morphology.

      (3) Although a semantic issue, Figure 1's categorization of microglia shows predefined groups of microglia do not necessarily usefully bin many cells. Is still possible to categorize the microglia without using hotly debated categorization methods? The literature review in the current manuscript correctly points out the spectrum phenomenon of microglial activation states, though some of the suggestions from Paolicelli et al. (2022) are not put into action. The use of "activated" only further perpetuates the oversimplified classification of microglia. Perhaps the authors could consider using the term "reactive", as it is recognized by the Microglial nomenclature paper cited above. Are "amoeboid microglia" not "activated microglia"? "Reactive" is a less loaded term and is a recommended descriptor. Amoeboid microglia are commonly understood to be indicative of a highly proinflammatory environment, though you could potentially use "hyper-reactive" to differentiate them from the slightly ramified "reactive" cells.

      We changed activated microglia to reactive microglia as requested by the reviewer in the text. Thanks a lot for your comment

      (4) The graphs in Figures 3 B-D are visually difficult to interpret. The better color contrast between the MorphoCellSorter/Expert and Expert1/Expert2 would be useful--- perhaps a color for Expert 1 and a different color for Expert 2. Is this the ranking from the same data in Figure 1 (lines 420-421)? It is unclear what the x-axis represents in 3B-D. E-G is much more intuitive.

      We believe the confusion stems more from Figure 1 than Figure 3, as both figures use similar representations for entirely different analyses (clustering vs. ranking). To address this, we have provided an updated version of Figure 1 to help clarify this distinction and avoid any potential misinterpretation.

      Regarding Figure 3B-D, we do not fully see the need for changing the colors. These panels are histograms that display the distribution of rank differences either between experts and MorphoCellSorter or between the two experts. Assigning specific colors to the experts or MorphoCellSorter would be challenging, as the histograms represent comparative distributions involving both an expert and MorphoCellSorter or the ranking differences between the two experts.

      The same reasoning applies to Figures 3E-G. In these scatter plots, each point is defined by an ordinate (ranking value for one expert) and an abscissa (ranking value for either the other expert or MorphoCellSorter). Therefore, it would not be straightforward or meaningful to assign distinct colors to these elements within this context.

      (5) Line 217: use the term "imaged" rather than "generated" ... or "images were generated of clusters of microglia located .... using MICROSOPE and Zen software." You aren't generating microglia, rather, you are generating images.

      Thanks a lot for raising this problem, we changed the sentence as followed: “For the AD model, crops of individual microglial cells located in the secondary visual cortex were extracted from images using the Zen software (v3.5, Zeiss) and exported to the Tif image format.

      (6) Elaborate on how an "inversion operation" was applied to Lacunarity, roundness factor, convex hull radii ratio, processes cell areas ratio, and skeleton processes. (Lines 299-300) Furthermore, a paragraph separation would be useful if the "inversion operation" is not what is described in the text immediately after this description.

      Indeed, we are sorry for this lack of explanation. Some morphological indexes rank cells from the least to the most ramified, while others rank them in the opposite order. By inverting certain parameters, we can standardize the ranking direction across all parameters, simplifying data interpretation. This clarification has been added to the revised manuscript as follows:

      “Lacunarity, roundness factor, convex hull radii ratio, processes cell areas ratio and skeleton processes ratio were subjected to an inversion operation in order to homogenize the parameters before conducting the PCA: indeed, some parameters rank cells from the least to the most ramified, while others rank them in the opposite order. By inverting certain parameters, we can standardize the ranking direction across all parameters, thus simplifying data interpretation.”

      (7) Line 560: "measureclarke" seems to be an error associated with the reference. Please correct.

      Thanks a lot, this has been corrected

      (8) Discussion: compare MorphoCellSorter to the MIC-MAC program used by Salamanca et al. (2019). They use a similar approach, albeit not Andrew's plot.

      We have added the Salamanca reference

      Reviewer #2 (Recommendations for the authors):

      While it's not expected that the authors address the significance of the morphology in relation to function here, they could help highlight the issue and produce data that would enhance the paper's significance. Therefore, I recommend a small-scale and straightforward study where the authors couple their analysis with a marker (e.g. Lysotracker or Mitotracker) to produce data that link their morphometric analysis to more functional readouts. Furthermore, I encourage the authors to elaborate on the practical applications of these morphometric tools and the implications of their measurements, as this would provide context for their work, which, as it stands, feels like just another tool.

      We would like to thank the reviewer for their thoughtful comment and suggestion. Indeed, MorphoCellSorter is simply another tool, but one that offers a more convenient and efficient approach, producing a variety of results tailored to specific research needs. We strongly believe that MorphoCellSorter should be used in conjunction with other tools, depending on the specific research question.

      In our view, MorphoCellSorter is particularly well-suited for researchers who need a quick and efficient way to determine whether their treatment, gene invalidation, or other experimental conditions affect microglial morphology. In this context, MorphoCellSorter is fast, user-friendly, and highly effective. However, for those who aim to uncover detailed differences in cell morphology, other tools requiring more time-intensive, full reconstructions of the cells would be more appropriate.

      Providing additional data on the relationship between cellular function and morphology could certainly pave the way for new questions and more robust evidence. For instance, combining single-cell transcriptomics with morphological analysis would be an excellent approach to exploring the relationship between function and morphology. However, this would involve significant time, expense, and effort, and it represents a different line of inquiry altogether.

      While it would be ideal to clearly demonstrate the link between morphology and function, we are concerned that pursuing such a goal would considerably delay the implementation and adoption of our tool, potentially raising additional questions beyond the scope of this study.!

      Minor comments:

      (1) Can MorphCellSorter be adapted for use with other cell types (e.g., astrocytes)?

      Yes it could, we have made some pretty conclusive analysis on astrocytes but some parameters have to be adapted before being released.

      (2) What modifications would be necessary? If it is not applicable, would a name that includes "Microglia" be more descriptive?

      Modification would be quite minor, it is mainly the parameters being considered that would change, this is the reason why we will keep the MorphoCellSorter name. Thank you for the suggestion!

      (3) A common challenge with such tools is the technical expertise required to use them. Could a user-friendly interface be developed to better fulfill its intended purpose and benefit the community?

      This is a good point thank you, and the answer is yes, we will translate our Matlab code to Python to open it to a wider audience and we will certainly work on a friendly user interface!

      (4) Given that this tool relies on imaging, can users trace a cell (or group of cells) back to the original image?

      Yes, it is possible if each crop is annotated with the spatial coordinates during the segmentation step. It is not yet implemented in the actual version of the software but mainly depend on the way segmentation is performed, which is not the topic of the paper.

      (5)  Line 36: The "biologically relevant" statement is central and needs to be expanded.

      This is not easy as it is the abstract with a word limit. What we mean by this sentence is that when classifying cells we force them by mathematical tools to enter in a group of cells based on metrics that have not necessarily a biological meaning. We suggest the following modification “However, this classification may lack biological relevance, as microglial morphologies represent a continuum rather than distinct, separate groups, and do not correspond to mathematically defined, clusters irrelevant of microglial cells function.”

      (6) Line 49-50: Provide reference and elaborate. For example, does this apply during early life?

      We have slightly changed the sentence and added a reference.

      (7) Line 69: Provide reference.

      The reference, Hubert et al 2021 has been added

      (8) Lines 78-88: A table summarizing other efforts in morphometric characterization of microglia would be helpful in distinguishing your work from others.

      This has already been done in some review articles; we thus added the references to address readers to these reviews. Here is the revised version of the sentence: “ To date, the literature contains a wide variety of criteria to quantitatively describe microglial morphology, ranging from descriptive measures such as cell body surface area, perimeter, and process length to indices calculating different parameters such as circularity, roundness, branching index, and clustering (Adaikkan et al., 2019; Heindl et al., 2018; Kongsui, Beynon, Johnson, & Walker, 2014; Morrison et al., 2017; Young & Morrison, 2018)”

      (9) Lines 130, 145: Please provide complete genotype information and the sources of the animals used.

      It has been done

      (10) Materials and Methods:

      (1) Standardize the presentation of products (e.g., using # consistently).

      It has been done

      (2) Provide versions of software used.

      We have modified accordingly

      (3) Lines 372-373: A table listing the 20 parameters with brief explanations (as partially done in Materials and Methods) would greatly improve readability.

      This is done in supp figure 8

      (4) Since nomenclature is a critical issue in the literature, you used specific definitions (lines 376-383). However, please indicate (with a reference) why you use the term "activated," as it implies that the others are non-activated. Alternatively, define "activated" cluster differently.

      We change activated microglia to reactive microglia as requested by the reviewer #1.

      (4) Figure 1: In my opinion placing this figure as the first main figure is problematic as it confuses the message of the paper. Since the authors are introducing a new approach for morphological characterization in Figure 2, I recommend the latter for the sake of readability and clarity should be the first main image, while Figure 1 can move the supplements.

      We do agree with the reviewer, we thus changed figure one as explained earlier to reviewer 1. Nonetheless because it is an important step of our reflection process we believe it can stay as a figure. We hope the change made in figure one clarifies the message of the paper.

      (5) Figure 1: Please indicate on the figure the marker for the analysis.

      Figure 2 has been changed

      (6) No funding agencies are communicated.

      This has been corrected

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1:

      (1) Line numbers are missing.

      Added

      (2) VR classroom. Was this a completely custom design based on Unity, or was this developed on top of some pre-existing code? Many aspects of the VR classroom scenario are only introduced (e.g., how was the lip-speech synchronisation done exactly?). Additional detail is required. Also, is or will the experiment code be shared publicly with appropriate documentation? It would also be useful to share brief example video-clips.

      We have added details about the VR classroom programming to the methods section (p. 6-7), and we have now included a video-example as supplementary material.

      “Development and programming of the VR classroom were done primarily in-house, using assets (avatars and environment) were sourced from pre-existing databases. The classroom environment was adapted from assets provided by Tirgames on TurboSquid (https://www.turbosquid.com/Search/Artists/Tirgames) and modified to meet the experimental needs. The avatars and their basic animations were sourced from the Mixamo library, which at the time of development supported legacy avatars with facial blendshapes (this functionality is no longer available in current versions of Mixamo). A brief video example of the VR classroom is available at: https://osf.io/rf6t8.

      “To achieve realistic lip-speech synchronization, the teacher’s lip movements were controlled by the temporal envelope of the speech, adjusting both timing and mouth size dynamically. His body motions were animated using natural talking gestures.”

      While we do intent to make the dataset publicly available for other researchers, at this point we are not making the code for the VR classroom public. However, we are happy to share it on an individual-basis with other researchers who might find it useful for their own research in the future.

      (3) "normalized to the same loudness level using the software Audacity". Please specify the Audacity function and parameters.

      We have added these details (p.7)

      “All sound-events were normalized to the same loudness level using the Normalize function in the audio-editing software Audacity (theaudacityteam.org, ver 3.4), with the peak amplitude parameter set to -5 dB, and trimmed to a duration of 300 milliseconds.“

      (4) Did the authors check if the participants were already familiar with some of the content in the mini-lectures?

      This is a good point. Since the mini-lectures spanned many different topics, we did not pre-screen participants for familiarity with the topics, and it is possible that some of the participants had some pre-existing knowledge.

      In hindsight, it would have been good to have added some reflective questions regarding participants prior knowledge as well as other questions such as level of interest in the topic and/or how well they understood the content. These are elements that we hope to include in future versions of the VR classroom.

      (5) "Independent Component Analysis (ICA) was then used to further remove components associated with horizontal or vertical eye movements and heartbeats". Please specify how this selection was carried out.

      Selection of ICA components was done manually based on visual inspection of their time-course patterns and topographical distributions, to identify components characteristic of blinks, horizontal eye-movements and heartbeats). Examples of these distinct components are provided in Author response image 1 below. These is now specified in the methods section.

      Author response image 1.

      (6) "EEG data was further bandpass filtered between 0.8 and 20 Hz". If I understand correctly, the data was filtered a second time. If that's the case, please do not do that, as that will introduce additional and unnecessary filtering artifacts. Instead, the authors should replace the original filter with this one (so, filtering the data only once). Please see de Cheveigne and Nelkn, Neuron, 2019 for an explanation. Also, please provide an explanation of the rationale for further restricting the cut-off bands in the methods section. Finally, further details on the filters should be included (filter type and order, for example).

      Yes, the data was indeed filtered twice. The first filter is done as part of the preprocessing procedure, in order to remove extremely high- and low- frequency noise but retain most activity within the range of “neural” activity. This broad range is mostly important for the ICA procedure, so as to adequately separate between ocular and neural contribution to the recorded signal.

      However, since both the speech tracking responses and ERPs are typically less broadband and are comprised mostly of lower frequencies (e.g., those that make up the speech-envelope), a second narrower filter was applied to improve TRF model-fit and make ERPs more interpretable.

      In both cases we used a fourth order zero-phase Butterworth IIR filter with 1-seconds of padding, as implemented in the Fieldtrip toolbox. We have added these details to the manuscript.

      (7) "(~ 5 minutes of data in total), which is insufficient for deriving reliable TRFs". That is a bit pessimistic and vague. What does "reliable" mean? I would tend to agree when talking about individual subject TRFs, which 5 min per participant can be enough at the group level. Also, this depends on the specific speech material. If the features are univariate or multivariate. Etc. Please narrow down and clarify this statement.

      We determined that the data in the Quiet condition (~5 min) was insufficient for performing reliable TRF analysis, by assessing whether its predictive-power was significantly better than chance. As shown in Author response image 2 below, the predictive power achieved using this data was not higher than values obtained in permuted data (p = 0.43). Therefore, we did not feel that it was appropriate to include TRF analysis of the Quiet condition in this manuscript. We have now clarified this in the manuscript (p. 10)

      Author response image 2.

      (8) "Based on previous research in by our group (Kaufman & Zion Golumbic 2023), we chose to use a constant regularization ridge parameter (λ= 100) for all participants and conditions". This is an insufficient explanation. I understand that there is a previous paper involved. However, such an unconventional choice that goes against the original definition and typical use of these methods should be clearly reported in this manuscript.

      We apologize for not clarifying this point sufficiently, and have added an explanation of this methodological choice (p.11):

      “The mTRF toolbox uses a ridge-regression approach for L2 regularization of the model to ensure better generalization to new data. We tested a range of ridge parameter values (λ's) and used a leave-one-out cross-validation procedure to assess the model’s predictive power, whereby in each iteration, all but one trials are used to train the model, and it is then applied to the left-out trial. The predictive power of the model (for each λ) is estimated as the Pearson’s correlation between the predicted neural responses and the actual neural responses, separately for each electrode, averages across all iterations. We report results of the model with the λ the yielded the highest predictive power at the group-level (rather than selecting a different λ for each participant which can lead to incomparable TRF models across participants; see discussion in Kaufman & Zion Golumbic 2023).”

      Assuming that the explanation will be sufficiently convincing, which is not a trivial case to make, the next issue that I will bring up is that the lambda value depends on the magnitude of input and output vectors. While the input features are normalised, I don't see that described for the EEG signals. So I assume they are not normalized. In that case, the lambda would have at least to be adapted between subjects to account for their different magnitude.

      We apologize for omitting this detail – yes, the EEG signals were normalized prior to conducting the TRF analysis. We have updated the methods section to explicitly state this pre-processing step (p.10).

      Another clarification, is that value (i.e., 100) would not be comparable either across subjects or across studies. But maybe the authors have a simple explanation for that choice? (note that this point is very important as this could lead others to use TRF methods in an inappropriate way - but I understand that the authors might have specific reasons to do so here). Note that, if the issue is finding a reliable lambda per subject, a more reasonable choice would be to use a fixed lambda selected on a generic (i.e., group-level) model. However selecting an arbitrary lambda could be problematic (e.g., would the results replicate with another lambda; and similarly, what if a different EEG system was used, with different overall magnitude, hence the different impact of the regularisation).

      We fully agree that selecting an arbitrary lambda is problematic (esp across studies). As clarified above, the group-level lambda chosen here for the encoding more was data-driven, optimized based on group-level predictive power.

      (9) "L2 regularization of the model, to reduce its complexity". Could the authors explain what "reduce its complexity" refers to?

      Our intension here was to state that the L2 regularization constrains the model’s weights so that it can better generalize between to left-out data. However, for clarity we have now removed this statement.

      (10) The same lambda value was used for the decoding model. From personal experience, that is very unlikely to be the optimal selection. Decoding models typically require a different (usually larger) lambda than forward models, which can be due to different reasons (different SNR of "input" of the model and, crucially, very different dimensionality).

      We agree with the reviewer that treatment of regularization parameters might not be identical for encoding and decoding models. Our initial search of lambda parameters was limited to λ= 0.01 - 100, with λ= 100 showing the best reconstruction correlations. However, following the reviewer’s suggestion we have now broadened the range and found that, in fact reconstruction correlations are further improved and the best lambda is λ= 1000 (see Author response image 3 below, left panel). Importantly, the difference in decoding reconstruction correlations between the groups is maintained regardless of the choice of lambda (although the effect-size varies; see Author response image 3, right panel). We have now updated the text to reflect results of the model with λ= 1000.

      Author response image 3.

      (11) Skin conductance analysis. Additional details are required. For example, how was the linear interpolation done exactly? The raw data was downsampled, sure. But was an anti-aliasing filter applied? What filter exactly? What implementation for the CDA was run exactly?

      We have added the following details to the methods section (p. 14):

      “The Skin Conductance (SC) signal was analyzed using the Ledalab MATLAB toolbox (version 3.4.9; Benedek and Kaernbach, 2010; http://www.ledalab.de/) and custom-written scripts. The raw data was downsampled to 16Hz using FieldTrip's ft_resampledata function, which applies a built-in anti-aliasing low-pass filter to prevent aliasing artifacts. Data were inspected manually for any noticeable artifacts (large ‘jumps’), and if present were corrected using linear interpolation in Ledalab. A continuous decomposition analysis (CDA) was employed to separate the tonic and phasic SC responses for each participant. The CDA was conducted using the 'sdeco' mode (signal decomposition), which iteratively optimizes the separation of tonic and phasic components using the default regularization settings.”

      (12) "N1- and P2 peaks of the speech tracking response". Have the authors considered using the N1-P2 complex rather than the two peaks separately? Just a thought.

      This is an interesting suggestion, and we know that this has been used sometimes in more traditional ERP literature. In this case, since neither peak was modulated across groups, we did not think this would yield different results. However, it is a good point to keep in mind for future work.

      (13) Figure 4B. The ticks are missing. From what I can see (but it's hard without the ticks), the N1 seems later than in other speech-EEG tracking experiments (where is closer to ~80ms). Could the authors comment on that? Or maybe this looks similar to some of the authors' previous work?

      We apologize for this and have added ticks to the figure.

      In terms of time-course, a N1 peak at around 100ms is compatible with many of our previous studies, as well as those from other groups.

      (14) Figure 4C. Strange thin vertical grey bar to remove.

      Fixed.

      (15) Figure 4B: What about the topographies for the TRF weights? Could the authors show that for the main components?

      Yes. The topographies of the main TRF components are similar to those of the predictive power and are compatible with auditory responses. We have added them to Figure 4B.

      (16) Figure 4B: I just noticed that this is a grand average TRF. That is ok (but not ideal) only because the referencing is to the mastoids. The more appropriate way of doing this is to look at the GFP, instead, which estimates the presence of dipoles. And then look at topographies of the components. Averaging across channels makes the plotted TRF weaker and noisier. I suggest adding the GFP to the plot. Also, the colour scale in Figure 4A is deceiving, as blue is usually used for +/- in plots of the weights. While that is a heatmap, where using a single colour or even yellow to red would be less deceiving at first look. Only cosmetics, indeed. The result is interesting nonetheless!

      We apologize for this, and agree with the reviewer that it is better not to average across EEG channels. In the revised Figure, we now show the TRFs based on the average of electrodes FC1, FC2, and FCz, which exhibited the strongest activity for the two main components.

      Following the previous comment, we have also included the topographical representation of the TRF main components, to give readers a whole-head perspective of the TRF.

      We have also fixed the color-scales.

      We are glad that the reviewer finds this result interesting!

      (17) Figure 4C. This looks like a missed opportunity. That metric shows a significant difference overall. But is that underpinned but a generally lower envelope reconstruction correlation, or by a larger deviation in those correlations (so, that metric is as for the control in some moments, but it drops more frequently due to distractibility)?

      We understand the reviewer’s point here, and ideally would like to be able to address this in a more fine-grained analysis, for example on a trial-by-trial basis. However, the design of the current experiment was not optimized for this, in terms of (for example) number of trials, the distribution of sound-events and behavioral outcomes. We hope to be able to address this issue in our future research.

      (18) I am not a fan of the term "accuracy" for indicating envelope reconstruction correlations. Accuracy is a term typically associated with classification. Regression models are typically measured through errors, loss, and sometimes correlations. 'Accuracy' is inaccurate (no joke intended).

      We accept this comment and now used the term “reconstruction correlation”.

      (19) Discussion. "The most robust finding in". I suggest using more precise terminology. For example, "largest effect-size".

      We agree and have changed the terminology (p. 31).

      (20) "individuals who exhibited higher alpha-power [...]". I probably missed this. But could the authors clarify this result? From what I can see, alpha did not show an effect on the group. Is this referring to Table 2? Could the authors elaborate on that? How does that reconcile with the non-significant effect of the group? In that same sentence, do you mean "and were more likely"? If that's the case, and they were more likely to report attentional difficulties, how is it that there is no group-effect when studying alpha?

      Yes, this sentence refers to the linear regression models described in Figure 10 and in Table 2. As the reviewer correctly points out, this is one place where there is a discrepancy between the results of the between-group analysis (ADHD diagnosis yes/no) and the regression analysis, which treats ADHD symptoms as a continuum, across both groups. The same is true for the gaze-shift data, which also did not show a significance between-group effect but was identified in the regression analysis as contributing to explaining the variance in ADHD symptoms.

      We discuss this point on pages 30-31, noting that “although the two groups are clearly separable from each other, they are far from uniform in the severity of symptoms experienced”, which motivated the inclusion of both analyses in this paper.

      At the bottom of p. 31 we specifically address the similarities and differences between the between-group and regression-based results. In our opinion, this pattern emphasizes that while neither approach is ‘conclusive’, looking at the data through both lenses contributes to an overall better understanding of the contributing factors, as well as highlighting that “no single neurophysiological measure alone is sufficient for explaining differences between the individuals – whether through the lens of clinical diagnosis or through report of symptoms”.

      (21) "why in the latter case the neural speech-decoding accuracy did not contribute to explaining ASRS scores [...]". My previous point 1 on separating overall envelope decoding from its deviation could help there. The envelope decoding correlation might go up and down due to SNR, while you might be more interested in the dynamics over time (i.e., looking at the reconstructions over time).

      Again, we appreciate this comment, but believe that this additional analysis is outside the scope of what would be reliably-feasible with the current dataset. However, since the data will be made publicly available, perhaps other researchers will have better ideas as to how to do this.

      (22) Data and code sharing should be discussed. Also, specific links/names and version numbers should be included for the various libraries used.

      We are currently working on organizing the data to make it publicly available on the Open Science Project.

      We have updated links and version numbers for the various toolboxes/software used, throughout the manuscript.

      Reviewer #2:

      (1) While it is highly appreciated to study selective attention in a naturalistic context, the readers would expect to see whether there are any potential similarities or differences in the cognitive and neural mechanisms between contexts. Whether the classic findings about selective attention would be challenged, rebutted, or confirmed? Whether we should expect any novel findings in such a novel context? Moreover, there are some studies on selective attention in the naturalistic context though not in the classroom, it would be better to formulate specific hypotheses based on previous findings both in the strictly controlled and naturalistic contexts.

      Yes, we fully agree that comparing results across different contexts would be extremely beneficial and important.

      The current paper serves as an important proof-first-concept demonstrating the plausibility and scientific potential of using combined EEG-VR-eyetracking to study neurophysiological aspects of attention and distractibility, but is also the basis for formulating specific hypothesis that will be tested in follow-up studies.

      If fact, a follow up study is already ongoing in our lab, where we are looking into this point, by testing users in different VR scenarios (e.g., classroom, café, office etc.), and assessing whether similar neurophysiological patterns are observed across contexts and to what degree they are replicable within and across individuals. We hope to share these data with the community in the near future.

      (2) Previous studies suggest handedness and hemispheric dominance might impact the processing of information in each hemisphere. Whether these issues have been taken into consideration and appropriately addressed?

      This is an interesting point. In this study we did not specifically control for handedness/hemispheric dominance, since most of the neurophysiological measured used here are sensory/auditory in their nature, and therefore potentially invariant to handedness. Moreover, the EEG signal is typically not very sensitive to hemispheric dominance, at least for the measures used here. However, this might be something to consider more explicitly in future studies. Nonetheless, we have added handedness information to the Methods section (p. 5): “46 right-handed, 3 left-handed”

      (3) It would be interesting to know how students felt about the Virtual Classroom context, whether it is indeed close to the real classroom or to some extent different.

      Yes, we agree. Obviously, the VR classroom differs in many ways from a real classroom, in terms of the perceptual experience, social aspects and interactive possibilities. We did ask participants about their VR experience after the experiment, and most reported feeling highly immersed in the VR environment and engaged in the task, with a strong sense of presence in the virtual-classroom.

      We note that, in parallel to the VR studies in our lab, we are also conducting experiments in real classrooms, and we hope that the cross-study comparison will be able to shed more light on these similarities/differences.

      (4) One intriguing issue is whether neural tracking of the teacher's speech can index students' attention, as the tracking of speech may be relevant to various factors such as sound processing without semantic access.

      Another excellent point. While separating the ‘acoustic’ and ‘semantic’ contributions to the speech tracking response is non-trivial, we are currently working on methodological approaches to do this (again, in future studies) following, for example, the hierarchical TRF approach used by Brodbeck et al. and others.

      (5) There are many results associated with various metrics, and many results did not show a significant difference between the ADHD group and the control group. It is difficult to find the crucial information that supports the conclusion. I suggest the authors reorganize the results section and report the significant results first, and to which comparison(s) the readers should pay attention.

      We apologize if the organization of the results section was difficult to follow. This is indeed a challenge when collecting so many different neurophysiological metrics.

      To facilitate this, we have now added a paragraph at the beginning of the result section, clarifying its structure (p.16):

      The current dataset is extremely rich, consisting of many different behavioral, neural and physiological responses. In reporting these results, we have separated between metrics that are associated with paying attention to the teacher (behavioral performance, neural tracking of the teacher’s speech, and looking at the teacher), those capturing responses to the irrelevant sound-events (ERPs and event-related changes in SC and gaze); as well as more global neurophysiological measures that may be associated with the listeners’ overall ‘state’ of attention or arousal (alpha- and beta-power and tonic SC).

      Moreover, within each section we have ordered the analysis such that the ones with significant effects are first. We hope that this contributes to the clarity of the results section.

      (6) The difference between artificial and non-verbal humans should be introduced earlier in the introduction and let the readers know what should be expected and why.

      We have added this to the Introduction (p. 4)

      (7) It would be better to discuss the results against a theoretical background rather than majorly focusing on technical aspects.

      We appreciate this comment. In our opinion, the discussion does contain a substantial theoretical component, both regarding theories of attention and attention-deficits, and also regarding their potential neural correlates. However, we agree that there is always room for more in depth discussion.

      Reviewer #3:

      Major:

      (1) While the study introduced a well-designed experiment with comprehensive physiological measures and thorough analyses, the key insights derived from the experiment are unclear. For example, does the high ecological validity provide a more sensitive biomarker or a new physiological measure of attention deficit compared to previous studies? Or does the study shed light on new mechanisms of attention deficit, such as the simultaneous presence of inattention and distraction (as mentioned in the Conclusion)? The authors should clearly articulate their contributions.

      Thanks for this comment.

      We would not say that this paper is able to provide a ‘more sensitive biomarker’ or a ‘new physiological measure of attention’ – in order to make those type of grand statements we would need to have much more converging evidence from multiple studies and using both replication and generalization approaches.

      Rather, from our perspective, the key contribution of this work is in broadening the scope of research regarding the neurophysiological mechanisms involved in attention and distraction.

      Specifically, this work:

      (1) Offers a significant methodological advancement of the field – demonstrating the plausibility and scientific potential of using combined EEG-VR-eyetracking to study neurophysiological aspects of attention and distractibility in contexts that ‘mimic’ real-life situations (rather than highly controlled computerized tasks).

      (2) Provides a solid basis formulating specific mechanistic hypothesis regarding the neurophysiological metrics associated with attention and distraction, the interplay between them, and their potential relation to ADHD-symptoms. Rather than being an end-point, we see these results as a start-point for future studies that emphasize ecological validity and generalizability across contexts, that will hopefully lead to improved mechanisms understanding and potential biomarkers of real-life attentional capabilities (see also response to Rev #2 comment #1 above).

      (3) Highlights differences and similarities between the current results and those obtained in traditional ‘highly controlled’ studies of attention (e.g., in the way ERPs to sound-events differ between ADHD and controls; variability in gaze and alpha-power; and more broadly about whether ADHD symptoms do or don’t map onto specific neurophysiological metrics). Again, we do not claim to give a definitive ’answer’ to these issues, but rather to provide a new type of data that can expands the conversation and address the ecological validity gap in attention research.

      (2) Based on the multivariate analyses, ASRS scores correlate better with the physiological measures rather than the binary deficit category. It may be worthwhile to report the correlation between physiological measures and ASRS scores for the univariate analyses. Additionally, the correlation between physiological measures and behavioral accuracy might also be interesting.

      Thanks for this. The beta-values reported for the regression analysis reflect the correlations between the different physiological measures and the ASRS scores (p. 30). From a statistical perspective, it is better to report these values rather than the univariate correlation-coefficients, since these represent the ‘unique’ relationship with each factor, after controlling for all the others.

      The univariate correlations between the physiological measures themselves, as well as with behavioral accuracy, are reported in Figure 10

      (3) For the TRF and decoding analysis, the authors used a constant regularization parameter per a previous study. However, the optimal regularization parameter is data-dependent and may differ between encoding and decoding analyses. Furthermore, the authors did not conduct TRF analysis for the quiet condition due to the limited ~5 minutes of data. However, such a data duration is generally sufficient to derive a stable TRF with significant predictive power (Mesik and Wojtczak, 2023).

      The reviewer raises two important points, also raised by Rev #1 (see above).

      Regarding the choice of regularization parameters, we have now clarified that although we used a common lambda value for all participants, it was selected in a data-driven manner, so as to achieve an optimal predictive power at the group-level.

      See revised methods section:

      “The mTRF toolbox uses a ridge-regression approach for L2 regularization of the model to ensure better generalization to new data. We tested a range of ridge parameter values (λ's) and used a leave-one-out cross-validation procedure to assess the model’s predictive power, whereby in each iteration, all but one trials are used to train the model, and it is then applied to the left-out trial. The predictive power of the model (for each λ) is estimated as the Pearson’s correlation between the predicted neural responses and the actual neural responses, separately for each electrode, averages across all iterations. We report results of the model with the λ the yielded the highest predictive power at the group-level (rather than selecting a different λ for each participant which can lead to incomparable TRF models across participants; see discussion in Kaufman & Zion Golumbic 2023).”

      Regarding whether data was sufficient in the Quiet condition for performing TRF analysis – we are aware of the important work by Mesik & Wojtczak, and had initially used this estimate when designing our study. However, when assessing the predictive-power of the TRF model trained on data from the Quiet condition, we found that it was not significantly better than chance (see Author response image 2, ‘real’ predictive power vs. permuted data). Therefore, we ultimately did not feel that it was appropriate to include TRF analysis of the Quiet condition in this manuscript. We have now clarified this in the manuscript (p. 10)

      (4) As shown in Figure 4, for ADHD participants, decoding accuracy appears to be lower than the predictive power of TRF. This result is surprising because more data (i.e., data from all electrodes) is used in the decoding analysis.

      This is an interesting point – however, in our experience it is not necessarily the case that decoding accuracy (i.e., reconstruction correlation with the stimulus) is higher than encoding predictive-power. While both metrics use Pearson’s’ correlations, they quantify the similarity between two different types of signals (the EEG and the speech-envelope). Although the decoding procedure does use data from all electrodes, many of them don’t actually contain meaningful information regarding the stimulus, and thus could just as well hinder the overall performance of the decoding.

      (5) Beyond the current analyses, the authors may consider analyzing inter-subject correlation, especially for the gaze signal analysis. Given that the area of interest during the lesson changes dynamically, the teacher might not always be the focal point. Therefore, the correlation of gaze locations between subjects might be better than the percentage of gaze duration on the teacher.

      Thanks for this suggestion. We have tried to look into this, however working with eye-gaze in a 3-D space is extremely complex and we are not able to calculate reliable correlations between participants.

      (6) Some preprocessing steps relied on visual and subjective inspection. For instance, " Visual inspection was performed to identify and remove gross artifacts (excluding eye movements) " (P9); " The raw data was downsampled to 16Hz and inspected for any noticeable artifacts " (P13). Please consider using objective processes or provide standards for subjective inspections.

      We are aware of the possible differences between objective methods of artifact rejection vs. use of manual visual inspection, however we still prefer the manual (subjective) approach. As noted, in this case only very large artifacts were removed, exceeding ~ 4 SD of the amplitude variability, so as to preserve as many full-length trials as possible.

      (7) Numerous significance testing methods were employed in the manuscript. While I appreciate the detailed information provided, describing these methods in a separate section within the Methods would be more general and clearer. Additionally, the authors may consider using a linear mixed-effects model, which is more widely adopted in current neuroscience studies and can account for random subject effects.

      Indeed, there are many statistical tests in the paper, given the diverse types of neurophysiological data collected here. We actually thought that describing the statistics per method rather than in a separate “general” section would be easier to follow, but we understand that readers might diverge in their preferences.

      Regarding the use of mixed-effect models – this is indeed a great approach. However, it requires deriving reliable metrics on a per-trial basis, and while this might be plausible for some of our metrics, the EEG and GSR metrics are less reliable at this level. This is why we ultimately chose to aggregate across trials and use a regular regression model rather than mixed-effects.

      (8) Some participant information is missing, such as their academic majors. Given that only two lesson topics were used, the participants' majors may be a relevant factor.

      To clarify – the mini-lectures presented here actually covered a large variety of topics, broadly falling within the domains of history, science and social-science and technology. Regarding participants’ academic majors, these were relatively diverse, as can be seen in Author response table 1 and Author response image 4.

      Author response table 1.

      Author response image 4.

      (9) Did the multiple regression model include cross-validation? Please provide details regarding this.

      Yes, we used a leave-one-out cross validation procedure. We have now clarified this in the methods section which now reads:

      “The mTRF toolbox uses a ridge-regression approach for L2 regularization of the model to ensure better generalization to new data. We tested a range of ridge parameter values (λ's) and used a leave-one-out cross-validation procedure to assess the model’s predictive power, whereby in each iteration, all but one trials are used to train the model, and it is then applied to the left-out trial. The predictive power of the model (for each λ) is estimated as the Pearson’s correlation between the predicted neural responses and the actual neural responses, separately for each electrode, averages across all iterations. We report results of the model with the λ the yielded the highest predictive power at the group-level (rather than selecting a different λ for each participant which can lead to incomparable TRF models across participants; see discussion in Kaufman & Zion Golumbic 2023).”

      Minor:

      (10) Typographical errors: P5, "forty-nine 49 participants"; P21, "$ref"; P26, "Table X"; P4, please provide the full name for "SC" when first mentioned.

      Thanks! corrected

    1. Author Response

      The following is the authors’ response to the current reviews.

      Reviewer #2 (Recommendations For The Authors):

      We sincerely appreciate the time and efforts of the Reviewer.

      In light of your data showing that the IgG response is similar with and without CIN, it would be good to drop "and induce abroad, vaccination-like anti-tumor IgG response". This suggests a direct connection between CIN and the IgG response.In my opinion, the shorter title is equally strong and more correct.

      We edited this phrase in the originally submitted title for accuracy:

      Chromosomal instability induced in cancer can enhance macrophage-initiated immune responses that include anti-tumor IgG

      I agree that inducing CIN through other means can be left for a different study but in that case the abstract should moredirectly mention MSP1 inhibition since that is how CIN is always induced. Perhaps line 18: CIN is induced by MSP-1inhibition in poorly immunogenic....

      Done as requested:

      “…Here, CIN is induced in poorly immunogenic B16F10 mouse melanoma cells using spindle assembly checkpoint MPS1 inhibitors…”


      The following is the authors’ response to the original reviews.

      eLife assessment

      This study highlights a valuable finding that chromosomal instability can change immunes responses, in particular macrophages behaviours. The convincing results showing that the use of CD47 targeting and anti-Tyrp1 IgG can overcome changes in immune landscape in tumors and prolong survival of tumor-bearing mice. These findings reveal a new exciting dimension on how chromosomal instability can influence immune responses against tumor.

      We thank the Editors for their enthusiasm and appreciation for this work. We also want to highlight our thanks for their careful reading, support, and patience while handling this manuscript. While this work provides useful insight into potential therapeutic implications of chromosomal instability in the macrophage immunotherapy field, we also hope it elucidates some novel basic science to further explore how chromosomal instability has such interesting effects on the immune system.

      Public Reviews:

      Reviewer #1 (Public Review):

      The manuscript by Hayes et al. explored the potential of combining chromosomal instability with macrophage phagocytosis to enhance tumor clearance of B16-F10 melanoma. However, the manuscript suffers from substandard experimental design, some contradictory conclusions, and a lack of viable therapeutic effects.

      The authors suggest that early-stage chromosomal instability (CIN) is a vulnerability for tumorigenesis, CD47-SIRPa interactions prevent effective phagocytosis, and opsonization combined with inhibition of the CD47-SIRPa axis can amplify tumor clearance. While these interactions are important, the experimental methodology used to address them is lacking.

      Reviewer #1 (Recommendations For The Authors):

      First, early stages of the tumor are essentially being defined as before implantation. In all cases, the tumor cells were pre-treated with MPS1i or had a genetic knockout of CD47. This makes it difficult to see how this would translate clinically.

      We greatly appreciate the Reviewer’s interest in the topic and its potential, but our manuscript makes no claims of immediate clinical translation. Chromosomal instability (CIN) studies have to date not yet discovered or described whether and how CIN can affect macrophage function. To our knowledge, this is the first study to begin such characterizations with various MPS1i drugs to induce CIN. Many variations of the approach can be envisioned for future studies.

      Our Results include some key studies of cancer cells with wildtype levels of CD47- including in vivo tumor elimination (Fig.3E). Nonetheless, we do conduct some of our studies in a CD47 knockout context to remove this “brake” that generally impedes phagocytosis, with our goal being to better understand how CIN affects phagocytosis. As cited to some extent in our Introduction, there are many efforts in clinical trials to disrupt this macrophage checkpoint and others focused on macrophage immunotherapy. Whether CIN can be induced by clinically translatable drugs and specifically in cancer cells is beyond the scope of our studies.

      I would like to see the amount of CIN that occurs in WT B16F10 over the course of tumorigenesis (ie longer than 5 days). This is because I would assume that CIN would eventually occur in the WT B16F10 regardless of whether MPS1i is being given. And if that's the case, then the initiation of CIN at day 10 after implantation (for example) would still be considered "early stage" CIN. If the therapy is then initiated at this point, does the effect remain? Or put differently, how would the authors propose to induce the appropriate level of CIN in an established tumor? Why is pretreatment necessary?

      Untreated B16F10 cells fail to produce micronuclei over 12 days compared to MPS1i treated cells – as shown in a newly added panel in Fig. S1:

      Author response image 1.

      This helps support our decision to pre-treat cells with MPS1i to stimulate genomic instability and is described in the first section of Results:

      “…we saw >10-fold increases of micronuclei over the cell line’s low basal level (~1% of cells), and two other MPS1i inhibitors AZ3146 and BAY12-17389 confirm such effects (Fig. S1A). Micronuclei-positive cells can persist up to 12 days after treatment (Fig. S1B), while control cells maintain the low basal levels. The results suggest pre-treatment with MPS1i can simulate CIN in an experimental context even for 1-2 weeks, which may not typically occur at the same frequency during early tumor growth.

      It is known that PD-1 expression inhibits tumor-associated macrophage phagocytosis (Nature, 2017). Does MSP1i (sic) treatment affect the population of PD-1+ tumor macrophages in vivo?

      We thank the Reviewer for bringing up an interesting point.

      Using the same tumor RNA-seq data that was used for Fig.1E, a heatmap of expression of PD-1 (gene Pdcd1) shows no consistent trend with MPS1i:

      Author response image 2.

      We also examined whether the secretome from CIN-afflicted cancer cells affect PD-1 expression in cultured macrophages, but we did not register any reads from our single-cell RNA-sequencing experiment for Pdcd1 in any of the macrophage clusters from Fig. 1H.

      Author response image 3.

      The Discussion section now includes a statement on this topic:

      “…B16F10 tumors are poorly immunogenic, do not respond to either anti-CD47 or anti-PD-1/PDL1 monotherapies, and show modest and variable cure rates (~20-40%; Dooling et al., 2023; Hayes et al., 2023) even when macrophages have been made maximally phagocytic according to notions above. We should note here that our whole-tumor RNA-seq data (Fig.1E) shows expression of PD-1 (gene Pdcd1) follows no consistent trend upon MPS1i treatment, and that Pdcd1 was not detected in our scRNA-seq data for macrophage cultures (Fig.1G) – motivating further study.”

      The authors must explain how the proposed therapy works since MPS1i increases tumor (cell) size, making it difficult for macrophages to phagocytose the tumor cells. It also reduces or suppresses Tyrp1 expression on the cancer cells, making it harder to opsonize. Since these were two main points for the rationale of this study, the authors need to reconcile them.

      We appreciate this comment and have re-organized this Results section to try to minimize confusion:

      CIN-afflicted, CD47-knockout tumoroids are eliminated by Macrophages

      To assess functional effects of macrophage polarization, we focused on a 3D “immuno-tumoroid” model in which macrophage activity can work (or not) over many days against a solid proliferating mass of cancer cells in non-adherent roundbottom wells (Fig. 2A) (Dooling et al., 2023). We used CD47 knockout (KO) B16F10 cells, which removes the inhibitory effect of CD47 on phagocytosis, noting that KO does not perturb surface levels of Tyrp1, which is targetable for opsonization with anti-Tyrp1 (Fig. S2A). BMDMs were added to pre-assembled tumoroids at a 3:1 ratio, and we first assessed surface protein expression of macrophage polarization markers. Consistent with our whole-tumor bulk RNA-sequencing and also single-cell RNA-sequencing of BMDM monocultures (Fig. 1E, 1I-J), BMDMs from immunotumoroids of MPS1i-treated B16F10 showed increased surface expression of M1-like markers MHCII and CD86 while showing decreased expression of M2-like markers CD163 and CD206 (Fig. 2B-C). Although these macrophages seemed poised for anticancer activity, the cancer cells showed decreased binding of anti-Tyrp1 (Fig. S2B) and ~20% larger size in flow cytometry (Fig. S2C). The latter likely reflects cytokinesis defects and poly-ploidy as acute effects of CIN induction (Chunduri & Storchová, 2019; Mallin et al., 2022). Such cancer cell changes might explain why standard 2D phagocytosis assays show BMDMs attached to rigid plastic engulf relatively few anti-Tyrp1 opsonized cancer cells pretreated with MPS1i versus DMSO (Fig. S2D). In such cultures, BMDMs use their cytoskeleton to attach and spread, competing with engulfment of large and poorly opsonized targets. Noting that tumors in vivo are not as rigid as plastic, our 3D immunotumoroids eliminate attachment to plastic, and large numbers of macrophages can cluster and cooperate in engulfing cancer cells in a cohesive mass (Dooling et al., 2023). We indeed find CIN-afflicted tumoroids are eliminated by BMDMs regardless of anti-Tyrp1 opsonization (Fig. 2D-E), whereas anti-Tyrp1 is required for clearance of DMSO control tumoroids (Fig. 2D, S3B). Imaging also suggests that cancer CIN stimulates macrophages to cluster (compare Day-4 in Fig. 2D), which favors cooperative phagocytosis of tumoroids (Dooling et al., 2023), and occurs despite the lack of cancer cell opsonization and their larger cell size. The 3D immunotumoroid results with induced CIN are thus consistent with a more pro-phagocytic M1-type polarization (Fig.1J and 2B,C).

      The authors used varying numbers of tumor cells for the in vivo portions of the study; the first half of the manuscript uses 500,000 cells, while the latter half uses 200,000 cells. Why?

      The reasons for the difference in numbers is now clarified in the Methods:

      For assessing immune infiltrates in early stages of tumor engraftment, when tumors are still small, we used a relatively high number of tumor cells (500,000 cells in Fig. 1D and Fig. 2F-G) to achieve sufficient cell numbers after dissociating the tumors, particularly for the slow-growing MPS1i-treated tumors. More specifically, with dissection, collagenase treatment, passage through a filter to remove clumps, we would lose many cells, and yet needed 100,000 viable cells or more for bulk RNA-seq suspensions and for flow cytometry measurements. For all other studies, 200,000 cancer cells were injected,

      The authors need to report the tumor volumes and the total number of cells isolated from the day five tumors to avoid grossly inflating the effect (i.e. Fig 2G and 4G).

      We have added relevant numbers in the Methods:

      For day 5 post-challenge measurements, 100,000 to 200,000 live cells were collected. For in vivo tumor infiltrate studies in re-challenged mice, 10 million live cells were collected.

      Also, regarding tumor sizes and cell numbers, we have previously published relevant measurements in assessments of tumor growth. Please see:

      Brandon H Hayes, Hui Zhu, Jason C Andrechak, Lawrence J Dooling, Dennis E Discher, Titrating CD47 by mismatch CRISPR-interference reveals incomplete repression can eliminate IgG-opsonized tumors but limits induction of antitumor IgG, PNAS Nexus, Volume 2, Issue 8, August 2023, pgad243, https://doi.org/10.1093/pnasnexus/pgad243

      Dooling, L.J., Andrechak, J.C., Hayes, B.H. et al. Cooperative phagocytosis of solid tumours by macrophages triggers durable anti-tumour responses. Nat. Biomed. Eng 7, 1081–1096 (2023). https://doi.org/10.1038/s41551-023-01031-3

      In the present study, similar tumor growth curves are provided for transparency, but the Kaplan-Meier curves as the key pieces of data in Fig. 3-4. Lastly, regarding reporting total cell number harvested, we based our experiments on previously accepted measurements that also reported numbers out of total harvested cells. See:

      Cerezo-Wallis, D., Contreras-Alcalde, M., … Soengas, M.S., 2020. Midkine rewires the melanoma microenvironment toward a tolerogenic and immune-resistant state. Nat Med 26, 1865–1877. https://doi.org/10.1038/s41591-020-1073-3

      The figure titles need to be revised. For example, the title of Figure 1 claims that "MPS1i-induced chromosomal instability causes proliferation deficits in B16F10 tumors." However, the evidence provided is weak. The authors only present GSEA analysis of proliferation and no functional evidence of impairment. The authors need to characterize this proliferation deficit using in vitro studies and functional studies of macrophage polarization. I would suggest proliferation assays (crystal violet, MTT, Incucyte, etc) to measure the B16 growth over time with MPS1i treatment.

      We thank the Reviewer for pointing this out. In Fig.1 we have minimized information regarding proliferation because it is later quantified in Figs.2D,E, S3, and 3D-i:

      Fig.1F legend: Top downregulated hallmark gene sets in tumors comprised of MPS1i-treated B16F10 cells, showing downregulated DNA repair, cell cycle, and growth-related pathways, consistent with observations of slowed growth in culture and in vivo – as subsequently quantified.

      Then the authors could collect the tumor supernatant to culture with macrophages and determine polarization in vitro. I would also like to see functional studies of macrophage polarization (suppression assays, cytokine production, etc). Currently, the authors provide no functional studies.

      Fig.2B,C provides functional surface marker measurements of in vitro polarization toward anti-cancer M1 macrophages by MPS1i-pretreated tumor cells, consistent with gene expression in Fig.1G-J. Function is further shown as ant-cancer activity in Fig.2D,E, as now stated explicitly in the text:

      “…In our 3D tumoroid in vitro assays, we found that macrophages can suppress the growth of chromosomally unstable tumoroids and clear them, surprisingly both with and without anti-Tyrp1 (Fig. 2D-E), regardless of MPS1i concentration used for treatment. Such a result is consistent with M1-type polarization (Fig.1J and 2B,C), which tends to be more pro-phagocytic. Such a result is consistent with M1-type polarization (Fig.1J and 2B,C), which tends to be more prophagocytic.”

      The authors claim that macrophages are the key effector cells, but they need to provide evidence for this claim.

      Other immune cells clearly contribute to the presented results because the IgG must eventually come from B cells. The text has been edited to indicate 'macrophages are key initiating-effector cells', and some evidence for this is the maximal survival of (WT B16 + Rev tumors) in Fig.3E upon treatment with Marrow Macrophages plus Macrophage-relevant SIRPa blockade and Macrophage-relevant IgG (via FcR). T cells do not have SIRPa or FcR.

      They can deplete macrophages and T and B cells to determine whether the effect remains or is ablated. This is the only definitive way to make this claim.

      To determine whether T and B cells might also be key initiating-effector cells, new experiments were done with mice depleted of T and B cells (per Fig.S9, below). We compared the growth of MPS1i vs DMSO treatments in these mice to results in mice with T and B cells (which should replicate our previous results in Fig.3D-i). We found that slower growth with Rev relative to DMSO was similar in mice without T and B cells compared to mice with T and B cells. We have added to the text our conclusion that: T and B cells are not key initiating-effector cells. Whereas B cells are effector cells at least in terms of eventually making anti-tumor IgG, our results show that macrophages are key initiating-effector cells because macrophages certainly affect the growth of (WT B16 + Rev tumors) when more are added (Fig.3E).

      Author response image 4.

      Growth of CIN-afflicted wild-type (WT) tumors in T- and B-cell deficient mice and T- and B-cell replete mice. Similar growth delays for MPS1i-pretreated B16F10 cells in T- and B-cell deficient NSG mice and immunocompetent C57BL/6 mice. Both types of mice have functional macrophages. Parallel studies in vivo were done with WT B16F10 ctrl cells cultured 24 h in 2.5 μM MPS1i (reversine or DMSO, then washed 3x in growth media for 5 min each and allowed to recover in growth media for 48 h. 200,000 cells in 100 uL PBS were injected subcutaneously into right flanks, and the standard size limit was used to determine survival curves. The C57BL/6 experiments were done independently here (by co-author L.J.D.) from the similar results (by B.H.H.) shown in Fig.3D-i, which provides evidence of reproducibility.

      The Results section final paragraph describes all of this:

      Macrophages seem to be the key initiating-effector cells, based in part on the following findings. First, macrophages with both SIRPα blockade and FcR-engaging, tumor-targeting IgG maximize survival of mice with WT B16 + Rev tumors (Fig. 3E) – noting that macrophages but not T cells express SIRPα and FcR’s. Despite the clear benefits of adding macrophages, to further assess whether T and B cells are key initiating-effector cells, new experiments were done with mice depleted of T and B cells. We compared the growth delay of MPS1i versus DMSO treatments in these mice to the delay in fully immunocompetent mice with T and B cells – with all studies done at the same time. We found that slower growth with Rev relative to DMSO was similar in mice without T and B cells when compared to immunocompetent C57 mice (Fig.S9). We conclude therefore that T and B cells are not key initiating-effector cells. At later times, B cells are likely effector cells at least in terms of making anti-tumor IgG, and T cells in tumor re-challenges are also increased in number (Fig. 4G-ii). We further note that in our earlier collaborative study (Harding et al., 2017) WT B16 cells were pre-treated by genome-damaging irradiation before engraftment in C57 mice, and these cells grew minimally – similar to MPS1i treatment – while untreated WT B16 cells grew normally at a contralateral site in the same mouse. Such results indicate that T and B cells in C57BL/6 mice are not sufficiently stimulated by genome-damaged B16 cells to generically impact the growth of undamaged B16 cells.

      Reviewer #2 (Public Review):

      Harnessing macrophages to attack cancer is an immunotherapy strategy that has been steadily gaining interest. Whether macrophages alone can be powerful enough to permanently eliminate a tumor is a high-priority question. In addition, the factors making different tumors more vulnerable to macrophage attack have not been completely defined. In this paper, the authors find that chromosomal instability (CIN) in cancer cells improves the effect of macrophage targeted immunotherapies. They demonstrate that CIN tumors secrete factors that polarize macrophages to a more tumoricidal fate through several methods. The most compelling experiment is transferring conditioned media from MSP1 inhibited and control cancer cells, then using RNAseq to demonstrate that the MSP1-inhibited conditioned media causes a shift towards a more tumoricidal macrophage phenotype. In mice with MSP1 inhibited (CIN) B16 melanoma tumors, a combination of CD47 knockdown and anti-Tyrp1 IgG is sufficient for long term survival in nearly all mice. This combination is a striking improvement from conditions without CIN.

      Like any interesting paper, this study leaves several unanswered questions. First, how do CIN tumors repolarize macrophages? The authors demonstrate that conditioned media is sufficient for this repolarization, implicating secreted factors, but the specific mechanism is unclear. In addition, the connection between the broad, vaccination-like IgG response and CIN is not completely delineated. The authors demonstrate that mice who successfully clear CIN tumors have a broad anti-tumor IgG response. This broad IgG response has previously been demonstrated for tumors that do not have CIN. It is not clear if CIN specifically enhances the anti-tumor IgG response or if the broad IgG response is similar to other tumors. Finally, CIN is always induced with MSP1 inhibition. To specifically attribute this phenotype to CIN it would be most compelling to demonstrate that tumors with CIN unrelated to MSP1 inhibition are also able to repolarize macrophages.

      Overall, this is a thought-provoking study that will be of broad interest to many different fields including cancer biology, immunology and cell biology.

      We thank the Reviewer for their enthusiastic and positive comments toward the manuscript.

      Our main purpose with this study has been discovery science oriented and mechanistic, with implications for improving macrophage immunotherapies. More experimentation needs to be done to further understand how this positive immune response emerges. However, we could address whether CIN enhances or not the anti-tumor IgG response by quantitative comparisons to our two other recent studies, and we conclude that it does not per new edits in the Abstract and the Results. See attached PPT for full details and comparison.

      Abstract:

      “CIN does not greatly affect the level of the induced response but does significantly increase survival.”

      “…these results demonstrate induction of a generally potent anti-cancer antibody response to CIN-afflicted B16F10 in a CD47 KO context. Importantly, comparing these sera results for CINafflicted tumors to our recent studies of the same tumor model without CIN (Dooling et al., 2022; Hayes et al., 2022), we find similar levels of IgG induction (e.g. ~100-fold above naive on average for IgG2a/c), similar increases in phagocytosis by sera opsonization (e.g. equivalent to antiTyrp1), and similar levels of suppressed tumoroid growth – including the variability.

      However, median survival increased (21 days) compared to their naïve counterparts (14 days), supporting the initial hypothesis of prolonged survival and consistent not only with past results indicating major benefits of a prime-&-boost approach with anti-Tyrp1 (Dooling et al., 2022) but also with the noted similarities in induced IgG levels.”

      Future studies could certainly focus on trying to identify what secreted factors might be inducing the M1-like polarization (using ELISA assays for cytokine detection, for example). This could be important because a main finding here is that we achieve nearly a 100% success rate in clearing tumors when we combine CD47 ablation and IgG opsonization with cancer cell CIN. Previous studies were only able to achieve about 40% cures in mice when working with CD47 disription and IgG opsonization alone, suggesting CIN in this experimental context does improve macrophage response.

      Lastly, we agree with the Reviewer that future studies should also address how CIN in general (not MPS1i-induced) affects tumor growth. The final paragraph of our Discussion at least cites support for consistent effects of M1-like polarization:

      “The effects of CIN and aneuploidy in macrophages certainly requires further investigation. We did publish recently that M1-like polarization of BMDMs with IFNg priming is sufficient to suppress growth of B16 tumoroids with anti-Tyrp1 opsonization more rapidly than unpolarized/unprimed macrophages and much more rapidly than M2-like polarization of BMDMs with IL4 (Extended Data Fig.5a in Dooling et al., 2023); hence, anti-cancer polarization contributes in this assay.

      While the secretome from MPS1i-treated cancer cells has been found to trigger…”

      Nonetheless, we can only speculate that there is a threshold of CIN reached by a certain timepoint in tumor engraftment and growth. Natural CIN might not be enough, so we pursued a pharmacological approach consistent with ongoing pre-clinical studies (https://doi.org/10.1158/1535-7163.MCT-15-0500). Future studies should consider trying knockdown models to gradually accrue CIN in tumors or using more relevant pharmacological drugs that are known to induce CIN not associated with the spindle. We believe, however, that these are larger questions on their own and are beyond the scope of the foundational discoveries in this manuscript.

      Reviewer #2 (Recommendations For The Authors):

      None

      We again thank the Reviewer for their support and enthusiasm for the manuscript. We made some additional changes and more data to address questions posed by the other Reviewer that we hope you find to help the manuscript further.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Hippocampal place cells display a sequence of firing activities when the animal travels through a spatial trajectory at a behavioral time scale of seconds to tens of seconds. Interestingly, parts of the firing sequence also occur at a much shorter time scale: ~120 ms within individual cycles of theta oscillation. These so-called theta sequences are originally thought to naturally result from the phenomenon of theta phase precession. However, there is evidence that theta sequences do not always occur even when theta phase precession is present, for example, during the early experience of a novel maze. The question is then how they emerge with experience (theta sequence development). This study presents evidence that a special group of place cells, those tuned to fast-gamma oscillations, may play a key role in theta sequence development.

      The authors analyzed place cells, LFPs, and theta sequences as rats traveled a circular maze in repeated laps. They found that a group of place cells were significantly tuned to a particular phase of fast-gamma (FG-cells), in contrast to others that did not show such tunning (NFG-cells). The authors then omitted FG-cells or the same number of NFG-cells, in their algorithm of theta sequence detection and found that the quality of theta sequences, quantified by a weighted correlation, was worse with the FG-cell omission, compared to that with the NFG-cell omission, during later laps, but not during early laps. What made the FG-cells special for theta sequences? The authors found that FG-cells, but not NFG-cells, displayed phase recession to slow-gamma (25 - 45 Hz) oscillations (within theta cycles) during early laps (both FG- and NFG-cells showed slow-gamma phase precession during later laps). Overall, the authors conclude that FG-cells contribute to theta sequence development through slow-gamma phase precession during early laps.

      How theta sequences are formed and developed during experience is an important question, because these sequences have been implicated in several cognitive functions of place cells, including memory-guided spatial navigation. The identification of FG-cells in this study is straightforward. Evidence is also presented for the role of these cells in theta sequence development. However, given several concerns elaborated below, whether the evidence is sufficiently strong for the conclusion needs further clarification, perhaps, in future studies.

      We thank the reviewer for these positive comments.

      (1) The results in Figure 3 and Figure 8 seems contradictory. In Figure 8, all theta sequences displayed a seemingly significant weighted correlation (above 0) even in early laps, which was mostly due to FG-cell sequences but not NFG-cell sequences (correlation for NFG-sequences appeared below 0). However, in Figure 3H, omitting FG-cells and omitting NFG-cells did not produce significant differences in the correlation. Conversely, FG-cell and NFG-cell sequences were similar in later laps in Figure 8 (NFG-cell sequences appeared even better than FG-cell sequences), yet omitting NFG-cells produced a better correlation than omitting FG-cells. This confusion may be related to how "FG-cell-dominant sequences" were defined, which is unclear in the manuscript. Nevertheless, the different results are not easy to understand.

      We thank the reviewer for pointing out this important problem.  The potential contradictory can be interpreted by different sequence dataset included in Fig3 and Fig8, described as follows.

      (1) In Fig 3, all sequences decoded without either FG or NFG cells were included, defined as exFG-sequences and exNFG sequences, so that we couldn’t observe sequence development at early phase and thus the weighted correlation was low.  (2) In Fig8, however, the sequences with either FG or NFG cells firing across at least 3 slow gamma cycles were included, defined as FG-cell sequences and NFG-cell sequences.  This criterion ensures to investigate the relationship between sequence development and slow gamma phase precession, so that these sequences were contributed by cells likely to show slow gamma phase precession.  These definitions have been updated to the “Theta sequences detection” section of the Methods (Line 606-619).

      At early phase, there’s still no difference of weighted correlation between FG-cell sequences and NFG-cell sequences (Author response image 1A, Student’s t test, t(65)=0.2, p=0.8, Cohen's D=0.1), but the FG-cell sequences contained high proportion of slow gamma phase precession (Fig8F).  At late phase, both FG-cell sequences and NFG-cell sequences exhibited slow gamma phase precession, so that their weighted correlation were high with no difference (Author response image 1B, Student’s t test, t(62)=-1.1, p=0.3, Cohen's D=0.3).  This result further indicates that the theta sequence development requires slow gamma phase precession, especially for FG cells during early phase.

      Author response image 1.

      (2) The different contributions between FG-cells and NFG-cells to theta sequences are supposed not to be caused by their different firing properties (Figure 5). However, Figure 5D and E showed a large effect size (Cohen's D = 07, 0.8), although not significant (P = 0.09, 0.06). But the seemingly non-significant P values could be simply due to smaller N's (~20). In other parts of the manuscript, the effect sizes were comparable or even smaller (e.g. D = 0.5 in Figure 7B), but interpreted as positive results: P values were significant with large N's (~480 in Fig. 7B). Drawing a conclusion purely based on a P value while N is large often renders the conclusion only statistical, with unclear physical meaning. Although this is common in neuroscience publications, it makes more sense to at least make multiple inferences using similar sample sizes in the same study.

      We thank the reviewer for this kind suggestion.  We made multiple inferences using similar sample sizes as much as possible.  In Fig7B, we did the statistical analysis with sessions as samples, and we found the significant conclusion was maintained.  These results have been updated to the revised manuscript (Lines 269-270).and the Fig7B has been replaced correspondingly.

      (3) In supplementary Figure 2 - S2, FG-cells displayed stronger theta phase precession than NFG-cells, which could be a major reason why FG-cells impacted theta sequences more than NFG cells. Although factors other than theta phase precession may contribute to or interfere with theta sequences, stronger theta phase precession itself (without the interference of other factors), by definition, can lead to stronger theta sequences.

      This is a very good point.  The finding that FG-cells displayed stronger theta phase precession than NFG-cells was consistent with the finding of Guardamagna et al., 2023 Cell Rep, that the theta phase precession pattern emerged with strong fast gamma.  Since slow gamma phase precession occurred within theta cycles, it is hard to consider the contribution of these factors to theta sequences development, without taking theta phase precession into account.  But one should be noted that the theta sequences could not be developed even if theta phase precession existed from the very beginning of the exploration (Feng et al., 2025 J Neurosci).  These findings suggest that theta phase precession, together with other factors, impact theta sequence development.  However, the weight of each factor and their interaction still need to be further investigated.  We have discussed this possibility in the Discussion section (Lines 361- 373).

      (4) The slow-gamma phase precession of FG-cells during early laps is supposed to mediate or contribute to the emergence of theta sequences during late laps (Figure 1). The logic of this model is unclear. The slow-gamma phase precession was present in both early and late laps for FG-cells, but only present in late laps for NFG-cells. It seems more straightforward to hypothesize that the difference in theta sequences between early and later laps is due to the difference in slow-gamma phase precession of NFG cells between early and late laps. Although this is not necessarily the case, the argument presented in the manuscript is not easy to follow.

      We thank the reviewer for pointing this out.  The slow gamma phase precession was first found in my previous publication (Zheng et al., 2016 Neuron), which indicates a temporally compressed manner for coding spatial information related to memory retrieval.  In this case, we would expect that slow gamma phase precession occurred in all cells during late laps, because spatial information was retrieved when rats have been familiar with the environment.  However, during early laps when novel information was just encoded, there would be balance between fast gamma and slow gamma modulation of cells for upcoming encoding-retrieval transition.  A possibility is that FG-cells support this balance by receiving modulation of both fast gamma and slow gamma, but with distinct phase-coding modes (fast gamma phase locking and slow gamma phase precession) in a temporally coordinated manner.  We have discussed this possibility in the Discussion section (Lines 415- 428).

      (5) There are several questions on the description of methods, which could be addressed to clarify or strengthen the conclusions.

      (i) Were the identified fast- and slow-gamma episodes mutually exclusive?

      Yes, the fast- and slow-gamma episodes are mutually exclusive. We have added descriptions in the “Detection of gamma episodes” section in the Methods part (Lines 538-550).

      (ii) Was the task novel when the data were acquired? How many days (from the 1st day of the task) were included in the analysis? When the development of the theta sequence was mentioned, did it mean the development in a novel environment, in a novel task, or purely in a sense of early laps (Lap 1, 2) on each day?

      We thank the reviewer for pointing this out.  The task was not novel to rats in this dataset, because only days with good enough recording quality for sequence decoding were included in this paper, which were about day2-day10 for each rat.  However, we still observed the process of sequence formation because of the rat’s exploration interest during early laps.  Thus, when the development of the theta sequence was mentioned, it meant a sense of early laps on each day.

      (iii) How were the animals' behavioral parameters equalized between early and later laps? For example, speed or head direction could potentially produce the differences in theta sequences.

      This is a very good point.  In terms of the effect of running speed on theta sequences, we quantified the running speeds during theta sequences across trials 1-5.  We found that the rats were running at stable running speed, which has been reported in Fig.3F.  In terms of the effect of head direction on theta sequences, we measured the angle difference between head direction and running direction.  We found that the angle difference for each lap was distributed around 0, with no significant difference across laps (Fig.S3, Watson-Williams multi-sample test, F(4,55)=0.2, p=0.9, partial η<sup>2</sup>= 0.01).  These results indicate that the differences in theta sequences across trials cannot be interpreted by the variability of behavioral parameters.  We have updated these results and corresponding methods in the revised manuscript (Lines 172-175, Lines 507-511, with a new Fig.S3).

      Reviewer #2 (Public Review):

      This manuscript addresses an important question that has not yet been solved in the field, what is the contribution of different gamma oscillatory inputs to the development of "theta sequences" in the hippocampal CA1 region? Theta sequences have received much attention due to their proposed roles in encoding short-term behavioral predictions, mediating synaptic plasticity, and guiding flexible decision-making. Gamma oscillations in CA1 offer a readout of different inputs to this region and have been proposed to synchronize neuronal assemblies and modulate spike timing and temporal coding. However, the interactions between these two important phenomena have not been sufficiently investigated. The authors conducted place cell and local field potential (LFP) recordings in the CA1 region of rats running on a circular track. They then analyzed the phase locking of place cell spikes to slow and fast gamma rhythms, the evolution of theta sequences during behavior, and the interaction between these two phenomena. They found that place cells with the strongest modulation by fast gamma oscillations were the most important contributors to the early development of theta sequences and that they also displayed a faster form of phase precession within slow gamma cycles nested with theta. The results reported are interesting and support the main conclusions of the authors. However, the manuscript needs significant improvement in several aspects regarding data analysis, description of both experimental and analytical methods, and alternative interpretations, as I detail below.

      • The experimental paradigm and recordings should be explained at the beginning of the Results section. Right now, there is no description whatsoever which makes it harder to understand the design of the study.

      We thank the reviewer for this kind suggestion.  The description of experimental paradigm and recordings has been added to the beginning of the results section (Lines 114-119).

      • An important issue that needs to be addressed is the very small fraction of CA1 cells phased-locked to slow gamma rhythms (3.7%). This fraction is much lower than in many previous studies, that typically report it in the range of 20-50%. However, this discrepancy is not discussed by the authors. This needs to be explained and additional analysis considered. One analysis that I would suggest, although there are also other valid approaches, is to, instead of just analyzing the phase locking in two discrete frequency bands, compute the phase locking will all LFP frequencies from 25-100 Hz. This will offer a more comprehensive and unbiased view of the gamma modulation of place cell firing. Alternative metrics to mean vector length that is less sensitive to firing rates, such as pairwise phase consistency index (Vinck et a., Neuroimage, 2010), could be implemented. This may reveal whether the low fraction of phase-locked cells could be due to a low number of spikes entering the analysis.

      We thank the reviewer for this constructive suggestion.  A previous work also on Long-Evans rats showed that the proportion of slow gamma phase-locked cells during novelty exploration was ~20%, however it dropped to ~10% during familiar exploration (Fig.4E, Kitanishi et al., 2015 Neuron).  This suggests that the proportion of slow gamma phase-locked cells may decreased with familiarity of the environment, which supports our data.  In addition, we also calculated the pairwise phase consistency index in terms of the effect of spike counts on MVL.  We could observe that the tendency of PPC (Author response image 2A) and MVL (Author response image 2B) along frequency bands were consistent across different subsets of cells, suggesting that the determination of cell subsets by MVL metric was not biased by the low number of spikes.  These results further shed light to the contribution of slow gamma phase precession of place cells to theta sequence development.

      Author response image 2.

      • From the methods, it is not clear to me whether the reference LFP channel was consistently selected to be a different one that where the spikes analyzed were taken. This is the better practice to reduce the contribution of spike leakage that could substantially inflate the coupling with faster gamma frequencies. These analyses need to be described in more detail.

      We thank the reviewer for pointing this out.  In the main manuscript, we used local LFPs as the cells were recorded from the same tetrode.  In addition, we selected an individual tetrode which located at stratum pyramidale and at the center of the drive bundle for each rat.  We detected a similar proportion of FG-cells by using LFPs on this tetrode, compared with that using local LFPs (Author response image 3A-B, Chi-squared test, χ<sup>2</sup>= 0.9, p=0.4, Cramer V=0.03).  We further found that the PPC measurement of FG- and NFG-cells were different at fast gamma band by using central LFPs (Author response image 3D), consistent with that by using local LFPs (Author response image 3C).  Therefore, these results suggest that the findings related to fast gamma was not due to the contribution of spike leakage in the local LFPs.  We have updated the description in the manuscript (Lines 553-557, 566-568).

      Author response image 3.

      • The initial framework of the authors of classifying cells into fast gamma and not fast gamma modulated implies a bimodality that may be artificial. The authors should discuss the nuances and limitations of this framework. For example, several previous work has shown that the same place cell can couple to different gamma oscillations (e.g., Lastoczni et al., Neuron, 2016; Fernandez-Ruiz et al., Neuron, 2017; Sharif et al., Neuron,2021).

      We thank the reviewer for this kind suggestion.  We have cited these references and discussed the possibility of bimodal phase-locking in the manuscript (Lines 430-433).

      • It would be useful to provide a more thorough characterization of the physiological properties of FG and NFG cells, as this distinction is the basis of the paper. Only very little characterization of some place cell properties is provided in Figure 5. Important characteristics that should be very feasible to compare include average firing rate, burstiness, estimated location within the layer (i.e., deep vs superficial sublayers) and along the transverse axis (i.e., proximal vs distal), theta oscillation frequency, phase precession metrics (given their fundamental relationship with theta sequences), etc.

      We thank the reviewer for this constructive suggestion.  In addition to the characterizations shown in Fig5, we also analyzed firing rate, anatomical location and theta modulation to compare the physiological properties of FG- and NFG-cells.

      In terms of the firing properties of both types of cells, we found that the mean firing rate of FG-cell was higher than NFG-cell (Fig. 5A, Student's t-test, t(22) = 2.1, p = 0.04, Cohen's D = 0.9), which was consistent with the previous study that the firing rate was higher during fast gamma than during slow gamma (Zheng et al., 2015 Hippocampus).  However, the spike counts of excluded FG- and NFG-cells for decoding were similar (Fig. 5B, Student's t-test, t(22) = 1.2, p = 0.3, Cohen's D = 0.5), suggesting that the differences found in theta sequences cannot be accounted for by different decoding quality related to spike counts.  In addition, we measured the burstiness based on the distribution of inter-spike-intervals, and we found that the bursting probability of spikes was not significantly different between FG and NFG cells (Author response image 4A, Student's t-test, t(22) = 0.6, p=0.5, Cohen's d=0.3).

      In terms of theta modulation of cells, we first compared the theta frequency related to the firing of FG and NFG cells.  We detected the instantaneous theta frequency at each spike timing of FG and NFG cells, and found that it was not significantly different between cell types (Author response image 4B, Student's t-test, t(22) = -0.5, p=0.6, Cohen's d=0.2).  In addition, we found the proportion of cells with significant theta phase precession was greater in FG-cells than in NFG-cells (Fig. S2E).  However, the slope and starting phase of theta phase precession was not significantly different between FG and NFG cells (Author response image 4C, Student's t-test, t(21) = 0.3, p=0.8, Cohen's d=0.1; Author response image 4D, Watson-Williams test, F(1,21)=0.5, p=0.5, partial η<sup>2</sup>=0.02).

      In terms of the anatomical location of FG and NFG cells, we identified tetrode traces in slices for each cell.  We found that both FG and NFG cells were recorded from the deep layer of dorsal CA1, with no difference of proportions between cell types (Author response image 4E, Chi-squared test, χ<sup>2</sup>=0.5, p=0.5, Cramer V=0.05).  The distribution of FG-cells he NFG-cells along the transverse axis was also similar between cell types (Author response image 4F, χ<sup>2</sup>=0.08, p=0.8, Cramer V=0.02).

      Author response image 4.

      • It is not clear to me how the analysis in Figure 6 was performed. In Figure 6B I would think that the grey line should connect with the bottom white dot in the third panel, which would be the interpretation of the results.

      We thank the reviewer for raising this good point.  The grey line was just for intuitional observation, not a quantitative analysis.  We have removed the grey lines from all heat maps in Fig.6.

      Reviewer #3 (Public Review):

      [Editors' note: This review contains many criticisms that apply to the whole sub-field of slow/fast gamma oscillations in the hippocampus, as opposed to this particular paper. In the editors' view, these comments are beyond the scope of any single paper. However, they represent a view that, if true, should contextualise the interpretation of this paper and all papers in the sub-field. In doing so, they highlight an ongoing debate within the broader field.]

      Summary:

      The authors aimed to elucidate the role of dynamic gamma modulation in the development of hippocampal theta sequences, utilizing the traditional framework of "two gammas," a slow and a fast rhythm. This framework is currently being challenged, necessitating further analyses to establish and secure the assumed premises before substantiating the claims made in the present article.

      The results are too preliminary and need to integrate contemporary literature. New analyses are required to address these concerns. However, by addressing these issues, it may be possible to produce an impactful manuscript.

      We thank the reviewer for raising these important questions in the hippocampal gamma field.  We have done a lot of new analyses according to the comments to strengthen our manuscript.

      I. Introduction

      Within the introduction, multiple broad assertions are conveyed that serve as the premise for the research. However, equally important citations that are not mentioned potentially contradict the ideas that serve as the foundation. Instances of these are described below:

      (1) Are there multiple gammas? The authors launched the study on the premise that two different gamma bands are communicated from CA3 and the entorhinal cortex. However, recent literature suggests otherwise, offering that the slow gamma component may be related to theta harmonics:

      From a review by Etter, Carmichael and Williams (2023)

      "Gamma-based coherence has been a prominent model for communication across the hippocampal-entorhinal circuit and has classically focused on slow and fast gamma oscillations originating in CA3 and medial entorhinal cortex, respectively. These two distinct gammas are then hypothesized to be integrated into hippocampal CA1 with theta oscillations on a cycle-to-cycle basis (Colgin et al., 2009; Schomburg et al., 2014). This would suggest that theta oscillations in CA1 could serve to partition temporal windows that enable the integration of inputs from these upstream regions using alternating gamma waves (Vinck et al., 2023). However, these models have largely been based on correlations between shifting CA3 and medial entorhinal cortex to CA1 coherence in theta and gamma bands. In vivo, excitatory inputs from the entorhinal cortex to the dentate gyrus are most coherent in the theta band, while gamma oscillations would be generated locally from presumed local inhibitory inputs (Pernía-Andrade and Jonas, 2014). This predominance of theta over gamma coherence has also been reported between hippocampal CA1 and the medial entorhinal cortex (Zhou et al., 2022). Another potential pitfall in the communication-through-coherence hypothesis is that theta oscillations harmonics could overlap with higher frequency bands (Czurkó et al., 1999; Terrazas et al., 2005), including slow gamma (Petersen and Buzsáki, 2020). The asymmetry of theta oscillations (Belluscio et al., 2012) can lead to harmonics that extend into the slow gamma range (Scheffer-Teixeira and Tort, 2016), which may lead to a misattribution as to the origin of slow-gamma coherence and the degree of spike modulation in the gamma range during movement (Zhou et al., 2019)."

      And from Benjamin Griffiths and Ole Jensen (2023)

      "That said, in both rodent and human studies, measurements of 'slow' gamma oscillations may be susceptible to distortion by theta harmonics [53], meaning open questions remain about what can be attributed to 'slow' gamma oscillations and what is attributable to theta."

      This second statement should be heavily considered as it is from one of the original authors who reported the existence of slow gamma.

      Yet another instance from Schomburg, Fernández-Ruiz, Mizuseki, Berényi, Anastassiou, Christof Koch, and Buzsáki (2014):

      "Note that modulation from 20-30 Hz may not be related to gamma activity but, instead, reflect timing relationships with non-sinusoidal features of theta waves (Belluscio et al., 2012) and/or the 3rd theta harmonic."

      One of this manuscript's authors is Fernández-Ruiz, a contemporary proponent of the multiple gamma theory. Thus, the modulation to slow gamma offered in the present manuscript may actually be related to theta harmonics.

      With the above emphasis from proponents of the slow/fast gamma theory on disambiguating harmonics from slow gamma, our first suggestion to the authors is that they A) address these statements (citing the work of these authors in their manuscript) and B) demonstrably quantify theta harmonics in relation to slow gamma prior to making assertions of phase relationships (methodological suggestions below). As the frequency of theta harmonics can extend as high as 56 Hz (PMID: 32297752), overlapping with the slow gamma range defined here (25-45 Hz), it will be important to establish an approach that decouples the two phenomena using an approach other than an arbitrary frequency boundary.

      We agree with the reviewer that the theta oscillations harmonics could overlap with higher frequency bands including slow gamma, as the above reviews discussed.  In order to rule out the possibility of theta harmonics effects in this study, we added new analyses in this letter (see below).

      (2) Can gammas be segregated into different lamina of the hippocampus? This idea appears to be foundational in the premise of the research but is also undergoing revision.

      As discussed by Etter et al. above, the initial theory of gamma routing was launched on coherence values. However, the values reported by Colgin et al. (2009) lean more towards incoherence (a value of 0) rather than coherence (1), suggesting a weak to negligible interaction. Nevertheless, this theory is coupled with the idea that the different gamma frequencies are exclusive to the specific lamina of the hippocampus.

      Recently, Deschamps et al. (2024) suggested a broader, more nuanced understanding of gamma oscillations than previously thought, emphasizing their wide range and variability across hippocampal layers. This perspective challenges the traditional dichotomy of gamma sub-bands (e.g., slow vs. medium gamma) and their associated cognitive functions based on a more rigid classification according to frequency and phase relative to the theta rhythm. Moreover, they observed all frequencies across all layers.

      Similarly, the current source density plots from Belluscio et al. (2012) suggest that SG and FG can be observed in both the radiatum and lacunosum-moleculare.

      Therefore, if the initial coherence values are weak to negligible and both slow and fast gamma are observed in all layers of the hippocampus, can the different gammas be exclusively related to either anatomical inputs or psychological functions (as done in the present manuscript)? Do these observations challenge the authors' premise of their research? At the least, please discuss.

      We thank the reviewer for raising this point, which I believe still remains controversial in this field.  We also thank the reviewer for providing detailed proofs of existence forms of gamma rhythms.  The reviewer was considering 2 aspects of gamma: 1) the reasonability of dividing slow and fast gamma by specific frequency bands; 2) the existence of gamma across all hippocampal layers, which challenged the functional significance of different types of gamma rhythms.  Although the results in Douchamps et al., 2024 challenged the idea of rigid gamma sub-bands, we still could see separate slow and fast gamma components exclusively occurred along time course, with central frequency of slow gamma lower than ~60Hz and central frequency of fast gamma higher than ~60Hz (Fig.1b of Douchamps et al., 2024).  This was also seen in the rat dataset of this reference (Fig. S3).  Since their behavioral test required both memory encoding and retrieval processes, it was hard to distinguish the role of different gamma components as they may dynamically coordinate during complex memory process.  Thus, although the behavioral performance can be decoded from broad range of gamma, we still cannot deny the existence of difference gamma rhythms and their functional significance during difference memory phases.

      (3) Do place cells, phase precession, and theta sequences require input from afferent regions? It is offered in the introduction that "Fast gamma (~65-100Hz), associated with the input from the medial entorhinal cortex, is thought to rapidly encode ongoing novel information in the context (Fernandez-Ruiz et al., 2021; Kemere, Carr, Karlsson, & Frank, 2013; Zheng et al., 2016)".

      CA1 place fields remain fairly intact following MEC inactivation include Ipshita Zutshi, Manuel Valero, Antonio Fernández-Ruiz , and György Buzsáki (2022)- "CA1 place cells and assemblies persist despite combined mEC and CA3 silencing" and from Hadas E Sloin, Lidor Spivak, Amir Levi, Roni Gattegno, Shirly Someck, Eran Stark (2024) - "These findings are incompatible with precession models based on inheritance, dual-input, spreading activation, inhibition-excitation summation, or somato-dendritic competition. Thus, a precession generator resides locally within CA1."

      These publications, at the least, challenge the inheritance model by which the afferent input controls CA1 place field spike timing. The research premise offered by the authors is couched in the logic of inheritance, when the effect that the authors are observing could be governed by local intrinsic activity (e.g., phase precession and gamma are locally generated, and the attribution to routed input is perhaps erroneous). Certainly, it is worth discussing these manuscripts in the context of the present manuscript.

      We thank the review for this discussion.  The main purpose of our current study is to investigate the mechanism of theta sequence development along with learning, which may or may not dependent on theta phase precession of single place cells as it remains controversial in this field.  Also, there is a limitation in this study that all gamma components were recorded from stratum pyramidale, thus we cannot make any conclusion on the originate of gamma in modulating sequence development.

      II. Results

      (1) Figure 2-

      a. There is a bit of a puzzle here that should be discussed. If slow and fast frequencies modulate 25% of neurons, how can these rhythms serve as mechanisms of communication/support psychological functions? For instance, if fast gamma is engaged in rapid encoding (line 72) and slow gamma is related to the integration processing of learned information (line 84), and these are functions of the hippocampus, then why do these rhythms modulate so few cells? Is this to say 75% of CA1 neurons do not listen to CA3 or MEC input?

      The proportion ~25% was the part of place cells phase-locked to either slow or fast gamma.  However, one of the main findings in this study was that most cells were modulated by slow gamma as they fired at precessed slow gamma phase within a theta cycle (Figs 6-8), which would promote information compression for theta sequence development.  Therefore, we didn’t mean that only a small proportion of cells were modulated by gamma rhythms and contributed to this process.

      b. Figure 2. It is hard to know if the mean vector lengths presented are large or small. Moreover, one can expect to find significance due to chance. For instance, it is challenging to find a frequency in which modulation strength is zero (please see Figure 4 of PMID: 30428340 or Figure 7 of PMID: 31324673).

      i. Please construct the histograms of Mean Vector Length as in the above papers, using 1 Hz filter steps from 1-120Hz and include it as part of Figure 2 (i.e., calculate the mean vector length for the filtered LFP in steps of 1-2 Hz, 2-3 Hz, 3-4 Hz,... etc). This should help the authors portray the amount of modulation these neurons have relative to the theta rhythm and other frequencies. If the theta mean vector length is higher, should it be considered the primary modulatory influence of these neurons (with slow and fast gammas as a minor influence)?

      We thank the review for this suggestion.  We measured the mean vector length at 5Hz step (equivalent to 1Hz step), and we found that the FG-cells were phase-locked to fast gamma rhythms even stronger than that to theta (Author response image 2B, mean MVL of theta=0.126±0.007, mean MVL of theta=0.175±0.006, paired t-test, t(112)=-5.9, p=0.01, Cohen's d=0.7).  In addition, in some previous studies with significant fast gamma phase locking, the MVL values were around 0.15 by using broad gamma band (Kitanishi et al., 2015 Neuron, Lasztóczi et al., 2016 Neuron, Tomar et al., 2021 Front Behav Neurosci, and Asiminas et al., 2022 Molecular Autism), which was consistent with the value in this study.  Therefore, we don’t believe that fast gamma was only a minor influence of these neurons.

      ii. It is possible to infer a neuron's degree of oscillatory modulation without using the LFP. For instance, one can create an ISI histogram as done in Figure 1 here (https://www.biorxiv.org/content/10.1101/2021.09.20.461152v3.full.pdf+html; "Distinct ground state and activated state modes of firing in forebrain neurons"). The reciprocal of the ISI values would be "instantaneous spike frequency". In favor of the Douchamps et al. (2024) results, the figure of the BioRXiV paper implies that there is a single gamma frequency modulate as there is only a single bump in the ISIs in the 10^-1.5 to 10^-2 range. Therefore, to vet the slow gamma results and the premise of two gammas offered in the introduction, it would be worth including this analysis as part of Figure 2.

      By using suggested method, we calculated the ISI distribution on log scale for FG-cells and NFG-cells during behavior (Author response image 5).  We could observe that the ISI distribution of FG-cells had a bump in the 10<sup>-1.5</sup>= to 10<sup>-2</sup>= range (black bar), in particular in the fast gamma range (10<sup>-2</sup>= to 10<sup>-1.8</sup>=).

      Author response image 5.

      c. There are some things generally concerning about Figure 2.

      i. First, the raw trace does not seem to have clear theta epochs (it is challenging to ascertain the start and end of a theta cycle). Certainly, it would be worth highlighting the relationship between theta and the gammas and picking a nice theta epoch.

      We thank the review for this suggestion.  We've updated this figure with a nice theta epoch in the revised manuscript.

      ii. Also, in panel A, there looks to be a declining amplitude relationship between the raw, fast, and slow gamma traces, assuming that the scale bars represent 100uV in all three traces. The raw trace is significantly larger than the fast gamma. However, this relationship does not seem to be the case in panel B (in which both the raw and unfiltered examples of slow and fast gamma appear to be equal; the right panels of B suggest that fast gamma is larger than slow, appearing to contradict the A= 1/f organization of the power spectral density). Please explain as to why this occurs. Including the power spectral density (see below) should resolve some of this.

      We thank the review for pointing this out.  The scales of y-axis of LFPs tracs in Fig.2B was not consistent, which mislead the comparison of amplitude between slow and fast gamma.  We have unified y axis scales across different gamma types in the revised manuscript.  Moreover, we also have replaced these examples with more typical ones (also see the response below).

      iii. Within the example of spiking to phase in the left side of Panel B (fast gamma example)- the neuron appears to fire near the trough twice, near the peak twice, and somewhere in between once. A similar relationship is observed for the slow gamma epoch. One would conclude from these plots that the interaction of the neuron with the two rhythms is the same. However, the mean vector lengths and histograms below these plots suggest a different story in which the neuron is modulated by FG but not SG. Please reconcile this.

      We thank the review for pointing this out.  We found that the fast gamma phase locking was robust across FG-cells with fast gamma peak as the preferred phase.  Therefore, we have replaced these examples with more typical ones, so that the examples were consistent with the group effect.

      iv. For calculating the MVL, it seems that the number of spikes that the neuron fires would play a significant role. Working towards our next point, there may be a bias of finding a relationship if there are too few spikes (spurious clustering due to sparse data) and/or higher coupling values for higher firing rate cells (cells with higher firing rates will clearly show a relationship), forming a sort of inverse Yerkes-Dodson curve. Also, without understanding the magnitude of the MVL relative to other frequencies, it may be that these values are indeed larger than zero, but not biologically significant.

      - Please provide a scatter plot of Neuron MVL versus the Neuron's Firing Rate for 1) theta (7-9 Hz), 2) slow gamma, and 3) fast gamma, along with their line of best fit.

      - Please run a shuffle control where the LFP trace is shifted by random values between 125-1000ms and recalculate the MVL for theta, slow, and fast gamma. Often, these shuffle controls are done between 100-1000 times (see cross-correlation analyses of Fujisawa, Buzsaki et al.).

      - To establish that firing rate does not play a role in uncovering modulation, it would be worth conducting a spike number control, reducing the number of spikes per cell so that they are all equal before calculating the phase plots/MVL.

      We thank the review for raising this point.  Beside of the MVL value, we also calculated the pairwise phase consistency (PPC) as suggested by Reviewer2, which is not sensitive to the spike counts.  We found that the phase locking strength to either rhythm (theta or gamma) was comparable between MVL and PPC measurements (Author response image 2).  Moreover, we quantified the relationship between MVL and mean firing rate, as suggested.  We found that the MVL value for theta, slow gamma and fast gamma was negatively correlated with mean firing rate (Author response image 6, Pearson correlation, theta: R<sup>2</sup>= 0.06, Pearson’s r=-0.3, p=1.3×10<sup>-8</sup>=; slow gamma: R<sup>2</sup>= 0.1, Pearson’s r=-0.4, p=2.4×10<sup>-17</sup>=; fast gamma: R<sup>2</sup>= 0.03, Pearson’s r=-0.2, p=4.3×10<sup>-5</sup>=).  These results help us rule out the concerns of the effect of spikes counts on the phase modulation measurement.

      Author response image 6.

      (2) Something that I anticipated to see addressed in the manuscript was the study from Grosmark and Buzsaki (2016): "Cell assembly sequences during learning are "replayed" during hippocampal ripples and contribute to the consolidation of episodic memories. However, neuronal sequences may also reflect preexisting dynamics. We report that sequences of place-cell firing in a novel environment are formed from a combination of the contributions of a rigid, predominantly fast-firing subset of pyramidal neurons with low spatial specificity and limited change across sleep-experience-sleep and a slow-firing plastic subset. Slow-firing cells, rather than fast-firing cells, gained high place specificity during exploration, elevated their association with ripples, and showed increased bursting and temporal coactivation during postexperience sleep. Thus, slow- and fast-firing neurons, although forming a continuous distribution, have different coding and plastic properties."

      My concern is that much of the reported results in the present manuscript appear to recapitulate the observations of Grosmark and Buzsaki, but without accounting for differences in firing rate. A parsimonious alternative explanation for what is observed in the present manuscript is that high firing rate neurons, more integrated into the local network and orchestrating local gamma activity (PING), exhibit more coupling to theta and gamma. In this alternative perspective, it's not something special about how the neurons are entrained to the routed fast gamma, but that the higher firing rate neurons are better able to engage and entrain their local interneurons and, thus modulate local gamma. However, this interpretation challenges the discussion around the importance of fast gamma routed from the MEC.

      a. Please integrate the Grosmark & Buzsaki paper into the discussion.

      b. Also, please provide data that refutes or supports the alternative hypothesis in which the high firing rate cells are just more gamma modulated as they orchestrate local gamma activity through monosynaptic connections with local interneurons (e.g., Marshall et al., 2002, Hippocampal pyramidal cell-interneuron spike transmission is frequency dependent and responsible for place modulation of interneuron discharge). Otherwise, the attribution to a MEC routed fast gamma routing seems tenuous.

      c. It is mentioned that fast-spiking interneurons were removed from the analysis. It would be worth including these cells, calculating the MVL in 1 Hz increments as well as the reciprocal of their ISIs (described above).

      We thank the review for this suggestion.  Because we found the mean firing rate of FG-cells was higher than that of NFG-cells, it would be possible that the FG-cells are mainly overlapped with fast-firing cells (rigid cells) in Grosmark et al., 2016 Science.  Actually, in this study, we aimed to investigate how fast and slow gamma rhythms modulated neurons dynamically during learning, rather than defining new cell types.  Thus, we don’t think this work was just a replication of the previous publication.  We have added this description in the Discussion part (Lines 439-441).  In addition, we don’t have enough number of interneurons to support the analysis between interneurons and place cells.  Therefore, we couldn’t make any statement about where was the fast gamma originated (CA1 locally or routed from MEC) in this study.

      (3) Methods - Spectral decomposition and Theta Harmonics.

      a. It is challenging to interpret the exact parameters that the authors used for their multi-taper analysis in the methods (lines 516-526). Tallon-Baudry et al., (1997; Oscillatory γ-Band (30-70 Hz) Activity Induced by a Visual Search Task in Humans) discuss a time-frequency trade-off where frequency resolution changes with different temporal windows of analysis. This trade-off between time and frequency resolution is well known as the uncertainty principle of signal analysis, transcending all decomposition methods. It is not only a function of wavelet or FFT, and multi-tapers do not directly address this. (The multitaper method, by using multiple specially designed tapers -like the Slepian sequences- smooths the spectrum. This smoothing doesn't eliminate leakage but distributes its impact across multiple estimates). Given the brevity of methods and the issues of theta harmonics as offered above, it is worth including some benchmark trace testing for the multi-taper as part of the supplemental figures.

      i. Please spectrally decompose an asymmetric 8 Hz sawtooth wave showing the trace and the related power spectral density using the multiple taper method discussed in the methods.

      ii. Please also do the same for an elliptical oscillation (perfectly symmetrical waves, but also capable of casting harmonics). Matlab code on how to generate this time series is provided below:

      A = 1; % Amplitude

      T = 1/8; % Period corresponding to 8 Hz frequency

      omega = 2*pi/T; % Angular frequency

      C = 1; % Wave speed

      m = 0.9; % Modulus for the elliptic function (0<m<1 for cnoidal waves)

      x = linspace(0, 2*pi, 1000); % temporal domain

      t = 0; % Time instant

      % Calculate B based on frequency and speed

      B = sqrt(omega/C);

      % Cnoidal wave equation using the Jacobi elliptic function

      u = A .* ellipj(B.*(x - C*t), m).^2;

      % Plotting the cnoidal wave

      figure;

      plot(x./max(x), u);

      title('8 Hz Cnoidal Wave');

      xlabel('time (x)');

      ylabel('Wave amplitude (u)');

      grid on;

      The Symbolic Math Toolbox needs to be installed and accessible in your MATLAB environment to use ellipj. Otherwise, I trust that, rather than plotting a periodic orbit around a circle (sin wave) the authors can trace the movement around an ellipse with significant eccentricity (the distance between the two foci should be twice the distance between the co-vertices).

      We thank the review for this suggestion.  In the main text of manuscript, we only applied Morlet's wavelet method to calculate the time varying power of rhythms.  Multitaper method was used for the estimation of power spectra across running speeds, which was shown in the manuscript.  Therefore, we removed the description of Multitaper method and updated the Morlet's wavelet power spectral analysis in the Methods (Lines 541-544).

      As suggested, we estimated the power spectral densities of 8 Hz sawtooth and elliptical oscillation by using these methods, and compared them with the results from FFT.  We found that both the Multitaper's and Morlet's wavelet methods could well capture the 8Hz oscillatory components (Author response image 7).  However, we could observe harmonic components from FFT spectrum.

      Author response image 7.

      iii. Line 522: "The power spectra across running speeds and absolute power spectrum (both results were not shown).". Given the potential complications of multi-taper discussed above, and as each convolution further removes one from the raw data, it would be the most transparent, simple, and straightforward to provide power spectra using the simple fft.m code in Matlab (We imagine that the authors will agree that the results should be robust against different spectral decomposition methods. Otherwise, it is concerning that the results depend on the algorithm implemented and should be discussed. If gamma transience is a concern, the authors should trigger to 2-second epochs in which slow/fast gamma exceeds 3-7 std. dev. above the mean, comparing those resulting power spectra to 2-second epochs with ripples - also a transient event). The time series should be at least 2 seconds in length (to avoid spectral leakage issues and the issues discussed in Talon-Baudry et al., 1997 above).

      Please show the unmolested power spectra (Y-axis units in mV2/Hz, X-axis units as Hz) as a function of running speed (increments of 5 cm/s) for each animal. I imagine three of these PSDs for 3 of the animals will appear in supplemental methods while one will serve as a nice manuscript figure. With this plot, please highlight the regions that the authors are describing as theta, slow, and fast gamma. Also, any issues should be addressed should there be notable differences in power across animals or tetrodes (issues with locations along proximal-distal CA1 in terms of MEC/LEC input and using a local reference electrode are discussed below).

      As suggested, we firstly estimated the power spectra as a function of running speeds in each running lap, and showed them separately for each rat, by using the multitaper spectral analysis (Author response image 8).  In addition, to achieve unmolested power spectra, the short-time Fourier transform (STFT) was used for this analysis at the same frequency resolution (Author response image 9).  We could see that the power spectra were consistent between these two methods.  Notably, there seems no significant theta harmonic component in the slow gamma band range.

      The multitaper spectral analysis was performed as follows.  The power spectra were measured across different running speeds as described previously (Ahmed et al., 2012 J Neurosci; Zheng et al., 2015 Hippocampus; Zheng et al., 2016 eNeuro).  Briefly, the absolute power spectrum was calculated for 0.5s moving window and 0.2s step size of the LFPs recordings each lap, using the multitaper spectral analysis in the Chronux toolbox (Mitra and Bokil, 2008, http://chronux.org/) and STFT spectral analysis in Matlab script stft.m.  In the multitaper method, the time-bandwidth product parameter (TW) was set at 3, and the number of tapers (K) was set at 5.  In the STFT method, the FFT length was set at 2048, which was equivalent with the parameters used in multitaper method.  Running speed was calculated (see “Estimation of running speed and head direction” section in the manuscript) and averaged within each 0.5s time window corresponding to the LFP segments.  Then, the absolute power at each frequency was smoothed with a Gaussian kernel centered on given speed bin.  The power spectral as a function of running speed and frequency were plotted in log scale.  Also, the colormap was in log scale, allowing for comparisons across different frequencies that would otherwise be difficult due to the 1/f decay of power in physiological signals.

      Author response image 8.

      Author response image 9.

      iv. Schomberg and colleagues (2014) suggested that the modulation of neurons in the slow gamma range could be related to theta harmonics (see above). Harmonics can often extend in a near infinite as they regress into the 1/f background (contributing to power, but without a peak above the power spectral density slope), making arbitrary frequency limits inappropriate. Therefore, in order to support the analyses and assertions regarding slow gamma, it seems necessary to calculate a "theta harmonic/slow gamma ratio". Aru et al. (2015; Untangling cross-frequency coupling in neuroscience) offer that: " The presence of harmonics in the signal should be tested by a bicoherence analysis and its contribution to CFC should be discussed." Please test both the synthetic signals above and the raw LFP, using temporal windows of greater than 4 seconds (again, the large window optimizes for frequency resolution in the time-frequency trade-off) to calculate the bicoherence. As harmonics are integers of theta coupled to itself and slow gamma is also coupled to theta, a nice illustration and contribution to the field would be a method that uses the bispectrum to isolate and create a "slow gamma/harmonic" ratio.

      We thank the reviewer for providing the method regarding on the theta harmonics.  We firstly measured the theta harmonics on the synthesized signal by using the biphasic coherence method, and we could clearly observe the nonlinear coupling between theta rhythm and its harmonics (Author response image 10).

      Author response image 10.

      In addition, we also measured the bicoherence on raw traces during slow gamma episodes.  We did not see nonlinear coupling between slow gamma and theta bands in this real data (mean bicoherence=0.1±0.0002) compared with that in the synthesized signal (mean bicoherence=0.7 for elliptical waves and 0.5 for sawtooth waves), suggesting that the slow gamma detected in this study was not pure theta harmonic (Author response image 11C, F, I, in red boxes).  Therefore, we believe that the contribution of theta harmonic in slow gamma is not significant.

      Author response image 11.

      (4) I appreciate the inclusion of the histology for the 4 animals. Knerim and colleagues describe a difference in MEC projection along the proximal-distal axis of the CA1 region (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3866456/)- "There are also differences in their direct projections along the transverse axis of CA1, as the LEC innervates the region of CA1 closer to the subiculum (distal CA1), whereas the MEC innervates the region of CA1 closer to CA2 and CA3 (proximal CA1)" From the histology, it looks like some of the electrodes are in the part of CA1 that would be dominated by LEC input while a few are closer to where the MEC would project.

      a. How do the authors control for these differences in projections? Wouldn't this change whether or not fast gamma is observed in CA1?

      b. I am only aware of one manuscript that describes slow gamma in the LEC which appeared in contrast to fast gamma from the MEC (https://www.science.org/doi/10.1126/science.abf3119). One would surmise that the authors in the present manuscript would have varying levels of fast gamma in their CA1 recordings depending on the location of the electrodes in the Proximal-distal axis, to the extent that some of the more medial tetrodes may need to be excluded (as they should not have fast gamma, rather they should be exclusively dominated by slow gamma). Alternatively, the authors may find that there is equal fast gamma power across the entire proximal-distal axis. However, this would pose a significant challenge to the LEC/slow gamma and MEC/fast gamma routing story of Fernandez-Ruiz et al. and require reconciliation/discussion.

      c. Is there a difference in neuron modulation to these frequencies based on electrode location in CA1?

      We thank the reviewer for this concern, which was also raised by Reviewer2.  We aligned the physical location of LFP channels in the proximal-distal axis based on histology.  In our dataset, only 2 rats were recorded from both distal and proximal hippocampus, so we calculated the gamma power from both sites in these rats.  We found that slow power was higher from proximal tetrodes than that from distal tetrodes (Author response image 12, repeated measure ANOVA, F(1,7)=10.2, p=0.02, partial η <sup>2</sup>=0.8).  However, fast gamma power were similar between different recording sites (F(1,7)=0.008, p=0.9, partial η <sup>2</sup>=0.001).  These results are partially consistent with the LEC/slow gamma and MEC/fast gamma routing story of Fernandez-Ruiz’s work.  The main reason would be that all LFPs were recorded from tetrodes in stratum pyramidale, deep layer in particular (Author response image 4E), so that it was hard to precisely identify their distance to distal/proximal apical dendrites.

      Author response image 12.

      In terms of the anatomical location of FG and NFG cells, we identified tetrode traces in slices for each cell.  We found that both FG and NFG cells were recorded from the deep layer of dorsal CA1, with no difference of proportions between cell types (Author response image 4E, Chi-squared test, χ<sup>2</sup>=0.5, p=0.5, Cramer V=0.05).  The distribution of FG-cells he NFG-cells along the transverse axis was also similar between cell types (Author response image 4F, χ<sup>2</sup>=0.08, p=0.8, Cramer V=0.02).

      (5) Given a comment in the discussion (see below), it will be worth exploring changes in theta, theta harmonic, slow gamma, and fast gamma power with running speed as no changes were observed with theta sequences or lap number versus. Notably, Czurko et al., report an increase in theta and harmonic power with running speed (1999) while Ahmed and Mehta (2012) report a similar effect for gamma.

      a. Please determine if the oscillations change in power and frequency of the rhythms discussed above change with running speed using the same parameters applied in the present manuscript. The specific concern is that how the authors calculate running speed is not sensitive enough to evaluate changes.

      We thank the reviewer for this suggestion.  The description of running speed quantification has been updated in the Method (see “Estimation of running speed and head direction” section, Lines 501-511).  Overall, the sample frequency of running speed was25Hz which would be sensitive enough to evaluate the behavioral changes.

      By measuring the rhythmic power changing as a function of running speed (Author response image 8 and Author response image 9), we could observe that theta power was increased as running speed getting higher.  Consistent with the results in (Ahmed and Mehta, 2012) and our previous study (Zheng et al., 2015), the fast gamma power was increasing and slow gamma power was decreasing when running speed was getting high.

      In addition, we also estimated the rhythmic frequency as a function of running speed in the slow and fast episodes respectively.  We found that fast gamma frequency was increased with running speed (Author response image 13, linear regression, R<sup>2</sup>=0.4, corr=0.6, p=9.9×10<sup>-15</sup>), whereas slow gamma frequency was decreased with running speed (R<sup>2</sup>=0.2, corr=-0.4, p=8.8×10<sup>-6</sup>).  Although significant correlation was found between gamma frequency and running speed, consistent with the previous studies, the frequency change (~70-75Hz for fast gamma and ~30-28Hz for slow gamma) was not big enough to affect the sequence findings in this study.  In additiontheta frequency was maintained in either slow episodes (R<sup>2</sup>=0.02, corr=-0.1, p=0.1) or fast episodes (R<sup>2</sup>=0.004, corr=0.06, p=0.5), consistent with results in Fig.1G of Kropff et al., 2021 Neuron.

      Author response image 13.

      b. It is astounding that animals ran as fast as they did in what appears to be the first lap (Figure 3F), especially as rats' natural proclivity is thigmotaxis and inquisitive exploration in novel environments. Can the authors expand on why they believe their rats ran so quickly on the first lap in a novel environment and how to replicate this? Also, please include the individual values for each animal on the same plot.

      We thank the reviewer for pointing this out.  The task was not brand new to rats in this dataset, because only days with good enough recording quality for sequence decoding were included in this paper, which were about day2-day10 for each rat.  However, we still observed the process of sequence formation because of the rat’s exploration interest during early laps.  Thus, in terms exploration behaviors, the rats ran at relative high speeds across laps (Author response image 14, each gray line represents the running speed within an individual session).

      Author response image 14.

      c. Can the authors explain how the statistics on line 169 (F(4,44)) work? Specifically, it is challenging to determine how the degrees of freedom were calculated in this case and throughout if there were only 4 animals (reported in methods) over 5 laps (depicted in Figure 3F. Given line 439, it looks like trials and laps are used synonymously). Four animals over 5 laps should have a DOF of 16.

      This statistic result was performed with each session/day as a sample (n=12 sessions/days).  The statistics were generated by repeated measures ANOVA on 5 trials in 12 sessions, with a DOF of 44.

      (6) Throughout the manuscript, I am concerned about an inflation of statistical power. For example on line 162, F(2,4844). The large degrees of freedom indicate that the sample size was theta sequences or a number of cells. Since multiple observations were obtained from the same animal, the statistical assumption of independence is violated. Therefore, the stats need to be conducted using a nested model as described in Aarts et al. (2014; https://pubmed.ncbi.nlm.nih.gov/24671065/). A statistical consult may be warranted.

      We thank the reviewer for this suggestion.  We have replaced this statistic result by using generalized linear mixed model with ratID being a covariate.  These results have been updated in the revised manuscript (Lines 164-167).

      (7) It is stated that one tetrode served as a quiet recording reference. The "quiet" part is an assumption when often, theta and gamma can be volume conducted to the cortex (e.g., Sirota et al., 2008; This is often why laboratories that study hippocampal rhythms use the cerebellum for the differential recording electrode and not an electrode in the corpus callosum). Generally, high frequencies propagate as well as low frequencies in the extracellular milieu (https://www.eneuro.org/content/4/1/ENEURO.0291-16.2016). For transparency, the authors should include a limitation paragraph in their discussion that describes how their local tetrode reference may be inadvertently diminishing and/or distorting the signal that they are trying to isolate. Otherwise, it would be worth hearing an explanation as to how the author's approach avoids this issue.

      In terms of the locations of references, we had 2 screws above the cerebellum in the skull connected to the recording drive ground, and 1 tetrode in a quiet area of the cortex serving as the recording reference.  We agree that the theta and gamma can be volume conducted to the cortex which may affect the power of these rhythms in the stratum pyramidale.  However, we didn’t mean to measure or compare the absolute theta or gamma power in this study, as we only cared about the phase modulation of gamma to place cells.  Therefore, we believe the location of recording reference would not make significant effect on our conclusion.

      Apologetically, this review is already getting long. Moreover, I have substantial concerns that should be resolved prior to delving into the remainder of the analyses. e.g., the analyses related to Figure 3-5 assert that FG cells are important for sequences. However, the relationship to gamma may be secondary to either their relationship to theta or, based on the Grosmark and Buzsaki paper, it may just be a phenomenon coupled to the fast-firing cells (fast-firing cells showing higher gamma modulation due to a local PING dynamic). Moreover, the observation of slow gamma is being challenged as theta harmonics, even by the major proponents of the slow/fast gamma theory. Therefore, the report of slow gamma precession would come as an unsurprising extension should they be revealed to be theta harmonics (however, no control for harmonics was implemented; suggestions were made above). Following these amendments, I would be grateful for the opportunity to provide further feedback.

      III. Discussion.

      a. Line 330- it was offered that fast gamma encodes information while slow gamma integrates in the introduction. However, in a task such as circular track running (from the methods, it appears that there is no new information to be acquired within a trial), one would guess that after the first few laps, slow gamma would be the dominant rhythm. Therefore, one must wonder why there are so few neurons modulated by slow gamma (~3.7%).

      The proportion of ~3.7% was the part of place cells phase-locked to slow gamma.  However, we aimed to find that the slow gamma phase precession of place cells promoted the theta sequence development.  We would not expect the cells phase-locked to slow gamma if phase precession occurred.

      b. Line 375: The authors contend that: "...slow gamma, related to information compression, was also required to modulate fast gamma phase-locked cells during sequence development. We replicated the results of slow gamma phase precession at the ensemble level (Zheng et al., 2016), and furthermore observed it at late development, but not early development, of theta sequences." In relation to the idea that slow gamma may be coupled to - if not a distorted representation of - theta harmonics, it has been observed that there are changes in theta relative to novelty.

      i. A. Jeewajee, C. Lever, S. Burton, J. O'Keefe, and N. Burgess (2008) report a decrease in theta frequency in novel circumstances that disappears with increasing familiarity.

      ii. One could surmise that this change in frequency is associated with alterations in theta harmonics (observed here as slow gamma), challenging the author's interpretation.

      iii. Therefore, the authors have a compelling opportunity to replicate the results of Jeewajee et al., characterizing changes of theta along with the development of slow gamma precession, as the environment becomes familiar. It will become important to demonstrate, using bicoherence as offered by Aru et al., how slow gamma can be disambiguated from theta harmonics. Specifically, we anticipate that the authors will be able to quantify A) theta harmonics (the number, and their respective frequencies and amplitudes), B) the frequency and amplitude of slow gamma, and C) how they can be quantitatively decoupled. Through this, their discussion of oscillatory changes with novelty-familiarity will garner a significant impact.

      We think we have demonstrated that the slow gamma observed in this study was not purely theta harmonics.  We didn’t focus on the frequency change of slow gamma or theta rhythms in this study.  Further investigation will be carried out on this topic in the future.

      c. Broadly, it is interesting that the authors emphasize the gamma frequency throughout the discussion. Given that the power spectral density of the Local Field Potential (LFP) exhibits a log-log relationship between amplitude and frequency, as described by Buzsáki (2005) in "Rhythms of the Brain," and considering that the LFP is primarily generated through synaptic transmembrane currents (Buzsáki et al., 2012), it seems parsimonious to consider that the bulk of synaptic activity occurs at lower frequencies (e.g., theta). Since synaptic transmission represents the most direct form of inter-regional communication, one might wonder why gamma (characterized by lower amplitude rhythms) is esteemed so highly compared to the higher amplitude theta rhythm. Why isn't the theta rhythm, instead, regarded as the primary mode of communication across brain regions? A discussion exploring this question would be beneficial.

      We thank the reviewer for this deep thinking.  When stating the conclusion on gamma rhythms, we didn’t mean to weaken the role of theta rhythm.  Conversely, the fast or slow gamma episodes were detected riding on theta rhythms, and we believe that the information compression should occur at a finer scale within a theta cycle scale.  More investigation will be carried out on this topic in the future.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) It is helpful to clearly define "FG-cell sequences" before the relevant results are described in the Results section. More importantly, the seemingly conflicting results between Figure 3 and Figure 8 may need to be clarified.

      The “exFG-sequences and exNFG sequences”, “FG-cell sequences and NFG-cell sequences” have been defined clearly in the revised manuscript.  Moreover, the seemingly conflicting results between Figure 3 and Figure 8 have been interpreted properly.

      (2) It is helpful to clearly state the N and what defines a sample whenever a result is described.

      In each statistical results, the N and what defines a sample have been clarified in the revised manuscript.

      (3) Addressing the questions regarding the methods (#5) would clarify some of the results.

      The questions regarding the Methods part has addressed in the revised manuscript.

      (4) Line #244: "successful" should be "successive"?

      Fixed.

      Reviewer #2 (Recommendations For The Authors):

      - The writing of the manuscript can be substantially improved.

      The manuscript can be substantially revised and updated.

      - I noticed that the last author of the manuscript is not the lead or corresponding and has only provided a limited contribution to this work (according to the detailed author contributions). The second to last author seems to be the main senior intellectual contributor and supervisor, together with the third to last author. This speaks of potential bad academic practices where a senior person whose intellectual contribution to the study is relatively minor takes the last author position, against the standard conventions on authorship worldwide. I strongly suggest that this is corrected.

      We thank the reviewer for raising this problem.  The last author Dr. Ming was also a senior author and supervised this project with large contribution.  We have fixed his role as a co-corresponding author in the revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Summary of revisions

      Title

      We have changed the title of the manuscript to “Chromatin endogenous cleavage provides a global view of yeast RNA polymerase II transcription kinetics”.

      Text

      Additional discussion of the patterns for elongation factors added (detailed below).

      Small text changes throughout, as mentioned in the detailed response below.

      Figures

      Updated legend-image in Figure 2F to reflect correct colors

      Added Figure 2 – supplement 1F – RNAPII enrichment with shorter promoter dwell times

      Added Figure 2 - supplement 2 with ChIP-seq outcomes (and text legend)

      Removed gene numbers in Figure 5C and put them in the legend.

      Substituted Med1 and Med8 ChEC over Rap1 sites in Figure 5F.

      Moved kin28-is growth inhibition to Figure 5 – Supplement 1.

      Substituted a new panel overlaying the RNAPII enrichment over UASs or promoters for all three strains in Figure 7D.

      Improved the labeling and legend of Figure 7E

      Methods

      Added ChIP-seq performed to confirm that the MNase fusion proteins are able to produce the expected pattern for ChIP.

      Point-by-point response to reviewers’ comments

      Reviewer 1:

      (1) Extending this work to elongation factors Ctk1 and Spt5 unexpectedly give strong signals near the PIC location and little signals over the coding region. This, and mapping CTD S2 and S5 phosphorylation by ChEC suggests to me that, for some reason, ChEC isn't optimal for detecting components of the elongation complex over coding regions. 

      (3) mapping the elongation factors Spt5 and Ctk1 by ChEC gives unexpected results as the signals over the coding sequences appear weak but unexpectedly strong at promoters and terminators. It would be helpful if the authors could comment on reasons why ChEC may not work well with elongation factors. For example, could this be something to do with the speed of Pol elongation and/or the chromatin structure of coding sequences such that coding sequence DNA is less accessible to MNase cleavage? 

      (7) The mintbodys are an interesting attempt to measure Pol II CTD modifications during elongation but give unexpected results as the signals in the coding region are lower than at promoters and terminators. It seems like ChIP is still a much better option for elongation factors unless I'm missing something. 

      We agree with the reviewer that this is a point that could confuse the reader.  Therefore, we have devoted two additional paragraphs to possible interpretations of our data in the Discussion:

      ChEC with factors involved in elongation (Ctk1, Spt5, Ser2p-RNAPII), when normalized to total RNAPII, showed greater enrichment over the CDS (Figure 3G), as expected. However, it is surprising that we also observed clear enrichment of these factors at promoters (e.g. Figure 3A, E & F). The association of elongation factors with the promoter seems to be biologically relevant. Changes in transcription correlate with changes in ChEC enrichment for these factors and modifications (Figure 4C). Blocking initiation by inhibiting TFIIH kinase led to a reduction of Ser5p RNAPII and Ser2p RNAPII over both the promoter and the transcribed region (Figure 5G). This suggests either that the true signal of these factors over transcribed regions is less evident by ChEC than by ChIP or that ChEC can reveal interactions of elongation factors at early stages of transcription that are missed by ChIP. The expectations for enrichment of elongation factors and phosphorylated CTD are largely based on ChIP data. Because ChIP fails to capture RNAPII enrichment at UASs and promoters, it is possible that ChIP also fails to capture promoter interaction of factors involved in elongation as well.

      Factors important for elongation can also function at the promoter. For example, Ctk1 is required for the dissociation of basal transcription factors from RNAPII at the promoter (Ahn et al., 2009). Transcriptional induction leads to increases in Ctk1 ChEC enrichment both over the promoter and over the 3’ end of the transcribed region (Figure 4C). Dynamics of Spt4/5 association with RNAPII from in vitro imaging (Rosen et al., 2020) indicate that the majority of Spt4/5 binding to RNAPII does not lead to elongation; Spt4/5 frequently dissociates from DNA-bound RNAPII. Association of Spt4/5 with RNAPII may represent a slow, inefficient step in the transition to productive elongation. If so, then ChEC-seq2 may capture transient Spt4/5 interactions that occur prior to productive elongation, producing enrichment of Spt5 at the promoter.

      (2) Finally, the role of nuclear pore binding by Gcn4 is explored, although the results do not seem convincing (10) In Figure 7, it's not convincing to me that ChEC is revealing the reason for the transcriptional defect in the Gcn4 PD mutant. The plots in panel D look nearly the same and I don't follow the authors' description of the differences stated in the text. In panel A, replotting the data in some other way might make the transcriptional differences between WT and Gcn4 PD mutants more obvious. 

      The phenotype of the gcn4-pd mutant is a quantitative decrease in transcription and this leads to a quantitative decrease, rather than qualitative loss, of RNA polymerase II over the promoter, without impacting the association of RNA polymerase II over the UAS region. This effect is small but statistically significant (p = 4e5). We have changed the title of this section of the manuscript to “ChEC-seq2 suggests a role for the NPC in stabilizing promoter association of RNAPII”. Also, to make comparison clearer, we have plotted the data together in the revised figure (Figure 7D).

      The magnitude of the decrease is not large, but we would highlight that is almost as large as that produced by inhibiting the Kin28 kinase (Figure 5H). Because the promoter-bound RNAPII is poorly captured by ChIP, this effect might be difficult to observe by techniques other than ChEC. Obviously, more mechanistic studies will need to be performed to fully understand this phenotype, but this result supports a role for the interaction with the nuclear pore complex in either enhancing the transfer of RNA polymerase II from the enhancer to the promoter or in preventing its dissociation from the promoter.

      I think that the related methods cut&run/cut&tag have been used to map elongating pol II. The authors should summarize what is known from this approach in the introduction and/or discussion. 

      CUT&RUN has been used to map RNAPII in mammals, but we are not aware of reports in S. cerevisiae.  Work from the Henikoff Lab in yeast mapped transcription factors and histone modifications (PMIDs 28079019 and 31232687).  A report using CUT&RUN in a human cell line reported a promoter-5’ bias of RNAPII that appeared to be dependent on fragment length (PMID 33070289). Regardless, the report highlights a key distinction between yeast and other eukaryotes: paused RNAPII. Indeed, paused RNAPII dominates ChIP-seq tracks in metazoans, and so we are hesitant to speculate between CUT&RUN in other species vs. ChEC-seq2 in S. cerevisiae

      Are the Rpb1, Rpb3, TFIIA, and TFIIE cleavage patterns expected based on the known structure of the PIC (Figures 2C, E)? 

      Rpb1 and 3 show peaks at approximately -17 and +34 with respect to TATA. TFIIA (Toa2) shows peaks at -12 and + 12.  And TFIIE (Tfa1) shows a peak around +34 (Figure 2C & E):

      As shown in the supplementary movie (based on the cMed-PIC structure; PDB #5OQM; Schilbach et al., 2017), upon binding to TBP/TFIID, TFIIA would be expected to cleave slightly upstream and downstream of the protected TATA (-12 and +12), while TFIIE binds downstream after the +12 site is protected and would be closest to the +34 unprotected site (to the right in the image below). RNAPII, which binds the fully assembled PIC, should be able to access either the upstream site (-12) or the downstream site (+34). Rpb1’s unstructured carboxy terminal domain, to which MNase is fused, would give it maximum flexibility, which likely explains why Rpb1 cleaves both at -12 and +34, with a preference for -12. Rpb3 also cleaves both sites, but without an obvious preference. 

      Author response image 1.

      Author response image 2.

      cleavage at -12, +12 and +34

      Author response image 3.

      Highlighted sites corresponding to the peaks in TFIIA assembled with TBP:

      Author response image 4.

      The complete PIC, protecting the +12 site, but leaving the +34 site exposed: 

      (6) Figure 2 S1: Pol II ChIP in the coding region gives a better correlation with transcription vs ChEC in promoters. Also, Pol II ChIP at terminators is almost as good as ChEC at promoters for estimating transcription. This latter point seems at odds with the text. The authors should comment on this and modify the text as needed. 

      Thank you for this comment.  We have clarified the text.

      In Figures 4 and 5, it's hard to tell how well changes in transcription correlate with changes in Pol II ChEC signals. It might be helpful to have a scatterplot or some other type of plot so that this relationship can be better evaluated. 

      While we find corresponding increase/decrease in ChEC-seq2 signal in genes identified as up/downregulated by SLAM-seq, the magnitude in change is not well correlated between the two techniques.  This was not surprising, because neither ChIP nor ChEC correlate especially well with SLAM-seq (Figure 2 – supplement 1E).

      In Figure 5, it's unclear why Pol association with Rap1 is being measured. Buratowski/Gelles showed that Pol associates with strong acidic activators - presumably through Mediator. Rap1 supposedly does not bind Mediator - so how is Pol associating here? Perhaps it would be better to measure Pol binding at STM genes that show Mediator-UAS binding. 

      Thank you; this is a good point.  We chose Rap1 because we had generated high-confidence binding sites in our strains under these conditions by ChEC-seq2. The results suggest that RNAPII is recruited well to these sites and that this recruitment does not require TFIIB. However, in disagreement with the notion that Mediator does not interact with Rap1, ChEC with Mediator subunits Med1 and Med8 also show peaks at these sites (new Figure 5F; the old Figure 5F is now Figure 5 – Supplement 1).  Therefore, either these sites are co-occupied by other transcription factors that mind Mediator, or Mediator is recruited by Rap1.  In either case, this correlates with binding of RNAPII. 

      Reviewer 2:

      (1) The term "nascent transcription" is all too often used interchangeably for NET-seq, PRO-seq, 4sUseq, and other assays that often provide different types of information. The authors should make it clear their use of the term refers to SLAM-seq data. 

      We have clarified throughout the manuscript that nascent transcription measured by SLAM-seq.

      The authors should explicitly state that experiments were performed in S. cerevisiae in the Results section. 

      We have made it clear in the title and the text that these experiments were performed in S. cerevisiae.

      Lines 216-218 state that "None of the 24 predicted the strong signal over the transcribed region with promoter depletion characteristic of ChIP-seq". I understand the authors' point, but there are parameter combinations that produce a flat profile with slightly less signal over the promoter (e.g., 5 sec dwell times and 3000 bp/ min elongation rate). If flanking windows were included, this profile would look something like ChIP-seq. I'd encourage the authors to be more precise with their language. 

      Thank you for highlighting this over-statement.

      We have now clarified the text and added another supplementary panel as follows:

      “While some combinations predicted a relatively flat distribution across the gene with lower levels in the promoter, none of the 24 predicted the strong signal over the transcribed region with promoter depletion characteristic of ChIP-seq. Only very short promoter dwell times (i.e., < 1s), produced the low promoter occupancy seen in ChIP-seq (Figure 2 – supplement 1F).”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      In the manuscript by Su et al., the authors present a massively parallel reporter assay (MPRA) measuring the stability of in vitro transcribed mRNAs carrying wild-type or mutant 5' or 3' UTRs transfected into two different human cell lines. The goal presented at the beginning of the manuscript was to screen for effects of disease-associated point mutations on the stability of the reporter RNAs carrying partial human 5' or 3' UTRs. However, the majority of the manuscript is dedicated to identifying sequence components underlying the differential stability of reporter constructs. This shows that TA dinucleotides are the most predictive feature of RNA stability in both cell lines and both UTRs.

      The effect of AU rich elements (AREs) on RNA stability is well established in multiple systems, and the present study confirms this general trend but points out variability in the consequence of seemingly similar motifs on RNA stability. For example, the authors report that a long stretch of Us has extreme opposite effects on RNA stability depending on whether it is preceded by an A (strongly destabilizing) or followed by an A (strongly stabilizing). While the authors interpretation of a context- dependence of the effect is certainly well-founded, it seems counterintuitive that the preceding or following A would be the (only) determining factor. This points to a generally reductionist approach taken by the authors in the analysis of the data and in their attempt to dissect the contribution of "AU rich sequences" to RNA stability, with a general tendency to reduce the size and complexity of the features (e.g. to dinucleotides). While this certainly increases the statistical power of the analysis due to the number of occurrences of these motifs, it limits the interpretability of the results. How do TA dinucleotides per se contribute to destabilizing the RNA, both in 5' and 3' UTRs, but (according to limited data presented) not in coding sequences? What is the mechanism? RBPs binding to TA dinucleotide containing sequences are suggested to "mask" the destabilizing effect, thereby leading to a more stable RNA. Gain of TA dinucleotides is reported to have a destabilizing effect, but again no hypothesis is provided as to the underlying molecular mechanism. In addition to reducing the motif length to dinucleotides, the notion of "context dependence" is used in a very narrow sense; especially when focusing on simple and short motifs, a more extensive analysis of the interdependence of these features (beyond the existing analysis of the relationship between TA- diNTs and GC content) could potentially reveal more of the context dependence underlying the seemingly opposite behavior of very similar motifs.

      (We have used UA instead of TA, as per the reviewer's suggestion)

      The contribution of coding region sequence to RNA stability has been extensively discussed (For example: doi.org/10.1016/j.molcel.2022.03.032; doi.org/10.1186/s13059-020-02251-5; doi.org/10.15252/embr.201948220; doi.org/10.1371/journal.pone.0228730; doi.org/10.7554/eLife.45396). While UA content at the third codon position (wobble position) has been implicated as a pro-degradation signal, codon optimality has emerged as the most prominent determinant for RNA stability. This indicates that the role of coding regions in RNA stability differs from that of UTRs due to the involvement of translation elongation. We did not intend to suggest that UA-dinucleotides in UTRs and coding regions have the same effect. 

      To ensure the representativeness of the features entered into the LASSO model, we pre-selected those with an occurrence greater than 10% among all UTRs. As a result, while motifs with very low occurrences were excluded from the analysis, there is no evidence to indicate a preference for dinucleotides by the LASSO model.

      We hypothesize that UA-dinucleotide may recruit endonucleases RNase A family, whose catalytic pockets exhibit a strong bias for UA dinucleotide (doi.org/10.1016/j.febslet.2010.04.018). Structures or protein bindings that block this recognition might stabilize RNAs. To gain further insight into the motif interactions, we investigated the interactions between UA and other 15 dinucleotides through more detailed analyses. We conducted a linear regression analysis investigating interactions between UA and the other 15 dinucleotides. The formula used below includes UA:

      , where all 𝛽 terms represent the regression coefficients, and , , and represent the number of UA dinucleotides, the number of other dinucleotides (other than UA), and the GC content of the i<sup>th</sup> UTR, respectively, and 𝜖<sub>i</sub> denotes the error term. For each dinucleotide, we tested the significance of 𝛽<sub>UAxGC%</sub> and 𝛽<sub>UAxDiNT</sub>, and compared their p-values using a quantile-quantile (QQ) plot. Author response image 1 shows that the interaction effect of UA dinucleotides with GC% is much more significant than interactions with the other 15 dinucleotides, as indicated by the inflated QQ plot of p-values. This suggests that GC content is a more critical contextual factor influencing UA dinucleotides' impact on RNA stability.

      Author response image 1.

      The present MPRAs measures the effect of UTR sequences in one specific reporter context and using one experimental approach (following the decay of in vitro transcribed and transfected RNAs). While this approach certainly has its merits compared to other approaches, it also comes with some caveats: RNA is delivered naked, without bound RBPs and no nuclear history, e.g. of splicing (no EJCs), editing and modifications. One way to assess the generalizability of the results as well as the context dependence of the effects is to perform the same analysis on existing datasets of RNA stability measurements obtained through other methods (e.g. transcription inhibition). Are TA dinucleotides universally the most predictive feature of RNA half-lives?

      Our system studies the stability control of RNA synthesized in vitro and delivered into human cells. While we did not intend to generalize our conclusions to endogenous RNAs, our approach contributes to the understanding of in vitro synthesized RNA used for cellular expression, such as in vaccines. It is known that endogenous RNAs undergo very different regulation. The most prominent factors controlling endogenous RNA stability are the density of splice junctions and the length of UTRs (doi.org/10.1186/s13059-022-02811-x; doi.org/10.1186/s12915-021-00949-x). To decipher the sequence regulation, these factors are controlled in our experiments. Therefore, we do not expect the dinucleotide features found by our approach to be generalized as the most predictive feature of RNA half-life in vivo. 

      The authors conclude their study with a meta-analysis of genes with increased TA dinucleotides in 5' and 3'UTRs, showing that specific functional groups are overrepresented among these genes. In addition, they provide evidence for an effect of disease-associated UTR mutations on endogenous RNA stability. While these elements link back to the original motivation of the study (screening for effects of point mutations in 5' and 3' UTRs), they provide only a limited amount of additional insights.

      We utilized the Taiwan Biobank to investigate whether mutations significantly affecting RNA stability also impact human biochemical measurements. Our findings indicate that these mutations indeed have a significant effect on various biochemical indices. This highlights the importance of our study, as it bridges basic science with potential applications in precision medicine. By linking specific UTR mutations with measurable changes in biochemical indices, our research underscores the potential for these findings to inform targeted medical interventions in the future.

      In summary, this manuscript presents an interesting addition to the long-standing attempts at dissecting the sequence basis of RNA stability in human cells. The analysis is in general very comprehensive and sound; however, at times the goal of the authors to find novelty and specificity in the data overshadows some analyses. One example is the case where the authors try to show that TA-dinucleotides and GC content are decoupled and not merely two sides of the same coin.

      They claim that the effect of TA dinucleotides is different between high- and low-GC content contexts but do not control for the fact that low GC-content regions naturally will contain more TA dinucleotides and therefore the effect sizes and the resulting correlation between TA-diNT rate and stability will be stronger (Fig. 5A). A more thorough analysis and greater caution in some of the claims could further improve the credibility of the conclusions.

      Low GC content implies a higher UA content but does not directly equate to a high UA-dinucleotide ratio. For instance, the sequence AUUGAACCUU has a lower GC content (0.3) compared to UAUAGGCCGC (0.6), yet it also has a lower UA-dinucleotide ratio (0 vs. 0.22). To address this concern more rigorously, we performed a stratified analysis based on UA-diNT rate. As shown in our Fig. S7C, even after stratifying by UA- dinucleotide ratio (upper panel high UA- dinucleotide ratio / lower panel low UA- dinucleotide ratio), we still observe that the destabilizing effect of UA is stronger in the low GC content group.

      Reviewer #2 (Public Review):

      Summary of goals:

      Untranslated regions are key cis-regulatory elements that control mRNA stability, translation, and translocation. Through interactions with small RNAs and RNA binding proteins, UTRs form complex transcriptional circuitry that allows cells to fine-tune gene expression. Functional annotation of UTR variants has been very limited, and improvements could offer insights into disease relevant regulatory mechanisms. The goals were to advance our understanding of the determinants of UTR regulatory elements and characterize the effects of a set of "disease-relevant" UTR variants.

      Strengths:

      The use of a massively parallel reporter assay allowed for analysis of a substantial set (6,555 pairs) of 5' and 3' UTR fragments compiled from known disease associated variants. Two cell types were used.

      The findings confirm previous work about the importance of AREs, which helps show validity and adds some detailed comparisons of specific AU-rich motif effects in these two cell types.

      Using a Lasso regression, TA-dinucleotide content is identified as a strong regulator of RNA stability in a context dependent manner based on GC content and presence of RNA binding protein binding motifs. The findings have potential importance, drawing attention to a UTR feature that is not well characterized.

      The use of complementary datasets, including from half-life analyses of RNAs and from random sequence library MRPA's, is a useful addition and supports several important findings. The finding the TA dinucleotides have explanatory power separate from (and in some cases interacting with) GC content is valuable.

      The functional enrichment analysis suggests some new ideas about how UTRs may contribute to regulation of certain classes of genes.

      Weaknesses:

      It is difficult to understand how the calculations for half-life were performed. The sequencing approach measures the relative frequency of each sequence at each time point (less stable sequences become relatively less frequent after time 0, whereas more stable sequences become relatively more frequent after time 0). Since there is no discussion of whether the abundance of the transfected RNA population is referenced to some external standard (e.g., housekeeping RNAs), it is not clear how absolute (rather than relative) half-lives were determined.

      We estimated decay constant λ and half-life (t<sub>1/2</sub>) by the following equations:

      where C<sub>i(t)</sub> and C<sub>i(t=0)</sub> are read count values of the ith replicate at time points 𝑡 and 0 (see also Methods). The absolute abundance was not required for the half-life calculation. 

      Fig. S1A and B are used to assess reproducibility. They show that read counts at a given time point correlate well across replicate experiments. However, this is not a good way to assess reproducibility or accuracy of the measurements of t1/2 are. (The major source of variability in read counts in these plots - especially at early time points - is likely the starting abundance of each RNA sequence, not stability.) This creates concerns about how well the method is measuring t1/2. Also creating concern is the observation that many RNAs are associated with half-lives that are much longer than the time points analyzed in the study. For example, based upon Figure S1 and Table S1 correctly, the median t1/2 for the 5' UTR library in HEK cells appears to be >700 minutes. Given that RNA was collected at 30, 75, and 120 minutes, accurate measurements of RNAs with such long half lives would seem to be very difficult.

      We estimated the half-life based on the following equations:

      where C<sub>i(t)</sub> and C<sub>i(t=0)</sub> are read count values of the ith replicate at time points 𝑡 and 0 (see also Methods). The calculation of the half-life involves first determining the decay constant 𝜆, which represents a constant rate of decay. Since 𝜆 is a constant, it is possible to accurately calculate it without needing data over the entire decay range. Our experimental design considers this by selecting appropriate time points to ensure a reliable estimation of 𝜆, and thus, the half-life. To determine the most suitable time points, we conducted preliminary experiments using RT-PCR.

      These experiments indicated that 30, 75, and 120 minutes provided an effective range for capturing the decay dynamics of the transcripts.

      There is no direct comparison of t1/2 between the two cell types studied for the full set of sequences studied. This would be helpful in understanding whether the regulatory effects of UTRs are generally similar across cell lines (as has been shown in some previous studies) or whether there are fundamental differences. The distribution of t1/2's is clearly quite different in the two cell lines, but it is important to know if this reflects generally slow RNA turnover in HEK cells or whether there are a large number of sequence-specific effects on stability between cell lines. A related issue is that it is not clear whether the relatively small number of significant variant effects detected in HEK cells versus SH-SY5Y cells is attributable to real biological differences between cell types or to technical issues (many fewer read counts and much longer half lives in HEK cells).

      For both cell lines, we selected oligonucleotides with R<sup>2</sup> > 0.5 and mean squared error (MSE) < 1 for analysis when estimating half-life (λ) by linear regression. This selection criterion was implemented to minimize the effect of experimental noise. After quality control, we selected common UTRs and compared the RNA half-lives of the two cell lines using a scatter plot. Author response image 2 shows that RNA half-lives are quite different between the cell lines, with a moderate similarity observed in the 5' UTRs (R = 0.21), while the correlation in the 3' UTRs is non-significant.

      Author response image 2.

      Despite the low correlation of mRNA half-life between the two cell lines, UA-dinucleotide and UA-rich sequences consistently emerge as the most significant destabilizing features, suggesting a shared regulatory mechanism across diverse cellular environments.

      The general assertion is made in many places that TA dinucleotides are the most prominent destabilizing element in UTRs (e.g., in the title, the abstract, Fig. 4 legend, and on p. 12). This appears to be true for only one of the two cell lines tested based on Fig. 3.

      UA-dinucleotides and other UA-rich sequences exhibit similar effects on RNA stability, as illustrated in Fig. S5A-C. In two cell lines, UA-dinucleotide and WWWWWW sequences were representatives of the same stability-affecting cluster. While the impact of UA-dinucleotides can be generalized, we have rephrased some statements for clarification to avoid any potential misunderstanding. For examples: 

      Abstract: “...We found that UA dinucleotides and UA-rich motifs are the most prominent destabilizing element.“

      p.10: “UA dinucleotides and UA-rich motifs are the most common and effective RNA destabilizing factor” 

      Figure 4: “The UTR UA dinucleotides and UA-rich motifs are the most common and influential RNA destabilizing factor.”

      Appraisal and impact:

      The work adds to existing studies that previously identified sequence features, including AREs and other RNA binding protein motifs, that regulate stability and puts a new emphasis on the role of "TA" (better "UA") dinucleotides. It is not clear how potential problems with the RNA stability measurements discussed above might influence the overall conclusions, which may limit the impact unless these can be addressed.

      It is difficult to understand whether the importance of TA dinucleotides is best explained by their occurrence in a related set of longer RBP binding motifs (see Fig 5J, these motifs may be encompassed by the "WWWWWW cluster") or whether some other explanation applies. Further discussion of this would be helpful. Does the LASSO method tend to collapse a more diverse set of longer motifs that are each relatively rare compared to the dinucleotide? It remains unclear whether TA dinucleotides are associated with less stability independent of the presence of the known larger WWWWWWW motif. As noted above, the importance of TA dinucleotides in the HEK experiments appears to be less than is implied in the text.

      To ensure the representativeness of the features entered into the LASSO model, we pre-selected those with an occurrence greater than 10% among all UTRs. There is no evidence to support a preference for dinucleotides by LASSO. To address whether the destabilizing effect of UA dinucleotides is part of the broader WWWWWW motif, we divided UA dinucleotides into two groups: those within the WWWWWW motif and those outside of it. Specifically, we divided UTRs into two categories: 'at least one UA within a WWWWWW motif' and 'no UA within a WWWWWW motif,' and visualized the results using a boxplot. As shown in Author response image 3, the destabilizing trend still remains for UA dinucleotides outside of the WWWWWW motif, although the effect appears to be more pronounced when UA is within the WWWWWW motif. This suggests that while UA dinucleotides have a destabilizing effect independently, their impact is amplified when they are part of the broader WWWWWW motif.

      Author response image 3.

      The inclusion of more than a single cell type is an acknowledgement of the importance of evaluating cell type-specific effects. The work suggests a number of cell type-specific differences, but due to technical issues (especially with the HEK data, as outlined above) and the use of only two cell lines, it is difficult to understand cell type effects from the work.

      The inclusion of both 3' and 5' UTR sequences distinguishes this work from most prior studies in the field. Contrasting the effects of these regions on stability is of interest, although the role of these UTRs (especially the 5' UTR) in translational regulation is not assessed here.

      We examined the role of UTR and UTR variants in translation regulation using polysome profiling. By both univariate analysis and an elastic regression model, we identified motifs of short repeated sequences, including SRSF2 binding sites, as mutation hotspots that lead to aberrant translation. Furthermore, these polysome-shifting mutations had a considerable impact on RNA secondary structures, particularly in upstream AUG-containing 5’ UTRs. Integrating these features, our model achieved high accuracy (AUROC > 0.8) in predicting polysome-shifting mutations in the test dataset. Additionally, metagene analysis indicated that pathogenic variants were enriched at the upstream open reading frame (uORF) translation start site, suggesting changes in uORF usage underlie the translation deficiencies caused by these mutations. Illustrating this, we demonstrated that a pathogenic mutation in the IRF6 5’ UTR suppresses translation of the primary open reading frame by creating a uORF. Remarkably, site-directed ADAR editing of the mutant mRNA rescued this translation deficiency. Because the regulation of translation and stability does not converge, we illustrate these two mechanisms in two separate manuscripts (this one and doi.org/10.1101/2024.04.11.589132).

      Reviewer #3 (Public Review):

      Summary:

      In their manuscript titled "Multiplexed Assays of Human Disease‐relevant Mutations Reveal UTR

      Dinucleotide Composition as a Major Determinant of RNA Stability" the authors aim to investigate the effect of sequence variations in 3'UTR and 5'UTRs on the stability of mRNAs in two different human cell lines.

      To do so, the authors use a massively parallel reporter assay (MPRA). They transfect cells with a set of mRNA reporters that contain sequence variants in their 3' or 5' UTRs, which were previously reported in human diseases. They follow their clearance from cells over time relative to the matching non-variant sequence. To analyze their results, they define a set of factors (RBP and miRNA binding sites, sequence features, secondary structure etc.) and test their association with differences in mRNA stability. For features with a significant association, they use clustering to select a subset of factors for LASSO regression and identify factors that affect mRNA stability.

      They conclude that the TA dinucleotide content of UTRs is the strongest destabilizing sequence feature. Within that context, elevated GC content and protein binding can protect susceptible mRNAs from degradation. They also show that TA dinucleotide content of UTRs affects native mRNA stability, and that it is associated with specific functional groups. Finally, they link disease associated sequence variants with differences in mRNA stability of reporters.

      Strengths:

      This work introduces a different MPRA approach to analyze the effect of genetic variants. While previous works in tissue culture use DNA transfections that require normalization for transcription efficiency, here the mRNA is directly introduced into cells at fixed amounts, allowing a more direct view of the mRNA regulation.

      The authors also introduce a unique analysis approach, which takes into account multiple factors that might affect mRNA stability. This approach allows them to identify general sequence features that affect mRNA stability beyond specific genetic variants, and reach important insights on mRNA stability regulation. Indeed, while the conclusions to genetic variants identified in this work are interesting, the main strength of the work involve general effect of sequence features rather than specific variants.

      The authors provide adequate supports for their claims, and validate their analysis using both their reporter data and native genes. For the main feature identified, TA di-nucleotides, they perform follow-up experiments with modified reporters that further strengthen their claims, and also validate the effect on native cellular transcripts (beyond reporters), demonstrating its validity also within native scenarios.

      The work provides a broad analysis of mRNA stability, across two mRNA regulatory segments (3'UTR and 5'UTR) and is performed in two separate cell-types. Comparison between two different cell-types is adequate, and the results demonstrate, as expected, the dependence of mRNA stability on the cellular context. Analysis of 3'UTR and 5'UTR regulatory effects also shows interesting differences and similarities between these two regulatory regions.

      Weaknesses:

      (1) The authors fail to acknowledge several possible confounding factors of their MPRA approach in the discussion.

      First, while transfection of mRNA directly into cells allows to avoid the need to normalize for differences in transcription, the introduction of naked mRNA molecules is different than native cellular mRNAs and could introduce biases due to differences in mRNA modifications, protein associations etc. that may occur co-transcriptionally.

      Second, along those lines, the authors also use in-vitro polyadenylation. The length of the polyA tail of the transfected transcripts could potentially be very different than that of native mRNAs and also affect stability.

      The transcripts used in our study were polyadenylated in vitro with approximately 100 nucleotides 

      (Fig. S1C), similar to the polyA tail lengths typically observed in vivo (dx.doi.org/10.1016/j.molcel.2014.02.007).  Additionally, these transcripts were capped to emulate essential mRNA characteristics and to minimize immune responses in recipient cells. This design allows us to study RNA decay for in vitro-synthesized RNA delivered into human cells, akin to RNA vaccines, but it does not necessarily extend to endogenous RNAs. As mentioned, endogenous RNAs undergo nuclear processing and are decorated by numerous trans factors, resulting in distinct regulatory mechanisms. We therefore provided a more discussion on these differences and their implications in the revised manuscript: “However, while our approach effectively assesses the stability of synthesized RNA in human cells, it may not fully capture the decay dynamics of nuclear-synthesized RNA, which can be influenced by endogenous modifications and trans-acting RNA binding factors. (p. 18)”

      (2) The analysis approach used in this work for identifying regulatory features in UTRs was not previously used. As such, lack of in-depth details of the methodology, and possibly also more general validation of the approach, is a drawback in convincing the reader in the validity of this approach and its results.

      In particular, a main point that is not addressed is how the authors decide on the set of "factors" used in their analysis? As choosing different sets of factors might affect the results of the analysis. 

      In our study, we employed the calculation of the Variance Inflation Factor (VIF) as a basis for selecting variables. This well-established method is widely used to detect variables with high collinearity, thus ensuring the robustness and reliability of our analysis. By identifying and excluding highly collinear variables, we aimed to minimize multicollinearity and improve the accuracy of our regression models. For more detailed information on the use of VIF in regression analysis, please refer to Akinwande, M., Dikko, H., and Samson, A. (2015). Variance Inflation Factor: As a Condition for the Inclusion of Suppressor Variable(s) in Regression Analysis. Open Journal of Statistics, 5, 754-767. doi: 10.4236/ojs.2015.57075. We have included the method details in the revised manuscript (p. 28) :”… to avoid multicollinearity caused by similar features that perturb feature selection, all features were clustered using single-linkage hierarchical clustering with the distance metric defined as one minus the absolute value of the Spearman correlation coefficient. We cut the tree at a specific height, and the feature that had the greatest influence on RNA stability, which was examined using a simple linear regression model, was selected to be the representative of each cluster. Then we calculated the variance inflation factor (VIF) value of the representative features. The VIFs were obtained by the following linear model and equations:

      where and are the estimated value of the jth feature and the value of the kth feature of the ith UTR (note that the kth feature is a feature other than the jth feature), and are the intercept and the regression coefficients of the linear model that regressed the jth feature on the other remaining features, and is the mean level of the jth feature of all UTRs.”

      For example, the choice to use 7-mer sequences within the factors set is not explained, particularly when almost all motifs that are eventually identified (Figure 3B-E) are shorter.

      The known RBP motifs are primarily 6-mer. To explore the possibility of discovering novel motifs that could significantly impact our model, we started with 7-mer sequences. However, our analysis revealed that including these additional variables did not improve the explanatory power of the model; instead, it reduced it. Consequently, our final model focuses on motifs shorter than 7-mer. We explained the motif selections in the revised manuscript (p. 9): “Given our discovery that the effect of AREs is heavily dependent on sequence content, we decided to further explore the effects of other sequence elements, i.e., beyond known regulatory motifs, in more detail. Since most reported RBP motifs are 6-mers, we initiated a search for novel motifs by analyzing the presence of all 7-mers in our massively parallel reporter assay (MPRA) library, correlating their occurrence with mRNA half-life.”

      In addition, the authors do not perform validations to demonstrate the validity of their approach on simulated data or well-established control datasets. Such analysis would be helpful to further convince the reader in the usefulness and robustness of the analysis.

      We acknowledge the importance of validating our approach on simulated data or well-established control datasets to demonstrate its robustness and reliability. However, to the best of our knowledge, there are currently no well-established control datasets available that perfectly correspond to our specific study context. Despite this, we will continue to search for any relevant datasets that could be utilized for this purpose in future work. This effort will help to further reinforce the confidence in our methodology and its findings.

      (3) The analysis and regression models built in this work are not thoroughly investigated relative to native genes within cells. The effect of sequence "factors" on native cellular transcripts' stability is not investigated beyond TA di-nucleotides, and it is unclear to what degree do other predicted factors also affect native transcripts.

      Our system studies the stability control of RNA synthesized in vitro and delivered into human cells. While we validated the UTR UA-dinucleotide effect in vivo, we did not intend to conclude that this is the most influential regulation for endogenous RNAs. It is known that endogenous RNAs undergo very different regulation. The most prominent factors controlling endogenous RNA stability are the density of splice junctions and the length of UTRs (doi.org/10.1186/s13059-022-02811-x; doi.org/10.1186/s12915-021-00949-x). To decipher the sequence regulation, we controlled for these factors in our experiments. Therefore, we acknowledge that several endogenous features, which were excluded by our approach, may serve as predictive features of RNA half-life in vivo. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Specific comments:  

      Some references are missing, e.g for the sentence:

      Please see the response below.

      "Similarly, point mutation of the GFPT1 3' UTR results in congenital myasthenic syndrome." (p5)

      The reference has been added to the text:

      Dusl, M., Senderek, J., Muller, J. S., Vogel, J. G., Pertl, A., Stucka, R., Lochmuller, H., David, R., & Abicht, A. (2015). A 3'-UTR mutation creates a microRNA target site in the GFPT1 gene of patients with congenital myasthenic syndrome. Human Molecular Genetics, 24(12), 34183426. https://doi.org/10.1093/hmg/ddv090 

      "...but there have been no systematic assessments of the explicit effects of variants of both UTRs on stability regulation." (not true in the current phrasing; e.g. PMIDs 32719458, 36156153, 34849835)

      These references have been added to the text. However, we have to point out that these studies do not focus on the effects of the disease-relevant variants. To clarify, we modified the sentence to "... systematic assessments of the explicit effects of disease-relevant variants in both UTRs on stability regulation are still absent."

      "Multiple approaches have revealed AREs as exerting a destabilizing effect on RNA stability (Barreau et al., 2005). (p8)

      The reference has been added to the text:

      Barreau, C., Paillard, L., & Osborne, H. B. (2005). AU-rich elements and associated factors: are there unifying principles? Nucleic Acids Research, 33(22), 7138-7150. https://doi.org/10.1093/nar/gki1012 

      "This effect is specific, as such ratios in the coding region are inconsequential." (p12)

      This refers to our findings of Fig. 4G and Supplemental Fig. S5F.

      What are the sequences at the 5' and 3'UTR without insertion of a library? 5'UTR library (especially in SH) has much longer half-life compared to 3'utr library (Fig S1D).

      There is no designed 5’UTR of the 3’UTR library, only the Kozak sequence derived from the pEGFPC1 vector. This may partially underlie the shorter half-life of the 3’ UTR library.

      Fig2A: What are the units? "half-life (log)" Do the numbers correspond to log10(min)?

      It represents ln (min). To clarify, we now use ‘ln t<sub>1/2</sub> (min)’ in all figures.

      Fig 2 and 3: This was done only on the wild-type sequences? Or all tested sequences together, wt and mut?

      It was done only on the wild-type sequences. To clarify, we modified the text to “we examined the effect of AREs on RNA stability of the ref alleles according to specific sequence content….(p.8)” and “We considered as many factors as possible to explain the half-life of our ref UTR libraries,…. (p.9)”. ‘ref’ stands for reference.

      "Furthermore, to avoid collinearity confounding our model, e.g., the effects of very similar factors (such as 'AA' and 'AAA' sequences), we clustered the factors according to their properties, and then only one representative factor from within a cluster (i.e., the one with the highest correlation to halflife within a cluster) was subjected to LASSO regression": Given the observed context dependence, e.g. in the case of poly-U stretches: Isn't this clustering leading to similar/identical motifs with different context being grouped together (such as polyU preceded by an A (strongly destabilizing, according to Fig 2B) or followed by one (strongly stabilizing, according to Fig 2B), resulting in ignoring the context or using one potential outcome while a motif from the same cluster can have the opposite effect?

      Thank you very much for pointing this out. To determine if considering different contextual effects within each feature cluster would enhance model performance, we modified our feature selection by choosing both the feature with the largest positive and the largest negative effect on RNA half-life in Step III of Figure 3A. We then split the data into a 2:1 training and testing set and repeated this process 100 times. Model performance was evaluated using mean average error (MAE), root mean squared error (RMSE), and adjusted R-squared. From Author response image 4, we observed no significant improvement in model performance using this new approach. Notably, in the SH-SY5Y 5' UTR model, our original method even outperformed the modified one, with statistically lower MAE and RMSE and a higher adjusted R-squared. Therefore, we believe our current approach remains appropriate.

      Author response image 4.

      "Overall, motifs that are at least two nucleotides long proved critical for RNA stability, supporting the sequence specificity of the decay process." Unclear why this supports the "sequence specificity"

      No monomers were selected as an explanatory factor. On the contrary, specific sequence combinations and order are important for the regulation. These findings suggest sequence-specific recognition for the decay process.

      Fig3: The same features were used in both cell lines? If yes: Since they were selected for their highest correlation with half-life, how was a common set chosen? If no: problematic to compare.

      Thank you for your question regarding feature selection across cell lines. Initially, the features were collected uniformly for both cell lines. However, subsequent feature selection steps were cell-type specific, focusing on identifying features with the greatest impact on RNA half-life in each context. This approach allows us to still compare model performance and discuss the similarities and differences in selected features across cell types. By maintaining a consistent starting point, we ensure that any observed differences reflect cell-specific regulatory dynamics.

      uORFs were not used as features?

      Thank you for pointing this out. At the beginning of our study, we investigated the impact of Kozak sequence strength (categorized as weak, moderate, strong, or optimal) on RNA half-life. However, we found that this feature performed poorly in predicting RNA stability, and as a result, we decided not to include upstream open reading frames (uORFs) or Kozak sequences in our subsequent analyses.

      Experimental reproducibility: Only correlations between replicates for the same time point is shown, but no comparison between time points or between decay rates. How reproducible were the paired differences between mut/wt?

      The decay rate was calculated by modeling the slope of a linear regression of all time points. Therefore, there is only one decay rate associated with a genotype. To rule out inconsistent data, we excluded any regression with a mean square error greater than 1, as this indicates a poor fit of the data points. 

      Fig 7C/p17: This does not establish a "causal relationship" as the authors claim.

      We agree with the reviewer’s suggestion. We have modified the text on p.17 to “to establish a correlation between UTR variants and health outcomes,…..”

      In the discussion, the authors claim that TA-diNTs are not only an opposite of the GC percentage and base this on Fig 5A.

      Fig 5A: The range of TA-diNTs is naturally much higher in the low GC group. To make the high and low GC content comparable (as the authors aim to do), the correlation should be assessed for the same range of TA dint in both cases.

      To address this concern more rigorously, we performed a stratified analysis based on UA-diNT rate. As shown in our Fig. S7C, even after stratifying by UA- dinucleotide ratio (upper panel high UA- dinucleotide ratio / lower panel low UA- dinucleotide ratio), we still observe that the destabilizing effect of UA is stronger in the low GC content group.

      Supplemental Figure S7. Interplay of GC content and TA dinucleotide on stability regulation, related to Figure 5. (C) Stratifications of both TA dinucleotide ratio and GC content showed that the destabilizing effect of TA dinucleotide is the most prominent under conditions of low TA dinucleotide ratio and low GC content. The same trend was observed for 5’ UTR (left) and 3’ UTR (right).

      The injection of in vitro transcribed and polyA/capped RNA certainly has advantages over other methods, but delivering naked mRNA without nuclear history might also lead to artifacts. The caveats of the approach should be discussed more extensively.

      We appreciate the suggestion and have hence added the following in the Discussion (p.18): “However, while our approach effectively assesses the stability of synthesized RNA in human cells, it may not fully capture the decay dynamics of nuclear-synthesized RNA, which can be influenced by endogenous modifications and trans-acting RNA binding factors.”

      "We unexpectedly identified many crucial regulatory features in 5' UTRs." Why was this unexpected?

      We initially thought the 3’ UTR would play a major role in stability regulation. To avoid confusion, we have removed the word ‘unexpected’ from the text (p. 20): "We identified many crucial regulatory features in 5' UTRs."

      "...a massively parallel reporter assay in which coding regions and human 5'/3' UTRs with diseaserelevant mutations were generated in vitro and then directly transfected into human cell lines to assess their decay patterns by next‐generation sequencing": also coding regions?

      Thanks for the question. Indeed, the coding region was not synthesized together with the UTR library. Therefore, we modified the text of p. 6 to “…we developed a massively parallel reporter assay in which human 5’/3’ UTRs with disease-relevant mutations were generated in vitro, ligated with the enhanced green fluorescence protein (EGFP) coding region, and then directly transfected into human cell lines to assess their decay patterns by next-generation sequencing.”

      Reviewer #2 (Recommendations For The Authors):

      Nomenclature: When discussing RNA sequences, "U" should be used in place of "T" (e.g., "UA dinucleotide").

      We have replaced the RNA sequence “T” with “U” of the text and figures.

      Abstract: "We examined the RNA degradation patterns mediated by the UTR library in multiple cell lines" - It would be clearer to state that two cell lines (rather than multiple) were used.

      We appreciate the suggestion. We have modified the abstract as suggested: “We examined the RNA degradation patterns mediated by the UTR library in two cell lines…"

      The manuscript refers to "wild-type (WT) and mutant (mt) alleles." (p. 7 and elsewhere). It would be better to use "reference" instead of "wild type" given that these are human populations.

      We appreciate the suggestion. All instances of ‘wild-type’ or ‘WT’ in the text and figures have been replaced with ‘reference’ or ‘ref’.

      In the introduction, it is stated that traditional MPRAs "cannot differentiate the effect of the UTRs on transcription, stability and, in some cases, even protein production, greatly limiting scientific interpretation." This is confusing, since these assays can and have been used in association with both RNA decay measurements and measurements of reporter protein levels that allow assessment of effects on stability and protein production (including in the cited references).

      We reason that the RNA steady-state level (e.g., sequencing the overall RNA normalized to DNA) or protein steady-state level (e.g., detecting the fluorescence signal) does not precisely reveal the decay kinetics of the RNA. Steady-state level is a result of production and decay, both of which UTRs contribute to. Similarly, the protein level is not a perfect estimate of the RNA decay.

      To clarify, we have modified the introduction (p. 5) to “Nevertheless, because the steady-state level is a result of production and decay, these approaches cannot differentiate the effect of the UTRs on transcription, stability and, in some cases, even protein production, greatly limiting scientific interpretation.” 

      Adding raw and normalized read count data from individual experiments (e.g., to Table S1) would make it more likely for others to use this dataset to address additional questions.

      All raw and processed sequencing data generated in this study have been submitted to the NCBI Gene Expression Omnibus (GEO; https://www.ncbi.nlm.nih.gov/geo/) under accession number GSE217518 (reviewer token snspaakujtsdpcv).

      The manuscript would benefit from further clarification about model selection. Additional details regarding how the features were clustered, and the actual clusters themselves should be included.

      It should be discussed why Lasso was chosen vs Ridge or Elastic Net, in the context of handling multicollinearity. Often, data is subsetted for training and validation, and model performance metrics are presented.

      Thank you for pointing out the need for further clarification on model selection. The features were clustered using single-linkage hierarchical clustering with the distance metric defined as one minus the absolute value of the Spearman correlation coefficient (this information has been added to the manuscript on p. 28: “…to avoid multicollinearity caused by similar features that perturb feature selection, all features were clustered using single-linkage hierarchical clustering with the distance metric defined as one minus the absolute value of the Spearman correlation coefficient.”). The resulting feature clusters are available in Supplemental Table S3. 

      Regarding model selection, we chose LASSO over ridge and elastic net primarily for feature selection, as ridge does not perform feature selection. Elastic net is essentially a hybrid of ridge regression and LASSO regularization, but we opted for LASSO for its simplicity and effectiveness in selecting a sparse set of important features.

      We also performed a 2:1 training and testing set analysis and have included these details in the manuscript. Model performance metrics, including correlation coefficient between observed and predicted values in the testing set, mean absolute error (MAE), root mean squared error (RMSE), mean absolute percentage error (MAPE), and R-squared, are provided in new Supplemental Table S4.

      Recommend reviewing and correcting verb tenses in the methods section.

      We appreciate the reviewer’s suggestion. We have corrected verb tenses in the methods section, which includes “The UTRs were defined by NCBI RefSeq and ENCODE V27. (p.21)”, “The variant was placed in the middle of the sequence….(p.22)”, and “eCLIP signals with value < 1 or p value > 0.05 were removed. (p.26)”

      Please add information about which cell type(s) are being used in each of the figure legends (e.g., in Figs. 2B and 5).

      We appreciate the reviewer’s suggestion. We have added the cell type information in the figure legends: “Figure 2…. (B) The ten most influential AREs in terms of RNA stability in SH-SY5Y cells.” And “Figure 5…..(A) MPRA data of SH-SY5Y cells stratified according to the GC content (GC%) of UTRs.”

      Recommend review of axis labels and consistency in formatting the log(half-lives) and including the base of the log and the time unit (minutes). Even better, converting axis labels from log minutes to minutes would make this easier to understand.

      Thank you for the suggestion regarding axis labels and consistency. We have unified the half-life label to ‘ln t<sub>1/2</sub> (min)’ in all figures. We chose not to convert the axis from logarithmic minutes to minutes because the original scale is highly skewed, which would hinder clear data visualization.

      The discussion refers to Figure 1D but Figure 1 only has A-C

      Thank you for pointing out this mistake. ‘Fig. 1D’ has been changed to ‘Fig. 1B’ in the text (p. 7 and p. 20).

      The analyses in Fig. 2 are interpreted as demonstrating that AREs destabilize RNAs. These analyses are examining associations, so it would be more appropriate to say that AREs are associated with destabilization (since it is formally possible that other sequences that are present in these UTR fragment cause destabilization). A similar issue arises on p. 10: "TA dinucleotides alone can negatively regulate RNA stability, with a Pearson's correlation coefficient of ‐0.287 for 5' UTRs and ‐0.377 for 3' UTRs (Fig. 4A,C)." This is an association and does not establish causation. Again on p. 17: "We identified several SNPs in UTRs that induce aberrant RNA expression and/or protein expression (Supplemental Table S7)." These may be causal but may simply be in LD with other variants that are causal.

      We agree that the association observed is not proven to be causal. Therefore, we modified the text as suggested: 

      “AUUUA/AUUA-containing AREs are associated with RNA destabilization.” (p. 8)

      “UA dinucleotides alone present a negative correlation with RNA stability, with a Pearson’s correlation coefficient of -0.287 for 5’ UTRs and -0.377 for 3’ UTRs.”  (p.10)

      “We identified several SNPs in UTRs that correlated with aberrant RNA expression and/or protein expression.”  (p. 17)

      Figure 4C is important in that it examines whether variant sequences that differ in a manner that changes the number of dinucleotide repeats affect stability. Please show the number (not just the percentage) of sequences in each category.

      Thank you for your insightful comment. We believe the figure you referred to is Figure 4E. We have updated the figure to include the number of sequences in each category.

      Figure 6A and B: The horizontal axes appear to be misaligned since the dotted vertical lines do not cross at 0. ?

      The dotted vertical lines represent the genomic background of the UA-diNT ratio. To clarify it, we have modified the legend to: “Figure 6……(A) The top ten biological processes for which the 5’ UTR UA-dinucleotide ratio most significantly deviated from the genomic background (dashed line).”

      It may be helpful to state what the dashed and solid lines represent on Figure 6 E/F. Please correct spelling of "Biological" in 6E.

      As per the reviewer’s suggestions, we have modified the legend of Figure 6 to: “………..(E) Biological processes for RNAs in which the UA-dinucleotide ratios of both 5’ and 3’ UTRs are significantly different from the genomic background (dashed lines). (F) Molecular functions for RNAs in which the UA-dinucleotide ratios of both 5’ and 3’ UTRs are significantly different from the genomic background (dashed lines). The thin solid lines represent the standard deviation of the UAdinucleotide ratio within the gene group.” 

      In addition, the spelling of “Biological” in Fig. 6E has been corrected.

      Reviewer #3 (Recommendations For The Authors):

      I have 3 points that I think could improve science and its presentation within the manuscript.

      (1) Most importantly, how well do LASSO regression models predict the stability of native transcripts? Such analysis can also be useful for comparison between two different cell-types. How well does the regression model learned (on reporters) within one cell-type predict mRNA stability (of reporters and native genes) in this cell-type and in the other cell-type? Similarly, models can also help to analyze the effects of 5'UTR and 3'UTR sequences on mRNA stability. In particular, how well does the regression model of each separate regulatory sequence (3'UTR or 5'UTR) is able to predict the stability of native genes in the cell? Can the predictions be improved by combining both 3'UTR and 5'UTR sequence features within the regression models?

      The decay model for native transcripts has been established in prior research (doi.org/10.1186/s13059-022-02811-x; doi.org/10.1186/s12915-021-00949-x), which indicates that exon junction density and transcript length are the primary determinants of RNA stability. Based on these findings, we designed the MPRA with fixed length and without splicing to focus on the contribution of primary sequences. We validated the destabilizing effect of UA dinucleotide on endogenous RNAs (Fig. 4G and Supplemental Fig. S5F) but do not recommend using our model to fully explain or predict the stability of native transcripts.

      To assess the model's cross-cell type predictive performance for RNA half-life, we employed the Regression Error Characteristic (REC) curve (Bi & Bennett, 2003). Similar to the receiver operating characteristic (ROC) curve, the REC curve illustrates the trade-off between error tolerance and accuracy, with better performance indicated by curves trending toward the upper left. We also computed the Area Over the Curve (AOC) as a performance metric, where lower values indicate better predictive ability. From Author response image 5, the REC curves reveal that cross-cell type prediction performance is suboptimal. The y-axis represents prediction accuracy, while the x-axis denotes error tolerance for the natural logarithm of RNA half-life (ln(𝑡<sub>1/2</sub>), in minutes).

      Author response image 5.

      In response to the suggestion of combining 5' and 3' UTR sequence features in the regression model, we believe this approach may not be ideal. As shown in Figure S1D, the distribution of RNA half-lives between 5' and 3' UTRs is significantly different, reflecting their distinct regulatory roles. Additionally, the base composition differs, with 5' UTRs having a higher GC content compared to 3' UTRs. Combining these datasets would likely make the origin of the sequence (5' or 3' UTR) the most predictive feature, thereby reducing the model's interpretability. Furthermore, our MPRA results, derived from separate 5’ or 3’ UTR library, do not support a combined model, further suggesting this approach may not be suitable with our data.

      The conclusions regarding genetic variants are interesting, yet the main strength of the work involves identifying general sequence features that affect mRNA stability rather than specific variants. I wonder if the authors have considered to shift the focus of the motivation part to reflect that?

      We appreciated the reviewer’s suggestion. We have revised the abstract and introductions to emphasize the general UTR regulation. Here is the revised abstract:

      UTRs contain crucial regulatory elements for RNA stability, translation and localization, so their integrity is indispensable for gene expression. Approximately 3.7% of genetic variants associated with diseases occur in UTRs, yet a comprehensive understanding of UTR variant functions remains limited due to inefficient experimental and computational assessment methods. To systematically evaluate the effects of UTR variants on RNA stability, we established a massively parallel reporter assay on 6,555 UTR variants reported in human disease databases. We examined the RNA degradation patterns mediated by the UTR library in two cell lines, and then applied LASSO regression to model the influential regulators of RNA stability. We found that UA dinucleotides and UA-rich motifs are the most prominent destabilizing element. Gain of UA dinucleotide outlined mutant UTRs with reduced stability. Studies on endogenous transcripts indicate that high UA-dinucleotide ratios in UTRs promote RNA degradation. Conversely, elevated GC content and protein binding on UA dinucleotides protect high-UA RNA from degradation. Further analysis reveals polarized roles of UA-dinucleotide-binding proteins in RNA protection and degradation. Furthermore, the UA-dinucleotide ratio of both UTRs is a common characteristic of genes in innate immune response pathways, implying a coordinated stability regulation through UTRs at the transcriptomic level. We also demonstrate that stability-altering UTRs are associated with changes in biobank-based health indices, underscoring the importance of precise UTR regulation for wellness. Our study highlights the importance of RNA stability regulation through UTR primary sequences, paving the way for further exploration of their implications in gene networks and precision medicine.

      Plots presenting correlations (e.g., Figure 4A, 4C) are more informative when plotted as density plots (i.e., using colorscale to show density of the dots at each part of the plot).

      We greatly appreciate the reviewer's insightful suggestion regarding the use of density plots for presenting correlations. We have modified Figures 4A and 4C in the revised manuscript to implement density plotting. The updated figures now utilize a colorscale that highlights areas of high and low data density.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      In this study, Marocco and colleagues perform a deep characterization of the complex molecular mechanism guiding the recognition of a particular CELLmotif previously identified in hepatocytes in another publication. Having miR-155-3p with or without this CELLmotif as the initial focus, the authors identify 21 proteins differentially binding to these two miRNA versions. From there, they decided to focus on PCBP2. They elegantly demonstrate PCBP2 binding to the miR-155-3p WT version but not to the CELLmotif-mutated version. miR-1553p contains a hEXOmotif identified in a different report, whose recognition is largely mediated by another RNA-binding protein called SYNCRIP. Interestingly, mutation of the hEXOmotif contained in miR-155-3p did not only blunt SYNCRIP binding but also PCBP2 binding despite the maintenance of the CELLmotif. This indicates that somehow SYNCRIP binding is a pre-requisite for PCBP2 binding. EMSA assay confirms that SYNCRIP is necessary for PCBP2 binding to miR-155-3p, while PCBP2 is not needed for SYNCRIP binding. The authors aim to extend these findings to other miRNAs containing both motifs. For that, they perform a small-RNA-Seq of EVs released from cells knockdown for PCBP2 versus control cells, identifying a subset of miRNAs whose expression either increases or decreases. The assumption is that those miRNAs containing PCBP2-binding CELLmotif should now be less retained in the cell and go more to extracellular vesicles, thus reflecting a higher EV expression. The specific subset of miRNAs having both the CELLmotif and hEXOmotif (9 miRNAs) whose expressions increase in EVs due to PCBP2 reduction is also affected by knocking-down SYNCRIP in the sense that reduction of SYNCRIP leads to lower EV sorting. Further experiments confirm that PCBP2 and SYNCRIP bind to these 9 miRNAs and that knocking down SYNCRIP impairs their EV sorting.

      We thank this Reviewer for the time spent on our manuscript and for having appreciated our characterization of the present molecular mechanism controlling miRNA export/cellretention in hepatocytes.

      While the process studied in this work is novel and interesting, there are several aspects of this manuscript that should be improved:

      (1) First of all, the nature of the CELLmotif and the hEXOmotif they are studying is extremely confusing. For the CELLmotif, the authors seem to focus on the Core CELLmotif AUU A/G in some experiments and the extended 7-nucleotide version in others. The fact that these CELLmotif and hEXOmotif are not shown anywhere in the figures (I mean with the full nucleotide variability described in the original publications) but only referred to in the text further complicates the identification of the motifs and the understanding of the experiments. Moreover, I am not convinced that the sequences they highlight in grey correspond to the original CELLmotif in all cases. For instance, in the miR-155-3p sequence, GCAUU is highlighted in grey. However, the original CELLmotif is basically 7-nucleotide long: C, A/U, G/A/C, U, U/A, C/G/A, A/U/C or CAGUUCA in its more abundant version. I can only see clearly the presence of the Core CELLmotif AUUA in miR-155-3p; however, the last A is not highlighted in grey. It is true that there is some nucleotide variability in each position in the originally reported CELLmotif by the authors in ref. 5 and the hEXOmotifs in ref. 7; however, not all nucleotides are equally likely to be found in each position. This fact seems to be not to be taken into account by the authors as they took basically any sequence with any length and almost sequence combination as valid CELLmotif. This means that I cannot identify the CELLmotif in many cases among the ones they highlight in grey. Instead, they should really focus on the most predominant CELLmotif sequence or, instead, take a reduced subset of "more abundant" CELLmotif versions from the ones that could fit in the originally described CELLmotif. Altogether, the authors need to explain much better what they have considered as the CELLmotif, what is the Core CELLmotif and what is hEXOmotif in each case and restrict to the most likely versions of the CELLmotif and hEXOmotif.

      We thank the Reviewer for having raised this concern and indeed we must agree with her/him and therefore, we modified the text and the figure accordingly. 

      In brief, as now stated, with respect to the CELL motif, miR 155-3p, miR-155-5p, miR-181d5p, miR-3084-5p, miR-122b-3p, miR-192-5p, miR-26b-3p, miR-31-3p, miR-195a-5p and miR-421-3p have the Core CELL motif (AUUA/G) described by Garcia- Martin and colleagues (ref. 5). 

      Other miRNAs (miR-345-3p, miR-23a-5p and miR-214-3p) share the described CELL motif (ref.5) with the most frequent nucleotides, considering also the reported variability. Also for the hEXO motif described by Santangelo and collaborators (ref.7), the most frequent nucleotides defining the motif sequence have been taken into consideration. The motifs have been better highlighted in the new version of Fig. 1 panel C.

      (2) Validation of EV isolation method: first, a large part of Supplementary Figure 2 is not readable. EV markers seem to be enriched in EV isolates; however, more EV and cell markers should be assayed to fulfill MISEV guidelines.

      We apologize for the low quality of the figure. In order to address this issue, we replaced the Supplementary Figure 2 panel A (now panel B) and we added further EV markers (TSG101, Alix, Flotillin) in Supplementary Figure 2 panel B (now panel A). Notably, in the same Western blot analysis we also addressed the expression of SYNCRIP and PCBP2 (that were found in the cellular end EV compartments or only in the intracellular compartment respectively).

      (3) A key variable is missing in Supplementary Figure 2, which is whether PCBP2 or SYNCRIP knockdowns impair EV secretion rates. A quantification of the nr vesicles released per cell upon knocking down each of these factors would be essential to rule out that any of the effects seen throughout the paper are not due to reduced or enhanced EV production rather than miRNA sorting/retention.

      We addressed this issue by quantifying the number of EVs per cell in shPCBP2 or shSYNCRIP with respect to the shCTR conditions. Data are shown in the new Supplementary Figure 2 panel C and indicate that there are not significant differences on EVs production rate upon PCBP2 or SYNCRIP knockdown.

      (4) The EMSA experiment is important to support their claims. Given the weak bands that are shown, the authors need to show all their replicates to convince the readers that it is reproducible.

      We are aware that the signals appear faint; the experimental replicates showing the robustness of the observation are reported below. 

      Author response image 1.

      (5) Although the bindings of SYNCRIP and PCBP2 to miR-155-3p and other miRNAs having both hEXOmotif and CELLmotif seem clear, the need for SYNCRIP binding to allow for PCBP2-mediated cellular retention is counterintuitive. What happens to those miRNAs that only contain a CELLmotif in terms of cellular retention and SYNCRIP dependence for cellular retention? In this regard, a representative miRNA (miR-31-3p) is analyzed in several experiments, showing that PCBP2 does not bind to it unless a hEXOmotif is introduced (Figure 3). However, this type of experiment should definitely be extended to other miRNAs containing only CELLmotif without hEXOmotif.

      Based on the Reviewer’s suggestion we confirmed previous findings by extending the observation to further two miRNAs embedding the sole CELL-motif (miR-195a-5p and 4213p) whose sequences are reported in Figure 4C. Data relative to qPCR amplification are reported in Figure 4D, Figure 5 panels A-B, Figure 6 and new Supplementary Figure 3. They confirm that miRNAs only containing CELL motif are not EV-exported in dependence of SYNCRIP and are cell-retained independently of PCBP2 silencing. 

      (6) Along the same line, I am missing another important experiment: the artificial incorporation of CELLmotif. For example, miR-365-2-5p lacks a CELLmotif but has a hEXOmotif. Does PCBP2 bind to this miRNA upon incorporation of CELLmotif? Does this lead now to enhanced cellular retention of this miRNA?

      We are grateful for the Reviewer's concern. As suggested, we added RNA pull-down experiments for miR-365-2-5p in wild type form and in mutated form (with the inclusion of CELL motif). As reported in the new Figure 1 panel E, the addition of the CELL motif maintains SYNCRIP binding and allows PCBP2 interaction with this miRNA. 

      (7) What would be the net effect of knocking down both SYNCRIP and PCBP2 at the same time? Would this neutralize each other's effect or would the lack of one impose on the other? That could help in understanding the complex interplay between these two factors for mediating cellular retention and EV sorting.

      SYNCRIP and PCBP2 play opposite roles in the dynamics of miRNA retention/export. SYNCRIP is involved in the loading of miRNAs into EVs through the recognition of hEXO motif. Instead, PCBP2 is involved in cellular retention of miRNAs, acting as a negative regulator of SYNCRIP activity. PCBP2 binding and function requires both CELL-motif and SYNCRIP binding in order to negatively regulate miRNAs export into EVs.

      Being SYNCRIP silencing sufficient to cause miRNA retention (as shown in Supplementary Figure 3), we believe that the contemporary silencing of PCBP2 should not disclose any additional aspect on cellular retention and EV sorting dynamics.

      (8) The authors have here a great opportunity to shed some light on an unclear aspect of miRNA EV sorting and cellular retention: whether the RBPs go together with the miRNA to the EVs or not. While the original paper describing hEXOmotif found SYNCRIP in EVs, another publication (Jeppesen et al, Cell 2019; PMID: 30951670) later found this RBP being very scarce in small EVs compared to cellular bodies or large EVs (Supplementary Tables 3 and 4 in that publication). Can the authors find SYNCRIP and PCBP2 in the EVs? Another important question would be the colocalization of these RBPs in the place where the miRNA selection is supposed to take place: in multivesicular bodies (MVB). Is there a colocalization of these RBPs with MVBs in the cell?

      We are thankful for the Reviewer’s suggestions. As reported in Supplementary Figure 2A SYNCRIP is present in both the intracellular end EV compartment and PCBP2 is detectable only in the intracellular one.

      (9) In Figure 4C, the authors state in the text that CELLmotif and hEXOmotif are present in extra-seed region; however, for miR-181d-5p and miR-122-3p this is not true as their CELLmotifs fall within the seed sequence.

      We apologize for our mistake. While for hEXO motif, it is confirmed that it is present in extraseed region on all analyzed miRNAs (as in ref. 7), the CELL motif on the cited miRNAs is overlapping with the seed sequence. We modified the text accordingly.

      (10) The authors need to describe how they calculate the EV/cell ratio in gene expression in some experiments (for instance, Figures 1H, 4D, etc). Did they use any housekeeping gene for EV RNA content, the same RNA load, or some other alternative method to normalize EV vs cell RNA content?

      We apologize for having not well clarified the calculation of EV/cell ratio in the cited figures. Data are shown as ratio of miRNAs expression in EVs with respect to the intracellular compartment. Expression of miRNAs in both compartments are normalized with respect to the spike-in sequence (cel-miR-39-3p), included in miRNAs sample (EVs and intracellular samples). This is also better clarified in the Materials and Methods section.

      (11) I would suggest that the authors speculate a bit in the discussion section on how the interaction between PCBP2 and SYNCRIP takes place. Do they contain any potential interacting domain? The binding of one to the miRNA would impose a topological interference on the binding of the other?

      We now speculate on the interaction between PCBP2 and SYNCRIP in the discussion section. Briefly, we described that PCBP2 interaction with several proteins have been reported (as in PMID 19881509 and 10772858), indicating the C-terminal domain including also the two KH1 and KH2 regions as the domains with the highest propensity interaction with proteins. Also in the case of SYNCRIP binding, the domains of interaction with proteins have been reported (as in PMID 10734137, 29483512 and 16765914) and we should hypothesize that these domains represent conserved regions responsible for its interaction also with PCBP2. Moreover, we also discussed that upon the interaction between SYNCRIP and the miRNAs a topological switch can occur, impacting the affinity of PCBP2 for the same miRNAs. 

      Reviewer #2 (Public review):

      Summary:

      The author of this manuscript aimed to uncover the mechanisms behind miRNA retention within cells. They identified PCBP2 as a crucial factor in this process, revealing a novel role for RNA-binding proteins. Additionally, the study discovered that SYNCRIP is essential for PCBP2's function, demonstrating the cooperative interaction between these two proteins. This research not only sheds light on the intricate dynamics of miRNA retention but also emphasizes the importance of protein interactions in regulating miRNA behavior within cells.

      We thank this Reviewer for having appreciated our characterization of the molecular dynamics governing miRNA export/cell-retention in hepatocytes.

      Strengths:

      This paper makes important progress in understanding how miRNAs are kept inside cells. It identifies PCBP2 as a key player in this process, showing a new role for proteins that bind RNA. The study also finds that SYNCRIP is needed for PCBP2 to work, highlighting how these proteins work together. These discoveries not only improve our knowledge of miRNA behavior but also suggest new ways to develop treatments by controlling miRNA locations to influence cell communication in diseases. The use of liver cell models and thorough experiments ensures the results are reliable and show their potential for RNA-based therapies

      Weaknesses:

      Despite its strengths, the manuscript has several notable limitations. The study's exclusive focus on hepatocytes limits the applicability of the findings to other cell types and physiological contexts. While the interaction between PCBP2 and SYNCRIP is wellcharacterized, the manuscript lacks detailed insights into the structural basis of this interaction and the dynamic regulation of their binding. The generalization of the findings to a broader spectrum of miRNAs and RNA-binding proteins (RBPs) remains underexplored, leaving gaps in understanding the full scope of miRNA compartmentalization.

      Furthermore, the therapeutic implications of these findings, though promising, are not directly connected to specific disease models or clinical scenarios, reducing their immediate translational impact. The manuscript would also benefit from a deeper discussion of potential upstream regulators of PCBP2 and SYNCRIP and the influence of cellular or environmental factors on their activity. Additionally, it is important to note that SYNCRIP has already been recognized as a major regulator of miRNA loading in extracellular vesicles (EVs). However, the purity of EVs is a concern, as the author only performed crude extraction methods without further purification using an iodixanol density gradient. The study also lacks in vivo evidence of PCBP2's role in exosomal miRNA export.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Improve figure quality in some cases (Figures 1A, 4B, Supplementary Figure 2).

      Figures have been improved accordingly. 

      Reviewer #2 (Recommendations for the authors):

      Questions for the Authors:

      (1) Why was hepatocyte-specific data prioritized, and how generalizable are the findings to other cell types?

      This work is based on our previous publication (Santangelo et al., Cell Reports), concerning the identification of the RBP SYNCRIP as an actor in the loading machinery of miRNAs in Extracellular Vesicles in hepatocytes. Since  both SYNCRIP and  PCBP2 are  expressed in different cell types  (Keerthikumar et al., 2016, PMID: 26434508; https://www.bgee.org/gene/ENSMUSG00000056851), is conceivable  that our  findings can be translated also in other cellular systems. To formally proof this hypothesis seems out of the scope of this manuscript. 

      (2) Can the authors elaborate on the functional impact of PCBP2-mediated miRNA retention? Which biological pathways are directly influenced by miRNAs retained by PCBP2?

      We appreciate the suggestion; in line with this comment, we performed a Gene Ontology enrichment analysis of the targets of the retained miRNAs. In order to be the most exhaustive, we included both validated and predicted targets, respectively obtained from TarBase v9.0 database and DIANA-microT web server. As reported in the new figure 4, panel E, and in the new supplementary figure 4, the analysis highlighted several biological pathways collectively influenced by the PCBP2-dependent cell-retained miRNAs, including establishment of organelle localization, regulation of cell cycle and lymphocyte differentiation. 

      (3) What criteria were used to select the miRNAs (e.g., miR-155-3p) for this study?

      miR-155-3p was selected as initial bait for RNA pulldown based on the reported presence of Core CELL motif in AML12 cell line (PMID: 34937935).

      (4) How do the results using recombinant PCBP2 in RNA pull-down assays compare with those using native PCBP2 in cellular extracts?

      The RNA-pull down with recombinant PCBP2 confirms the evidence obtained by RNA-pull downs with cellular extracts. Indeed, PCBP2 interacts with miR-155-3p in the wild type form and this interaction is lost upon the mutation of the CELL motif. Moreover, this experiment highlights a direct and sequence specific interaction.  

      (5) How much protein was loaded for Western blot analysis?

      We’re sorry for not explaining the experimental procedure in depth. For protein expression analysis, as reported in supplementary figure 1 and 2, we loaded 30 µg of proteins. Half of the amount of the protein obtained upon either RNA pull-down or protein immunoprecipitation experiments performed using 2 mg of protein extract were analyzed. This information has been added to the methods section.

      Suggested experiments to strengthen the manuscript:

      (1) Purify EVs using an iodixanol density gradient to eliminate the possibility of soluble PCBP2 contamination.

      We appreciate this suggestion. In order to avoid the effect of PCBP2 contamination that represents a source of variability in the experiments, we evaluated its presence in the purified extracellular vesicles protein extracts. As reported in the new figure Supplementary 2A, PCBP2 is completely absent in EV extracts as assessed by Western Blot; thus, accordingly with MISEV guideline we followed the differential ultracentrifugation method for EV purification. 

      (2) Perform gain- and loss-of-function assays by overexpressing or silencing PCBP2 in various models to observe downstream changes in miRNA-dependent pathways.

      We chose to silence PCBP2 protein since its high expression in our cellular model. Overexpression of PCBP2 would probably have no other significant readout. We are aware that PCBP2 silencing would perturb miRNA biogenesis and in turn miRNA downstream pathway modulation. Indeed, its association with Dicer has been reported as propaedeutic to miRNA processing (Li, et al 2012, Cell metabolism). However, this aspect is out of the scope of the present manuscript, here we focus exclusively on PCBP2 role in the regulation of miRNA EV export. Moreover, to overcome the effects on miRNA processing we evaluated the expression level of each miRNA as ratio between the extracellular and intracellular compartment. 

      (3) Use a murine model with hepatocyte-specific PCBP2 knockout and track changes in EV miRNA content and their functional effects in target tissues.

      We took advantage of our murine cells silenced for PCBP2 and evaluated miRNA content.

      New functional assays (now included in the new figure 4F) with leukocytes obtained from C57BL/6J mice livers show a higher percentage of IFN-+ T cell, NK and myeloid cells upon shPCBP2 EVs treatment in comparison to the shCTR EVs; this suggests that PCBP2 silencing results into an EV-mediated modulation of the immune response.

      (4) Conduct co-culture experiments to assess EV-mediated intercellular communication between donor and recipient cells.

      We reasoned that co-culture experiments don’t limit the observed effect on EVs since the contribution of soluble factors can have a role on recipient cells. Conversely  the treatments with purified EVs, here performed,  allow the evaluation of the sole EV-mediated downstream effects.

      These experiments would provide insights into the PCBP2-SYNCRIP axis, broaden the applicability of the findings, and enhance their translational relevance.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Qin et al. set out to investigate the role of mechanosensory feedback during swallowing and identify neural circuits that generate ingestion rhythms. They use Drosophila melanogaster swallowing as a model system, focusing their study on the neural mechanisms that control cibarium filling and emptying in vivo. They find that pump frequency is decreased in mutants of three mechanotransduction genes (nompC, piezo, and Tmc), and conclude that mechanosensation mainly contributes to the emptying phase of swallowing. Furthermore, they find that double mutants of nompC and Tmc have more pronounced cibarium pumping defects than either single mutants or Tmc/piezo double mutants. They discover that the expression patterns of nompC and Tmc overlap in two classes of neurons, md-C and md-L neurons. The dendrites of md-C neurons warp the cibarium and project their axons to the subesophageal zone of the brain. Silencing neurons that express both nompC and Tmc leads to severe ingestion defects, with decreased cibarium emptying. Optogenetic activation of the same population of neurons inhibited filling of the cibarium and accelerated cibarium emptying. In the brain, the axons of nompC∩Tmc cell types respond during ingestion of sugar but do not respond when the entire fly head is passively exposed to sucrose. Finally, the authors show that nompC∩Tmc cell types arborize close to the dendrites of motor neurons that are required for swallowing, and that swallowing motor neurons respond to the activation of the entire Tmc-GAL4 pattern.

      Strengths:

      • The authors rigorously quantify ingestion behavior to convincingly demonstrate the importance of mechanosensory genes in the control of swallowing rhythms and cibarium filling and emptying

      • The authors demonstrate that a small population of neurons that express both nompC and Tmc oppositely regulate cibarium emptying and filling when inhibited or activated, respectively

      • They provide evidence that the action of multiple mechanotransduction genes may converge in common cell types

      Thank you for your insightful and detailed assessment of our work. Your constructive feedback will help to improve our manuscript.

      Weaknesses:

      • A major weakness of the paper is that the authors use reagents that are expressed in both md-C and md-L but describe the results as though only md-C is manipulated-Severing the labellum will not prevent optogenetic activation of md-L from triggering neural responses downstream of md-L. Optogenetic activation is strong enough to trigger action potentials in the remaining axons. Therefore, Qin et al. do not present convincing evidence that the defects they see in pumping can be specifically attributed to md-C.

      Thank you for your comments. This is important point that we did not adequately address in the original preprint. We have obtained imaging and behavioral results that strongly suggest md-C, rather than md-L, are essential for swallowing behavior.

      36 hours after the ablation of the labellum, the signals of md-L were hardly observable when GFP expression was driven by the intersection between Tmc-GAL4 & nompC-QF (see F Figure 3—figure supplement 1A). This observation indicates that the axons of md-L likely degenerated after 36 hours, and were unlikely to influence swallowing. Moreover, the projecting pattern of Tmc-GAL4 & nompC-QF>>GFP exhibited no significant changes in the brain post labellum ablation.

      Furthermore, even after labellum ablation for 36 hours, flies exhibited responses to light stimulation (see Figure 3—figure supplement 1B-C, Video 5) when ReaChR was expressed in md-C. We thus reasoned that md-C but not md-L, plays a crucial role in the swallowing process.

      • GRASP is known to be non-specific and prone to false positives when neurons are in close proximity but not synaptically connected. A positive GRASP signal supports but does not confirm direct synaptic connectivity between md-C/md-L axons and MN11/MN12.

      In this study, we employed the nSyb-GRASP, wherein the GRASP is expressed at the presynaptic terminals by fusion with the synaptic marker nSyb. This method demonstrates an enhanced specificity compared to the original GRASP approach.

      Additionally, we utilized +/ UAS-nSyb-spGFP1-10, lexAop-CD4-spGFP11 ; + / MN-LexA fruit flies as a negative control to mitigate potential false signals originating from the tool itself (Author response image 1, scale bar = 50μm). Beside the genotype Tmc-Gal4, Tub(FRT. Gal80) / UAS-nSyb-spGFP1-10, lexAop-CD4-spGFP11 ; nompC-QF, QUAS-FLP / MN-LexA fruit flies discussed in this manuscript, we also incorporated genotype Tmc-Gal4, Tub(FRT. Gal80) / lexAop-nSyb-spGFP1-10, UAS-CD4-spGFP11 ; nompC-QF, QUAS-FLP / MN-LexA fruit flies as a reverse control (Author response image 2). Unexpectedly, similar positive signals were observed, indicating that, positive signals may emerge due to close proximity between neurons even with nSyb-GRASP.

      Author response image 1.

      It should be noted that the existence of synaptic projections from motor neurons (MN) to md-C cannot be definitively confirmed at this juncture. At present, we can only posit the potential for synaptic connections between md-C and motor neurons. A more conclusive conclusion may be attainable with the utilization of comprehensive whole-brain connectome data in future studies.

      Author response image 2.

      • As seen in Figure 2—figure supplement 1, the expression pattern of Tmc-GAL4 is broader than md-C alone. Therefore, the functional connectivity the authors observe between Tmc expressing neurons and MN11 and 12 cannot be traced to md-C alone

      It is true that the expression pattern of Tmc-GAL4 is broader than that of md-C alone. Our experiments, including those flies expressing TNT in Tmc+ neurons, demonstrated difficulties in emptying (Figure 2A, 2D). Notably, we encountered challenges in finding fly stocks bearing UAS>FRT-STOP-P2X2. Consequently, we opted to utilize Tmc-GAL4 to drive UAS-P2X2 instead. We believe that the results further support our hypothesis on the role of md-C in the observed behavioral change in emptying.

      Overall, this work convincingly shows that swallowing and swallowing rhythms are dependent on several mechanosensory genes. Qin et al. also characterize a candidate neuron, md-C, that is likely to provide mechanosensory feedback to pumping motor neurons, but the results they present here are not sufficient to assign this function to md-C alone. This work will have a positive impact on the field by demonstrating the importance of mechanosensory feedback to swallowing rhythms and providing a potential entry point for future investigation of the identity and mechanisms of swallowing central pattern generators.

      Reviewer #2 (Public Review):

      In this manuscript, the authors describe the role of cibarial mechanosensory neurons in fly ingestion. They demonstrate that pumping of the cibarium is subtly disrupted in mutants for piezo, TMC, and nomp-C. Evidence is presented that these three genes are co-expressed in a set of cibarial mechanosensory neurons named md-C. Silencing of md-C neurons results in disrupted cibarial emptying, while activation promotes faster pumping and/or difficulty filling. GRASP and chemogenetic activation of the md-C neurons is used to argue that they may be directly connected to motor neurons that control cibarial emptying.

      The manuscript makes several convincing and useful contributions. First, identifying the md-C neurons and demonstrating their essential role for cibarium emptying provides reagents for further studying this circuit and also demonstrates the important of mechanosensation in driving pumping rhythms in the pharynx. Second, the suggestion that these mechanosensory neurons are directly connected to motor neurons controlling pumping stands in contrast to other sensory circuits identified in fly feeding and is an interesting idea that can be more rigorously tested in the future.

      At the same time, there are several shortcomings that limit the scope of the paper and the confidence in some claims. These include:

      a) the MN-LexA lines used for GRASP experiments are not characterized in any other way to demonstrate specificity. These were generated for this study using Phack methods, and their expression should be shown to be specific for MN11 and MN12 in order to interpret the GRASP experiments.

      Thanks for the suggestion. We have checked the expression pattern of MN-LexA, which is similar to MN-GAL4 used in previous work (Manzo et al., PNAS., 2012, PMID:22474379) . Here is the expression pattern:

      Author response image 3.

      b) There is also insufficient detail for the P2X2 experiment to evaluate its results. Is this an in vivo or ex vivo prep? Is ATP added to the brain, or ingested? If it is ingested, how is ATP coming into contact with md-C neuron if it is not a chemosensory neuron and therefore not exposed to the contents of the cibarium?

      The P2X2 experimental preparation was done ex vivo. We immersed the fly in the imaging buffer, as described in the Methods section under Functional Imaging. Following dissection and identification of the subesophageal zone (SEZ) area under fluorescent microscopy, we introduced ATP slowly into the buffer, positioned at a distance from the brain

      c) In Figure 3C, the authors claim that ablating the labellum will remove the optogenetic stimulation of the md-L neuron (mechanosensory neuron of the labellum), but this manipulation would presumably leave an intact md-L axon that would still be capable of being optogenetically activated by Chrimson.

      Please refer to the corresponding answers for reviewer 1 and Figure 3—figure supplement 1.

      d) Average GCaMP traces are not shown for md-C during ingestion, and therefore it is impossible to gauge the dynamics of md-C neuron activation during swallowing. Seeing activation with a similar frequency to pumping would support the suggested role for these neurons, although GCaMP6s may be too slow for these purposes.

      Profiling the dynamics of md-C neuron activation during swallowing is crucial for unraveling the operational model of md-C and validating our proposed hypothesis. Unfortunately, our assay faces challenges in detecting probable 6Hz fluorescent changes with GCaMP6s.

      In general, we observed an increase of fluorescent signals during swallowing, but movement of alive flies during swallowing influenced the imaging recording, so we could not depict a decent tracing for calcium imaging for md-C neurons. To enhance the robustness of our findings, patching the md-C neurons would be a more convincing approach. As illustrated in Figure 2, the somata of md-C neurons are situated in the cibarium rather than the brain. patching of the md-C neuron somata in flies during ingestion is difficult.

      e) The negative result in Figure 4K that is meant to rule out taste stimulation of md-C is not useful without a positive control for pharyngeal taste neuron activation in this same preparation.

      We followed methods used in the previous work (Chen et al., Cell Rep., 2019, PMID:31644916), which we believe could confirm that md-C do not respond to sugars.

      In addition to the experimental limitations described above, the manuscript could be organized in a way that is easier to read (for example, not jumping back and forth in figure order).

      Thanks for your suggestion and the manuscript has been reorganized.

      Reviewer #3 (Public Review):

      Swallowing is an essential daily activity for survival, and pharyngo-laryngeal sensory function is critical for safe swallowing. In Drosophila, it has been reported that the mechanical property of food (e.g. Viscosity) can modulate swallowing. However, how mechanical expansion of the pharynx or fluid content sense and control swallowing was elusive. Qin et al. showed that a group of pharyngeal mechanosensory neurons, as well as mechanosensory channels (nompC, Tmc, and Piezo), respond to these mechanical forces for regulation of swallowing in Drosophila melanogaster.

      Strengths:

      There are many reports on the effect of chemical properties of foods on feeding in fruit flies, but only limited studies reported how physical properties of food affect feeding especially pharyngeal mechanosensory neurons. First, they found that mechanosensory mutants, including nompC, Tmc, and Piezo, showed impaired swallowing, mainly the emptying process. Next, they identified cibarium multidendritic mechanosensory neurons (md-C) are responsible for controlling swallowing by regulating motor neuron (MN) 12 and 11, which control filling and emptying, respectively.

      Weaknesses:

      While the involvement of md-C and mechanosensory channels in controlling swallowing is convincing, it is not yet clear which stimuli activate md-C. Can it be an expansion of cibarium or food viscosity, or both? In addition, if rhythmic and coordinated contraction of muscles 11 and 12 is essential for swallowing, how can simultaneous activation of MN 11 and 12 by md-C achieve this? Finally, previous reports showed that food viscosity mainly affects the filling rather than the emptying process, which seems different from their finding.

      We have confirmed that swallowing sucrose water solution activated md-C neurons, while sucrose water solution alone could not (Figure 4J-K). We hypothesized that the viscosity of the food might influence this expansion process.

      While we were unable to delineate the activation dynamics of md-C neurons, our proposal posits that these neurons could be activated in a single pump cycle, sequentially stimulating MN12 and MN11. Another possibility is that the activation of md-C neurons acts as a switch, altering the oscillation pattern of the swallowing central pattern generator (CPG) from a resting state to a working state.

      In the experiments with w1118 flies fed with MC (methylcellulose) water, we observed that viscosity predominantly affects the filling process rather than the emptying process, consistent with previous findings. This raises an intriguing question. Our investigation into the mutation of mechanosensitive ion channels revealed a significant impact on the emptying process. We believe this is due to the loss of mechanosensation affecting the vibration of swallowing circuits, thereby influencing both the emptying and filling processes. In contrast, viscosity appears to make it more challenging for the fly to fill the cibarium with food, primarily attributable to the inherent properties of the food itself.

      Reviewer #4 (Public Review):

      A combination of optogenetic behavioral experiments and functional imaging are employed to identify the role of mechanosensory neurons in food swallowing in adult Drosophila. While some of the findings are intriguing and the overall goal of mapping a sensory to motor circuit for this rhythmic movement are admirable, the data presented could be improved.

      The circuit proposed (and supported by GRASP contact data) shows these multi-dendritic neurons connecting to pharyngeal motor neurons. This is pretty direct - there is no evidence that they affect the hypothetical central pattern generator - just the execution of its rhythm. The optogenetic activation and inhibition experiments are constitutive, not patterned light, and they seem to disrupt the timing of pumping, not impose a new one. A slight slowing of the rhythm is not consistent with the proposed function.

      Motor neurons implicated in patterned motions can be considered effectors of Central Pattern Generators (CPGs)(Marder et al., Curr Biol., 2001, PMID: 11728329; Hurkey et al., Nature., 2023, PMID:37225999). Given our observation of the connection between md-C neurons and motor neurons, it is reasonable to speculate that md-C neurons influence CPGs. Compared to the patterned light (0.1s light on and 0.1s light off) used in our optogenetic experiments, we noted no significant changes in their responses to continuous light stimulation. We think that optogenetic methods may lead to overstimulation of md-C neurons, failing to accurately mimic the expansion of the cibarium during feeding.

      Dysfunction in mechanosensitive ion channels or mechanosensory neurons not only disrupts the timing of pumping but also results in decreased intake efficiency (Figure 1E). The water-swallowing rhythm is generally stable in flies, and swallowing is a vital process that may involve redundant ion channels to ensure its stability.

      The mechanosensory channel mutants nompC, piezo, and TMC have a range of defects. The role of these channels in swallowing may not be sufficiently specific to support the interpretation presented. Their other defects are not described here and their overall locomotor function is not measured. If the flies have trouble consuming sufficient food throughout their development, how healthy are they at the time of assay? The level of starvation or water deprivation can affect different properties of feeding - meal size and frequency. There is no description of how starvation state was standardized or measured in these experiments.

      Defects in mechanosensory channel mutants nompC, piezo, and TMC, have been extensively investigated (Hehlert et al., Trends Neurosci., 2021, PMID:332570000). Mutations in these channels exhibit multifaceted effects, as illustrated in our RNAi experiments (see Figure 2E). Deprivation of water and food was performed in empty fly vials. It's important to note that the duration of starvation determines the fly's willingness to feed but not the pump frequency (Manzo et al., PNAS., 2012, PMID:22474379).

      In most cases, female flies were deprived water and food in empty vials for 24 hours because after that most flies would be willing to drink water. The deprivation time is 12 hours for flies with nompC and Tmc mutated or flies with Kir2.1 expressed in md-C neurons, as some of these flies cannot survive 24h deprivation.

      The brain is likely to move considerably during swallow, so the GCaMP signal change may be a motion artifact. Sometimes this can be calculated by comparing GCaMP signal to that of a co-expressed fluorescent protein, but there is no mention that this is done here. Therefore, the GCaMP data cannot be interpreted.

      We did not co-express a fluorescent protein with GCaMP for md-C. The head of the fly was mounted onto a glass slide, and we did not observe significant signal changes before feeding.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      .>Abstract: I disagree that swallow is the first step of ingestion. The first paragraph also mentions the final checkpoint before food ingestion. Perhaps sufficient to say that swallow is a critical step of ingestion.

      Indeed, it is not rigorous enough to say “first step”. This has been replaced by “early step”.

      Introduction:

      Line 59: "Silence" should be "Silencing"

      This has been replaced.

      Results:

      Lines 91-92: I am not clear about what this means. 20% of nompC and 20% of wild-type flies exhibit incomplete filling? So nompC is not different from wild-type?

      Sorry for the mistake. Viscous foods led to incomplete emptying (not incomplete filling), as displayed in Video 4. The swallowing behavior differs between nompC mutants and wild-type flies, as illustrated in Figure 1C, Figure 1—figure supplement 1A-C and video 1&5.

      When fed with 1% MC water solution (Figure 1—figure supplement 1E-H). We found that when fed with 1% MC watere solution, Tmc or piezo mutants displayed incomplete emptying, which could constitute a long time proportion of swallowing behavior; while only 20% of nompC flies and 20% of wild-type flies sporadically exhibit incomplete emptying, which is significantly different. Though the percent of flies displaying incomplete pump is similar between nompC mutant and wild-type files, you can find it quite different in video 1 and 5.

      Line 94: Should read: “while for foods with certain viscosity, the pump of Tmc or piezo mutants might"

      What evidence is there for weakened muscle motion? The phenotypes of all three mutants is quite similar, so concluding that they have roles in initiation versus swallowing strength is not well supported -this would be better moved to the discussion since it is speculative.

      Muscles are responsible for pumping the bolus from the mouth to the crop. In the case of Tmc or piezo mutants, as evidenced by incomplete filling for viscous foods (see Video 4), we speculate that the loss of sensory stimuli leads to inadequate muscle contraction. The phenotypes observed in Tmc and piezo mutants are similar yet distinct from those of the wild-type or nompC mutant, as shown in Video 1 and 4. The phrase "due to weakened muscle motion" has been removed for clarity.

      Line 146: If md-L neurons are also labeled by this intersection, then you are not able to know whether the axons seen in the brain are from md-L or md-C neurons. Line 148: cutting the labellum is not sufficient to ablate md-L neurons. The projections will still enter the brain and can be activated with optogenetics, even after severing the processes that reside in the labellum.

      Please refer to the responses for reviewer #1 (Public Review):” A major weakness of the paper…” and Figure 4.

      Line 162: If the fly head alone is in saline, do you know that the sucrose enters the esophagus? The more relevant question here is whether the md-C neurons respond to mechanical force. If you could artificially inflate the cibarium with air and see the md-C neurons respond that would be a more convincing result. So far you only know that these are activated during ingestion, but have not shown that they are activated specifically by filling or emptying. In addition, you are not only imaging md-C (md-L is also labeled). This caveat should be mentioned.

      We followed the methods outlined in the previous work (Chen et al., Cell Rep., 2019, PMID:31644916), which suggested that md-C neurons do not respond to sugars. While we aimed to mechanically stimulate md-C neurons, detecting signal changes during different steps of swallowing is challenging. This aspect could be further investigated in subsequent research with the application of adequate patch recording or two-photon microscopy (TPM).

      Figure 3: It is not clear what the pie charts in Figure 3 A refer to. What are the three different rows, and what does blue versus red indicate?

      Figure 3A illustrates three distinct states driven by CsChrimson light stimulation of md-C neurons, with the proportions of flies exhibiting each state. During light activation, flies may display difficulty in filling, incomplete filling, or a normal range of pumping. The blue and red bars represent the proportions of flies showing the corresponding state, as indicated by the black line.

      Figure 4: Where are the example traces for J? The comparison in K should be average dF/F before ingestion compared with average dF/F during ingestion. Comparing the in vitro response to sucrose to the in vivo response during ingestion is not a useful comparison.

      Please refer to the answers for reviewer #2 question d).

      Reviewer #2 (Recommendations For The Authors):

      Suggested experiments that would address some of my concerns listed in the public review include:

      a) high resolution SEZ images of MN-LexA lines crossed to LexAop-GFP to demonstrate their specificity

      b) more detail on the P2X2 experiment. It is hard to make suggestions beyond that without first seeing the details.

      c) presenting average GCaMP traces for all calcium imaging results

      d) to rule out taste stimulation of md-C (Figure 4K) I would suggest performing more extensive calcium imaging experiments with different stimuli. For example, sugar, water, and increasing concentrations of a neutral osmolyte (e.g. PEG) to suppress the water response. I think that this is more feasible than trying to get an in vitro taste prep to be convincing.

      Please refer to the responses for public review of reviewer #2.

      Reviewer #3 (Recommendations For The Authors):

      Below I list my suggestions as well as criticisms.

      (1) It would be excellent if the authors could demonstrate whether varying levels of food viscosity affect md-C activation.

      That is a good point, and could be studied in future work.

      (2) It is not clear whether an intersectional approach using TMC-GAL4 and nompC-QF abolishes labelling of the labellar multidendritic neurons. If this is the case, please show labellar multidendritic neurons in TMC-GAL4 only flies and flies using the intersectional approach. Along with this question, I am concerned that labellum-removed flies could be used for feeding assay.

      Intersectional labelling using TMC-GAL4 and nompC-QF could not abolish labelling of the labellar multidendritic neurons (Author response image 4). Labellum-removed flies could be used for feeding assay (Figure 3—figure supplement 1B-C, video 5), but once LSO or cibarium of fly was damaged, swallowing behavior would be affected. Removing labellum should be very careful.

      Author response image 4.

      (3) Please provide the detailed methods for GRASP and include proper control.

      Please refer to the responses for public review of reviewer #1.

      (4) The authors hypothesized that md-C sequentially activates MN11 and 12. Is the time gap between applying ATP on md-C and activation of MN11 or MN12 different? Please refer to the responses for public review of reviewer #3. The time gap between applying ATP on md-C and activation of MN11 or MN12 didn’t show significant differences, and we think the reason is that the ex vivo conditions could not completely mimic in vivo process.

      I found the manuscript includes many errors, which need to be corrected.

      (1) The reference formatting needs to be rechecked, for example, lines 37, 42, and 43.

      (2) Line 44-46: There is some misunderstanding. The role of pharyngeal mechanosensory neurons is not known compared with chemosensory neurons.

      (3) Line 49: Please specify which type of quality of food. Chemical or physical?

      (4) Line 80 and Figure 1B-D Authors need to put filling and emptying time data in the main figure rather than in the supplementary figure. Otherwise, please cite the relevant figures in the text(S1A-C).

      (5) Line 84-85; Is "the mutant animals" indicating only nompC? Please specify it.

      (6) Figure 1a: It is hard to determine the difference between the series of images. And also label filling and emptying under the time.

      (7) S1E-H: It is unclear what "Time proportion of incomplete pump" means. Please define it.

      (8) Please reorganize the figures to follow the order of the text, for example, figures 2 and 4

      (9) Figure 4A. There is mislabelling in Figure 4A. It is supposed to be phalloidin not nc82.

      (10) Figure 4K: It does not match the figure legend and main text.

      (11) Figure 4D and G: Please indicate ATP application time point.

      Thanks for your correction and all the points mentioned were revised.

      Reviewer #4 (Recommendations For The Authors):

      The figures need improvement. 1A has tiny circles showing pharynx and any differences are unclear.

      The expression pattern of some of these drivers (Supplement) seems quite broad. The tmc nompC intersection image in Figure 1F is nice but the cibarium images are hard to interpret: does this one show muscle expression? What are "brain" motor neurons? Where are the labellar multi-dendritic neurons?

      Tmc nompC intersection image show no expression in muscles. Somata of motor neurons 12 or 11 situated at SEZ area of brain, while somata of md-C neurons are in the cibarium. Image of md-L neurons was posted in response for reviewer #3 (Recommendations For The Authors):

      Why do the assays alternate between swallowing food and swallowing water?

      Thank for your suggestion, figure 1A has been zoomed-in. The Tmc nompC intersection image in Figure 2F displayed the position of md-C neurons in a ventral perspective, and muscles were not labelled. We stained muscles in cibarium by phalloidin and the image is illustrated in Figure 4A, while we didn’t find overlap between md-C neurons and muscles. Image of md-L neurons were posted as Author response image 4.

      In the majority of our experiments, we employed water to test swallowing behavior, while we used methylcellulose water solution to test swallowing behavior of mechanoreceptor mutants, and sucrose solution for flies with md-C neurons expressing GCaMP since they hardly drank water when their head capsules were open.

      How starved or water-deprived were the flies?

      One day prior to the behavioral assays, flies were transferred to empty vials (without water or food) for 24 hours for water deprivation. Flies who could not survive 24h deprivation would be deprived for 12h.

      How exactly was the pumping frequency (shown in Fig 1B) measured? There is no description in the methods at all. If the pump frequency is scored by changes in blue food intensity (arbitrary units?), this seems very subjective and maybe image angle dependent. What was camera frame rate? Can it capture this pumping speed adequately? Given the wealth of more quantitative methods for measuring food intake (eg. CAFE, flyPAD), it seems that better data could be obtained.

      How was the total volume of the cibarium measured? What do the pie charts in Figure 3A represent?

      The pump frequency was computed as the number of pumps divided by the time scale, following the methodology outlined in Manzo et al., 2012. Swallowing curves were plotted using the inverse of the blue food intensity in the cibarium. In this representation, ascending lines signify filling, while descending lines indicate emptying (see Figure 2D, 3B). We maintain objectivity in our approach since, during the recording of swallowing behavior, the fly was fixed, and we exclusively used data for analysis when the Region of Interest (ROI) was in the cibarium. This ensures that the intensity values accurately reflect the filling and emptying processes. Furthermore, we conducted manual frame-by-frame checks of pump frequency, and the results align with those generated by the time series analyzer V3 of ImageJ.

      For the assessment of total volume of ingestion, we referred the methods of CAFE, utilizing a measurable glass capillary. We then calculated the ingestion rate (nL/s) by dividing the total volume of ingestion by the feeding time.

      The changes seem small, in spite of the claim of statistical significance.

      The observed stability in pump frequency within a given genotype underscores the significance of even seemingly small changes, which is statistically significant. We speculate that the stability in swallowing frequency suggests the existence of a redundant mechanism to ensure the robustness of the process. Disruption of one channel might potentially be partially compensated for by others, highlighting the vital nature of the swallowing mechanism.

      How is this change in pump frequency consistent with defects in one aspect of the cycle - either ingestion (activation) or expulsion (inhibition)?

      Please refer to Figure 2, 3. Both filling and emptying process were affects, while inhibition mainly influences emptying time (Figure 1—figure supplement 1).

      for the authors:

      Line 48: extensively

      Line 62 - undiscovered.

      Line 107, 463: multi

      Line 124: What is "dysphagia?" This is an unusual word and should be defined.

      Line 446: severe

      Line 466: in the cibarium or not?

      Thanks for your correction and all the places mentioned were revised.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      We thank the reviewer for his/her very positive comments.

      Reviewer #2 (Public review):

      We thank the reviewer for his/her positive evaluation. We plan to add RNAseq data of yeast wild-type and JDP mutant strains as more direct readout for the role of Apj1 in controlling Hsf1 activity. We agree with the reviewer that our study includes one major finding: the central role of Apj1 in controlling the attenuation phase of the heat shock response. In accordance with the reviewer we consider this finding highly relevant and interesting for a broad readership. We agree that additional studies are now necessary to mechanistically dissect how the diverse JDPs support Hsp70 in controlling Hsf1 activity. We believe that such analysis should be part of an independent study but we will indicate this aspect as part of an outlook in the discussion section of a revised manuscript.

      Reviewer #3 (Public review):

      We thank the reviewer for his/her suggestions. We agree that it is sometimes difficult to distinguish direct effects of JDP mutants on heat shock regulation from indirect ones, which can result from the accumulation of misfolded proteins that titrate Hsp70 capacity. We also agree that an in vitro reconstitution of Hsf1 displacement from DNA by Apj1/Hsp70 will be important, also to dissect Apj1 function mechanistically. We will add this point as outlook to the revised manuscript.

      Reviewer #1 (Recommendations for the authors): 

      (1) Can the authors submit the raw translatome data to a standard repository? Also, the data should be summarized in a supplemental Excel table. 

      We submitted the raw translatome data to the NCBI Gene Expression Omnibus and added the analyzed data sets (shown in Figures 1 and 5) as Supplementary Tables S4/S5 (excel sheets). We additionally included RNAseq analysis of yeast WT and JDP mutants set grown at 25°C, complementing and confirming our former translatome analysis (new Figure 5, Figure Supplement 2). Respective transcriptome raw data were also deposited at the NCBI Gene Expression Omnibus and analyzed data are available as Supplementary Table S7.

      (2) MW indicators need to be added to the Western Blot figures. 

      We added molecular weight markers to the Western Blot figures.

      (3) Can the authors please include the sequences of the primers used in all the RT-qPCR experiments? They mention they are in the supplemental information, but I couldn't locate them. 

      We added the sequences of the RT-qPCR primers as Supplementary Table S4.

      (4) Given the clear mechanism proposed, it would be nice if the authors could provide a nice summary figure. 

      We followed the suggestion of the reviewer and illustrate our main finding as new Figure 7.

      Reviewer #2 (Recommendations for the authors): 

      (1) As mentioned above, a co-IP experiment between Hsf1 and Ssa1/2 in APJ1 and apj1∆ cells, utilizing Hsf1 alleles with and without the two known binding sites, would cement the assignment of Apj1 in the Hsf1 regulatory circuit. 

      We agree with the reviewer that Hsf1-Ssa1/2 pulldown experiments, as done by Pincus and colleagues (1), will further specify the role of Apj1 in targeting Hsp70 to Hsf1 during the attenuation phase of the heat shock response. We have tried extensively such pulldown experiments to document dissociation of Ssa1/2 from Hsf1 upon heat shock in yeast wild-type cells. While we could specifically detect Ssa1/2 upon Hsf-HA1 pulldown, our results after heat shock were highly variable and inconclusive and did not allow us to probe for a role of Apj1 or the two known Ssa1/2 binding sites in the phase-specific targeting. We now discuss the potential roles of the two distinct Ssa1/2 binding sites for phase-specific regulation of Hsf1 activity in the revised manuscript (page 12, lanes 17-21).

      (2) Experiments in Figure 3 nicely localize CHIP reactions with known HSEs. A final confirmatory experiment utilizing a mutated HSE (another classic experiment in the field) would cement this finding and validate the motif and reporter-based analysis. 

      We thank the reviewer for this meaningful suggestions. We have done something like this by using the non-Hsf1 regulated gene BUD3, which lacks HSEs, as reference. We engineered a counterpart, termed “BUD3 HS-UAS”, which bears inserted HSEs, derived from the native UAS of HSP82, within the BUD3 UAS. We show that BUD3<sup>+</sup> lacking HSEs is not occupied by Hsf1 and Apj1 under either non-stress or heat shock conditions while BUD3-HSE is clearly occupied under both, paralleling Hsf1 and Apj1 occupancy of HSP82 (Figure 3E). We have renamed the engineered allele to “BUD3-HSE” to clarify the experimental design and output.

      (3) Page 8 - the ydj1-4xcga allele is introduced without explaining why it's needed, since ydj1∆ cells are viable. The authors should acknowledge the latter fact, then justify why the RQC depletion approach is preferred. Especially since the ydj1∆ mutant appears in Figure 5B. 

      ydj1∆ cells are viable, yet they grow extremely slowly at 25°C and hardly at 30°C,  making them difficult to handle. The RQC-mediated depletion of Ydj1 in ydj1-4xcga cells allows for solid growth at 30°C, facilitating strain handling and analysis of Ydj1 function. Importantly, ydj1-4xcga cells are still temperature-sensitive and exhibit the same deregulation of the heat shock response upon combination with apj1D as observed for ydj1∆ cells. Thus ydj1 knockout and knockdown cells do not differ in the relevant phenotypes reported here and we performed most of the analysis with  ydj1-4xcga cells due to their growth advantage. We added a respective explanation to the text (page 8, lanes 13-14) .

      (4) The authors raise the possibility that Sis1, Apj1, and Ydj1 may all be competing for access to Ssa1/2 at different phases of the HSR, and that access may be dictated by conformational changes in Hsf1. Given that there are at least two known Hsp70 binding sites that have negative regulatory activity in Hsf1, the possibility that domain-specific association governs the different roles should be considered. It is also unclear how the JDPs are associating with Hsf1 differentially if all binding is through Ssa1/2. 

      We thank the reviewer for the comment and will add the possibility of specific roles of the identified Hsp70 binding sites in regulating Hsf1 activity at the different phases of the heat shock response to the discussion section. Binding of Ssa1/2 to substrates (including Hsf1) is dependent on J-domain proteins (JDPs), which differ in substrate specificity. It is tempting to speculate that the distinct JDPs recognize different sites in Hsf1 and are responsible for mediating the specific binding of Ssa1/2 to either N- or C-terminal sites in Hsf1. Thus, the specific binding of a JDP to Hsf1 might dictate the binding to Ssa1/2 to either binding site. We discuss this aspect in the revised manuscript (page 12, lanes 17-21).

      (5) Figure 6 - temperature sensitivity of hsf1 and ydj1 mutants has been linked to defects in the cell wall integrity pathway rather than general proteostasis collapse. This is easily tested via plating on osmotically supportive media (i.e., 1M sorbitol) and should be done throughout Figure 6 to properly interpret the results.

      Our data indicate proteostasis breakdown in ydj1 cells by showing strongly altered localization of Sis1-GFP, pointing to massive protein aggregation (Figure 6 – Figure Supplement  1D).

      We followed the suggestion of the reviewer and performed spot tests in presence of 1 M sorbitol (see figure below). The presence of sorbitol is improving growth of ydj1-4xcga mutant cells at increased temperatures, in agreement with the remark of the reviewer. We, however, do not think that growth rescue by sorbitol is pointing to specific defects of the ydj1 mutant in cell wall integrity. Sorbitol functions as a chemical chaperone and has been shown to have protective effects on cellular proteostasis and to rescue phenotypes of diverse point mutants in yeast cells by facilitating folding of the respective mutant proteins and suppressing their aggregation (2-4). Thus sorbitol can broadly restore proteostasis, which can also explain its effects on growth of ydj1 mutants at increased temperatures. Therefore the readout of the spot test with sorbitol is not unambiguous and we therefore prefer not showing it in the manuscript.

      Author response image 1.

      Serial dilutions of indicated yeast strains were spotted on YPD plates without and with 1 M sorbitol and incubated at indicated temperatures for 2 days.<br />

      Reviewer #3 (Recommendations for the authors): 

      (1) Line 154: Can the authors, by analysis, offer an explanation for why HSR attenuation varies between genes for the sis1-4xcga strain? Is it, for example, a consequence of that a hypomorph and not a knock is used, a mRNA turnover issue, or that Hsf1 has different affinities for the HSEs in the promoters? 

      We used the sis1-4xcga knock-down strain because Sis1 is essential for yeast viability. The point raised by the reviewer is highly valid and we extensively thought about the diverse consequences of Sis1 depletion on levels of e.g. translated BTN2 (minor impact) and HSP104 (strong impact) mRNA. We meanwhile performed transcriptome analysis and confirmed the specific impact of Sis1 depletion on HSP104 mRNA levels, while BTN2 mRNA levels remained much less affected (new Figure 5 - Figure Supplement 2A/B). We compared numbers and spacings of HSEs in the respective target genes but could not identify obvious differences. Hsf1 occupancy within the UAS region of both BTN2 and HSP104 is very comparable at three different time points of a 39°C heat shock: 0, 5 and 120 min, arguing against different Hsf1 affinities to the respective HSEs (5). The molecular basis for the target-specific derepression upon Sis1 depletion thus remains to be explored. We added a respective comment to the revised version of the manuscript (page 12, lanes 3-8) .

      (2) Line 194: The analysis of ChIP-seq is not very elaborated in its presentation. How specific is this interaction? Can it be ruled out by analysis that it is simply the highly expressed genes after the HS that lead to Apj1 appearing there? More generally: Can the data in the main figure be presented to give a more unbiased genome-wide view of the results?

      We overall observed a low number of Apj1 binding events in the UAS of genes. The interaction of Apj1 with HSEs is specific as we do not observe Apj1 binding to the UAS of well-expressed non-heat shock genes. Similarly, Apj1 does not bind to ARS504 (Figure S3 – Figure Supplement 1). We extended the description of our ChIP-seq analysis procedures leading to the identification of HSEs as Apj1 target sites to make it easier to understand the data analysis. We additionally re-analysed the two Apj1 binding peaks that did not reveal an HSE in our original analysis. Using a modified setting we can identify a slightly degenerated HSE in the promoter region of the two genes (TMA10, RIE1) and changed Figure 3C accordingly. Notably, TMA10 is a known target gene of Hsf1. The expanded analysis is further documenting the specificity of the Apj1 binding peaks.

      (3) Line 215. Figure 3. The clear anticorrelation is puzzling. Presumably, Apj1 binds Hsf1 as a substrate, and then a straight correlation is expected: When Hsf1 substrate levels decrease at the promoters, also Apj1 signal is predicted to decrease. What explanations could there be for this? Is it, for example, that Hsf1 is not always available as a substrate on every promoter, or is Apj1 tied up elsewhere in the cell/nucleus early after HS? 

      We propose that Apj1 binds HSE-bound Hsf1 only after clearance of nuclear inclusions, which form upon heat stress. Apj1 thereby couples the restoration of nuclear proteostasis to the attenuation of the heat shock response. This explains the delayed binding of Apj1 to HSEs (via Hsf1), while Hsf1 shows highest binding upon activation of the heat shock response (early timepoints). Notably, the binding efficiency of Hsf1 and Apj1 (% input) largely differ, as we determine strong binding of Hsf1 five min post heat shock (30-40% of input), whereas maximal 3-4% of the input is pulled down with Apj1 (60 min post heat shock) (Figure 3D). Even at this late timepoint 10-20% of the input is pulled down with Hsf1. The diverse kinetics and pulldown efficiencies suggest that Apj1 displaces Hsf1 from HSEs and accordingly Hsf1 stays bound to HSEs in apj1D cells (Figure 4). This activity of Apj1 explains the anti-correlation: increased targeting of Apj1 to HSE-bound Hsf1 will lower the absolute levels of HSE-bound Hsf1. What we observe in the ChIP experiment at the individual timepoints is a snapshot of this reaction. Accordingly, at the last timepoint (120 min after heat shock ) analyzed, we observe low binding of both Hsf1 and Apj1 as the heat shock response has been shut down.

      (4) Line 253: "Sis-depleted".  

      We have corrected the mistake.

      (5) Line 332: Fig. 6C SIS1 OE from pRS315. A YIP would have been better, 20% of the cells will typically not express a protein with a CEN/ARS of the pRS-series so the Sis1 overexpression phenotype may be underestimated and this may impact on the interpretation. 

      We agree with the reviewer that Yeast Integrated Plasmids (YIP) represent the gold standard for complementation assays. We are not aware of a study showing that 20% of cells harboring pRS-plasmids do not express the encoded protein. The results shown in Fig. 8C/D demonstrate that even strong overproduction of Sis1 cannot restore Hsf1 activity control. This interpretation also will not be affected assuming that a certain percentage of these cells do not express Sis1. Nevertheless, we added a comment to the respective section pointing to the possibility that the Sis1 effect might be underestimated due to variations in Sis1 expression (page 11, lanes 15-19).

      (6) Figure 1C. Since n=2, a more transparent way of showing the data is the individual data points. It is used elsewhere in the manuscript, and I recommend it. 

      We agree that showing individual data points can enhance transparency, particularly with small sample sizes. However, the log2 fold change (log2FC) values presented in Figure 1C and other figures derived from ribosome profiling and RNAseq experiments were generated using the DESeq2 package. This DeSeq2 pipeline is widely used in analyzing differential gene expression and known for its statistical robustness. It performs differential expression analysis based on a model that incorporates normalization, dispersion estimation, and shrinkage of fold changes. The pipeline automatically accounts for biological, technical variability, and batch effects, thereby improving the reliability of results. These log2FC values are not directly calculated from log-transformed normalized counts of individual samples but are instead estimated from a fitted model comparing group means. Therefore, the individual values of replicates in DESeq2 log2FC cannot be shown.

      (7) Figure 1D. Please add the number of minutes on the X-axis. Figure legend: "Cycloheximide" is capitalized.  

      We revised the figure and figure legend as recommended.

      (8) Several figure panels: Statistical tests and SD error bars for experiments performed in duplicates simply feel wrong for this reviewer. I do recognize that parts of the community are calculating, in essence, quasi-p-values using parametric methods for experiments with far too low sample numbers, but I recommend not doing so. In my opinion, better to show the two data points and interpret with caution.

      We followed the advice of the reviewer and removed statistical tests for experiments based on duplicates.

      References

      (1) Krakowiak, J., Zheng, X., Patel, N., Feder, Z. A., Anandhakumar, J., Valerius, K. et al. (2018) Hsf1 and Hsp70 constitute a two-component feedback loop that regulates the yeast heat shock response eLife 7,

      (2) Guiberson, N. G. L., Pineda, A., Abramov, D., Kharel, P., Carnazza, K. E., Wragg, R. T. et al. (2018) Mechanism-based rescue of Munc18-1 dysfunction in varied encephalopathies by chemical chaperones Nature communications 9, 3986

      (3) Singh, L. R., Chen, X., Kozich, V., and Kruger, W. D. (2007) Chemical chaperone rescue of mutant human cystathionine beta-synthase Mol Genet Metab 91, 335-342

      (4) Marathe, S., and Bose, T. (2024) Chemical chaperone - sorbitol corrects cohesion and translational defects in the Roberts mutant bioRxiv  10.1101/2024.09.04.6109452024.2009.2004.610945

      (5) Pincus, D., Anandhakumar, J., Thiru, P., Guertin, M. J., Erkine, A. M., and Gross, D. S. (2018) Genetic and epigenetic determinants establish a continuum of Hsf1 occupancy and activity across the yeast genome Mol Biol Cell 29, 3168-3182

    1. Author Response

      The following is the authors’ response to the original reviews.

      Thank you for organizing the reviews for our manuscript: Behavioral entrainment to rhythmic auditory stimulation can be modulated by tACS depending on the electrical stimulation field properties,” and for the positive eLife assessment. We also thank the reviewers for their constructive comments. We have addressed every comment, which has helped to improve the transparency and readability of the manuscript. The main changes to the manuscript are summarized as follows:

      1. Surrogate distributions were created for each participant and session to estimate the effect of tACS-phase lag on behavioral entrainment to the sound that could have occurred by chance or because of our analysis method (R1). The actual tACS-amplitude effects were normalized relative to the surrogate distribution, and statistical analysis was performed on the normalized (z-score) values. This analysis did not change our main outcome: that tACS modulates behavioral entrainment to the sound depending on the phase lag between the auditory and the electrical signals. This analysis has now been incorporated into the Results section and in Fig. 3c-d.

      2. Two additional supplemental figures were created to include the single-participant data related to Fig. 3b and 3e (R2).

      3. Additional editing of the manuscript has been performed to improve the readability.

      Below, you will find a point-by-point response to the reviewers’ comments.

      Reviewer #1 (Public Review):

      We are grateful for the reviewer’s positive assessment of the potential impact of our study. The reviewer’s primary concerns were 1) the tACS lag effects reported in the manuscript might be noise because of the realignment procedure, and 2) no multiple comparisons correction was conducted in the model comparison procedure.

      In response to point 1), we have reanalyzed the data in exactly the manner prescribed by the reviewer. Our effects remain, and the new control analysis strengthens the manuscript. 2) In the context of model comparison, the model selection procedure was not based on evaluating the statistical significance of any model or predictor. Instead, the single model that best fit the data was selected as the model with the lowest Akaike’s information criterion (AIC), and its superiority relative to the second-best model was corroborated using the likelihood ratio test. Only the best model was evaluated for significance and analyzed in terms of its predictors and interactions. This model is an omnibus test and does not require multiple comparison correction unless there are posthoc decompositions. For similar approaches, see (Kasten et al., 2019).

      Below, we have responded to each comment specifically or referred to this general comment.

      Summary of what the authors were trying to achieve.

      This paper studies the possible effects of tACS on the detection of silence gaps in an FM-modulated noise stimulus. Both FM modulation of the sound and the tACS are at 2Hz, and the phase of the two is varied to determine possible interactions between the auditory and electric stimulation. Additionally, two different electrode montages are used to determine if variation in electric field distribution across the brain may be related to the effects of tACS on behavioral performance in individual subjects.

      Major strengths and weaknesses of the methods and results.

      The study appears to be well-powered to detect modulation of behavioral performance with N=42 subjects. There is a clear and reproducible modulation of behavioral effects with the phase of the FM sound modulation. The study was also well designed, combining fMRI, current flow modeling, montage optimization targeting, and behavioral analysis. A particular merit of this study is to have repeated the sessions for most subjects in order to test repeat-reliability, which is so often missing in human experiments. The results and methods are generally well-described and well-conceived. The portion of the analysis related to behavior alone is excellent. The analysis of the tACS results is also generally well described, candidly highlighting how variable results are across subjects and sessions. The figures are all of high quality and clear. One weakness of the experimental design is that no effort was made to control for sensation effects. tACS at 2Hz causes prominent skin sensations which could have interacted with auditory perception and thus, detection performance.

      The reviewer is right that we did not control for the sensation effects in our paradigm. We asked the participants to rate the strength of the perceived stimulation after each run. However, this information was used only to assess the safety and tolerability of the stimulation protocol. Nevertheless, we did not consider controlling for skin sensations necessary given the within-participant nature of our design (all participants experienced all six tACS–audio phase lag conditions, which were identical in their potential to cause physical sensations; the only difference between conditions was related to the timing of the auditory stimulus). That is, while the reviewer is right that 2-Hz tACS can indeed induce skin sensation under the electrodes, in this study, we report the effects that depend on the tACS-phase lag relative to the FM-stimulus. Note that the starting phase of the FM-stimulus was randomized across trials within each block (all six tACS audio lags were presented in each block of stimulation). We have no reason to expect the skin sensation to change with the tACS-audio lag from trial to trial, and therefore do not consider this to be a confound in our design. We have added some sentences with this information to the Discussion section:

      Pages 16-17, lines 497-504: “Note that we did not control for the skin sensation induced by 2-Hz tACS in this experiment. Participants rated the strength of the perceived stimulation after each run. However, this information was used only to assess the safety and tolerability of the stimulation protocol. It is in principle possible that skin sensation would depend on tACS phase itself. However, in this study, we report effects that depend on the relationship between tACS-phase and FM-stimulus phase, which changed from trial to trial as the starting phase of the FM-stimulus was randomized across trials. We have no reason to expect the skin sensation to change with the tACS-audio lag and therefore do not consider this to be a confound in our data.”

      Appraisal of whether the authors achieved their aims, and whether the results support their conclusions.

      Unfortunately, the main effects described for tACS are encumbered by a lack of clarity in the analysis. It does appear that the tACS effects reported here could be an artifact of the analysis approach. Without further clarification, the main findings on the tACS effects may not be supported by the data.

      Likely impact of the work on the field, and the utility of the methods and data to the community.

      The central claim is that tACS modulates behavioral detection performance across the 0.5s cycle of stimulation. However, neither the phase nor the strength of this effect reproduces across subjects or sessions. Some of these individual variations may be explainable by individual current distribution. If these results hold, they could be of interest to investigators in the tACS field.

      The additional context you think would help readers interpret or understand the significance of the work.

      The following are more detailed comments on specific sections of the paper, including details on the concerns with the statistical analysis of the tACS effects.

      The introduction is well-balanced, discussing the promise and limitations of previous results with tACS. The objectives are well-defined.

      The analysis surrounding behavioral performance and its dependence on the phase of the FM modulation (Figure 3) is masterfully executed and explained. It appears that it reproduces previous studies and points to a very robust behavioral task that may be of use in other studies.

      Again, we would like to thank the reviewer for the positive assessment of the potential impact of our work and for the thoughtful comments regarding the methodology. For readability in our responses, we have numbered the comments below.

      1. There is a definition of tACS(+) vs tACS(-) based on the relative phase of tACS that may be problematic for the subsequent analysis of Figures 4 and 5. It seems that phase 0 is adjusted to each subject/session. For argument's sake, let's assume the curves in Fig. 3E are random fluctuations. Then aligning them to best-fitting cosine will trivially generate a FM-amplitude fluctuation with cosine shape as shown in Fig. 4a. Selecting the positive and negative phase of that will trivially be larger and smaller than a sham, respectively, as shown in Fig 4b. If this is correct, and the authors would like to keep this way of showing results, then one would need to demonstrate that this difference is larger than expected by chance. Perhaps one could randomize the 6 phase bins in each subject/session and execute the same process (fit a cosine to curves 3e, realign as in 4a, and summarize as in 4b). That will give a distribution under the Null, which may be used to determine if the contrast currently shown in 4b is indeed statistically significant.

      We agree with the reviewer’s concerns regarding the possible bias induced by the realignment procedure used to estimate tACS effects. Certainly, when adjusting phase 0 to each participant/session’s best tACS phase (peak in the fitting cosine), selecting the positive phase of the realigned data will be trivially larger than sham (Fig. 4a). This is why the realigned zero-phase and opposite phase (trough) bins were excluded from the analysis in Fig. 4b. Therefore, tACS(+) vs. tACS(-) do not represent behavioral entrainment at the peak positive and negative tACS lags, as both bins were already removed from the analysis. tACS(+) and tACS(-) are the averages of two adjacent bins from the positive and negative tACS lags, respectively (Zoefel et al., 2019). Such an analysis relies on the idea that if the effect of tACS is sinusoidal, presenting the auditory stimulus at the positive half cycle should be different than when the auditory stimulus lags the electrical signal by the other half. If the effect of tACS was just random noise fluctuations, there is no reason to assume that such fluctuations would be sinusoidal; therefore, any bias in estimating the effect of tACS should be removed when excluding the peak to which the individual data were realigned. Similar analytical procedures have been used previously in the literature (Riecke et al., 2015; Riecke et al., 2018). We have modified the colors in Fig. 4a and 4c (former 4b) and added a new panel to the figure (new 4b) to make the realignment procedure, including the exclusion of the realigned peak and trough data, more visually obvious.

      Moreover, we very much like the reviewer’s suggestion to normalize the magnitude of the tACS effect using a permutation strategy. We performed additional analyses to normalize our tACS effect in Fig. 4c by the probability of obtaining the effect by chance. For each subject and session, tACS-phase lags were randomized across trials for a total of 1000 iterations. For each iteration, the gaps were binned by the FM-stimulus phase and tACS-lag. For each tACS-lag, the amplitude of behavioral entrainment to the FM-stimulus was estimated (FM-amplitude), as shown in Fig. 3. Similar to the original data, a second cosine fit was estimated for the FM-amplitude by tACS-lag. Optimal tACS-phase was estimated from the cosine fit and FM-amplitude values were realigned. Again, the realigned phase 0 and trough were removed from the analysis, and their adjacent bins were averaged to obtain the FM-amplitude at tACS(+) and tACS(−), as shown in Fig. 4c. We then computed the difference between 1) tACS(+) and sham, 2) tACS(-) and sham, and 3) tACS(+) and tACS (-), for the original data and the permuted datasets. This procedure was performed for each participant and session to estimate the size of the tACS effect for the original and surrogate data. The original tACS effects were transformed to z-scores using surrogate distributions, providing us with an estimate of the size of the real effect relative to chance. We then computed one-sample t-tests to compare whether the effects of tACS were statistically significant. In fact, this analysis showed that the tACS effects were still statistically significant. This analysis has been added to the Results and Methods sections and is included in Figure 4d.

      Page 10, lines 282-297: “In order to further investigate whether the observed tACS effect was significantly larger than chance and not an artifact of our analysis procedure (33), we created 1000 surrogate datasets per participant and session by permuting the tACS lag designation across trials. The same binning procedure, realignment, and cosine fits were applied to each surrogate dataset as for the original data. This yielded a surrogate distribution of tACS(+) and tACS(-) values for each participant and session. These values were averaged across sessions since the original analysis did not show a main effect of session. We then computed the difference between tACS(+) and sham, tACS(-) and sham, and tACS(+) and tACS(-), separately for the original and surrogate datasets. The obtained difference for the original data where then z-scored using the mean and standard deviation of the surrogate distribution. Note that in this case we used data of all 42 participants who had at least one valid session (37 participants with both sessions). Three one-sample t-tests were conducted to investigate whether the size of the tACS effect obtained in the original data was significantly larger than that obtained by chance (Fig. 4d). This analysis showed that all z-scores were significantly higher than zero (all t(41) > 2.36, p < 0.05, all p-values corrected for multiple comparisons using the Holm-Bonferroni method).”

      Page 31, lines 962-972: “To further control that the observed tACS effects were not an artifact of the analysis procedure, the difference between the tACS conditions (sham, tACS(+), and tACS(-)) were normalized using a permutation approach. For each participant and session, 1000 surrogate datasets were created by permuting the tACS lag designation across trials. The same binning procedure, realignment, and cosine fits were applied to each surrogate dataset as for the original data (see above). FM-amplitude at sham, tACS(+) and tACS(-) were averaged across sessions since the original analysis did not show a main effect of session. Difference between tACS conditions were estimated for the original and surrogate datasets and the resulting values from the original data were z-scored using the mean and standard deviation from the surrogate distributions. One-sample t-tests were conducted to test the statistical significance of the z-scores. P-values were corrected for multiple comparisons using the Holm-Bonferroni method.”

      1. Results of Fig 5a and 5b seem consistent with the concern raised above about the results of Fig. 4. It appears we are looking at an artifact of the realignment procedure, on otherwise random noise. In fact, the drop in "tACS-amplitude" in Fig. 5c is entirely consistent with a random noise effect.

      Please see our response to the comment above.

      1. To better understand what factors might be influencing inter-session variability in tACS effects, we estimated multiple linear models ..." this post hoc analysis does not seem to have been corrected for multiple comparisons of these "multiple linear models". It is not clear how many different things were tried. The fact that one of them has a p-value of 0.007 for some factors with amplitude-difference, but these factors did not play a role in the amplitude-phase, suggests again that we are not looking at a lawful behavior in these data.

      We suspect that the reviewer did not have access to the supplemental materials where all tables (relevant here is Table S3) are provided. This post hoc analysis was performed as an exploratory analysis to better understand the factors that could influence the inter-session variability of tACS effects. In Table S3, we provide the formula for each of the seven models tested, including their Akaike information criteria corrected for small samples (AICc), R2, F, and p-values. As described in the methods section, the winning model was selected as the model with the smallest AICc. A similar procedure has been previously used in the literature (Kasten et al., 2019). Moreover, to ensure that our winning model was better at explaining the data than the second-best unrestricted model, we used the likelihood ratio test. After choosing the winning model and before reporting the significance of the predictors, we examined the significance of the model in and of itself, taking into account its R2 as well as F- and p-values relative to a constant model. Thus, only one model is being evaluated in terms of statistical significance. Therefore, to our understanding, there are no multiple comparisons to correct for. We added the information regarding the selection procedure, hoping this will make the analysis clearer.

      See page 12, lines 354-360: “This model was selected because it had the smallest Akaike’s information criterion (corrected for small samples), AICc. Moreover, the likelihood ratio test showed no evidence for choosing the more complex unrestricted model (stat = 2.411, p = 0.121). Following the same selection criteria, the winning model predicting inter-session variability in tACS-phase, included only the factor gender (Table S4). However, this model was not significant in and of itself when compared to a constant model (F-statistic vs. constant model: 3.05, p = 0.09, R2 = 0.082).”

      1. "So far, our results demonstrate that FM-stimulus driven behavioral modulation of gap detection (FM-amplitude) was significantly affected by the phase lag between the FM-stimulus and the tACS signal (Audio-tACS lag) ..." There appears to be nothing in the preceding section (Figures 4 and 5) to show that the modulation seen in 3e is not just noise. Maybe something can be said about 3b on an individual subject/session basis that makes these results statistically significant on their own. Maybe these modulations are strong and statistically significant, but just not reproducible across subjects and sessions?

      Please see our response to the first comment regarding the validity of our analysis for proving the significant effect of tACS lag on modulating behavioral entrainment to the FM-stimulus (FM-amplitude), and the new control analysis. After performing the permutation tests, to make sure the reported effects are not noise, our statistical analysis still shows that tACS-lag does significantly modulate behavioral entrainment to the sound (FM-amplitude). Thus, the reviewer is right to say “these modulations are strong and statistically significant, just not reproducible across subjects and sessions”. In this regard, we consider our evaluation of session-to-session reliability of tACS effects is of high relevance for the field, as this is often overlooked in the literature.

      1. "Inter-individual variability in the simulated E-field predicts tACS effects" Authors here are attempting to predict a property of the subjects that was just shown to not be a reliable property of the subject. Authors are picking 9 possible features for this, testing 33 possible models with N=34 data points. With these circumstances, it is not hard to find something that correlates by chance. And some of the models tested had interaction terms, possibly further increasing the number of comparisons. The results reported in this section do not seem to be robust, unless all this was corrected for multiple comparisons, and it was not made clear?

      We thank the reviewer very much for this comment. While the reviewer is right that in these models, we are trying to predict an individual property (tACS-amplitude) that was not test–retest reliable across sessions, we still consider this to be a valid analysis. Here, we take the tACS-amplitude averaged across sessions, trying to predict the probability of a participant to be significantly modulated by tACS, in general, regardless of day-to-day variability. Regarding the number of multiple regression models, how we chose the winning model and the appropriateness/need of multiple-comparisons correction in this case, please see our explanation under “Reviewer 1 (Public review)” and our response to comment 3.

      1. "Can we reduce inter-individual variability in tACS effects ..." This section seems even more speculative and with mixed results.

      We agree with the reviewer that this section is a bit speculative. We are trying to plant some seeds for future research can help move the field forward in the quest for better stimulation protocols. We have added a sentence at the end of the section to explicitly say that more evidence is needed in this regard.

      Page 14, lines 428-429: “At this stage, more evidence is needed to prove the superiority of individually optimized tACS montages for reducing inter-individual variability in tACS effects.”

      Given the concerns with the statistical analysis above, there are concerns about the following statements in the summary of the Discussion:

      1. "2) does modulate the amplitude of the FM-stimulus induced behavioral modulation (FM-amplitude)"

      This seems to be based on Figure 4, which leaves one with significant concerns.

      Please see response to comment 1. We hope the reviewer is satisfied with our additional analysis to make sure the effect of tACS here reported is not noise.

      1. "4) individual variability in tACS effect size was partially explained by two interactions: between the normal component of the E-field and the field focality, and between the normal component of the E-field and the distance between the peak of the electric field and the functional target ROIs."

      The complexity of this statement alone may be a good indication that this could be the result of false discovery due to multiple comparisons.

      We respectfully disagree with the reviewer’s opinion that this is a complex statement. We think that these interaction effects are very intuitive as we explain in the results and discussion sections. These significant interactions show that for tACS to be effective, it matters that current gets to the right place and not to irrelevant brain regions. We believe this finding is of great importance for the field, since most studies on the topic still focus mostly on predicting tACS effects from the absolute field strength and neglect other properties of the electric field.

      For the same reasons as stated above, the following statements in the Abstract do not appear to have adequate support in the data:

      "We observed that tACS modulated the strength of behavioral entrainment to the FM sound in a phase-lag specific manner. ... Inter-individual variability of tACS effects was best explained by the strength of the inward electric field, depending on the field focality and proximity to the target brain region. Spatially optimizing the electrode montage reduced inter-individual variability compared to a standard montage group."

      Please see response to all previous comments

      In particular, the evidence in support of the last sentence is unclear. The only finding that seems related is that "the variance test was significant only for tACS(-) in session 2". This is a very narrow result to be able to make such a general statement in the Abstract. But perhaps this can be made clearer.

      We changed this sentence in the abstract to:

      Page 2, lines 41-43: “Although additional evidence is necessary, our results also provided suggestive insights that spatially optimizing the electrode montage could be a promising tool to reduce inter-individual variability of tACS effects.”

      Reviewer #3 (Public Review):

      In "Behavioral entrainment to rhythmic auditory stimulation can be modulated by tACS depending on the electrical stimulation field properties" Cabral-Calderin and collaborators aimed to document 1) the possible advantages of personalized tACS montage over standard montage on modulating behavior; 2) the inter-individual and inter-session reliability of tACS effects on behavioral entrainment and, 3) the importance of the induced electric field properties on the inter-individual variability of tACS.

      To do so, in two different sessions, they investigated how the detection of silent gaps occurring at random phases of a 2Hz- amplitude modulated sound could be enhanced with 2Hz tACS, delivered at different phase lags. In addition, they evaluated the advantage of using spatially optimized tACS montages (information-based procedure - using anatomy and functional MRI to define the target ROI and simulation to compare to a standard montage applied to all participants) on behavioral entrainment. They first show that the optimized and the standard montages have similar spatial overlap to the target ROI. While the optimized montage induced a more focal field compared to the standard montage, the latter induced the strongest electric field. Second, they show that tACS does not modify the optimal phase for gap detection (phase of the frequency-modulated sound) but modulates the strength of behavioral entrainment to the frequency-modulated sound in a phase-lag specific manner. However, and surprisingly, they report that the optimal tACS lag, and the magnitude of the phasic tACS effect were highly variable across sessions. Finally, they report that the inter-individual variability of tACS effects can be explained by the strength of the inward electric field as a function of the field focality and on how well it reached the target ROI.

      The article is interesting and well-written, and the methods and approaches are state-of-the-art.

      Strengths:

      • The information-based approach used by the authors is very strong, notably with the definition of subject-specific targets using a fMRI localizer and the simulation of electric field strength using 3 different tACS montages (only 2 montages used for the behavioral experiment).

      • The inter-session and inter-individual variability are well documented and discussed. This article will probably guide future studies in the field.

      Weaknesses:

      • The addition of simultaneous EEG recording would have been beneficial to understand the relationship between tACS entrainment and the entrainment to rhythmic auditory stimulation.

      We are grateful for the Reviewer’s positive assessment of our work and for the reviewer’s recommendations. We agree with the reviewer that adding simultaneous EEG or MEG to our design would have been beneficial to understand tACS effects. However, as the reviewer might be familiar with, such combination also possesses additional challenges due to the strong artifacts induced by tACS in the EEG signals, which is at the frequency of interest and several orders of magnitude higher than the signal of interest. Unfortunately, the adequate setup for simultaneous tACS-EEG was not available at the moment of the study. Nevertheless, since we are using a paradigm that we have repeatedly studied in the past and have shown it entrains neural activity and modulates behavior rhythmically, we are confident our results are of interest on their own. For readability of our answers, we numbered to comments below.

      1. It would have been interesting to develop the fact that tACS did not "overwrite" neural entrainment to the auditory stimulus. The authors try to explain this effect by mentioning that "tACS is most effective at modulating oscillatory activity at the intended frequency when its power is not too high" or "tACS imposes its own rhythm on spiking activity when tACS strength is stronger than the endogenous oscillations but it decreases rhythmic spiking when tACS strength is weaker than the endogenous oscillations". However, it is relevant to note that the oscillations in their study are by definition "not endogenous" and one can interpret their results as a clear superiority of sensory entrainment over tACS entrainment. This potential superiority should be discussed, documented, and developed.

      We thank the reviewer very much for this remark. We completely agree that our results could be interpreted as a clear superiority of sensory entrainment over tACS entrainment. We have now incorporated this possibility in the discussion.

      Page 16, line 472-478: “Alternatively, our results could simply be interpreted as a clear superiority of the auditory stimulus for entrainment. In other words, sensory entrainment might just be stronger than tACS entrainment in this case where the stimulus rhythm was strong and salient. It would be interesting to further test whether this superiority of sensory entrainment applies to all sensory modalities or if there is a particular advantage for auditory stimuli when they compete with electrical stimulation. However, answering this question was beyond the scope of our study and needs further investigations with more appropriate paradigms.”

      1. The authors propose that "by applying tACS at the right lag relative to auditory rhythms, we can aid how the brain synchronizes to the sounds and in turn modulate behavior." This should be developed as the authors showed that the tACS lags are highly variable across sessions. According to their results, the optimal lag will vary for each tACS session and subtle changes in the montage could affect the effects.

      We thank the reviewer for this remark. We believe that the right procedure in this case would be using close-loop protocols where the optimal tACS-lag is estimated online as we discuss in the summary and future directions sub-section. We tried to make this clearer in the same sentence that the reviewer mentioned.

      Page 17, line 506-508: “Since optimal tACS phase was variable across participants and sessions, this approach would require closed-loop protocols where the optimal tACS lag is estimated online (see next section).”

      1. In a related vein, it would be very useful to show the data presented in Figure 3 (panels b,d,e) for all participants to allow the reader to evaluate the quality of the data (this can be added as a supplementary figure).

      Thank you very much for the suggestion. We have added two new supplemental figures (Fig S1 and S2) to show individual data for Fig. 3b and 3e. Note that Fig. 3d already shows the individual data as each circle represents optimal FM-phase for a single participant.

      Reviewer #1 (Recommendations For The Authors):

      Minor comments:

      "was optimized in SimNIBS to focus the electric field as precisely as possible at the target ROI" It appears that some form of constrained optimization was used. It would be good to clarify which method was used, including a reference.

      Indeed, SimNIBS implements a constrained optimization approach based on pre-calculated lead fields. We have added the corresponding reference. All parameters used for the optimization are reported in the methods (see sub-section Electric field simulations and montage optimization). Regarding further specifics, the readers are invited to check the MATLAB code that was used for the optimization which is made available at: https://osf.io/3yutb

      "Thus, each montage has its pros and cons, and the choice of montage will depend on which of these dependent measures is prioritized." Well put. It would be interesting to know if authors considered optimizing for intensity on target. That would give the strongest predicted intensity on target, which seems like an important desideratum. Individualizing for something focal, as expected, did not give the strongest intensity. In fact, the method struggled to achieve the desired intensity of 0.1V/m in some subjects. It would be interesting to have a discussion about why this particular optimization method was selected.

      The specific optimization method used in this study was somewhat arbitrary, as there is no standard in the field. It was validated in prior studies, where it was also demonstrated that it performs favorably compared to alternative methods (Saturnino et al., 2019; Saturnino et al., 2021). The underlying physics of the head volume conductor generally limits the maximally achievable focality, and requires a tradeoff between focality and the desired intensity in the target. This tradeoff depends on the maximal amount of current that can be injected into the electrodes due to safety limits (4 mA in total in our case). Further constraints of the optimization in our application were the simultaneous targeting of two areas, and achieving field directions in the targets roughly parallel to those of auditory dipoles. Given the combination of these constraints, as the reviewer noticed, we could not even achieve the desired intensity of .1V/m in some subjects. As we wanted to stimulate both auditory cortices equally, our priority was to have the E-fields as similar as possible between hemispheres. Future studies optimizing for only one target would be easier to optimize for target intensity (assuming the same maximal total current injection). Alternatively, relaxing the constraint on direction and optimizing only for field intensity would help to increase the field intensities in the targets, but would lead to differing field directions in the two targets. As an example, see Rev. Fig.1 below. We extensively discuss some of these points in the discussion section: “Are individually optimized tACS montage better?” (Pages 21-22).

      Additionally, we added a few sentences in the Results and Methods giving more details about the optimization approach.

      Page 5, lines 115-116: “Using individual finite element method (FEM) head models (see Methods) and the lead field-based constrained optimization approach implemented in SimNIBS (31)”

      Page 27, lines 819-822: “The optimization pipeline employed the approach described in (31) and was performed in two steps. First, a lead field matrix was created per individual using the 10-10 EEG virtual cap provided in SimNIBS and performing electric field simulations based on the default tissue conductivities listed below.”

      Author response image 1.

      E-field distributions for one example participant. Brain maps show the results from the same optimization procedure described in the main manuscript but with no constraint for the current direction (top) or constraining the current direction (bottom). Note that the desired intensity of .1 V/m can be achieved when the current direction is not constrained.

      The terminology of "high-definition HD" used here is unconventional and may confuse some readers. The paper cited for ring electrodes (18) does not refer to it as HD. A quick search for high-definition HD yields mostly papers using many small electrodes, not ring electrodes. They look more like what was called "individualized". More conventional would be to call the first configuration a "ring-electrode", and the "individualized" configuration might be called "individualized HD".

      We thank the reviewer for this remark. We changed the label of the high-definition montage to ring-electrode. Regarding the individualized configuration, we prefer not to use individualized HD as it has the same number of electrodes as the standard montage.

      "So far, we have evaluated whether tACS at different phase lags interferes with stimulus-brain synchrony and modulates behavioral signatures of entrainment" The paper does not present any data on stimulus-brain synchrony. There is only an analysis of behavior and stimulus/tACS phase.

      We agree with the reviewer. To be more careful with such statement we now modified the sentence to say:

      Page 10, lines 303-304: “So far, we have evaluated whether tACS at different phase lags modulates behavioral signatures of entrainment: FM-amplitude and FM-phase.”

      "However, the strength of the tACS effect was variable across participants." and across sessions, and the phase also was variable across subjects and sessions.

      "tACS-amplitude estimates were averaged across sessions since the session did not significantly affect FM-amplitude (Fig. 5a)." More importantly, the authors show that "tACS-amplitude" was not reproducible across sessions.

      Unfortunately, we did not understand what the reviewer is suggesting here, and would have to ask the reviewer in this case to provide us with more information.

      References

      Kasten FH, Duecker K, Maack MC, Meiser A, Herrmann CS (2019) Integrating electric field modeling and neuroimaging to explain inter-individual variability of tACS effects. Nat Commun 10:5427. Riecke L, Sack AT, Schroeder CE (2015) Endogenous Delta/Theta Sound-Brain Phase Entrainment Accelerates the Buildup of Auditory Streaming. Curr Biol 25:3196-3201.

      Riecke L, Formisano E, Sorger B, Baskent D, Gaudrain E (2018) Neural Entrainment to Speech Modulates Speech Intelligibility. Curr Biol 28:161-169 e165.

      Saturnino GB, Madsen KH, Thielscher A (2021) Optimizing the electric field strength in multiple targets for multichannel transcranial electric stimulation. J Neural Eng 18.

      Saturnino GB, Siebner HR, Thielscher A, Madsen KH (2019) Accessibility of cortical regions to focal TES: Dependence on spatial position, safety, and practical constraints. Neuroimage 203:116183.

      Zoefel B, Davis MH, Valente G, Riecke L (2019) How to test for phasic modulation of neural and behavioural responses. Neuroimage 202:116175.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer 1 (public):

      1) “It is unclear whether new in vivo experiments were conducted for this study”.

      All in vivo experiments were conducted for this study by using previously published fly stocks to directly compare N- and C-terminal shedding side-by-side in two Hh-dependent developmental systems. This is now clearly stated in the revised supplement (Fig. S8). We also conducted these experiments because previous in vivo studies in flies often relied on Hh overexpression in the fat body, raising questions about their physiological relevance. Our in vivo analyses of Hh function in wing and eye discs are more physiologically relevant and can explain the previously reported presence of non-lipidated bioactive Hh in disc tissue (PMID: 23554573).

      2) “A critical shortcoming of the study is that experiments showing Shh secretion/export do not include a Shh(-) control condition. Without demonstration that the bands analyzed are specific for Shh(+) conditions, these experiments cannot be appropriately evaluated”.

      The Cell Signaling Technology C9C5 anti-Shh antibody used in our study is highly specific against Shh, and it has been used in over 60 publications. C9C5 even lacks cross-reactivity with highly similar Ihh or Dhh (https://www.cellsignal.com/products/primary-antibodies/shh-c9c5-rabbit-mab/2207?_requestid=1528451). We confirmed C9C5 specificity repeatedly (one example is shown below; another quality control that includes media of mock-transfected cells is now shown in Fig. S1) and never observed unspecific bands under any experimental condition. As shown below, C9C5 and R&D AF464 anti-Shh antibodies (the latter were previously used in our lab) detect the same bands.

      Author response image 1.

      Shh immunoblot. R&D 8908-SH served as a size control for full-length dual-lipidated Shh, and C25S;26-35Shh served as a size control for N-terminally truncated monolipidated Shh. Both C25SShh bands are specific: One represents the full-length protein and the bottom band represents N-truncated processed proteins. The blot was first incubated with antibody AF464 and reincubated (after stripping) with the much more sensitive antibody C9C5.

      3) “A stably expressing Shh/Hhat cell line would reduce condition to condition and experiment to experiment variability”.

      We agree and therefore have previously aimed to establish stable Hhat-expressing cell lines. However, we found that long-term Hhat overexpression eliminated transfected cells after several passages, or cells gradually ceased to express Hhat. This prevented us from establishing stable cell lines co-expressing Shh/Hhat despite several attempts and different strategies. Instead, we established transient co-expression of Shh/Hhat from the same mRNA as the next-best strategy for reliable near-quantitative Shh palmitoylation in our assays.

      4) “Unusual normalization strategies are used for many experiments, and quantification/statistical analyses are missing for several experiments”.

      We repeated all qPCR assays to eliminate this shortcoming. Biological activities and transcriptional responses of palmitoylated Shh and non-palmitoylated C25AShh are now directly compared and quantified (revised Fig. 4A,B, newly included Fig. 6, revised Fig. S5B). The original comparison of both proteins with dual-lipidated R&D 8908-SH is still important in order to show that both Shh and C25AShh in serum-containing media have equally high, and not equally low, activities because R&D 8908-SH is generally seen as the Shh form with the highest biological activity. These comparisons are therefore still discussed in the main manuscript text and are now shown in Fig. S5E.

      5) “The study provides a modest advance in the understanding of the complex issue of Shh membrane extraction”

      We believe that the revised manuscript advances our understanding of Shh membrane extraction beyond the modest in three important ways. First, although Disp was indeed known as a furin-activated Hh exporter, our findings show for the first time that furin activation of Disp is strictly linked to proteolytic Shh processing as the underlying release mode, fully consistent with data obtained from the Disp-/- cells.

      Second, Scube2 was known as a Shh release enhancer and several lipoproteins were previously shown to play a role in the process, but our findings are the first to show that synergistic Disp/Scube2 function depends on the presence of lipoprotein and that HDL (but no other lipoprotein) accepts free cholesterol or a novel monolipidated Shh variant from Disp. This challenges the dominant model of Scube2 chaperone function in Hh release and transport (PMID 22902404, PMID 22677548, PMID 36932157).

      Third, we show that this Shh variant is fully bioactive, despite the lack of the palmitate. Therefore, N-palmitate is dispensable for Shh signaling to Ptch1 receptors, but only if the morphogen is released by, and physically linked to, HDL. In contrast, previously published studies analyzed monolipidated Shh variants in the absence of HDL, resulting in variably reduced bioactivity of these physiologically irrelevant forms. Therefore, our findings challenge the current dominating model of N-palmitate-dependent Shh signaling to Ptch1 (this model also does not postulate any role for lipoproteins, PMID 36932157) and essential roles of N-palmitate (stating that the N-palmitate is sufficient for signaling, PMID 27647915).

      Reviewer 2 (public):

      1) “However, the results concerning the roles of lipoproteins and Shh lipid modifications are largely confirmatory of previous results, and molecular identity/physiological relevance of the newly identified Shh variant remain unclear”.

      We disagree with this assessment on several points. First, our findings do not confirm, but strongly challenge, the current dogma of Disp-mediated handover of dual-lipidated Shh to Scube2 as a soluble acceptor (instead of to HDL, PMID 36932157). Second, we report three new findings: Disp, Scube2, and lipoproteins all interact to specifically increase N-terminal Shh shedding, whereas C-terminal shedding is optional; Disp function depends on the presence of HDL; and HDL modulates Shh shedding (dual Shh shedding in the absence of HDL versus N-shedding and HDL association in its presence). Our work also directly determines the molecular identity of a previously unknown Shh variant as monolipidated (by RP-HPLC), HDL associated (by SEC and density gradient centrifugation), and fully bioactive (in two cell-based reporter assays).

      Third, regarding the physiological relevance of our findings: Fig. S8 demonstrates that deletion of the N-terminal sheddase target site of Hh abolishes all Hh biofunction in Drosophila eye discs and wing discs, which strongly supports physiological relevance of N-terminal Hh shedding during release. N-terminal shedding is further consistent with in vivo findings of others. These studies showed that artificial monolipidated Shh variants (C25SShh and ShhN) generate highly variable loss-of-function phenotypes in vivo, but can also generate gain-of-function phenotypes if compared with the dual-lipidated cellular protein 1, 2, 3, 4, 5. These observations are difficult to align with the dominating model of essential N-palmitate function at the level of Ptch1 (PMID 36932157), because the lack of N-palmitate is expected to always diminish signaling in all tissue contexts and developmental stages. Our finding that dual-lipidated Shh is strictly released in a Disp/Scube2-controlled manner from producing cells, while artificial monolipidated Shh variants leak uncontrolled from the cellular surface, explains these seemingly paradoxical in vivo findings much better. This is because uncontrolled Shh release can increase Shh signaling locally (when physiological release would normally be prevented at this site 6 or time), while it can also decrease it (for example, in situations requiring timed pulses of Shh release and signaling 7, 8, 9, 10, 11). This is discussed in our manuscript (Discussion, first paragraph).

      2) The molecular properties of the processed Shh variants are unclear – incorporation of cholesterol/palmitate and removal of peptides were not directly demonstrated…

      We also disagree on this point. Our study is the only one that uses RP-HPLC and defined controls (dual-lipidated commercial R&D 9808-SH, dual-lipidated cellular proteins eluting at the same positions, non-lipidated or monolipidated controls, Fig. S1F-K) to compare the lipidation status of cellular and corresponding solubilized Shh and to determine their exact lipidation status (Figs. 1, 3, 5, Figs. S4, S6, S7). Co-expressed Hhat assures full Shh palmitoylation during biosynthesis (as shown in original Figs. 1A and S2F-K & S4A and as confirmed by R&D 9808-SH) as an essential prerequisite to reliably conduct and interpret these analyses. The removal of peptides is demonstrated by the increase in electrophoretic mobility of soluble forms, if compared with their dual-lipidated cellular precursor, because chemical delipidation results in a decrease in electrophoretic mobility in SDS-PAGE (as discussed in detail in 12 that we now cite in our work).

      3) This (N-terminal palmitoylation status) is particularly relevant …, as the signaling activity of non-palmitoylated Hedgehog proteins is controversial.

      We agree with this comment and are aware of the published data. However, in our work, we have demonstrated strong signaling activities by using C25AShh mutants that are fully impaired in their ability to undergo N-palmitoylation (Fig. 4, Fig. S5). These are highly bioactive if associated with HDL. Therefore, we do not see any ambiguity in our findings and suggest that the reports of others resulted from different experimental conditions.

      4) A decrease in hydrophobicity is no proof for cleavage of palmitate, this could also be due to addition of a shorter acyl group.

      As shown in the original manuscript, we have controlled for this possibility: RP-HPLC was established by using defined controls (dual-lipidated, non-lipidated, or monolipidated, Fig. S1F-K and corresponding color coding). Because the cellular Shh precursor prior to release was always dual-lipidated, whereas the soluble form was not, lipids were clearly lost during release (because a decrease in the hydrophobicity of soluble proteins is always shown relative to that in their dual-lipidated cellular precursors). The increase in electrophoretic mobility detected for the very same proteins in SDS-PAGE demonstrates delipidation during their release (please see my reply to point 2 above). Finally, the suggested possibility of palmitate exchange for shorter acyls during Shh release at the cell surface is extremely unlikely, as there is no known machinery to catalyze this exchange at the plasma membrane. Hh acylation only occurs in the ER membrane via Hhat 13.

      5) “It would be important to demonstrate key findings in cells that secrete Shh endogenously”.

      We now show that Panc1 cells release endogenous Shh in truncated form, as our transfected cells do (Fig. S1). Moreover, the experimental data shown in Fig. S8B demonstrate that engrailed-controlled expression of sheddase-resistant Hh variants in wing disc cells completely blocks endogenous Hh produced in the same cells by stalling Disp-mediated morphogen export. Both findings strongly support our key finding that N-processing is not optional but absolutely required to finalize Hh release.

      6) Co-fractionation of Shh and ApoA1 is not convincing, as the two proteins peak at different molecular weights…. The authors could use an orthogonal approach, optimally a demonstration of physical interaction, or at least fractionation by a different parameter

      Shifted Shh peaks upon physiologically relevant Shh transfer via Disp to HDL must be expected in SEC, because Shh association with HDL subfractions increases their size. Comparing relative peaks of Shh-loaded HDL with Shh-free reference HDL suggests 10-15 Shh molecules per HDL (adding 200kDa - 300kDa to its molecular mass). This is now stated in the revised manuscript (page 10, line 2).

      Still, to further support direct Shh/HDL association, we analyzed high molecular weight Shh SEC fractions by subsequent RP-HPLC. This approach confirms direct physical interactions between cholesteroylated Shh and HDL (now shown in Fig. S6G).

      We support this possibility further by density gradient centrifugation, again demonstrating that Shh and HDL interact physically (now shown in Fig. S6 E,F).

      Recommendations from the reviewing editor:

      1) “The authors should certainly tone down statements of novelty because much of the work is confirmatory in nature”

      We followed this request in our revised manuscript and now clearly point out what was known and what we add to the concept of Disp and lipoprotein-mediated Hh export. Still, as outlined in our response to reviewer 2, our findings align with only one previously published model of lipoprotein-mediated Hh transport, while they do not support the most current models of Disp-mediated handover of dual-lipidated Shh to Scube2 (PMID 36932157) and essential signaling roles of N-palmitate at the level of the receptor Ptch1. Thus, our work should not be viewed solely as confirmatory of one of the many previous models, because at the same time it also contradicts the other models of Hh solubilization and transport.

      2) “Inclusion of the Shh(-) control”

      Please see our reply to reviewer 1 above. The Cell Signaling Technology C9C5 anti-Shh antibody used in our study is highly specific against Shh. We also carefully characterized the C9C5 antibody before any of the experiments shown in our work had been initiated. We never observed any unspecific C9C5 reactivity that otherwise would – of course – have prevented us from switching to this antibody from the AF464 antibodies that we had previously used. Consistent C9C5 antibody specificity is evident from the representative example shown below that was recently produced in our lab: no cellular proteins or TCA-precipitated serum-depleted media components from mock-transfected cells (left two lanes) react with C9C5.

      Author response image 2.

      Top left: C9C5 detects the cellular 45kDa Shh precursor and the 19 kDa signaling-active protein. No unspecific signals are detected in untransfected cells and supernatants of such cells (left two lanes). Right: Loading control on the stripped blot.

      3) “Clean up how the data are normalized for quantification”

      Please see our reply to reviewer 1 above. Normalization has been changed for the indicated figures. We also repeated qPCR analyses and added new ones to the manuscript that include required controls. We also changed figure outlines in accordance with the request.

      4) “The issue of a non-specific band of this Shh antibody is critical”

      Please see our replies above. In our hands, unspecific C9C5 antibody binding was never observed.

      5) “Regarding experimental rigor, I would add that the HPLC … should just show the real data points”

      We agree and added individual data points to our revised manuscript.

      Recommendations for the authors:

      1) I would like to see the controls in the same figure with the experimental results.

      We show antibody specificity controls together with released Shh in Fig. S1.

      2) Figure 2 confirms previously published results. It was shown in PMC5811216 that Disp processing by furin is required for Shh release from producing cells.

      Indeed, it was shown that furin processing of Disp increases Shh release (supposedly together with lipids), but we show here that furin-activated Disp specifically mediates proteolytic Shh shedding and loss of lipids – which is not the same. Indeed, we show this finding because we interpret it the other way around: Because it is known that furin activation of Disp increases Shh release by some means (PMC5811216), our observation that furin-mediated Disp activation specifically increases Shh shedding independently supports our model.

      3) Figure 3: it is stated that there is no increase in Shh release into the media…

      We removed this statement.

      4) Figure S5: Scale bars are missing.

      We added scale bars to the figures.

      5) Figure 4: A direct comparison between wt Shh and C25A conditioned media for qPCR is needed.

      We agree and repeated all experiments. Results confirm our previous findings and are shown in revised Fig. 4 and in Fig. S5.

      6) What other components can be examined in addition to ApoA1 as a marker for HDL? Why is the Shh peak shifted to the left? What about exovesicles?

      We also detected ApoE4, a mobile lipoprotein present on expanding (large) HDL (Figs. 5, 6, Figs S6, 7) 14. We also used density gradient centrifugation to support the Shh/HDL association. Regarding the leftwards Shh size shift relative to the major HDL peak in SEC, please refer to our explanation above – if loaded with Shh, a size increase of the respective HDL subfraction is expected. Finally, we did not test the role of exovesicles in our assays. However, due to their large size (60-120nm, HDL 7-12 nm), Shh associated with exovesicles should have eluted in the void volume of our gel filtration column. This we never observed.

      7) Why is osteoblast differentiation used?

      C3H10T1/2 osteoblast differentiation is strongly driven by Ihh and Shh activity and is established as a sensitive and robust assay. Still, following this reviewer’s advice, we conducted qPCR assays on these cells and in addition on NIH3T3 cells to support our findings.

      Finally, we corrected all minor mistakes regarding spelling and figure labeling. We also improved the readability of the revised manuscript, as suggested by reviewer 2.

      References

      1. Gallet A, Ruel L, Staccini-Lavenant L, Therond PP. Cholesterol modification is necessary for controlled planar long-range activity of Hedgehog in Drosophila epithelia. Development 133, 407-418 (2006).

      2. Porter JA, et al. Hedgehog patterning activity: role of a lipophilic modification mediated by the carboxy-terminal autoprocessing domain. Cell 86, 21-34 (1996).

      3. Lewis PM, et al. Cholesterol modification of sonic hedgehog is required for long-range signaling activity and effective modulation of signaling by Ptc1. Cell 105, 599-612 (2001).

      4. Huang X, Litingtung Y, Chiang C. Region-specific requirement for cholesterol modification of sonic hedgehog in patterning the telencephalon and spinal cord. Development 134, 2095-2105 (2007).

      5. Lee JD, et al. An acylatable residue of Hedgehog is differentially required in Drosophila and mouse limb development. Dev Biol 233, 122-136 (2001).

      6. Corrales JD, Rocco GL, Blaess S, Guo Q, Joyner AL. Spatial pattern of sonic hedgehog signaling through Gli genes during cerebellum development. Development 131, 5581-5590 (2004).

      7. Cordero D, Marcucio R, Hu D, Gaffield W, Tapadia M, Helms JA. Temporal perturbations in sonic hedgehog signaling elicit the spectrum of holoprosencephaly phenotypes. J Clin Invest 114, 485-494 (2004).

      8. Dessaud E, et al. Interpretation of the sonic hedgehog morphogen gradient by a temporal adaptation mechanism. Nature 450, 717-720 (2007).

      9. Garcia-Morales D, Navarro T, Iannini A, Pereira PS, Miguez DG, Casares F. Dynamic Hh signalling can generate temporal information during tissue patterning. Development 146, (2019).

      10. Harfe BD, Scherz PJ, Nissim S, Tian H, McMahon AP, Tabin CJ. Evidence for an expansion-based temporal Shh gradient in specifying vertebrate digit identities. Cell 118, 517-528 (2004).

      11. Nahmad M, Stathopoulos A. Dynamic interpretation of hedgehog signaling in the Drosophila wing disc. PLoS Biol 7, e1000202 (2009).

      12. Ehring K, et al. Conserved cholesterol-related activities of Dispatched 1 drive Sonic hedgehog shedding from the cell membrane. J Cell Sci 135, (2022).

      13. Coupland CE, et al. Structure, mechanism, and inhibition of Hedgehog acyltransferase. Mol Cell 81, 5025-5038 e5010 (2021).

      14. Sacks FM, Jensen MK. From High-Density Lipoprotein Cholesterol to Measurements of Function: Prospects for the Development of Tests for High-Density Lipoprotein Functionality in Cardiovascular Disease. Arterioscler Thromb Vasc Biol 38, 487-499 (2018).

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:  

      Reviewer #1 (Public review):  

      Summary:  

      This work examines the binding of several phosphonate compounds to a membrane-bound pyrophosphatase using several different approaches, including crystallography, electron paramagnetic resonance spectroscopy, and functional measurements of ion pumping and pyrophosphatase activity. The work attempts to synthesize these different approaches into a model of inhibition by phosphonates in which the two subunits of the functional dimer interact differently with the phosphonate.  

      Strengths:  

      This study integrates a variety of approaches, including structural biology, spectroscopic measurements of protein dynamics, and functional measurements. Overall, data analysis was thoughtful, with careful analysis of the substrate binding sites (for example calculation of POLDOR omit maps).  

      Weaknesses:  

      Unfortunately, the protein did not crystallize with the more potent phosphonate inhibitors. Instead, structures were solved with two compounds with weak inhibitory constants >200 micromolar, which limits the molecular insight into compounds that could possibly be developed into small molecule inhibitors. Likewise, the authors choose to focus the spectroscopy experiments on these weaker binders, missing an opportunity to provide insight into the interaction between more potent binders and the protein. 

      We acknowledge the reviewer concern regarding the choice of weaker inhibitors. We attempted cocrystallization with all available inhibitors, including those with higher potency. However, despite numerous efforts, these potent inhibitors yielded low-resolution crystals, making them unsuitable for detailed structural analysis. Therefore, we chose to focus on the weaker binders, as we were able to obtain high-quality crystal structures for these compounds. This allowed us to perform DEER spectroscopy and monitor conformational TmPPase state ensembles in solution with the added advantage of accurately analysing the data against structural models derived from X-ray crystallography. Using these weaker inhibitors enabled a more precise interpretation of the DEER data, thus providing reliable insights into the conformational dynamics and inhibition mechanism. As suggested by the reviewer, in the revised version, we add new DEER experiments, conditions and analysis on two of the more potent inhibitors (alendronate and pamidronate) to provide additional insight into their interactions. Furthermore, we also implemented additional DEER data on the cytoplasmic side of TmPPase; at a new site we identified (with the advantage of being an endogenous cysteine residue) and spin labelled (C599R1), given the DEER data for the previous T211R1cytoplasmic site were difficult to interpret owing to the highly dynamic nature of this region. The new pair C599R1 yielded high-quality DEER traces and indicated more clearly than T211R1, distance distributions consistent with asymmetry across the sampled conditions.  Again, as suggested by the reviewer, alendronate and pamidronate DEER measurements were also recorded for this site (cytoplasmic side; C599R1) as well as the periplasmic side (525R1).

      In general, the manuscript falls short of providing any major new insight into membrane-bound pyrophosphatases, which are a very well-studied system. Subtle changes in the structures and ensemble distance distributions suggest that the molecular conformations might change a little bit under different conditions, but this isn't a very surprising outcome. It's not clear whether these changes are functionally important, or just part of the normal experimental/protein ensemble variation. 

      We respectfully disagree with the reviewer. The scale of motions particularly seen in solution (and now on a new reliable spin pair (C599R1) located on the cytoplasmic side) correspond to those seen in the full panoply of crystal structures of mPPases. Some proteins undergo very large conformational changes during catalysis – such as the rotary ATPase. This one does not, meaning that the precise motions we describe here are relevant and observed in solution for the first time. Conformational changes in the ensemble, whether large or small, represent essential protein motions which underlie key mPPase catalytic function. These dynamic transitions are extremely challenging to monitor, especially in so many conditions and our DEER spectroscopy data demonstrate the sensitivity and resolution necessary to monitor these subtle changes in equilibria, even if these are only a few Angstroms. For several of the conditions we investigated by DEER in solution, corresponding X-ray structures have been solved, with the derived distances agreeing well with the DEER distributions. This further validates the biological relevance of the structures, and reveals the complete conformational ensemble, intractable using other current approaches. Indeed, some conformational states were previously seen using serial time-resolved X-ray static structures and were consistent with asymmetry.

      The ZLD-bound crystal structure doesn't predict the DEER distances, and the conformation of Na+ binding site sidechains in the ZLD structure doesn't predict whether sodium currents occur. This might suggest that the ZLD structure captures a conformation that does not recapitulate what is happening in solution/ a membrane. 

      We agree with the reviewer that the ZLD-bound crystal structure does not predict the DEER distances. However, we believe this discrepancy arises from the steric bulkiness of ZLD inhibitor, which prevents the closure of the hydrolytic centre. Additionally, the absence of Na+ at the ion gate in the ZLD-bound structure suggests that Na+ transport does not occur, a conclusion further supported by our electrometric measurements. We agree with the reviewer; distances observed in the DEER experiments might represent a potential new conformation in solution, not captured by the static X-ray structure, thereby offering new insights into the dynamic nature of the protein under physiological conditions. This serves to emphasize the complementarity of the DEER approach to Xray crystallography and redoubles the importance of using both techniques. Finally, the static X-ray structures have not captured the asymmetric conformations that must exist to explain half-of-thesites reactivity, where DEER yields distance distributions, across all 16 cases tested here (two mutants with eight conditions each), that are consistent with asymmetry.

      Reviewer #2 (Public review):  

      Summary:  

      Crystallographic analysis revealed the asymmetric conformation of the dimer in the inhibitor-bound state. Based on this result, which is consistent with previous time-resolved analysis, authors verified the dynamics and distance between spin introduced label by DEER spectroscopy in solution and predicted possible patterns of asymmetric dimer.  

      Strengths:  

      Crystal structures with inhibitor bound provide detailed coordination in the binding pocket thus useful information for the mPPase field and maybe for drug development.  

      Weaknesses:  

      The distance information measured by DEER is advantageous for verifying the dynamics and structure of membrane protein in solution. However, regarding T211 data, which, as the authors themselves stated, lacks measurement precision, it is unclear for readers how confident one can judge the conclusion leading from these data for the cytoplasmic side. 

      We thank the reviewer for acknowledging the advantageous use of the DEER methodology for identifying dynamic states of membrane proteins in solution. In our original manuscript, we used two sites in our analysis: S525 (periplasm) and T211 (cytoplasm), in which S525R1 yielded highquality DEER data, while T211R1 yielded weak (or no) visual oscillations, leading to broad distributions for the several conditions tested. In the revised manuscript, we now added a third site at the cytoplasmic side (C599R1 located at TMH14), which yielded high-quality DEER data and comparable to S525R1. Both C599R1 and C525R1 spin pairs generated distance distributions for all 16 conditions (two mutants of eight conditions each) that were described well by the solution-state ensemble adopting a predominantly asymmetric conformation.  

      Furthermore, we have tailored our interpretation of the T211R1 DEER data, and refrain from using the data to draw conclusions about the TmPPase conformational ensemble in the presence of different inhibitors. However, we still opted to include the T211R1 data in the SI because they confirm an important structural feature of mPPase in solution conditions; the intrinsically dynamic behaviour of the loop5-6 where T211 is located. This observation in solution is also consistent with our previous (Kellosalo et al., Science, 2012; Li et al., Nat. Commun, 2016; Vidilaseris et al., Sci. Adv., 2019; Strauss et al., EMBO Rep., 2024) and current X-ray crystallography data. To reiterate, we excluded T211R1 from any analysis relating to mPPase asymmetry and our conclusions were entirely based on the S525R1 and new C599R1 DEER data, which allowed us to monitor both sides on the membrane.  

      The distance information for the luminal site, which the authors claim is more accurate, does not indicate either the possibility or the basis for why it is the ensemble of two components and not simply a structure with a shorter distance than the crystal structure.  

      We thank the reviewer for pointing out this possibility and alternative interpretation of our DEER data. We now provide further analysis to show that our DEER data from both membrane sides reporters are highly consistent with (although they cannot completely exclude) asymmetry and rephrase to be inclusive of other possibilities. Importantly, this additional possibility does not affect the current interpretation of the data in our manuscript. Furthermore, we have removed Fig. 6 from the manuscript, and we now include a direct comparison of the in silico predicted distribution coming from the asymmetric hybrid structure with the 8 conditions tested, for both mutants (i.e. S525R1 and C599R1).

      Reviewer #3 (Public review):  

      Summary:  

      Membrane-bound pyrophosphatases (mPPases) are homodimeric proteins that hydrolyze pyrophosphate and pump H+/Na+ across membranes. They are attractive drug targets against protist pathogens. Non-hydrolysable PPi analogue bisphosphonates such as risedronate (RSD) and pamidronate (PMD) serve as primary drugs currently used. Bisphosphonates have a P-C-P bond, with its central carbon can accommodate up to two substituents, allowing a large compound variability. Here the authors solved two TmPPase structures in complex with the bisphosphonates etidronate (ETD) and zoledronate (ZLD) and monitored their conformational ensemble using DEER spectroscopy in solution. These results reveal the inhibition mechanism of these compounds, which is crucial for developing future small molecule inhibitors.  

      Strengths:  

      The authors show that seven different bisphosphonates can inhibit TmPPase with IC50 values in the micromolar range. Branched aliphatic and aromatic modifications showed weaker inhibition.  

      High-resolution structures for TmPPase with ETD (3.2 Å) and ZLD (3.3 Å) are determined. These structures reveal the binding mode and shed light on the inhibition mechanism. The nature of modification on the bisphosphonate alters the conformation of the binding pocket.  

      The conformational heterogeneity is further investigated using DEER spectroscopy under several conditions.  

      Weaknesses:  

      The authors observed asymmetry in the TmPPase-ELD structure above the hydrolytic center. The structural asymmetry arises due to differences in the orientation of ETD within each monomer at the active site. As a result, loop5-6 of the two monomers is oriented differently, resulting in the observed asymmetry. The authors attempt to further establish this asymmetry using DEER spectroscopy experiments. However, the (over)interpretation of these data leads to more confusion than any further understanding. DEER data suggest that the asymmetry observed in the TmPPase-ELD structure in this region might be funneled from the broad conformational space under the crystallization conditions. 

      We respectfully disagree with the reviewer. The asymmetry was previously established using serial time crystallography (Strauss et al., EMBO Rep, 2024) and biochemical assays (e.g. Malinen et al., Prot. Sci., 2022; Artukka et al., Biochem J, 2018; Luoto et al., PNAS, 2013) and partially seen in one static structure (Vidilaseris et al., Sci Adv 2019). DEER data here also show that the previously proposed asymmetry is also present (and this presence of asymmetry is consistent across all DEER data) within the TmPPase conformational ensemble in solution conditions. Although we cannot rule out the possibility that the TmPPase monomers adopt a metastable intermediate state, in such a case we would expect the distance changes reported by DEER to be symmetric across both membrane sides. However, we observe a symmetry breaking between the cytoplasmic and periplasmic TmPPase sites. Indeed, DEER data yield distance distributions similar to that of the hybrid asymmetric structure under all: apo, +Ca, +Ca/ETD, +ETD, +ZLD, +IDP, +PAM, +ALE conditions.

      DEER data for position T211R1 at the enzyme entrance reveal a highly flexible conformation of loop56 (and do not provide any direct evidence for asymmetry, Figure EV8).

      Please see relevant response above. We acknowledge that T211 is indeed situated on a highly dynamic loop, which is important for gating and our DEER data confirm the high flexibility of this protein region. Given we have not observed dipolar oscillations, leading to broad distributions, we have stated in the original manuscript that we will not establish the presence of any asymmetry in solution on the basis of T211, rather relying on the S525R1 and the new C599R1 sites, for which we have acquired high-quality DEER data, as was also pointed out and has been commented on by all reviewers. We have provided data at the C599R1 position (same cytoplasmic side as 211 for which we have now limited our analysis to a minimum) which further provides evidence for asymmetry, including two new conditions.

      Similarly, data for position S521R1 near the exit channel do not directly support the proposed asymmetry for ETD.  

      The reviewer appears to suggest that we hold the S525R1 DEER data as direct proof of asymmetry; this is combative on the grounds that to directly prove asymmetry would require time-resolved DEER measurements, far beyond the scope of this work. Rather, we have applied DEER measurements to explore whether asymmetry (observed previously via time-resolved X-ray crystallography) is also present (or indeed a possibility) in solution. All our S525R1 and C599R1 DEER data (recorded for eight conditions) are consistent with asymmetry (see also detailed response above).

      Despite the high quality of the data, they reveal a very similar distance distribution. The reported changes in distances are very small (+/- 0.3 nm), which can be accommodated by a change of spin label rotamer distribution alone. Further, these spin labels are located on a flexible loop, thereby making it difficult to directly relate any distance changes to the global conformation

      We thank the reviewer for recognising the high quality of our DEER data for the S525R1 site which we now complement with a new pair on the cytoplasmic facing membrane side (C599R1) with DEER data of comparable quality as for S525R1, where visual oscillations in the raw traces for both spin pairs, as in our case, reportedly lead to highly accurate and reliable distributions, able to separate (in fortuitous cases) helical movements of only a few Angstroms (Peter et al., Nature Comms 13:4396, 2022; Klose et al., Biophys J 120:4842-4858, 2021). The ability of DEER/PELDOR offering near Angstrom resolution was also previously demonstrated by the acquisition and solution of highresolution multi-subunit spin-labelled membrane protein structures (Pliotas at al., PNAS, 2012; Pliotas et al., Nat Struct Mol Biol, 2015; Pliotas, Methods Enzymol, 2017) as well as its ability in detecting small (and of similar to mPPase magnitude) conformational changes in different integral membrane protein systems (Kapsalis et al., Nature Comms, 2019; Kubatova et al., PNAS, 2023; Schmidt et al., JACS, 2024; Lane et al., Structure, 2024; Hett et al., JACS, 2021; Zhao et al., Nature, 2024), occurring under different conditions and/or stimuli in solution and/or lipid environment. The changes here are not below the detection sensitivity of DEER (e.g. ~ 7 Angstroms between the two modal distance extremes (+Ca vs +IDP for S525R1), and with all other conditions showing intermediate changes.  

      We agree with the reviewer that these changes are relatively small, but they are expected for membrane ion pumps. Indeed, none of the mPPase structures show helical movements of greater than half a turn, and that only in helices 6 and 12. There appear to be larger-scale loop closing motions of the 5-6 loop that includes T211, due to the presence of E217 which binds to one of the Mg<sup>2+</sup> ions that coordinate the leaving group phosphate. This is, inter alia, the reason that this loop is so flexible: it cannot order before substrate is bound.  

      The reviewer suggests that the subtle distance shifts detected arise only from changes of label rotamer distribution. However, the concerted nature of the modal distance shifts with respect to multiple different conditions at a single labelling site strongly suggests that preferential rotamer orientations are not the cause. Indeed, for so many spin labels to undergo an arbitrary shift that the modal distance of the entire distribution changes – and in the absence of any conformational change – appears improbable. Here we have the resolution to detect such subtle differences by DEER, given there are unambiguous shifts in our time domain data (i.e. the position of the minimum of the first dipolar oscillation) (Fig 4) and these are reflected in the modal distances in the distributions. We also refrain from performing any quantitative analysis and use qualitative trends in modal distance shifts only; all which support our proposed model of a symmetry breaking across the membrane face. To further belabour this point, we do not quantify the DEER data (for instance through parametric fitting) to extract populations of different conformational states and we appreciate that to do so would be highly prone to error; however we do (and can, we feel without over-interpretation) assert that the modal distances shift.  

      The interpretations listed below are not supported by the data presented:  

      (1) 'In the presence of Ca2+, the distance distribution shifts towards shorter distances, suggesting that the two monomers come closer at the periplasmic side, and consistent with the predicted distances derived from the TmPPase:Ca structure.'

      Problem: This is a far-stretched interpretation of a tiny change, which is not reliable for the reasons described in the paragraph above. 

      While the authors overall agree with the reviewer assessment that ±0.3 nm is a small (not a minor) change, there are literature examples quantifying (or using for quantification) distribution peaks separated by similar Δr. (Kubatova et al., PNAS, 2023; Schmidt et al., JACS, 2024; Hett et al., JACS, 2021; Zhao et al., Nature, 2024). However, the time-domain data clearly indicate the position of the first minimum of the dipolar oscillation shifts to shorter dipolar evolution time. The sensitivity of the time-domain data to subtle changes in dipolar coupling frequency is significantly improved compared to the distance distributions.

      Importantly, we have fitted Gaussians to the experimental distance distributions of 525R1 output by the Comparative Deer Analyzer 2.0 and observed a change in the distribution width in presence of Ca2+, implying the rotameric freedom of the spin label is restricted. However, the CW-EPR for 525R1 indicate that the rotational correlation time of the spin label is highly consistent between conditions (the spectra are almost identical); this cannot be explained simply by rotameric preference of the spin label (as asserted by the reviewer 3), as there is no (further) immobilisation observed from the CW-EPR of apo-state (Figure EV9) to that in presence of Ca2+. Furthermore, in the absence of conformational changes, it is reasonable to assume (and demonstrable from the CW-EPR data) that the rotamer cloud should not significantly change between conditions. However, Gaussian fits of the two extreme cases yielding the longest (i.e., in presence of IDP) and shortest (in presence of ZLD) modal distances for the 525R1 DEER data indicated significant (i.e., above the noise floor after Tikhonov validation) probability density for the IDP condition at 50 Å (P(r) = 0.18). This occurs at four standard deviations above the mean of the Guassian fit to the +ZLD condition, which by random chance should occur with <0.007% probability.  

      As in previous response, the method can detect changes of such magnitude which are not small, but physiologically relevant and expected for integral membrane proteins, such as mPPases. Indeed, even in equal (or more) complex systems such as heptameric mechanosensitive channel proteins DEER provided sub-Angstrom accuracy, when a spin labelled high resolution XRC structure was solved (Pliotas et al., PNAS, 2012; Pliotas et al., Nat Struct Mol Biol, 2015). Despite this being an ideal case where DEER accuracy was experimentally validated another high-resolution structural method on modified membrane protein and is not very common it demonstrates the power of the method, especially when strong oscillations are present in the raw DEER data (as here for mPPase S525R1, and C599R1), even when multiple distances are present, Angstrom resolution is achievable in such challenging protein classes.

      (2) 'Based on the DEER data on the IDP-bound TmPPase, we observed significant deviations between the experimental and the in silico distances derived from the TmPPase:IDP X-ray structure for both cytoplasmic- (T211R1) and periplasmic-end (S525R1) sites (Figure 4D and Figure EV8D). This deviation could be explained by the dimer adopting an asymmetric conformation under the physiological conditions used for DEER, with one monomer in a closed state and the other in an open state.'  

      Problem: The authors are trying to establish asymmetry using the DEER data. Unfortunately, no significant difference is observed (between simulation and experiment) for position 525 as the authors claim (Figure 4D bottom panel). The observed difference for position 112 must be accounted for by the flexibility and the data provide no direct evidence for any asymmetry.  

      Reviewer 3 is incorrect in suggesting that we are trying to prove asymmetry through the DEER data. That is a well-known fact in the literature (e.g. Vidilaseris et al, Sci Adv 2019) where we show (1) that the exit channel inhibitor ATC (i.e. close to S525R1) binds better in solution to the TmPPase:PPi complex than the TmPPase:PPi<sub>2</sub> complex, and (2) that ATC binds in an asymmetric fashion to the TmPPase:IDP<sub>2</sub> complex with just one ATC dimer on one of the exit channels. We merely use the DEER data to support this well-established fact.  

      However, because we agree that the DEER data in presence of IDP does not provide direct proof for asymmetry; particularly for the cytoplasmic facing mutant T211R1, we have refrained from interpreting T211R1 data beyond being a highly dynamic loop region (as evidenced by the broad distributions). As pointed out by the reviewer, the differences in distance distributions between conditions observed for T211R1 likely arise from conformational heterogeneity in solution. Furthermore, we now report DEER data on another new site (C599R1), which is also on the cytoplasmic side and yields high quality DEER data comparable to the S525R1 data (commended for their quality by both the reviewers). The C599R1 measurements show that in all conditions tested, highly similar distributions are observed, inconsistent with the in silico predicted distance distributions from the symmetric X-ray structures, but consistent with an asymmetric hybrid structure (i.e. open-closed) in solution. Importantly, the difference between the fully open (6.8 nm modal distance) and fully closed (4.8 nm modal distance) states of the C599R1 dimer is larger than for the S525R1 dimer pair. Thus, delineating the asymmetric hybrid conformation from the symmetric conformations is more robust.

      (3) 'Our new structures, together with DEER distance measurements that monitor the conformational ensemble equilibrium of TmPPase in solution, provide further solid experimental evidence of asymmetry in gating and transitional changes upon substrate/inhibitor binding.'  

      Problem: See above. The DEER data do not support any asymmetry. 

      We feel that the reviewer comments here are somewhat unfounded. All the DEER data (for 525R1 periplasmic and C599R1 cytoplasmic sites are described, most parsimoniously, using an asymmetric hybrid structure. In particular, the new C599R1 distance distributions are poorly described by the symmetric X-ray crystal structures, with a conserved modal distance of approx. 5.8 nm throughout the tested conditions that aligns nicely with the in silico predictions from the asymmetric hybrid structure. Additionally, all S525R1 and C599R1 data well exceed the relevant criteria of the recent white paper (Schiemann et al., 2021, JACS) from the EPR community to be considered reliably interpretable (strong visual oscillations in the raw traces; signal-to-noise ratio .r.t modulation depth of > 20 in all cases; replicates have been performed and added into the maintext or supplementary; near quantitative labelling efficiency (evidenced by lack of free spin label signal in the CW-EPR spectra); analysed using the CDA (now Figure EV10) to avoid confirmation bias).

      While the DEER data do not prove asymmetry, we do not claim proof of asymmetry in the above sentence. We concede to rephrase the offending sentence above as: “Our new structures, together with DEER distance measurements that monitor the conformational ensemble of TmPPase in solution, do not exclude asymmetry in gating and transitional changes upon substrate/inhibitor binding and are consistent with our proposed model.” We feel that this reframed conjecture of asymmetry is well founded; indeed, comparing all the 16 experimentally derived DEER distance distributions for the 525R1 and 599R1 sites with in-silico modelling performed on the hybridised asymmetric structure (i.e., comprised of one monomer bound to Ca2+ and another bound to IDP) yields overlap coefficients (Islam and Roux, JPC B, 2015) of >0.85. This implies the envelope of the modelled distance distribution is quantitatively inside the envelope of the experimental distance distributions. Thus, the DEER data support asymmetry (previously observed by time-resolved XRC) in solution, and while we appreciate that ideally one would measure time-resolved DEER to directly correlate kinetics of conformational changes within the ensemble to the catalytic cycle of mPPase, (and this is something we aim to do in the future), it is far beyond the scope of this study.

      Indeed, half-of-the-sites reactivity has been demonstrated in at least the following papers

      (Vidilaseris et al, Sci Acv. ,2019, Strauss et al, EMBO Rep. 2024, Malinen et al Prot Sci, 2022, Artukka et al Biochem J, 2018; Luoto et al, PNAS, 2013). Half-of-the sites activity requires asymmetry in the mechanism, and therefore asymmetric motions in the active site (viz 211) and exit channel (viz 525). As mentioned above, we have demonstrated this for other inhibitors (Vidilaseris et al 2019) and as part of a time-resolved experiment (Strauss et al 2024). In fact, given the wealth of evidence showing that the symmetrical crystal structures sample a non- or less-productive conformation of the protein, it would be quixotic to propose the DEER experiments - in solution - do not generate asymmetric conformations. It certainly doesn’t obey Occam’s razor of choosing the simplest possible explanation that covers the data.

      (4) Based on these observations, and the DEER data for +IDP, which is consistent with an asymmetric conformation of TmPPase being present in solution, we propose five distinct models of TmPPase (Figure 7).  

      Problem: Again, the DEER data do not support any asymmetry and the authors may revisit the proposed models. 

      We have redressed the proposed models and limited them to four asymmetric models to clearly illustrate the apo/+Ca/+Ca:ETD-state (model 1) and highlight the distinct binding patterns of various inhibitors (ETD, ZLD and IDP; model 2-4), which result in a variety of closed/open-open states. In this version, we clarify that the proposed models are not solely based on the DEER data but all DEER data recorded for multiple conditions, inhibitors and for two opposite membrane side facing reporters are highly consistent, and are grounded in both current and previously solved structures, with the DEER data providing additional consistency with these models.

      (5) 'In model 2 (Figure 7), one active site is semi-closed, while the other remains open. This is supported by the distance distributions for S525R1 and T211R1 for +Ca/ETD informed by DEER, which agrees with the in silico distance predictions generated by the asymmetric TmPPase:ETD X-ray structure'  

      Problem: Neither convincing nor supported by the data 

      We respectfully disagree with the reviewer. However, owing to the conformational heterogeneity of T211R1, we now exclude T211R1 data from quantitative interpretation of changes to the conformational ensemble. Instead, we include new DEER data from site C599R1, which provides high-quality and convincing data that is consistent with asymmetry at the cytoplasmic face, and inconsistent with in silico distance distributions derived from symmetric X-ray crystal structures. Furthermore, the S525R1 distance distributions for the +ETD (corresponding to +Ca/ETD) and +ZLD conditions were directly compared with both the apo-state distance distribution (corresponding to a fully open, symmetric conformation) and the in silico predicted distributions of the asymmetric hybrid structure (corresponding to an open-closed conformation). Overlap coefficients were calculated (given in the main text) that indicated the +ETD (corresponding to +Ca/ETD) and +ZLD S525R1 distributions were more consistent with the apo-state distance distribution. This suggests that while on the cytosolic face of the membrane, an open-closed conformation is favoured, on the periplasmic face, a symmetric open-open conformation is favoured.

      Recommendations for the authors:  

      Reviewer #1 (Recommendations for the authors):   

      (1) The DEER experiments were performed with the two crystallized inhibitors, ETD and ZLD, along with previously characterized IDP. It would increase the impact of a tighter-binding phosphonate was examined since the inhibitory mechanism of these molecules is of greater interest. 

      We acknowledge the reviewer concern regarding the choice of weaker inhibitors. We chose to focus on the weaker binders, as we were able to obtain high-quality crystal structures for these compounds. This allowed us to perform DEER spectroscopy with the added advantage of accurately analysing the data against structural models derived from X-ray crystallography. In the revised version, we also include results from alendronate and pamidronate, two of the tighter inhibitors, which show similar and consistent results to the others.

      (2) I'm not able to find the concentrations of ETD and ZLD used for the DEER experiments. This information should be added to the Methods section on sample prep for EPR. 

      The information is already mentioned in the Method section on sample preparation for EPR spectroscopy (page 24), where we indicated that the protein aliquots were incubated with a final concentration of 2 mM inhibitors or 10 mM CaCl2 (30 min, RT). However, we recognise that this may not have been sufficiently clear. To clarify, we now explicitly state that the concentration of ETD and ZLD (amongst other inhibitors) used for the DEER experiments is 2 mM.  

      (3) There should be additional detail about the electrometry replicates. Does "triplicate" mean three measurements on the same sensor, three different sensors, and different protein preparations? At a minimum, data should be collected from three different sensors to ensure that the negative results (lack of current) for ETD and ZLD are not due to a failed sensor prep. In addition, Data from the other replicates should be shown in a supplementary figure, either the traces, or in a summary figure. Are the traces shown collected on the same sensor? They could be, in principle, since the inhibitor is washed away after each perfusion. 

      Yes, by 'triplicate', we mean three measurements taken on the same sensor. All traces shown were collected from a single sensor. Thank you for your advice; we now show here additional data from other sensors that display the same pattern. As for the possibility of a failed sensor preparation, this is unlikely since we always ensure the sensor quality with the substrate (PPi) as a positive control after each measurement.

      Author response image 1.

      (4) I'm confused by the NEM modification assay, and I don't think there is enough information in this manuscript for a reader to figure out what is happening. Why is the protein active if an inhibitor is present? I understand that there is a conformational change in the presence of the inhibitor that buries a cysteine, but the inhibitor itself should diminish function, correct? Is the inhibitor removed before testing the function? In addition, it would be clearer if the cysteines that are modified are indicated in the main text. I don't understand what is being shown in Figure Ev2. Shouldn't the accessible cysteines in the apo form be shown? Finally, the sentence "IDP has been reported to prevent the NEM modification..." does not make sense to me. Should the word "by" be removed from this sentence? 

      We apologize for the confusion. Yes, the inhibitors were removed before testing the protein function. In Figure EV2, the accessible cysteines are shown for both the apo and IDP-bound states. As seen, the accessible cysteines in the IDP-bound states are fewer than those in the apo state, meaning fewer cysteines are available for modification. Consequently, more activity is retained when IDP binds due to the reduction in accessible cysteines. We have addressed this in the manuscript (see the method section on the NEM modification assay).

      (5) Why does the model in Figure 7 show the small molecules bound to only one subunit, when they are crystallized in both subunits? 

      We propose that the small molecules bound to the two subunits in the crystal structure is likely a result of substrate inhibition, given the excess inhibitor used during crystallisation (e.g. Artukka, et al., Biochemical Journal, 2018; Vidilaseris, et al., Science Advances, 2022). Our PELDOR data indicate that in solution, the small molecules bound to TmPPase are in an intermediate state between both subunits being closed and both being open, most likely with at least one subunit in an open state. This is also consistent with previous kinetic studies (Anashkin, V. A., et al., International Journal of Molecular Sciences, 22, 2021), which showed that the binding constant of IDP to the second subunit is around 120 times higher than that of the first subunit.

      (6) The authors argue that the two ETDs bound in the two protomers adopt distinct conformations. Can this be further supported, for example, by swapping the position of the two ETDs between the two protomers and calculating a difference map (there should be corresponding negative/positive density if the modelling of the two different conformations is robust)? 

      As per the reviewer suggestion, we swapped the positions of the two ETDs between the protomers and calculated the difference electron density map. This analysis, presented in Figure EV3, reveals corresponding negative and positive electron density peaks, indicating that the ETDs indeed adopt distinct conformations in each protomer, supporting the accuracy of our modeling.

      (7) Are the changes in loop conformation possibly due to crystal packing differences for the two protomers? 

      We examined the crystal packing of the two protomers and found no interactions at the loop regions (red coloured in Author response image 2 below) that could be attributed to crystal packing differences. Therefore, we rule out this possibility.

      Author response image 2.

      (8) Typos:  

      Legend for Figure EV2 cystine - cysteine  

      Page 14, last sentence of the first paragraph: further - further  

      Figure 6 legend: there is no reference to panel B.  

      Thanks for pointing out the typos, now they are fixed.

      Reviewer #2 (Recommendations for the authors):  

      (1) T211 is located on the same loop where ligand/inhibitor-coordinating side chains (E217, D218) are located. It has not been tested whether spin labeling here would affect inhibitor binding. 

      We test all the mutant(s) activity before spin labelling, but not the activity of the spin-labelled mutants. MTSSL spin labels are typically not structurally perturbing. In particular, the T211R1 site that the reviewer is referring to is now not included in our interpretation of conformational changes occurring during mPPase’s functional cycle.

      (2) Why should the spin label be introduced to T211, which is recognized as a flexible region in the crystal structure? Authors should search for suitable residues except for T211 and other residues in this loop to evaluate the cytoplasmic distance. 

      We acknowledge the reviewer’s concern regarding the flexibility of the T211 region for spin labelling. Given the challenges associated with TmPPase, including reduced protein expression, loss of function, or inaccessibility upon spin labelling at certain sites, we have explored alternative residues. After extensive testing, we identified C599 as a suitable site for spin labelling resulting in high-quality DEER data. The results from spin labelling at C599 have been incorporated into the revised manuscript.

      (3) On the other hand, DEER data for S525 is solid, as the authors stated. This residue is located on the luminal side of the enzyme. However, the description of the luminal side structure and the comparison of symmetric/asymmetric dimer in this par are missing in the paper. 

      We thank the viewer for their positive assessment of the S525R1 DEER data. The data for 525 and now also for 599 spin pairs are indeed solid given the strong visual oscillation we observed particularly in such a challenging system.   

      We presented the periplasmic sites in the crystal structure dimer (Figure 4A), highlighting both the symmetrical region and the asymmetric model in Figure 4. In the revised version, we include additional details about this region and our rationale for labeling at position S525.

      (4) The conclusion models (Figure 7) are misleading. In the crystal structure, the 5-6Loop distance between each monomer should be close given the location of the dimer interface, and the actual distance between T211 in the structure (for example, in 5lzq) is about 10A. Nevertheless, the model depicts this distance longer than S525 (40.7A in 5LZQ), which would give a false impression. 

      We would like to apologize for the misleading model. We have now corrected the models to ensure they are consistent with their respective regions in the crystal structures.

      (5) P8 last paragraph  

      It is hard to imagine that in a crystal lattice, the straight inhibitor always binds to monomer A, and the neighboring monomer is always attached to a slightly tilted inhibitor, which causes asymmetry. For example, wouldn't it mean that it would first bind to one of them, which would then affect the neighboring monomer via 5-6 Loop, which would then affect its binding pose? So in this case, the inhibitor did not ARAISE asymmetry, and this is where it is misleading for readers. 

      We apologize for the confusion. What we intended to convey is that the first inhibitor binds to one protomer, which then affects the conformation of the neighbouring monomer, ultimately influencing its binding pose. This is required for half-of-the-sites reactivity, which is well-established in this system. This is reflected in our crystal structure, where we observed asymmetry in the loop 5-6 region and the ETD orientation between the two protomers. We have addressed this in the manuscript accordingly.

      (6) P11 L4 EV10 instead of EV8? 

      Thanks for pointing out. We have corrected it accordingly.

      (7) P11 L5 It is difficult to determine whether the peak is broad or sharp. Should be evaluated quantitatively by showing the half-value width of the peak. This may also be helpful to judge whether the peak is a mixture of two components or a single one. 

      We have taken this analysis out and rephrased the offending sentence. We have also added the FWHM values as the Reviewer suggested, and corresponding standard deviations for the distance distributions (under approximation as Gaussian distribution).   

      (8) Throughout the paper, the topology of the enzyme may be difficult to follow for readers who are not experts in this field. Please indicate the membrane plane's location or a figure's viewpoint in the caption. 

      We acknowledge the importance of making our figures accessible to all readers. In the revised manuscript, we have enhanced the clarity of our figures by explicitly indicating the membrane plane’s location and specifying the viewpoint in each figure caption. For example, we have added annotations such as “Top view of the superposition of chain A (cyan) and chain B (wheat), showing the relative movements (black arrow) of helices. The membrane plane is indicated by dashed lines.”

      (9) Figure 2B Check the color of the helix.  

      IDP and ETD are almost the same color, so it is difficult to see the superposition. It would be easier to understand the reading by, for example, using a lighter or transparent color set only for IDPs.  

      We acknowledge the reviewer concern regarding the colour similarity between the IDP and ETD in Figure 2B, which hinders clear differentiation. To enhance visual distinction, we have adjusted the colour scheme by changing the TmPPase:IDP structure colour to light blue. This modification improves the clarity of the superposition, making the structural differences more discernible.

      (10) Figure 2C Check the coordination state (dotted line), there appears to be coordination between E217Cg and Mg. Also, water that is located near N492 appears to be a bit distant from Mg, why does this act as a ligand? Stereo view or view from different angles, and distance information would help the reader understand the bonding state in more detail.  

      Yes, we confirm that Mg<sup>2+</sup> is coordinated by the oxygen atoms from both the side chain and main chain of residue E217. The water molecule near N492 is not directly coordinated with Mg<sup>2+</sup> but interacts with the O5 atom of one of the phosphate groups in ETD. To enhance clarity, we have updated Figure 2C (and other related figures) to include stereo views.  

      (11) Figure 5A: in the Bottom view (lower left), the symmetric dimer does not look symmetric. Better to view from a 2-fold axis exactly.  

      We have taken this figure out entirely and instead add a direct comparison to the in silico predicted distribution from the asymmetric hybrid structure to all 16 experimental DEER distributions. We have added the symmetric and asymmetric structures to Fig. 4A and view the symmetric structure along the 2-fold axis, as suggested.   

      (12) Figure 5B: Indicate which data is plotted in the caption.  

      As mentioned above, we have taken this figure out, as we felt quantifying two overlapping populations from a single Gaussian was over-interpretation of the data, and at the suggestion of reviewer 3, we have tailored our interpretation here.  

      (13) Figure EV8:  

      Because the authors discuss a lot about their conclusive model based on this data, Figure EV8 should be treated as a main figure, not a supplement. However, this reviewer has serious concerns about the measurement in this figure. Because DEER for T211 is too noisy, I don't see the point in discussing this in detail. For example, in the Ca/ETD data, there is a peak near 50A, but it would be difficult for TM5 to move away from this distance unless the protein unfolds. I do not find it meaningful to discuss using measurement results in which such an impossible distance is detected as a peak.  

      A: Show top view as in Figure 5  

      D: 2nd row dotted line. Regarding the in silico model that is used as a reference to compare the distance information, the distance of 40-50 A for T211 in the Ca-bound form is hard to imagine. PDB 4av6 model shows that T211 is disordered and not visible, but given the position of the TM5 helix, it does not appear to be that different from the IDR binding structure (5LZQ, 10A between two T211). The structures of in silico models are not shown in the figure, as it is only mentioned as modeled in Rossetafold. Please indicate their structures, especially focused on the relative orientation of T211 and S525 in the dimer, which would allow readers to determine the distances.  

      We acknowledge the reviewer’s concerns regarding Figure EV8 and the DEER data for T211R1. Upon re-evaluation, we recognize that the non-oscillating nature of the DEER data for T211R1 leads to broad distributions, indicating increased conformational dynamics, which is expected for a highly dynamic loop. Consequently, we have limited the discussion and interpretation of T211R1 in the revised manuscript and focused more on C599R1.

      Reviewer #3 (Recommendations for the authors):  

      A careful interpretation of the data in view of these limitations and without directly linking to asymmetry could solve the problem of the over-interpretation of the DEER data.  

      We respectfully disagree with the reviewer. Please see our detailed response above.  

      Additional comments:  

      (1) Did the authors use a Cys-less construct for spin labeling and DEER experiments?  

      We utilized a nearly Cys-less construct in which all native cysteines were mutated to serine, except for Cys183, which was retained due to its buried location and functional importance. We then introduced single cysteine mutations for spin labelling. For C599, Ser599 was reverted to cysteine.

      (2) The time data for position T211R1 is too short for most cases (Figure EV8D) for a reliable distance determination. No confidence interval is given for the '+Ca' sample distance distributions.  

      We recorded longer time traces for two of the conditions to better assign the background. We did not use the 211R1 data to reach any conclusions regarding asymmetry, which were based on the 525R1 and the 599R1 data. We now simply include T211R1 data to indicate the high mobility observed at loop5-6. We have added the confidence interval for the +Ca condition.  

      (3) It is recommended to mention the 2+1 artefact obvious at the end of the DEER data. 

      In the methods section, we have mentioned that the “2+1” artefact present at the end of the S525R1, and T211R1 DEER data likely arises from using a 65 MHz offset, rather than an 80 MHz offset (as for the C599R1 data), which avoids significant overlap of the pump and detection pulses. We also mention in the methods section that owing to the intense “2+1” artefact, the decision was made to truncate the artefact away, to minimise the impact on data treatment. As for motivation to use the lower offset of 65 MHz, we did so to maximise the achievable signal-to-noise ratio (SNR), as particularly for the T211R1 data, the detected echo was quite weak. This was further exacerbated by the poor transverse relaxation time observed at that site.  

      (4) Please check the number of significant digits for all the reported values. 

      We have addressed the number of significant digits as requested.

      (5) Please report the mean distances from DEER experiments with the standard deviation or FWHM.

      We have addressed this in the revised manuscript, we report modal distances rather than the mean distances and provide the FWHM and standard deviation.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1:

      Weaknesses:

      (1) Only Experiment 1 of Rademaker et al (2019) is reanalyzed. The previous study included another experiment (Expt 2) using different types of distractors which did result in distractor-related costs to neural and behavioral measures of working memory. The Rademaker et al (2019) study uses these two results to conclude that neural WM representations are protected from distraction when distraction does not impact behavior, but conditions that do impact behavior also impact neural WM representations. Considering this previous result is critical for relating the present manuscript's results to the previous findings, it seems necessary to address Experiment 2's data in the present work

      We thank the reviewer for the proposal to analyze Experiment 2 where subjects completed the same type of visual working memory task, but instead had either a flashing orientation distractor or a naturalistic (gazebo or face) distractor present during two-thirds of the trials. As the reviewer points out, unlike Experiment 1, these two conditions in Experiment 2 had a behavioral impact on recall accuracy, when compared to the blank delay. We have now run the temporal cross-decoding analysis, temporally-stable neural subspace analysis, and condition cross-decoding analysis in Experiment 2. The results from the stable subspace analysis are present in Figure 3, while the results from the temporal cross-decoding analysis and condition cross-decoding analysis are present in the Supplementary Data.

      First, we are unable to draw strong conclusions from the temporal cross-decoding analysis, as the decoding accuracies across time in Experiment 2 are much lower compared to Experiment 1. In some ROIs of the naturalistic distractor condition we see that some diagonal elements are not part of the above-chance decoding cluster, making it difficult to draw any conclusions regarding dynamic clusters. We do see some dynamic coding in the naturalistic condition in V3 where the off-diagonals do not show above-chance decoding. Since the temporal cross-decoding provides low accuracies, we do not examine the dynamics of neural subspaces across time.

      We do, however, run the stable subspace analysis on the flashing orientation distractor condition. Just like in Experiment 1, we examine temporally stable target and distractor subspaces. When projecting the distractor onto the working memory target subspace, we see a higher overlap between the two as compared to Experiment 1. A similar pattern is seen also when projecting the target onto the distractor subspace. We still see an above-chance principal angle between the target and distractor; however, this angle is qualitatively smaller compared to Experiment 1. This shows that the degree of separation between the two neural subspaces is impacted by behavioral performance during recall.

      (2) Primary evidence for 'dynamic coding', especially in the early visual cortex, appears to be related to the transition between encoding/maintenance and maintenance/recall, but the delay period representations seem overall stable, consistent with previous findings

      We agree with the reviewer that we primarily see dynamic coding between the encoding/maintenance and at the end of the maintenance periods, implying the WM representations are stable in most ROIs. The only place where we argue that we might see more dynamic coding during the delay itself is in V1 during the noise distractor trials in Experiment 1.

      (3) Dynamicism index used in Figure 1f quantifies the proportion of off-diagonal cells with significant differences in decoding performance from the diagonal cell. It's unclear why the proportion of time points is the best metric, rather than something like a change in decoding accuracy. This is addressed in the subsequent analysis considering coding subspaces, but the utility of the Figure 1f analysis remains weakly justified.

      We agree that other metrics can also provide a summary of dynamics; here, the dynamicism index just acts as a summary visualizing the dynamic elements. It offers an intuitive way to visualize peaks and troughs of the dynamic code across the extent of the trial.

      (4) There is no report of how much total variance is explained by the two PCs defining the subspaces of interest in each condition, and timepoint. It could be the case that the first two principal components in one condition (e.g., sensory distractor) explain less variance than the first two principal components of another condition.

      We thank the reviewer for this comment. We have now included the percent variance explained for the two PCs in both the temporally-stable target and distractor subspace and the dynamic subspace analysis. The percent-explained is comparable across analyses; the first PC ranges from 43-50% and the second ranges from 28-37%. The PCs within each analysis (dynamic no-distractor, orientation and noise distractor; temporally-stable target and distractor) are even closer in range (Figure 2c and 3d).

      (5) Converting a continuous decoding metric (angular error) to "% decoding accuracy" serves to obfuscate the units of the actual results. Decoding precision (e.g., sd of decoding error histogram) would be more interpretable and better related to both the previous study and behavioral measures of WM performance.

      We thank the reviewer for the comments. FCA is a linear function of the angular error that uses the following equation:

      We think that the FCA does not obfuscate the results, but instead provides an intuitive scale where 0% accuracy corresponds to a 180° error, 50% to a 90° error and so on. This also makes it easy to reverse-calculate the absolute error if need be. Our lab has previously used this method in other neuroimaging papers with continuous variables (Barbieri et al. 2023, Weber et al. 2024).

      We do, however, agree that “% decoding accuracy” does not provide an accurate reflection of the metric used. We have thus now changed “% decoding accuracy” to “Accuracy (% FCA)”.

      (6) This report does not make use of behavioral performance data in the Rademaker et al (2019) dataset.

      We have now analyzed Experiment 2 which, as previously mentioned by the reviewer and unlike Experiment 1, showed a decrease in recall accuracy during the two distractor conditions. We address the results from Experiment 2 in a previous response (please see Weaknesses 1).

      We do not, however, relate single subject behavioral performance to neural measurements, as we do not think there is enough power to do so with a small number of subjects in both Experiment 1 and 2. 

      (7) Given there were observed differences between individual retinotopic ROIs in the temporal cross-decoding analyses shown in Figure 1, the lack of data presented for the subspace analyses for the corresponding individual ROIs is a weakness

      We have now included an additional supplementary figure that shows individual plots of each ROI for the temporally stable subspace analysis for both Experiment 1 and Experiment 2 (Supplementary Figure 5). 

      Reviewer #1 (Recommendations For The Authors):

      (1) Is there any relationship between stable/dynamic coding properties and aspects of behavioral performance? This seems like a major missed opportunity to better understand the behavioral relevance or importance of the proposed dynamic and orthogonal coding schemes. For example, is it the case that participants who have more orthogonal coding subspaces between orientation distractor and remembered orientation show less of a behavioral consequence to distracting orientations? Less induced bias? I know these differences weren't significant at the group level in the original study, but maybe individual variability in the metrics of this study can explain differences in performance between participants in the reported dataset

      As mentioned in the previous response, we do not run individual correlations between dynamic or orthogonal coding metrics and behavioral performance, because of the small number of subjects in both experiments. We believe that for a brain-behavior correlation between average behavioral error of subjects and an average brain measure, we would need a larger sample size.  

      (2) The voxel selection procedure differs from the original study. The authors should add additional detail about the number of voxels included in their analyses, and how this number of voxels compares to that used in the original study.

      We have now added a figure summarizing the number of voxels selected across participants. We do select fewer voxels compared to Rademaker et al. 2019 (see their Supplementary Tables 9 and 10 and our Supplementary Figure 8). For example we have ~500 voxels on average in V1 in Experiment 1, while the original study had ~1000. As mentioned in the methods, we aimed to select voxels that reliably responded to both the perception localizer conditions and the working memory trials.

      (3) Lines 428-436 specify details about how data is rescaled prior to decoding. The procedure seems to estimate rescaling factors according to some aspect of the training data, and then apply this rescaling to the training and testing data. Is there a possibility of leakage here? That is - do aspects of the training data impact aspects of the testing data, and could a decoder pick up on such leakage to change decoding? It seems this is performed for each training/testing timepoint pair, and so the temporal unfolding of results may depend on this analysis choice.

      Thank you for the suggestion. To prevent data leakage, the mean and standard deviation are computed exclusively from the training set. These scaling parameters are then applied to the test set, ensuring that no information from the test set influences the training process. This transformation simply adjusts the test set to the same scale as the training data, without exposing the model to unseen test data during training.

      (4) Figure 1d, V1: it looks like the 'dynamics' are a bit non-symmetric - perhaps the authors could comment on this detail of the results? Why would we expect there would be a dynamic cluster on one side of the diagonal, but not the other? Given that this region, condition is the primary evidence for a dynamic code that's not related to the beginning/end of delay (see other comments), figuring this out is of particular importance.

      We thank the reviewer for this question. We think that this is just due to small numerical differences in the upper and lower triangles of the matrix, rather than a neuroscientifically interesting effect. However, this is only a speculative observation.

      (5) I think it's important to address the issue I raised in "weaknesses" about variance explained by the top N principal components in each condition. What are we supposed to learn from data projected into subspaces fit to different conditions if the subspaces themselves are differently useful?

      Thank you, this has now been addressed in a previous comment (please see Weakness 4). 

      Reviewer #2:

      Weaknesses:

      (1) An alternative interpretation of the temporal dynamic pattern is that working memory representations become less reliable over time. As shown by the authors in Figure 1c and Figure 4a, the on-diagonal decoding accuracy generally decreased over time. This implies that the signal-to-noise ratio was decreasing over time. Classifiers trained with data of relatively higher SNR and lower SNR may rely on different features, leading to poor generalization performance. This issue should be addressed in the paper.

      We thank the reviewer for raising this issue and we have now run three simulations that aim to address whether a changing SNR across time might create dynamic clusters. 

      In the first simulation we created a dataset of 200 voxels that have a sine or cosine response function to orientations between 1° to 180°, the same orientations as the remembered target. A circular shift is applied to each voxel to vary preferred (or maximal) responses of each simulated voxel. We then assess the decoding performance under different SNR conditions during training and testing. For each of the seven iterations we selected 108 responses (out of 180) to train on and 108 to test on. To increase variability the selected trials differed in each iteration. Random white noise was applied to the data and thus the SNR was independently scaled according to the specified levels for train and test data. We then use the same pSVR decoder as in the temporal cross decoding analysis to train and test. 

      The second and third simulations more directly address whether increased noise levels  would induce the decoder to rely on different features of the no-distractor and noise distractor data. We use empirical data from the primary visual cortex (V1; where dynamic coding was seen in the noise distractor trials) under the no-distractor and noise distractor conditions for the second and third simulations, respectively. Data from time points 5.6–8.8 seconds after stimulus onset are averaged across five TRs. As in the first simulation, SNR is systematically manipulated by adding white noise. Additionally, to see whether the initial decrease in SNR and subsequent increase would result in dynamic coding clusters, we initially increased and subsequently decreased the amplitude of added noise. The same pSVR decoder was used to train and test on the data with different levels of added noise.

      We see an absence of dynamic elements in the SNR cross-decoding matrices, as the decoding accuracy primarily depends on the training data rather than test data. This results in some off-diagonal values in the decoding matrix that are higher, rather than smaller, than corresponding on-diagonal elements.

      We have now added a Methods section explaining the simulations in more detail and Supplementary Figure 9 showing the SNR cross-decoding matrices. 

      (2) The paper tests against a strong version of stable coding, where neural spaces representing WM contents must remain identical over time. In this version, any changes in the neural space will be evidence of dynamic coding. As the paper acknowledges, there is already ample evidence arguing against this possibility. However, the evidence provided here (dynamic coding cluster, angle between coding spaces) is not as strong as what prior studies have shown for meaningful transformations in neural coding. For instance, the principal angle between coding spaces over time was smaller than 8 degrees, and around 7 degrees between sensory distractors and WM contents. This suggests that the coding space for WM was largely overlapping across time and with that for sensory distractors. Therefore, the major conclusion that working memory contents are dynamically coded is not well-supported by the presented results.

      We thank the reviewer for this comment. The principal angles we calculate are above-baseline, meaning that we subtract the within-subspace principal angles from the between-subspace principal angles and take the average. Thus a 7 degree difference does not imply that there are only 7 degrees separating e.g. the sensory distractor from the target; it just indicates that the separation is 7 degrees above chance. 

      (3) Relatedly, the main conclusions, such as "VWM code in several visual regions did not generalize well between different time points" and "VWM and feature-matching sensory distractors are encoded in separable coding spaces" are somewhat subjective given that cross-condition generalization analyses consistently showed above chance-level performance. These results could be interpreted as evidence of stable coding. The authors should use more objective descriptions, such as 'temporal generalization decoding showed reduced decoding accuracy in off-diagonals compared to on-diagonals.

      Thank you, we agree that our previous claims might have been too strong. We have now toned down our statements in the Abstract and use “did not fully generalize” and “VWM and feature-matching sensory distractors are encoded in coding spaces that do not fully overlap.”

      Reviewer #2 (Recommendations For The Authors):

      Weakness 1 can potentially be addressed with data simulations that fix the signal pattern, vary the noise pattern, and perform the same temporal generalization analysis to test whether changes in SNR can lead to seemingly dynamic coding formats.

      Thank you for the great suggestion. We have now run the suggested simulations. Please see above (response to Weakness 1).

      There are mismatches in the statistical symbols shown in Figure 4 and Supplementary Table 2. It seems that there was a swap between the symbols for the noise between-condition and noise within-condition.

      Thank you, this has now been fixed.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review): 

      (1) In Figure 1, the authors show that TF3C binds to the amino terminus of MYCN (Myc box I region), as shown previously. The data in Figure 1 B-D support, but do not rigorously confirm a 'direct' interaction because it has not been ruled out that accessory proteins mediating the association may be present in the mixture.

      In Figure 1B-D we have purified MYCN and the TFIIIC/TauA complex separately and then mixed the purified preparations, demonstrating that the purified proteins interact. We have additionally performed mass spectrometry, which shows that the TauA/MYCN complex is formed without further accessory proteins, as the molecular weight would be higher. Based on the Coomassie stained SDS-PAGE gels, there is no plausible contaminating band in the purified complex that could be mediating the interaction between MYCN and TauA, either in the purified complex (Figure 1C), or in the purified protein used to reconstitute the complex (Figure S1A & S1B).

      (2) The authors indicate in Figure 2 that TF3C has essentially no effect on MYCNdependent gene expression and/or transcription elongation. Yet a previous study (PMID: 29262328) associated with several of the same authors concluded that TF3C positively affects transcription elongation. The authors make no attempt to reconcile these disparate results and need to clarify this point.

      We agree that the data in this manuscript do not support the role on transcription elongation. This point was also raised by Reviewer 3. Comparing our new results to the data published previously we can summarize that the data sets in the two studies show three key results: First, the traveling ratio of RNAPII changes upon induction of MYCN. Second, RNAPII decreases at the transcription start side and third, it increases towards the end side.

      We agree that in the previous study we linked the traveling ratio directly to elongation. However performing ChIP-seq with different RNAPII antibodies showed us that for example RNAPII (N20), which is unfortunately discontinued, gives different results compared to RNAPII (A10). Combining our new results using the RNAPII (8WG16) antibody shows that the traveling ratio is not only reflecting transcription elongation but also includes that the RNAPII is kicked-off chromatin at the start side.

      (3) Figures 2B and C show that unphosphorylated pol2 is TSS-centered, and Ser2-P pol2 occupation is centered beyond the TES. From this data, however, the reader can't tell how much of the phospho-Ser2- pol2 is centered on the TSS. The authors should include overall plots over TSS and TES, and also perhaps the gene-body to allow a better comparison for TSS and TES plotted for both antibodies over the collected gene sets.

      We focused on the TSS for unphosphorylated RNAPII and the TES for pSer2-RNAPII, as these are the regions with specific enrichment of the respective antibodies. As requested for comparison, we now include metagenes showing TSS, gene-body, and TES for both antibodies as new Figure S2A and B. Additionally, we included density plots for unphosphorylated RNAPII at the TES as well as for pSer2-RNAPII at the TSS as a Figure for the Reviewers (Figure 1).

      (4) The authors see more TF3C at promoters in cells with MYCN (Figure 2F). What are the levels of TF3C in the absence and presence of MYCN?

      As shown in the immunoblot in Figure S1E, TF3C5 levels do not change upon induction of MYCN. We therefore think that MYCN helps to recruit TFIIIC5 to RNAPII promoter sites. This is also in accordance to what we previously reported 1.

      (5) The finding that TF3C is increased at TSS (Figure 2F) doesn't necessarily indicate that 1) MYCN is recruiting TF3C there, and 2) that this is due to the phosphorylation status of pol2. It could mean many other things. The logic of conflating these 3 points based on the data shown is questionable.

      We showed previously that knock-down of MYCN affects TFIIIC5 binding, showing that MYCN is required for binding of TFIIIC5 at promoter sites 1.

      Additionally, we included data with DRB treated cells (Figure 2F), which prevents RNAPII loading by preventing downstream de novo elongation. Those data show that TFIIIC5 binding at the TSS is massively increased upon induction of MYCN and additionally upon treatment with DRB. Conversely, we observed that the major effect of TFIIIC knock-down was at the nonphosphorylated RNAPII at the TSS on MYCN induction (Figure 2B). Therefore, we would argue that our assumption fits well to the data presented in the manuscript.

      (6) Figure 3A doesn't add much to the paper, as it is overplotted and no relationship is clear, except that Pol2 and MYCN occupy many of the same sites. Perhaps a less complex or different type of plot would allow the interactions to be better visible.

      We agree with the comment and since in another comment we were asked to show the same window for all shown Hi-ChIP data plots, we changed Figure 3A.

      (7) That depletion of TF3C leads to increased promoter hubs may or may not have anything to do with its association with MYCN (Figure 4E). This could be a direct consequence of its known structural function in cohesin complexes, and the MYCN changes as a secondary consequence of this (also see point 4, above).

      As shown in Büchel et al. (2017) 1 MYCN is needed to recruit RAD21 and depletion of RAD21 has no impact on the recruitment of MYCN. Since RAD21 is part of the cohesin complex we would exclude that the MYCN changes are a secondary consequence.

      (8) Depletion of TF3C5 results in a loss of EXOSC5 (exosome) at TSS in the presence and absence of MYCN (Figure 5B). As TF3C5 is a cohesin, could this simply be a consequence of genomic structure changes?

      We agree that the discovered changes in EXOSC5 can be due to depletion of TFIIIC5. TFIIIC has been shown to recruit cohesin 1 and condensin complexes 2, as well as inducing chromatin architectural changes 3. However, MYCN is needed to recruit TFIIIC and depletion of TFIIIC had no impact on MYCN recruitment 1. Furthermore, MYCN has been shown to recruit exosome 4. Therefore, we would argue that either MYCN can directly play a role or thru chromatin architectural changes.

      (9) The authors suggest that RNA dynamics are affected by changes in exosome function (RNA degradation, etc). What effect, if any does TF3C depletion have on the overall gene expression profile?

      We show in the manuscript that TFIIIC depletion in unperturbed cells has no effect on the global gene expression profile in the time frame analyzed (Figure 2E and S2B).

      Reviewer #2 (Public Review):

      (1) Dynamic inferences are made without kinetic experiments.

      While we agree that we did not collect kinetic data to study the dynamics of RNA polymerase we would argue that the integration of our different data sets make it possible to draw conclusions about dynamic interferences. The transcription cycle and its sequential steps have been well described. In this sense, we use the non-phosphorylated RNAPII data that is situated between RNAPII recruitment and initiation and RNAPII-pSer2 that shows pause-release to elongation to draw conclusions on the dynamic. Likewise, we also made use of our previous published datasets.

      Reviewer #2 (Recommendations For The Authors):  

      (1) A number of changes are reported in hub size, expression, etc. upon treatment with tamoxifen to activate MCN-ER. But MYC is already present in the SHEP cells, so why doesn't MYC support these same phenomena? It would seem that either the ability to cooperate with TFIIIC to clear non-productive polymerase complexes from promoters is particular to MYCN, or else it reflects a quantitative increase in total MYC proteins due to the entry of MYCN-ER into the nucleus with tamoxifen. The authors should address or discuss this issue.

      It could be that protein levels are the limiting factor between MYC and MYCN observed effects in this system. This interpretation would be in accordance with the results of Lorenzin et al. 5, which reported that different levels of MYC had different targets based on the affinity to Eboxes and protein level. A similar profile of MYC levels compared to function was also reported regarding SPT5 6. Those high protein levels mimic what is found in certain tumors in contrast to physiological levels. In this sense, the observed differences can also be between physiological and oncological levels of MYC proteins.

      On the other hand, it has been described both a core MYC- and an isoform specific-signature of target genes. MYCN is described to be involved in gene expression during the S-phase of the cell cycle 7. This suggests that there are differences between MYC and MYCN other than gene sets. The interaction with TFIIIC appears to be one of these differences. We have found multiple TFIIIC subunits as part of the MYCN interactome, but the interaction of TFIIIC with MYC is weaker and we are uncertain how relevant it is 7,8. We show here that depletion of different subunits of the TFIIIC complex show a MYCN-dependent growth defect (Figure 1 E). Similarly, nuclear exosome is a MYCN-specific dependence 4, and we show here that MYCNdependent recruitment of the exosome requires TFIIIC5. We take this as an indication that there is an intrinsic difference between MYC and MYCN and that MYCN engages TFIIIC for this pathway.

      (2) Reciprocal to TFIIIC recruitment to MYCN- rRNA, and other RNAPIII genes. Does this happen targets would be MYCN association with tRNA genes, 5S, and if so, is this association TFIIIC dependent? What happens to the expression of these genes?

      We did observe MYCN in interactions involving tRNA and other RNAPIII sites, such as SINE elements and tRNAs (Figure 4B, 4D, S3F, and S4B). There was no relevant number of 5S rRNA involved in interactions – either because the difficulty to properly map these repetitive regions or due to biology. In any case, none of those regions appeared to be specifically dependent on TFIIIC as the overall number of interactions increased in TFIIIC depletion regardless of the genomic annotation (Figure S4B). Regarding the expression of RNAPIII genes, we are constrained by technical limitations of poly(A) enrichment RNA-seq to globally analyze it in an unbiased way. However, we addressed this point for tRNAs expression in an earlier work 1 and found that tRNA levels do not change upon TFIIIC depletion. We think this is because tRNAs are stable transcripts and RNAPIII recycling can occur in a TFIIICindependent manner 9. Conversely, we reported no significant expression changes in RNAPII genes upon TFIIIC depletion in this work.

      (3) The authors show that TFIIIC depletion does not alter the RNA-expression profile; how do they account for this? Can they comment on "background" transcription that it would seem should be suppressed by TFIIIC-dependent removal of various hypofunctional polymerases?

      Since TFIIIC is important for the removal of non-functional RNAPII we would not expect changes to the gene expression profile upon depletion of TFIIIC in the time frame analyzed. Monitoring the elongating form of RNAPII by measuring pSer2 indeed shows us that transcription elongation is not affected.

      (4) Global changes in expression are difficult to assess with DESEQ2. This hypernormalizing algorithm is not really suited to distinguish differential, but universal upregulation from some targets being truly upregulated while others are downregulated. The authors should comment.

      The authors acknowledge that DESEQ2 relies on the conjecture that genewise estimates of dispersion are generally unchanged among samples. We address this comment in two different ways. We include those in the Figure for the Reviewers (Figure 2). The first was to sequence samples deeper to avoid any bias created by random effect of lower coverage, the range of total reads increased from 6.8-9.3 to 16.5-20.7 million reads. The second was to compare the fold average bin dot plot for RNA-seq of SH-EP-MYCN-ER showing mRNA expression normalized by control per bin using the DESEQ2 (Figure 2A) normalization to TMM in edgeR (Figure 2B) and to quantile normalization (Figure 2C). No major differences were found from the original data or using the different methods, but we updated the Figure 2E in the manuscript to include the deeper sequencing dataset, we also adjusted it to show -/+ MYCN and transformed to log2 to make it more intuitive. Overall, it enhances our original understanding that gene expression remains largely unaffected by TFIIIC5 knockdown.

      (5) On page 7, the authors claim that MYCN-ER increased Ser-2 can reflect MYCN-stimulated transcription elongation. In fact, without kinetic studies, this is not fully supported. Accumulation of Ser-2 RNAPII along a gene can reflect increased initiation of full-speed RNAPs or a pile-up of RNAPs slowing down. This should be resolved or qualified.

      While we agree that we did not collect kinetic data to study the dynamics of RNA polymerase we would argue that the integration of our different data sets make it possible to draw conclusions about dynamic interferences. We showed on the one side that pSer-2 accumulates on the TES and on the other side the induction of MYCN-ER up-regulates gene expression which proves productive transcription elongation.

      (6) pLHiChIP needs to be better described, the Mumbach reference is not sufficient.

      We have reformulated the pLHiChIP in the method section and hope that this will provide now a better description of the method.

      (7) Can the authors recheck all the labels in Figure 2D-I believe there is an error involving + or - MYCN.

      We carefully rechecked all the labels in Figure 2 and it was correct as it was. We understand the confusion that may have created comparing Figure 2D and Figure 2E. To avoid confusion, we updated Figure 2E to show the same direction of Figure 2D. We also log2 transformed the y-axis of Figure 2E to foster a more intuitive reading.

      (8) Why are there different scales for the regions of chromosome 17 shown in Figures 3 and 4? It would be easier to compare if the examples were all shown at the same scale (about 2 MB is shown in another Figure).

      We now show the same region of chromosome 17 in Figure 3 and 4.

      Reviewer #3 (Public Review):

      (1) The connection between the three major findings presented in this study regarding the role of TFIIIC in the regulation of MYCN function remains unclear. Specifically, how the TFIIICdependent restriction of MYCN localization to promoter hubs enhances the association of factors involved in nascent RNA degradation to prevent the accumulation of inactive RNA polymerase II at promoters is not apparent. As they are currently presented, these findings appear as independent observations. Cross-comparison of the different datasets obtained may provide some insight into addressing this question.

      We previously observed that TFIIIC does not affect MYCN recruitment, while MYCN affects TFIIIC binding 1. Moreover, our group reported that MYCN recruits exosome 4 and BRCA1 to promoter-proximal regions 10 to clear out non-functional RNAPII. We are currently reporting that MYCN-TFIIIC complexes exclude non-functional RNAPII. However, MYCN-active promoter hubs have more RNAPII and more transcription than MYCN-active promoter outside hubs. Furthermore, TFIIIC binding occurs upstream of BRCA1 and exosome recruitments as depletion of TFIIIC leads to recruitment decrease of both factors. Therefore, we argue that TFIIIC is required for the proper function of those MYCN-active promoter hubs.

      (2) Another concern involves the disparities in RNA polymerase II ChIP-seq results between this study and earlier ones conducted by the same group. In Figure 2, the authors demonstrate that activation of MYCN results in a reduction of non-phosphorylated RNA polymerase II across all expressed genes. This discovery contradicts prior findings obtained using the same methodology, where it was concluded that the expression of MYCN had no significant effect on the chromatin association of hypo-phosphorylated RNA polymerase II (Buchel et al, 2017). In this regard, the choice of the 8WG16 antibody raises concern, as fluctuations in the signal may be attributed to changes in the phosphorylation levels of the Cterminal domain. It remains unclear why the authors decided against using antibodies targeting the N-terminal domain of RNA polymerase II, which are unaffected by phosphorylation and consistently demonstrated a significant signal reduction upon MYCN activation in their previous studies (Buchel et al, 2017) (Herold et al, 2019). Similarly, the authors previously proposed that depletion of TFIIIC5 abrogates the MYCN-dependent increase of Ser2phosphorylated RNA polymerase II (Buchel et al, 2017), whereas they now show that it has no obvious impact. These aspects need clarification.

      We politely disagree that our discoveries are contradicting each other. Comparing our new results to the data published previously we can summarize that the data sets in the two studies show three key results: First, the traveling ratio of RNAPII changes upon induction of MYCN. Second, RNAPII decreases at the transcription start side and third, it increases towards the end side.

      We agree that in the previous study we linked the traveling ratio directly to elongation. However performing ChIP-seq with different RNAPII antibodies showed us that for example RNAPII (N20), which is unfortunately discontinued, gives different results compared to RNAPII (A10). Combining our new results using the RNAPII (8WG16) antibody shows that the traveling ratio is not only reflecting transcription elongation but also includes that the RNAPII is kicked-off chromatin at the start side.

      In the previous study we only performed manual ChIP experiments for RNAPII (8WG16) and pSer2. Now we did a global analysis which is more meaningful and is also reflected in the RNA sequencing data.

      (3) Finally, the varied techniques employed to explore the role of TFIIIC in MYCNdependent recruitment of nascent RNA degradation factors make it challenging to draw definitive conclusions about which factor is affected and which one is not. While conducting ChIPseq experiments for all factors may be beyond the scope of this manuscript, incorporating proximity ligation assays (PLA) or ChIP-qPCR assays with each factor would have enabled a more direct and comprehensive comparison.

      We understand the criticism that we are comparing different assays. We have performed PLAs with different antibodies. Since the controls of the PLAs were not sufficient for us, we refrain from using them. ChIP-qPCR experiments are much more challenging to do side by side compared to PLAs, which is why we decided against looking at all factors with this method.

      Recommendations For The Authors:

      Reviewer #3 (Recommendations For The Authors):

      (1) Figure 2: Why did the authors choose the 8WG16 antibody? Does TFIIIC5 depletion suppress the MYCN-dependent reduction of total RNA polymerase II binding to promoters that they consistently showed in previous studies? Given that phosphorylation of the CTD impacts 8WG16 recognition, including Ser5-phosphorylated RNA polymerase II ChIPseq experiments might clarify this issue.

      We used the RNAPII (8WG16) antibody to exactly map non-phosphorylated RNAPII which shows us the binding of non-functional RNAPII.

      (2) Figures 3 and 4: As it stands, the manuscript does not convincingly establish a functional connection between the results in Figures 2, 3, and 4 or elucidate potential mechanisms. Are changes in RNA polymerase II levels upon MYCN activation more pronounced at promoters located at MYCN hubs? Do changes in MYCN-enriched chromatin contacts upon TFIIIC5 depletion somehow correlate with alterations in RNA polymerase II levels? Performing similar cross-comparisons as in Figure 3C may help address this issue. Furthermore, it not clear how the authors concluded that MYCN/TFIIIC5-bound genes are not part of these so-called promoter hubs.

      In Figure 3C we show that RNAPII levels are more pronounced upon MYCN activation at promoters located at MYCN hubs. Additionally, we show non-phosphorylated ChIP-seq on TSS and RNAPII-pSer2 ChIP-seq on TES density plots for promoters with MYCN interactions in the Figure for the Reviewers (Figure 3). We found no other difference than binding compared to the overall global analysis for all expressed genes showed in Figure 2B and Figure 2C. This goes on the same direction of the high expression observed of those genes in MYCN interactions observed in Figure 3C.

      The changes observed in Figures 2B and 2C are global and do include the promoters with MYCN interactions. At the same time, it is required a higher number of replicates to statistically distinguish the MYCN interaction differences between TFIIIC5 presence and depletion. We acknowledge this limitation, and we therefore restrain any attempt towards this end. We base our conclusions on the other parts of the manuscript and on our previous studies that show that MYCN recruits TFIIIC, BRCA1, and the exosome to promoter proximal regions 1,4,10.

      (3) Figure 5: According to the PLA results, activation of MYCN could enhance RNA polymerase II-NELFE interaction in a TFIIC5-dependent manner. Considering the raised issues regarding the use of the 8WG16 antibody, this result might be of relevance.

      Nevertheless, PLA does not seem to be the optimal technique to address these questions, and I would rather suggest performing ChIP-qPCR experiments for all the factors to be compared. Finally, do the authors conclude that the TFIIIC5 effect on MYCN-dependent changes in RNA polymerase II depends upon the recruitment of EXOSC5 and BRCA1? If so, it would be interesting to determine whether depletion of these factors phenocopies the effects observed with TFIIC5.

      We understand the criticism that we are comparing different assays. We have performed PLAs with different antibodies. Since the controls of the PLAs were not sufficient for us, we refrain from using them.

      (4) In Figure S2 the labels should be EtOH, 4-OHT, and Input.

      We changed this accordingly.

      (5) On page 7, the sentence "We have shown previously that TFIIIC5 depletion does not cause significant changes in expression of multiple tRNA genes that are transcribed by RNAPIII (Buchel et al., 2017)" appears to lack a connection.

      We agree with the reviewer and we deleted this sentence from the manuscript.

      Author response image 1.

      (A) Density plot of ChIP-Rx signal for non-phosphorylated RNAPII. Data show mean (line) ± standard error of the mean (SEM indicated by the shade) of different gene sets based on an RNA-seq of SH-EP-MYCN-ER cells ± 4-OHT. The y-axis shows the number of spike-in normalized reads and it is centered to the TES ± 2 kb. N = number of genes in the gene set defined in the methods. (B) Density plot of ChIP-Rx signal for RNAPII pSer2 as described for panel A. The signal is centered to the TSS ± 2 kb.

      Author response image 2.

      Bin dot plot for RNA-seq of SH-EP-MYCN-ER showing mRNA expression normalized by control per bin comparing the fold average using DESEQ2 (A), normalization to TMM in edgeR (B) and to quantile normalization (C).

      Author response image 3.

      Average density plot of ChIP-Rx signal for non-phosphorylated RNAPII (A) or RNAPII pSer2 (B) at promoters with MYCN interactions.

      References

      (1) Büchel, G., Carstensen, A., Mak, K.-Y., Roeschert, I., Leen, E., Sumara, O., Hofstetter, J., Herold, S., Kalb, J., and Baluapuri, A. (2017). Association with Aurora-A controls NMYC-dependent promoter escape and pause release of RNA polymerase II during the cell cycle. Cell reports 21, 3483-3497.

      (2) Yuen, K.C., Slaughter, B.D., and Gerton, J.L. (2017). Condensin II is anchored by TFIIIC and H3K4me3 in the mammalian genome and supports the expression of active dense gene clusters. Sci Adv 3, e1700191. 10.1126/sciadv.1700191.

      (3) Ferrari, R., de Llobet Cucalon, L.I., Di Vona, C., Le Dilly, F., Vidal, E., Lioutas, A., Oliete, J.Q., Jochem, L., Cutts, E., Dieci, G., et al. (2020). TFIIIC Binding to Alu Elements Controls Gene Expression via Chromatin Looping and Histone Acetylation. Mol Cell 77, 475-487 e411. 10.1016/j.molcel.2019.10.020.

      (4) Papadopoulos, D., Solvie, D., Baluapuri, A., Endres, T., Ha, S.A., Herold, S., Kalb, J., Giansanti, C., Schulein-Volk, C., Ade, C.P., et al. (2021). MYCN recruits the nuclear exosome complex to RNA polymerase II to prevent transcription-replication conflicts. Mol Cell. 10.1016/j.molcel.2021.11.002.

      (5) Lorenzin, F., Benary, U., Baluapuri, A., Walz, S., Jung, L.A., von Eyss, B., Kisker, C., Wolf, J., Eilers, M., and Wolf, E. (2016). Different promoter affinities account for specificity in MYC-dependent gene regulation. Elife 5. 10.7554/eLife.15161.

      (6) Baluapuri, A., Hofstetter, J., Dudvarski Stankovic, N., Endres, T., Bhandare, P., Vos, S.M., Adhikari, B., Schwarz, J.D., Narain, A., Vogt, M., et al. (2019). MYC Recruits SPT5 to RNA Polymerase II to Promote Processive Transcription Elongation. Mol Cell 74, 674-687 e611. 10.1016/j.molcel.2019.02.031.

      (7) Baluapuri, A., Wolf, E., and Eilers, M. (2020). Target gene-independent functions of MYC oncoproteins. Nat Rev Mol Cell Biol. 10.1038/s41580-020-0215-2.

      (8) Koch, H.B., Zhang, R., Verdoodt, B., Bailey, A., Zhang, C.D., Yates, J.R., 3rd, Menssen, A., and Hermeking, H. (2007). Large-scale identification of c-MYCassociated proteins using a combined TAP/MudPIT approach. Cell Cycle 6, 205-217. 10.4161/cc.6.2.3742.

      (9) Ferrari, R., Rivetti, C., Acker, J., and Dieci, G. (2004). Distinct roles of transcription factors TFIIIB and TFIIIC in RNA polymerase III transcription reinitiation. Proc Natl Acad Sci U S A 101, 13442-13447. 10.1073/pnas.0403851101.

      (10) Herold, S., Kalb, J., Büchel, G., Ade, C.P., Baluapuri, A., Xu, J., Koster, J., Solvie, D., Carstensen, A., and Klotz, C. (2019). Recruitment of BRCA1 limits MYCN-driven accumulation of stalled RNA polymerase. Nature 567, 545-549.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This fundamental study evaluates the evolutionary significance of variations in the accuracy of the intron-splicing process across vertebrates and insects. Using a powerful combination of comparative and population genomics approaches, the authors present convincing evidence that species with lower effective population size tend to exhibit higher rates of alternative splicing, a key prediction of the drift-barrier hypothesis. The analysis is carefully conducted and all observations fit with this hypothesis, but focusing on a greater diversity of metazoan lineages would make these results even more broadly relevant. This study will strongly appeal to anyone interested in the evolution of genome architecture and the optimisation of genetic systems.

      Public Reviews):

      Reviewer #1 (Public Review:

      Summary:

      Functionally important alternative isoforms are gold nuggets found in a swamp of errors produced by the splicing machinery.

      The architecture of eukaryotic genomes, when compared with prokaryotes, is characterised by a preponderance of introns. These elements, which are still present within transcripts, are rapidly removed during the splicing of messenger RNA (mRNA), thus not contributing to the final protein. The extreme rarity of introns in prokaryotes, and the elimination of these introns from mRNAs before translation into protein, raises questions about the function of introns in genomes. One explanation comes from functional biology: introns are thought to be involved in post-transcriptional regulation and in the production of translational variants. The latter function is possible when the positions of the edges of the spliced intron vary. While some light has been shed on specific examples of the functional role of alternative splicing, to what extent are they representative of all introns in metazoans?

      In this study, the hypothesis of a functional role for alternative splicing, and therefore to a certain extent for introns, is evaluated against another explanation coming from evolutionary biology: isoforms are above all errors of imprecision by the molecular machinery at work during splicing. This hypothesis is based on a principle established by Motoo Kimura, which has become central to population genetics, explaining that the evolutionary trajectory of a mutation with a given effect is intimately linked to the effective population size (Ne) where this mutation emerges. Thus, the probability of fixation of a weakly deleterious mutation increases when Ne decreases, and the probability of fixation of a weakly advantageous mutation increases when Ne increases. The genomes of populations with low Ne are therefore expected to accumulate more weakly deleterious mutations and fewer weakly advantageous mutations than populations with high Ne. In this framework, if splicing errors have only small effects on the fitness of individuals, then natural selection cannot increase the precision of the splicing machinery, allowing tolerance for the production of alternative isoforms.

      In the past, the debate opposed one-off observations of effectively functional isoforms on the one hand, to global genomic quantities describing patterns without the possibility of interpreting them in detail. The authors here propose an elegant quantitative approach in line with the expected continuous variation in the effectiveness of selection, both between species and within genomes. The result describing the inter-specific pattern on a large scale confirms what was already known (there is a negative relationship between effective size and average alternative splicing rate). The essential novelty of this study lies in 1) the quantification, for each intron studied, of the relative abundance of each isoform, and 2) the analysis of a relationship between this abundance and the evolutionary constraints acting on these isoforms.

      What is striking is the light shed on the general very low abundance of alternative isoforms. Depending on the species, 60% to 96% of cases of alternatively spliced introns lead to an isoform whose abundance is less than 5% of the total variants for a given intron.

      In addition to the fact that 60 %-96% of the total isoforms are more than 20 times less abundant than their majority form, this large proportion of alternative isoforms exhibit coding-phase shift at rates similar to what would be expected by chance, i.e. for a third of them, which reinforces the idea that there is no particular constraint on these isoforms.

      The remaining 4%-40% of isoforms see their coding-phase shift rate decrease as their relative abundance increases. This result represents a major step forward in our understanding of alternative splicing and makes it possible to establish a quantitative model directly linking the relative abundance of an isoform with a putative functional role concerning only those isoforms produced in abundance. Only the (rare) isoforms which are abundantly produced are thought to be involved in a biological function.

      Within the same genome, the authors show that only highly expressed genes, i.e. those that tend to be more constrained on average, are also the genes with the lowest alternative splicing rates on average.

      The comparison between species in this study reveals that the smaller the effective size of a species, the more its genome produces isoforms that are low in abundance and low in constraint. Conversely, species with a large effective size relatively reduce rare isoforms, and increase stress on abundant isoforms. To sum up:

      • the higher the effective size of a species, the fewer introns are spliced.

      • highly expressed genes are spliced less.

      • when splicing occurs, it is mainly to produce low-abundance isoforms.

      • low-abundance isoforms are also less constrained.

      Taken together, these results reinforce a quantitative view of the evolution of alternative splicing as being mainly the product of imprecision in the splicing machinery, generating a great deal of molecular noise. Then, out of all this noise, a few functional gold nuggets can sometimes emerge. From the point of view of the reviewer, the evolutionary dynamics of genomes are depressing. The small effective population sizes are responsible for the accumulation of multiple slightly deleterious introns. Admittedly, metazoan genomes try to get rid of these introns during RNA maturation, but this mechanism is itself rendered imprecise by population sizes.

      Strengths:

      • The authors simultaneously study the effects of effective population size, isoform abundance, and gene expression levels on the evolutionary constraints acting on isoforms. Within this framework, they clearly show that an isoform becomes functionally important only under certain rare conditions.

      • The authors rule out an effect putatively linked to variations in expression between different organs which could have biased comparisons between different species.

      Weaknesses:

      • While the longevity of organisms as a measure of effective size seems to work overall, it may not be relevant for discriminating within a clade. For example, within Hymenoptera, we might expect them to have the same overall longevity, but that effective size would be influenced more by the degree of sociality: solitary bees/ants/wasps versus eusocial. I am therefore certain that the relationship shown in Figure 4D is currently not significant because the measure of effective size is not relevant for Hymenoptera. The article would have been even more convincing by contrasting the rates of alternative splicing between solitary versus social hymenopterans.

      As suggested by the reviewer, we investigated the degree of sociality for the 18 hymenopterans included in our study. We observed that the average dN/dS of the 12 eusocial species (4 bees, 6 ants, 2 wasps) is significantly higher than that of the 6 solitary species (p=2.1x10-3; Fig. R1A), consistent with a lower effective population size in eusocial species compared to solitary ones.

      However, the AS rate does not differ significantly between these two groups, neither for the full set of major-isoform introns (Author response image 1B), nor for the subsets of low-AS or high-AS major-isoform introns (Author response image 1C,D). Given the limited sample size (12 eusocial species, 6 solitary species), it is possible that some uncontrolled variables affecting the AS rate hide the impact of Ne.

      Author response image 1.

      Comparison of solitary (N=6) and eusocial hymenopterans (N=12). A: dN/dS ratio. B: AS rate (all major-isoform introns). C: AS rate (low-AS major-isoform introns). D: AS rate (high-AS major-isoform introns). The means of the two group were compared with a Wilcoxon test.

      • When functionalist biologists emphasise the role of the complexity of living things, I'm not sure they're thinking of the comparison between "drosophila" and "homo sapiens", but rather of a broader evolutionary scale. Which gives the impression of an exaggeration of the debate in the introduction.

      We disagree with the referee: in fact, all the debate regarding the paradox of the absence of relationship between the number of genes and organismal complexity arose from the comparative analysis of gene repertoires across metazoans. This debate started in the early 2000’s, when the sequencing of the human genome revealed that it contains only ~20,000 protein-coding genes (far less than the ~100,000 genes that were expected at that time). This came as a big surprise because it showed that the gene repertoire of mammals is not larger than that of invertebrates such as Caenorhabditis elegans (19,000 genes) or Drosophila melanogaster (14,000 genes) . We cite below several articles that illustrate how this paradox has been perceived by the scientific community:

      Graveley BR 2001 Alternative splicing: increasing diversity in the proteomic world. Trends in Genetics 17 : 100–107. https://doi.org/10.1016/S0168-9525(00)02176-4

      “ How can the genome of Drosophila melanogaster contain fewer genes than the undoubtedly simpler organism Caenorhabditis elegans? ”

      Ewing B and Green P 2000 Analysis of expressed sequence tags indicates 35,000 human genes. Nature Genetics 25: 232–234. https://doi.org/10.1038/76115

      “ the invertebrates Caenorhabditis elegans and Drosophila melanogaster having 19,000 and 13,600 genes, respectively. Here we estimate the number of human genes […] approximately 35,000 genes, substantially lower than most previous estimates. Evolution of the increased physiological complexity of vertebrates may therefore have depended more on the combinatorial diversification of regulatory networks or alternative splicing than on a substantial increase in gene number. ”

      Kim E, Magen A and Ast G 2007 Different levels of alternative splicing among eukaryotes. Nucleic Acids Research 35: 125–131. https://doi.org/10.1093/nar/gkl924

      “we reveal that the percentage of genes and exons undergoing alternative splicing is higher in vertebrates compared with invertebrates. […] The difference in the level of alternative splicing suggests that alternative splicing may contribute greatly to the mammal higher level of phenotypic complexity,”

      Nilsen TW and Graveley BR 2010 Expansion of the eukaryotic proteome by alternative splicing. Nature 463 : 457–463. https://doi.org/10.1038/nature08909

      “ It is noteworthy that Caenorhabditis elegans, D. melanogaster and mammals have about 20,000 (ref. 68), 14,000 (ref. 69) and 20,000 (ref. 70) genes, respectively, but mammals are clearly much more complex than nematodes or flies.”

      Reviewer #2 (Public Review):

      Summary:

      Two hypotheses could explain the observation that genes of more complex organisms tend to undergo more alternative splicing. On one hand, alternative splicing could be adaptive since it provides the functional diversity required for complexity. On the other hand, increased rates of alternative splicing could result through nonadaptive processes since more complex organisms tend to have smaller effective population sizes and are thus more prone to deleterious mutations resulting in more spurious splicing events (drift-barrier hypothesis). To evaluate the latter, Bénitière et al. analyzed transcriptome sequencing data across 53 metazoan species. They show that proxies for effective population size and alternative splicing rates are negatively correlated. Furthermore, the authors find that rare, nonfunctional (and likely erroneous) isoforms occur more frequently in more complex species. Additionally, they show evidence that the strength of selection on splice sites increases with increasing effective population size and that the abundance of rare splice variants decreases with increased gene expression. All of these findings are consistent with the drift-barrier hypothesis.

      This study conducts a comprehensive set of separate analyses that all converge on the same overall result and the manuscript is well organized. Furthermore, this study is useful in that it provides a modified null hypothesis that can be used for future tests of adaptive explanations for variation in alternative splicing.

      Strengths:

      The major strength of this study lies in its complementary approach combining comparative and population genomics. Comparing evolutionary trends across phylogenetic diversity is a powerful way to test hypotheses about the origins of genome complexity. This approach alone reveals several convincing lines of evidence in support of the drift-barrier hypothesis. However, the authors also provide evidence from a population genetics perspective (using resequencing data for humans and fruit flies), making results even more convincing.

      The authors are forward about the study's limitations and explain them in detail. They elaborate on possible confounding factors as well as the issues with data quality (e.g. proxies for Ne, inadequacies of short reads, heterogeneity in RNA-sequencing data).

      Weaknesses:

      The authors primarily consider insects and mammals in their study. This only represents a small fraction of metazoan diversity. Sampling from a greater diversity of metazoan lineages would make these results and their relevance to broader metazoans substantially more convincing. Although the authors are careful about their tone, it is challenging to reconcile these results with trends across greater metazoans when the underlying dataset exhibits ascertainment bias and represents samples from only a few phylogenetic groups. Relatedly, some trends (such as Figure 1B-C) seem to be driven primarily by non-insect species, raising the question of whether some results may be primarily explained by specific phylogenetic groups ( although the authors do correct for phylogeny in their statistics). How might results look if insects and mammals (or vertebrates) are considered independently?

      Following the referee’s suggestion, we investigated the relationship between AS rate and proxies of Ne, separately for insects and vertebrates (Supplementary Fig. 11) . We observed that the relationship was consistent in vertebrates and insects: linear regressions show a positive correlation, significant (p<0.05) in all cases, except for body length in vertebrates. We added a sentence (line 166) to mention this point.

      Note that for these analyses we have smaller sample sizes, so we have a weaker power to detect signal. We therefore prefer to present the combined analyses, using PGLS to account for phylogenetic inertia.

      Throughout the manuscript, the authors refer to infrequently spliced ( mode <5%) introns as "minor introns" and frequently spliced (mode >95%) as "major introns". This is extremely confusing since "minor introns" typically represent introns spliced by the U12 spliceosome, whereas "major introns" are those spliced by the U2 spliceosome.

      To avoid any confusion, we modified the terminology: we now refer to infrequently spliced introns as " minor-isoform introns" and frequently spliced as "major -isoform introns" (see line 135-137) . The entire manuscript (including the figures) has been modified accordingly.

      Furthermore, it remains unclear whether the study only considers major introns or both major and minor introns. Minor introns typically have AT-AC splice sites whereas major introns usually have GT/GC-AG splice sites, although in rare cases the U2 can recognize AT-AC (see Wu and Krainer 1997 for example).

      We modified the text (line 148-150) to clearly state that we studied all introns, both U2-type and U12-type.

      The authors also note that some introns show noncanonical AT-AC splice sites while these are actually canonical splice sites for minor introns.

      This is corrected (line 148).

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      Figures 1, 3, and 4: I suggest that authors add regression lines.

      We added the regression lines with the “pgls” function from the R package “caper” (in Fig. 1, 3 and 4, and also in all other figures where we present correlations).

      Figure 2: As previously mentioned, the terms "minor introns" and "major introns" are extremely confusing. I strongly suggest the authors use different naming conventions.

      We changed the terminology:

      minor introns -> minor-isoform introns

      major introns -> major-isoform introns

      Figure 5: Intron-exon boundaries and splice site annotations are shown at the bottom of B, C, and D but not A. I suggest removing the annotation beneath B for consistency and since A+C and B+D are aligned on the x-axis.

      Corrected, it was a mistake.

      Figure 7: The yellow dotted line is very challenging to see in A.

      Corrected, the line has been widened.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review):

      This manuscript presents a pipeline incorporating a deep generative model and peptide property predictors for the de novo design of peptide sequences with dual antimicrobial/antiviral functions. The authors synthesized and experimentally validated three peptides designed by the pipeline, demonstrating antimicrobial and antiviral activities, with one leading peptide exhibiting antimicrobial efficacy in animal models. However, the manuscript as it stands, has several major limitations on the computational side.

      Thanks for your comments. 

      Major issues:

      (1) The choice of GAN as the generative model. There are multiple deep generative frameworks (e.g., language models, VAEs, and diffusion models), and GANs are known for their training difficulty and mode collapse. Could the authors elaborate on the specific rationale behind choosing GANs for this task?

      We thank the reviewer for his/her concern on GAN models. We agree that there are some limitations of GAN itself such as its training difficulty, but we cannot deny its potential in generating biological sequences, especially in AMP generation. GAN and VAE are the two most commonly used generative models in the field of AMP design (Curr Opin Struct Biol 2023, 83:102733). AMPGAN (J Chem Inf Model, 2021, 61, 2198-2207.), Multi-CGAN (J Chem Inf Model 2024, 64, 1, 316–326), PepGAN (ACS Omega, 2020, 5, 22847-22851) and others have verified its application ability on peptide design. Moreover, PandoraGAN (Sn Comput Sci 2023, 4, 607) is one of the few works on AVP generation which is also based on GAN architecture. GAN updates the generator weights on the backpropagation from the discriminator directly rather than manually defined complicated loss function, which alleviates the reliance on input data. Our current results demonstrated that the trained GAN generator could produce novel sequences that featured high antimicrobial activity, both validated in silico and in vitro

      (2) The pipeline is supposed to generate peptides showing dual properties. Why were antiviral peptides not used to train the GAN? Would adding antiviral peptides into the training lead to a higher chance of getting antiviral generations?

      A major mechanism of antimicrobial peptides is to disrupt cell membranes. Thus, some antimicrobial peptides are reported with broad-spectrum antibacterial and antiviral activities, since the virus shares a membrane structure with bacteria, especially the enveloped viruses. In APD3 database, 244 of 3940 AMPs are labeled with antiviral activities. In constrast, most reported antiviral peptides inhibit the viruses by binding to specific targets (proteins and nucleic acids) related to viral proliferation so that they may not have antibacterial effects. Therefore, we trained the GAN with the AMP dataset. We chose this AMP dataset mainly for AMPredictor (with detailed logMIC label against E.coli) and then used the same dataset to train a GAN for simplification. 

      In the revised manuscript, we also tested adding available antiviral peptides from AVPdb to train the GAN model. The number of AVPs is 1,788 after removing overlaps with used AMP dataset. The GAN architecture and hyperparameters remain the same. After generating a batch of sequences with this trained generator, we scored them by AMPredictor and filtered them with five AVP classifiers. As expected, the predicted MIC values shifted to higher performance with 17 sequences < 5 μM and 39 sequences < 10 uM, and previous numbers are 26 and 42 in the manuscript. Among 39 sequences < 10 μM, 13 passed all five AVP classifiers and 17 passed four (33.3% and 43.6%, respectively). Previous ratios are 40.5% and 35.7% (17 and 15 out of 42). Two generators perform roughly the same for generating AVPs (76.9% vs. 76.1%) as evaluated by our rules (4 or more positives), but the generator trained solely with AMPs provided more AVPs with higher possibility (5 positives).

      We also experimentally tested dozens of generated peptides from two versions of generators (v1 for training solely on AMPs, v2 for training with AVPs, Figure 2 in revised manuscript). The ‘antiviral’ feature of a peptide was checked when significant inhibition was observed in immunofluorescence assays against HSV-1 at the concentration of 10 µM. Six and seven antiviral peptides were found out of 12 tested peptides from generators v1 and v2, respectively. Therefore, the success rates for two versions of generators are about 60% (including three reported peptides in the original manuscript) and show no significant difference.

      (3) For the antimicrobial peptide predictor, where were the contact maps of peptides sourced from?

      The contact maps of AMPs were predicted from ESM, which were obtained at the same time when obtaining the ESM embeddings (Methods section, Page 24, Line 538: Pretrained language model esm1b_t33_650M_UR50S was used to provide the embeddings and the contact maps.)

      (4) Morgan fingerprint can be used to generate amino acid features. Would it be better to concatenate ESM features with amino acid-level fingerprints and use them as node features of GNN?

      We thank the reviewer for this suggestion. We test using ESM and fingerprint (FP) features on graph nodes and the result is shown in Author response table 1. AMPredictor (ESM on nodes, FP after GNN) still performed slightly better than concatenating FP on node features on four regression metrics. 

      Author response table 1.

      Results of AMPredictor with fingerprint on nodes 

      (5) Although the number of labeled antiviral peptides may be limited, the input features (ESM embeddings) should be predictive enough when coupled with shallow neural networks. Have the authors tried simple GNNs on antiviral prediction and compared the prediction performance to those of existing tools?

      We thank the reviewer for his/her suggestion on AVP predictions. We haven’t tried it. An important reason is that we focused on developing regressors instead of binary classifiers. Currently available AVP data with numerical labels did not support training a reliable regressor, for their limited amount as well as heterogenous virus target and experimental assay. Therefore, we decided to use reported AVP classifiers as an additional filter following AMPredictor. Since only using one classifier may lead to bias, we chose five AVP classifiers as ensemble votes. 

      (6) Instead of using global alignment to get match scores, the authors should use local alignment.

      We calculated the match scores by global alignment methods referred to AMPGAN v2 (J Chem Inf Model 2021, 61, 2198−2207), CLaSS (Nat Biomed Eng 2021 5, 613–623), and AMPTrans-lstm (Comput Struct Biotechnol J 2022, 21, 463-471), to check the similarity between the generated sequences and any sequences in the training set. In addition, we also used local alignment to check the novelty of peptides (regarding the next question). 

      (7) How novel are the validated peptides? The authors should run a sequence alignment to get the most similar known AMP for each validated peptide, and analyze whether they are similar.

      We have listed the most similar AMP segments to our generated peptides from the training set and DRAMP database (28,233 sequences after filtering out those containing irregular characters). BLAST parameters were set as CLaSS (Nat Biomed Eng 2021 5, 613–623) for short peptides. The lowest Evalue of P001 aligned with the training set is 1.2, and no hits were found for P001 with DRAMP. Two E-values of P002 are 1.4 and 0.46. P076 had no hits in the training set and got a high E-value of 7.0 with DRAMP. Detailed alignments are shown below. This result indicates that our three validated AMPs are novel. 

      Since we generated more sequences using two versions of generator for validation, we also checked the BLAST E-value of these validated peptides. The results are listed in Table S3. All sequences obtained E-values > 0.1 and some of them had no hits when aligned with the training set or the DRAMP database. 

      Author response image 1.

      Alignments of three validated peptides.

      (8) Only three peptides were synthesized and experimentally validated. This is too few and unacceptable in this field currently. The standard is to synthesize and characterize several dozens of peptides at the very least to have a robust study.

      We thank the reviewer for the suggestion and promoted our models to generate >10 times more peptides in the revised manuscript. We have synthesized and tested more peptides in vitro and added these results in the revised manuscript (Figure 2). From two versions of generators (trained with or without AVPs), we selected 24 peptides in total for antibacterial and antiviral validations. All 24 peptides showed antibacterial activity towards at least bacterial strain, and 13 peptides were screened out through the quick antiviral test. This result indicates the capability of our design method for bifunctional AMPs with a notable success rate (60%).

      Reviewer #2 (Public Review):

      Summary:

      This study marks a noteworthy advance in the targeted design of AMPs, leveraging a pioneering deeplearning framework to generate potent bifunctional peptides with specificity against both bacteria and viruses. The introduction of a GAN for generation and a GCN-based AMPredictor for MIC predictions is methodologically robust and a major stride in computational biology. Experimental validation in vitro and in animal models, notably with the highly potent P076 against a multidrug-resistant bacterium and P002's broad-spectrum viral inhibition, underpins the strength of their evidence. The findings are significant, showcasing not just promising therapeutic candidates, but also demonstrating a replicable means to rapidly develop new antimicrobials against the threat of drug-resistant pathogens.

      Strengths:

      The de novo AMP design framework combines a generative adversarial network (GAN) with an AMP predictor (AMPredictor), which is a novel approach in the field. The integration of deep generative models and graph-encoding activity regressors for discovering bifunctional AMPs is cutting-edge and addresses the need for new antimicrobial agents against drug-resistant pathogens. The in vitro and in vivo experimental validations of the AMPs provide strong evidence to support the computational predictions. The successful inhibition of a spectrum of pathogens in vitro and in animal models gives credibility to the claims. The discovery of effective peptides, such as P076, which demonstrates potent bactericidal activity against multidrug-resistant A. baumannii with low cytotoxicity, is noteworthy. This could have far-reaching implications for addressing antibiotic resistance. The demonstrated activity of the peptides against both bacterial and viral pathogens suggests that the discovered AMPs have a wide therapeutic potential and could be effective against a range of pathogens.

      We thank the reviewer for the comments.

      Reviewer #3 (Public Review):

      Summary:

      Dong et al. described a deep learning-based framework of antimicrobial (AMP) generator and regressor to design and rank de novo antimicrobial peptides (AMPs). For generated AMPs, they predicted their minimum inhibitory concentration (MIC) using a model that combines the Morgan fingerprint, contact map, and ESM language model. For their selected AMPs based on predicted MIC, they also use a combination of antiviral peptide (AVP) prediction models to select AMPs with potential antiviral activity. They experimentally validated 3 candidates for antimicrobial activity against S. aureus, A. baumannii, E. coli, and P. aeruginosa, and their toxicity on mouse blood and three human cell lines. The authors select their most promising AMP (P076) for in vivo experiments in A. baumannii-infected mice. They finally test the antiviral activity of their 3 AMPs against viruses.

      Strengths:

      -The development of de novo antimicrobial peptides (AMPs) with the novelty of being bifunctional (antimicrobial and antiviral activity).

      -Novel, combined approach to AMP activity prediction from their amino acid sequence.

      Weaknesses:

      (1) I missed justification on why training AMPs without information of their antiviral activity would generate AMPs that could also have antiviral activity with such high frequency (32 out of 104).

      Thanks for your inquiry. A major mechanism of antimicrobial peptides is to disrupt cell membranes. Thus, some antimicrobial peptides are reported with broad-spectrum antibacterial and antiviral activities, since the virus shares a membrane structure with bacteria, especially the enveloped viruses. In APD3 database, 244 of 3940 AMPs are labeled with antiviral activities. However, several reported antiviral peptides inhibit the viruses by binding to specific targets (proteins and nucleic acids) related to viral proliferation so that they may not have antibacterial effects. Therefore, we trained the GAN with the AMP dataset. We chose this AMP dataset mainly for AMPredictor (with detailed logMIC label against E.coli) and then used the same dataset to train a GAN for simplification. In addition, it’s not 32 antiviral candidates out of 104 but 32 out of 42 peptides with predicted MIC < 10 µM because we did the filtering process stepwise. 

      In revision, we also tested adding available antiviral peptides from AVPdb to train the GAN model (generator v2). The number of AVPs is 1,788 after removing overlaps with used AMP dataset. The GAN architecture and hyperparameters remain the same. We used generator v2 to obtain a batch of sequences and screened out bifunctional candidates following the same procedure. 30 out of 39 peptides with predicted MIC < 10 µM passed four or five AVP predictors. Therefore, two generators perform roughly the same for generating AVP candidates (76.9% vs. 76.1%). 

      (2) The justification for AMP predictor advantages over previous tools lacks rationale, comparison with previous tools (e.g., with the very successful AMP prediction approach described by Ma et al. 10.1038/s41587-022-01226-0), and proper referencing.

      Thanks for your suggestion. Ma et al. proposed ensemble binary classification models to mine AMPs from metagenomes successfully. However, we concentrated on the development of regression models. As a regressor, AMPredictor predicts the specific logMIC value of the input sequences instead of giving a yes/no answer. Since the training settings and evaluation metrics are different for the classification and regression tasks, we could not compare AMPredictor with Ma et al. directly. Instead, we compared the performance of AMPredictor with some regression baseline models (Figure S2a) and our model outperformed them. 

      (3) Experimental validation of three de novo AMPs is a very low number compared to recent similar studies.

      Thanks for pointing out this shortcoming. We have synthesized and tested more peptides in vitro and added these results in the revised manuscript (Figure 2). From two versions of generators (trained with or without AVPs), we selected 24 peptides in total for antibacterial and antiviral validations. All 24 peptides showed antibacterial activity towards at least bacterial strain, and 13 peptides were screened out through the quick antiviral test. This result indicates the capability of our design method for bifunctional AMPs with a notable success rate (60%).

      (4) I have concerns regarding the in vivo experiments including i) the short period of reported survival compared to recent studies (0.1038/s41587-022-01226-0, 10.1016/j.chom.2023.07.001, 0.1038/s41551-022-00991-2) and ii) although in Figure 2 f and g statistics have been provided, log scale y-axis would provide a better comparative representation of different conditions.

      Thank you for your suggestions. 

      i) In current study, we monitored the survival of mice with peritoneal bacterial infection for 48 h.

      Because abdominal bacterial infection can induce severe sepsis and cause mouse death within 40 h (Sci Adv 2019, 5(7), eaax1946), the 48 h is sufficient to evaluate the therapeutic efficacy of antimicrobial peptides (Nat Biotechnol 2019, 37(10), 1186-1197).

      ii) In Figure 2f and 2g (3f and 3g in the revised manuscript), the y-axis has already been in log-scale and tick labels are marked in scientific notation.

      (5) I had difficulty reading the story because of the use of acronyms without referring to their full name for the first time, and incomplete annotation in figures and captions.

      Thank you for pointing this. We have checked the manuscript carefully and modified the figure captions during revision. 

      Reviewer #2 (Recommendations For The Authors):

      (1) To validate the generalizability of the model, it would be prudent to include data on AMPs targeting a broader range of bacteria and viruses. This could help ensure that the peptides designed are not narrowly focused on E. coli but are effective against a more extensive set of pathogens. 

      Thanks for your suggestions. We just incorporated AMPs with E. coli activity labels since it is the most common strain among available AMP databases. As for a regressive model (AMPredictor), the fitting object should be defined concisely, which means limited targeting bacteria. Some other articles also focused on E. coli labels as well (Nat Commun 2023, 14, 7197; mSystems 2023, 8, e0034523). 

      We used the same processed dataset to train the GAN generator for simplification. Most reported AMPs have the potential to target various microbes. We have counted the antimicrobial labels of these peptides in our dataset, shown in Figure S1b. In addition to E. coli, some of the peptides target Grampositive S. aureus, fungus C. albicans, and other bacterial species as well. Our experimental validation also reveals the wide spectrum of designed peptides inhibiting Gram-negative, Gram-positive, drugresistant bacteria, and enveloped viruses. With the expansion of well-curated AMP databases, we expect to update the model with larger scale datasets in the near future. 

      (2) Conduct sensitivity analyses to understand how minor changes in the peptide sequences impact the model’s predictions. This will reduce the chances of overlooking potential AMP candidates due to the model’s inability to capture subtle changes.

      Thank you for this valuable suggestion. We kept similar known peptide sequences in the training sets regarding that a single mutation may have an impact on their antimicrobial performances. We took P001 as an example to perform the sensitivity analysis by site saturation mutagenesis in silico. Author response image 2 represents the change of antimicrobial activity scores as predicted by AMPredictor. Since the predicted MIC of P001 is 0.949 µM (experimentally measured value is 0.80 µM), most single mutations lead to higher scores (i.e., worse performance), especially Asp (D) and Glu (E) residues with negative charges. The largest change value of single amino acid replacement is 25.51 (W6D). Although this value may not reflect the actual changes, it is enough to be distinguished when screening and ranking candidate sequences.

      Author response image 2.

      Site saturated mutagenesis of P001. Color shows the change of predicted MIC against E. coli as predicted by AMPredictor (lower score is better).

      (3) Given the relatively short length of the peptides, typically ranging from 10 to 20 residues, the authors might consider employing a fully-connected graph in the peptide’s graphical representation. This approach could potentially simplify the model without sacrificing the descriptive power due to the limited size of the peptides.

      Thanks for your suggestions. We tested fully-connected graph edge encodings and the results on the test set were shown in Author response table 2 below. We found that AMPredictor with peptide contact map still performed better on Pearson correlation coefficient and CI, while using fully-connected graphs reached a slightly improved RMSE and MSE. Nonetheless, using fully-connected graph demands about 10time memory and more computational costs when processing more complicated message-passing. Therefore, the involvement of structural information is still a preferred choice.

      Author response table 2.

      Results of AMPredictor with different graph edge encodings

      (4) Upon reviewing Table S1, it is apparent that the application of ESM embeddings alone achieves commendable prediction accuracy. It would be intriguing to investigate whether the adoption of the more recent ESM models-specifically the second-generation ESM2 t36_3B, t48_15B, and t33_650Mcould enhance model performance beyond that observed using the esm1b_t33_650M_UR50S model described in the manuscript. 

      Thanks for your suggestions. Here, we included various ESM2 models’ outputs as our node features and presented the results in Author response table 3. Notably, the dimensions of esm2_t36_3B and esm2_t48_15B are 2560 and 5120, respectively, while both esm2_t33_650M and esm1b_t33_650M are 1280 dimensions. 

      Interestingly, we found that larger models don’t lead to improved performance. ESM-1b version still holds the best metrics in RMSE, MSE, and Pearson correlation coefficient. This indicates that the choice of pretrained model versions depended on specific downstream tasks. 

      Author response table 3.

      Results of AMPredictor with different ESM versions

      (5) It may be pertinent to reevaluate the use of the MM-PBSA approach within the scope of this study. Typically, MM-PBSA is utilized to estimate the free energy differences between the bound and unbound states of solvated molecules. The application of MM-PBSA is to calculate binding energies between proteins and membranes is unconventional and infrequently documented in the literature. Therefore, it is recommended that the authors consider omitting this portion of the manuscript, or provide a robust justification for its inclusion and application in this context.

      Thanks for your comments on MM/PBSA methods. There have been several literatures using this approach to calculate peptide-membrane binding free energy (Langmuir 2016, 32, 1782-1790; J Cell Biochem 2018, 119, 9205-9216; J Chem Inf Model 2019, 59, 3262-3276; Molecular Therapy Oncolytics 2019, 16, 7-19; Microbiology Spectrum 2023, 11, e0320622; J Chem Inf Model 2023, 63, 5823-5833) and we referred to their settings, such as the dielectric constant. All of these works built similar all-atom systems including cationic antimicrobial peptides and membrane bilayers, and utilized MM/PBSA method to describe the absorption process of the peptide from an unbound initial state. The order of magnitude of our calculation results is consistent with other reported works. Additionally, computational results may provide supporting evidence and we discussed that this quantitative energy calculation should be considered along with other observed metrics. 

      Reviewer #3 (Recommendations For The Authors):

      The weaknesses I mentioned in the Public Review may be addressed by improving the writing and presentation and corrections to the text and figures.

      Thanks for your suggestion. We have carefully checked and improved the presentation of text and figures in the revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This is an excellent study by a superb investigator who discovered and is championing the field of migrasomes. This study contains a hidden "gem" - the induction of migrasomes by hypotonicity and how that happens. In summary, an outstanding fundamental phenomenon (migrasomes) en route to becoming transitionally highly significant.

      Strengths:

      Innovative approach at several levels. Migrasomes - discovered by Dr Yu's group - are an outstanding biological phenomenon of fundamental interest and now of potentially practical value.

      Weaknesses:

      I feel that the overemphasis on practical aspects (vaccine), however important, eclipses some of the fundamental aspects that may be just as important and actually more interesting. If this can be expanded, the study would be outstanding.

      We sincerely thank the reviewer for the encouraging and insightful comments. We fully agree that the fundamental aspects of migrasome biology are of great importance and deserve deeper exploration.

      In line with the reviewer’s suggestion, we have expanded our discussion on the basic biology of engineered migrasomes (eMigs). A recent study by the Okochi group at the Tokyo Institute of Technology demonstrated that hypoosmotic stress induces the formation of migrasome-like vesicles, involving cytoplasmic influx and requiring cholesterol for their formation (DOI: 10.1002/1873-3468.14816, February 2024). Building on this, our study provides a detailed characterization of hypoosmotic stressinduced eMig formation, and further compares the biophysical properties of natural migrasomes and eMigs. Notably, the inherent stability of eMigs makes them particularly promising as a vaccine platform.

      Finally, we would like to note that our laboratory continues to investigate multiple aspects of migrasome biology. In collaboration with our colleagues, we recently completed a study elucidating the mechanical forces involved in migrasome formation (DOI: 10.1016/j.bpj.2024.12.029), which further complements the findings presented here.

      Reviewer #2 (Public review):

      Summary:

      The authors' report describes a novel vaccine platform derived from a newly discovered organelle called a migrasome. First, the authors address a technical hurdle in using migrasomes as a vaccine platform. Natural migrasome formation occurs at low levels and is labor intensive, however, by understanding the molecular underpinning of migrasome formation, the authors have designed a method to make engineered migrasomes from cultured, cells at higher yields utilizing a robust process. These engineered migrasomes behave like natural migrasomes. Next, the authors immunized mice with migrasomes that either expressed a model peptide or the SARSCoV-2 spike protein. Antibodies against the spike protein were raised that could be boosted by a 2nd vaccination and these antibodies were functional as assessed by an in vitro pseudoviral assay. This new vaccine platform has the potential to overcome obstacles such as cold chain issues for vaccines like messenger RNA that require very stringent storage conditions.

      Strengths:

      The authors present very robust studies detailing the biology behind migrasome formation and this fundamental understanding was used to form engineered migrasomes, which makes it possible to utilize migrasomes as a vaccine platform. The characterization of engineered migrasomes is thorough and establishes comparability with naturally occurring migrasomes. The biophysical characterization of the migrasomes is well done including thermal stability and characterization of the particle size (important characterizations for a good vaccine).

      Weaknesses:

      With a new vaccine platform technology, it would be nice to compare them head-tohead against a proven technology. The authors would improve the manuscript if they made some comparisons to other vaccine platforms such as a SARS-CoV-2 mRNA vaccine or even an adjuvanted recombinant spike protein. This would demonstrate a migrasome-based vaccine could elicit responses comparable to a proven vaccine technology. 

      We thank the reviewer for the thoughtful evaluation and constructive suggestions, which have helped us strengthen the manuscript. 

      Comparison with proven vaccine technologies:

      In response to the reviewer’s comment, we now include a direct comparison of the antibody responses elicited by eMig-Spike and a conventional recombinant S1 protein vaccine formulated with Alum. As shown in the revised manuscript (Author response image 1), the levels of S1-specific IgG induced by the eMig-based platform were comparable to those induced by the S1+Alum formulation. This comparison supports the potential of eMigs as a competitive alternative to established vaccine platforms. 

      Author response image 1.

      eMigrasome-based vaccination showed similar efficacy compared with adjuvanted recombinant spike protein The amount of S1-specific IgG in mouse serum was quantified by ELISA on day 14 after immunization. Mice were either intraperitoneally (i.p.) immunized with recombinant Alum/S1 or intravenously (i.v.) immunized with eM-NC, eM-S or recombinant S1. The administered doses were 20 µg/mouse for eMigrasomes, 10 µg/mouse (i.v.) or 50 µg/mouse (i.p.) for recombinant S1 and 50 µl/mouse for Aluminium adjuvant.

      Assessment of antigen integrity on migrasomes:

      To address the reviewer’s suggestion regarding antigen integrity, we performed immunoblotting using antibodies against both S1 and mCherry. Two distinct bands were observed: one at the expected molecular weight of the S-mCherry fusion protein, and a higher molecular weight band that may represent oligomerized or higher-order forms of the Spike protein (Figure 5b in the revised manuscript).

      Furthermore, we performed confocal microscopy using a monoclonal antibody against Spike (anti-S). Co-localization analysis revealed strong overlap between the mCherry fluorescence and anti-Spike staining, confirming the proper presentation and surface localization of intact S-mCherry fusion protein on eMigs (Figure 5c in the revised manuscript). These results confirm the structural integrity and antigenic fidelity of the Spike protein expressed on eMigs.

      Recommendations for the authors

      Reviewer #1 (Recommendations For The Authors):

      I feel that the overemphasis on practical aspects (vaccine), however important, eclipses some of the fundamental aspects that may be just as important and actually more interesting. If this can be expanded, the study would be outstanding.

      I know that the reviewers always ask for more, and this is not the case here. Can the abstract and title be changed to emphasize the science behind migrasome formation, and possibly add a few more fundamental aspects on how hypotonic shock induces migrasomes?

      Alternatively, if the authors desire to maintain the emphasis on vaccines, can immunological mechanisms be somewhat expanded in order to - at least to some extent - explain why migrasomes are a better vaccine vehicle?

      One way or another, this reviewer is highly supportive of this study and it is really up to the authors and the editor to decide whether my comments are of use or not.

      My recommendation is to go ahead with publishing after some adjustments as per above.

      We’d like to thank the reviewer for the suggestion. We have changed the title of the manuscript and modified the abstract, emphasizing the fundamental science behind the development of eMigrasome. To gain some immunological information on eMig illucidated antibody responses, we characterized the type of IgG induced by eM-OVA in mice, and compared it to that induced by Alum/OVA. The IgG response to Alum/OVA was dominated by IgG1. Quite differently, eM-OVA induced an even distribution of IgG subtypes, including IgG1, IgG2b, IgG2c, and IgG3 (Figure 4i in the revised manuscript). The ratio between IgG1 and IgG2a/c indicates a Th1 or Th2 type humoral immune response. Thus, eM-OVA immunization induces a balance of Th1/Th2 immune responses.

      Reviewer #2 (Recommendations For The Authors):

      The study is a very nice exploration of a new vaccine platform. This reviewer believes that a more head-to-head comparison to the current vaccine SARS-CoV-2 vaccine platform would improve the manuscript. This comparison is done with OVA antigen, but this model antigen is not as exciting as a functional head-to-head with a SARS-CoV-2 vaccine.

      I think that two other discussion points should be included in the manuscript. First, was the host-cell protein evaluated? If not, I would include that point on how issues of host cell contamination of the migrasome could play a role in the responses and safety of a vaccine. Second, I would discuss antigen incorporation and localization into the platform. For example, the full-length spike being expressed has a native signal peptide and transmembrane domain. The authors point out that a transmembrane domain can be added to display an antigen that does not have one natively expressed, however, without a signal peptide this would not be secreted and localized properly. I would suggest adding a discussion of how a non-native signal peptide would be necessary in addition to a transmembrane domain.

      We thank the reviewer for these thoughtful suggestions and fully agree that the points raised are important for the translational development of eMig-based vaccines.

      (1) Host cell proteins and potential immunogenicity:

      We appreciate the reviewer’s suggestion to consider host cell protein contamination. Considering potential clinical application of eMigrasomes in the future, we will use human cells with low immunogenicity such as HEK-293 or embryonic stem cells (ESCs) to generate eMigrasomes. Also, we will follow a QC that meets the standard of validated EV-based vaccination techniques. 

      (2) Antigen incorporation and localization—signal peptide and transmembrane domain:

      We also agree with the reviewer’s point that proper surface display of antigens on eMigs requires both a transmembrane domain and a signal peptide for correct trafficking and membrane anchoring. For instance, in the case of full-length Spike protein, the native signal peptide and transmembrane domain ensure proper localization to the plasma membrane and subsequent incorporation into eMigs. In case of OVA, a secretary protein that contains a native signal peptide yet lacks a transmembrane domain, an engineered transmembrane domain is required. For antigens that do not naturally contain these features, both a non-native signal peptide and an artificial transmembrane domain are necessary. We have clarified this point in the revised discussion and explicitly noted the requirement for a signal peptide when engineering antigens for surface display on migrasomes.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The manuscript by Dubicka and co-workers on calcification in miliolid foraminifera presents an interesting piece of work. The study uses confocal and electron microscopy to show that the traditional picture of calcification in porcelaneous foraminifera is incorrect.

      Strengths:

      The authors present high-quality images and an original approach to a relatively solid (so I thought) model of calcification.

      Weaknesses:

      There are several major shortcomings. Despite the interesting subject and the wonderful images, the conclusions of this manuscript are simply not supported at all by the results. The fluorescent images may not have any relation to the process of calcification and should therefore not be part of this manuscript. The SEM images, however, do point to an outdated idea of miliolid calcification. I think the manuscript would be much stronger with the focus on the SEM images and with the speculation of the physiological processes greatly reduced.

      We agree that fluorescence studies presented in the paper are not an unequivocal proof by itself for calcification model utilised by studied Miliolida species. However, fluorescence data combined with SEM studies, especially overlap of the elements that show autofluorescence upon excitation at 405 nm (emission 420–480 nm) and acidic vesicles marked by p_H-_sensitive LysoGlow84, may be a hint indicating ACC-bearing vesicles.

      We will tone down the the physiological interpretation based on fluorescence studies in the revised version of the manuscript.

      Nevertheless, we think that our fluorescent life-imaging experiments provides important observations in miliolida, which is scarce in the existing literature, and therefore are worth being presented as they might be very helpful in better understanding of full calcification model in the future.

      Reviewer #2 (Public Review):

      Summary:

      Dubicka et al. in their paper entitled " Biocalcification in porcelaneous foraminifera" suggest that in contrast to the traditionally claimed two different modes of test calcification by rotallid and porcelaneous miliolid formaminifera, both groups produce calcareous tests via the intravesicular mineral precursors (Mg-rich amorphous calcium carbonate). These precursors are proposed to be supplied by endocytosed seawater and deposited in situ as mesocrystals formed at the site of new wall formation within the organic matrix. The authors did not observe the calcification of the needles within the transported vesicles, which challenges the previous model of miliolid mineralization. Although the authors argue that these two groups of foraminifera utilize the same calcification mechanism, they also suggest that these calcification pathways evolved independently in the Paleozoic.

      We do not argue that Miliolida and Rotallida utilize exactly the same calcification mechanism but the both groups use less divergent crystallization pathways, where mesocrystalline chamber walls are created by accumulating and assembling particles of pre-formed liquid amorphous mineral phase.

      Strengths:<br /> The authors document various unknown aspects of calcification of Pseudolachlanella eburnea and elucidate some poorly explained phenomena (e.g., translucent properties of the freshly formed test) however there are several problematic observations/interpretations which in my opinion should be carefully addressed.

      Weaknesses:

      (1) The authors (line 122) suggest that "characteristic autofluorescence indicates the carbonate content of the vesicles (Fig. S2), which are considered to be Mg-ACCs (amorphous MgCaCO3) (Fig. 2, Movies S4 and S5)". Figure S2 which the authors refer to shows only broken sections of organic sheath at different stages of mineralization. Movie S4 shows that only in a few regions some vesicles exhibit red autofluorescence interpreted as Mg-ACC (S5 is missing but probably the authors were referring to S3). In their previous paper (Dubicka et al 2023: Heliyon), the authors used exactly the same methodology to suggest that these are intracellularly formed Mg-rich amorphous calcium carbonate particles that transform into a stable mineral phase in rotaliid Aphistegina lessonii. However, in Figure 1D (Dubicka et al 2023) the apparently carbonate-loaded vesicles show the same red autofluorescence as the test, whereas in their current paper, no evidence of autofluorescence of Mg-ACC grains accumulated within the "gel-like" organic matrix is given. The S3 and S4 movies show circulation of various fluorescing components, but no initial phase of test formation is observable (numerous mineral grains embedded within the o rganic matrix - Figures 3A and B - should be clearly observed also as autofluorescence of the whole layer). Thus the crucial argument supporting the calcification model (Figure 5) is missing.

      This is correct that we did not observe the initial phase of test formation in vivo. Therefore, it is not our crucial argument supporting novel components of the new calcification model. We suspect that vesicles preparing and transporting Mg-ACC are produced way before their docking and deposition into the new wall, because such seawater vesicles were observed between the chamber formation stages (Goleń and Tyszka, 2024, personal communication based on independent experiments on a closely related miliolid taxon). It means that our in vivo experiments most likely represent a long, dynamic stage of vesicles formation via seawater endocytosis, their modification (incl. Mg-ACC formation) before the stage of exocytosis during the new chamber formation. Our crucial arguments supporting the calcification model come from the SEM imaging of the specimens fixed during chamber formation, as well as from the transparency of the new chamber wall during its progressive calcification.

      There is no support for the following interpretation (lines 199-203) "The existence of intracellular, vesicular intermediate amorphous phase (Mg-ACC pools), which supply successive doses of carbonate material to shell production, was supported by autofluorescence (excitation at 405 nm; Fig. 2; Movies S3 and S4; see Dubicka et al., 2023) and a high content of Ca and Mg quantified from the area of cytoplasm by SEM-EDS analysis (Fig. S6)."

      We used laser line 405nm and multiphoton excitaton to detect ACCs. These wavelengths (partly) permeate the shell to excite ACCs autofluorescence. The autofluorescence of the shells is present as well but not clearly visible in movieS4 as the fluorescence of ACCs is stronger. This may be related to the plane/section of the cell which is shown. The laser permeates the shell above the ACCs (short distance) but to excite the shell CaCO3 around foraminifera in the same three-dimensional section where ACCs are shown, the light must pass a thick CaCO3 area due to the three-dimensional structure of the foraminiferan shell. Therefore, the laser light intensity is reduced. In a revised version a movie/image with reduced threshold is shown.

      Author response image 1.

      Autofluorescence image of studied Miliolida species (exc. 405 nm) showing algal chlorophyll (blue) and CaCO3 (red), both ACC and calcite shell.

      It would be very convenient if it was possible to visualize ACC by illumination with a blacklight, but there are very many organic molecules that have an autofluorescence excited by ~405 nm. One of the examples is NADH (Lee et al., 2015. Kor J Physiol Pharmac 19(4): 373-382), an omnipresent molecule in any cell (couldn't copy the appropriate picture here, but the reference has a figure with the em/exc spectra).

      The paper of Lee et al. 2015 shows that the excitation spectrum of NADH is ending close to 400 nm. This means that NADH is not or only very weakly excitable at 405nm, what we used as the excitation laser line. 

      (2) The authors suggest that "no organic matter was detected between the needles of the porcelain structures (Figures 3E; 3E; S4C, and S5A)". Such a suggestion, which is highly unusual considering that biogenic minerals almost by definition contain various organic components, was made based only on FE-SEM observation. The authors should either provide clearcut evidence of the lack of organic matter (unlikely) or may suggest that intense calcium carbonate precipitation within organic matrix gel ultimately results in a decrease of the amount of the organic phase (but not its complete elimination), alike the pure calcium carbonate crystals are separated from the remaining liquid with impurities ("mother liquor"). On the other hand, if (249-250) "organic matrix involved in the biomineralization of foraminiferal shells may contain collagen-like networks", such "laminar" organization of the organic matrix may partly explain the arrangement of carbonate fibers parallel to the surface as observed in Fig. 3E1.

      We agree with the reviewer that biogenic minerals should by definition contain some organic components. We just wrote that "no organic matter was detected between the needles of the porcelain structures” that means that we did not detect any organic structures based only on our FE-SEM observations. We will rephrase this part of the text to avoid further confusion.

      (3) The author's observations indeed do not show the formation of individual skeletal crystallites within intracellular vesicles, however, do not explain either what is the structure of individual skeletal crystallites and how they are formed. Especially, what are the structures observed in polarized light (and interpreted as calcite crystallites) by De Nooijer et al. 2009? The author's explanation of the process (lines 213-216) is not particularly convincing "we suspect that the OM was removed from the test wall and recycled by the cell itself".

      Thank you for this comment. We will do our best to supplement our explanations. We are aware about the structures observed in polarized light by De Nooijer et al. (2009). However, Goleń et al. (2022, Prostist; + 2 other citations) showed that organic polymers may also exhibit light polarization. Additional experimental studies are needed to separate these types of polarization. We will try to investigate this issue in our future research.

      (4) The following passage (lines 296-304) which deals with the concept of mesocrystals is not supported by the authors' methodology or observations. The authors state that miliolid needles "assembled with calcite nanoparticles, are unique examples of biogenic mesocrystals (see Cölfen and Antonietti, 2005), forming distinct geometric shapes limited by planar crystalline faces" (later in the same passage the authors say that "mesocrystals are common biogenic components in the skeletons of marine organisms" (are they thus unique or are they common)? It is my suggestion to completely eliminate this concept here until various crystallographic details of the miliolid test formation are well documented.

      Our intension was to express that mesocrystals are common biogenic components in the skeletons of marine organisms however such a miliolid needles forming distinct geometric shapes limited by planar crystalline faces are unique.

      Reviewer #1 (Recommendations For The Authors):

      Below, I have summarized my main criticisms.

      (1) The movies S1-S4 do not indicate what is described. The videos show indeed seawater (S1), cell membranes (S2), and autofluorescence and acidic vesicles (S3 and S4). The presence of all these intracellular structures is not surprising: any eukaryotic cell will have those. The authors, however, claim that they participate in the process of calcification, which is simply not shown. One of the main arguments seems the presence of 'carbonate pools', in the caption these are even claimed to be 'Mg-ACC pools', but this is by no means revealed by an excitation of 405nm/ emission between 420 and 490 nm. It would be very convenient if it was possible to visualize ACC by illumination with a blacklight, but there are very many organic molecules that have an autofluorescence excited by ~405 nm. One of the examples is NADH (Lee et al., 2015. Kor J Physiol Pharmac 19(4): 373-382), an omnipresent molecule in any cell (couldn't copy the appropriate picture here, but the reference has a figure with the em/exc spectra).

      The paper of Lee et al. 2015 shows that the excitation spectrum of NADH is ending close to 400 nm. This means that NADH is not or only very weakly excitable at 405nm, what we used as the excitation laser line. 

      The fluorescence by this excitation/ emission couple unlikely indicates the vesicles in which these foraminifera calcify. Therefore, most of the interpretation of the authors on what happens with the calcitic needles is not based on results but remains pure speculation.

      The fluorescence autofluorescence upon excitation at 405 nm (emission 420–480 nm is typical for CaCO3 both for biocalcite and amorphous calcium carbonate, what was proven by laboratory synthesis of amorphous calcium carbonate (Dubicka et al., in preparation).

      (2) The results mention 'granules', which are the supposed Mg-ACC-containing vesicles, but the movies simply don't show any granules. Only fluorescence. Again, the results show a lot of vesicles with autofluorescence, but these are not necessarily related to calcification. Proof could be supplied by showing that the same fluorescent vesicles are 'used up' when the specimens under observation are making a new chamber, but until that is done, the fate of all these vesicles remains uncertain and once more, may not be involved in calcification at all.

      We suspect that vesicles preparing and transporting Mg-ACC are produced way before their docking and deposition into the new wall, because such seawater vesicles were observed between the chamber formation stages (Goleń and Tyszka, 2024, personal communication based on independent experiments on a closely related miliolid taxon). It means that our in vivo experiments most likely represent a long, dynamic stage of vesicles formation via seawater endocytosis, their modification (incl. Mg-ACC formation) before the stage of exocytosis during the new chamber formation. Our crucial arguments supporting the calcification model come from the SEM imaging of the specimens fixed during chamber formation, as well as from the transparency of the new chamber wall during its progressive calcification.

      (3) The Methods are unclear. How long were the foraminifers kept before being placed under the microscope? Were they fed with anything? This is important since the chlorophyll should not be from any food source. I didn't know that this foraminiferal species has photosynthetic symbionts: genera like Quinqueloculina don't. Is there any reference for this? Normally, I wouldn't care that much, but the authors find the presence of (facultative) symbionts important (lines 305-336). I am a bit suspicious about this since the only evidence for the presence of photosynthetic symbionts is because of the autofluorescence. As the authors said, commonly these miliolid species are regarded as symbiont-barren, so additional proof for these symbionts is necessary.

      We agree that additional proof is needed for the presence of photosynthetic symbionts. We rephrased the manuscript accordingly.

      (4) It is also unclear (Methods) at what stage the miliolids were photographed (Figure 3). How did chamber formation proceed, what was the timing of the photographs, etc. These pictures are to me the most interesting finding of this study, but need to be described much better.

      All individuals of living foraminifera were fixed at the overall stage of chamber formation. However, every individual presents a complete set of successive steps (substages) of chamber wall calcification fixed at once. Fig. 3A and B present nearly the most proximal (youngest) part of the new chamber with a thick wall of calcite nanograins within a gel-like organic matrix. Fig. 3C and D present a bit more distal (intermediate) part of the calcified chamber. Fig. 3E shows the most distal part of the new chamber. This part is anchored to the older, underlying solid calcified chamber (not shown in this figure). All these steps are synchronous, however, represent gradual successive stages of calcification. The main text and Figs 4 and 5 explain this phenomenon in details.

      There are many small issues with the text too. These include:

      Line 28/29: in many other groups, calcification is thought to be polyphyletic (e.g. sponges: Chombard et al., 1997. Biol Bull 193: 359-367).

      Corrected

      Line 29/30: there may be even more 'types of shells'. The first author has shown in earlier papers that nodosarids have a unique shell architecture. Spirillinids also seem to have their own way of calcification. It is unclear what is meant here by 'two contrasting models'.

      By now there are known only two models of foraminiferal calcification. Lagenida biocalcification has not been studied.

      Line 33: 'Both groups'? This paper only shows calcification in miliolids.

      However, we refer to previous study.

      Line 42: Perhaps, but there is no data on the pseudopodial network in this manuscript.

      We refer to Angell, 1980 studies

      Line 43: Likely, but that is not what this manuscript is showing.

      Line 42-44: The authors should make a choice and be clear. The point of this paper is that miliolids and rotalids calcify in ways that are actually not as different as they seemed previously. Still, they are said to have different 'chamber formation modes'. If they are calcifying in a similar way (which I think is not necessarily supported by the results), isn't calcification in these groups like variations on the same theme? How does this relate to the independent origins of calcification within these two groups?

      Our intension is to show that Miliolida and Rotaliida utilize less divergent calcification pathways, following the recently discovered biomineralization principles.

      Line 49-51: is this a well-established distinction? If so, please add a reference. If not: what is fundamentally different between B and C? Does only the size of the intracellular vesicle matter?

      Rephrased

      Line 60: please include a reference for the intracellular calcification by coccolithophores.

      Added

      Line 67: this is wrong. It is the alignment of the needles at the surface that makes them all reflect light in the same way and gives the shells a porcelaneous appearance. A close-up of the miliolid's shell surface shows this arrangement. Underneath this layer, the orientation of the needles is more random.

      We referred to Johan Hohenegger papers.

      Line 114: how else?

      Line 114-116: I don't see the relevance here. If seawater is taken up, the vesicle containing this seawater has to have a membrane around it. By definition. The text here ('These vesicles') suggests that Calcein and FM1-43 were combined (which they easily could have), but the methods describe that they are used successively.

      Yes, we used two dyes separately.

      Lines 122-130: I think the interpretation of this autofluorescence signal is wrong. Even if it was true, these lines belong to the Discussion.

      This paragraph has been placed within discussion

      Line 138: What are 'mobile clusters'? I don't see a relation between the location of the symbionts and the other vesicles (Figure 2).

      Line 147-148: How can an SEM image show the absence of organic matter?

      We meant the absence of the gel-like OM visible in the previous stages of the chamber formation

      Line 148: Should be 'Figs. 3E; 3E1; S4C'.

      Corrected

      Lines 143-150: this can be merged with the following paragraph.

      Done

      Lines 151-169: why is there no indication of the time? Figures 3 and 4 link the pictures in time to show the development of the growing chamber wall. However, neither here nor in the methods, is there any recording of the time after the beginning of chamber formation. Now, the images are linked (Figure 4) as if they were taken at regular intervals, but this is not documented.

      Lines 170-184: this should go to the Discussion.

      Done

      Line 193-195: this is likely, but not visible in Figure 1.

      It was visible by optical microscopy and described by Angell, 1980

      Line 199-201: I don't understand this: the fluorescent vesicles were not observed during chamber formation so any link between the SEM and CLSM scans remains pure speculation.

      Line 203-204: needed for what?

      For better documentation of Miliolid ACC-bearing granules

      Line 220: is this shown in any of the images? 

      Angell, 1980

      Line 230: It sounds nice, but I don't think a 'paradigm shift' is appropriate here. However interesting and important foraminiferal biomineralization is, the authors show that the crystals of miliolids are likely formed differently than previously thought. If this is a 'paradigm shift', then most scientific findings are.

      In our opinion this is definitely a shift of paradigm

      Line 231: I don't think anyone suggested miliolids and coccolithophores share 'the same' pathway. They are shown (cocco's) and thought (miliolids) to secrete their calcite intracellularly.

      Changed to similar, intracellular

      Line 258: References should only be to peer-reviewed studies.

      Line 430: Burgers'

      Corrected

      Reviewer #2 (Recommendations For The Authors):

      Please separate clearly the results (observations) from the discussion (interpretations): various interpretational/commentary phrases should be removed from the Results section to Discussion e.g., lines 124-130, 131-135.

      Interpretation have been separated from results as suggested by Reviewer.

      [line 49] " living cells have evolved three major skeleton crystallization pathways". I would rather say "organisms" not "cells" as the coordination of the calcification process in multicellular organisms clearly involves processes that are beyond the individual cell activity.

      Corrected

    1. Author response:

      The following is the authors’ response to the original reviews.

      Response to the Reviewer #1 (Public review):

      We greatly appreciate the reviewer’s high evaluation of our paper and helpful comments. As expected, we revealed that the CCL17/CCL22–CCR4 axes play an important role in guiding Tregs to the atherosclerotic aorta. Interestingly, we also demonstrated that these axes are critical for Treg-dependent regulation of proinflammatory T cell responses in lymphoid tissues and atherosclerotic aortas, which is a previously unrecognized role for CCR4 in regulating inflammatory immune responses. However, the role of the CCL17/CCL22–CCR4 axes in regulating inflammatory immune responses and atherosclerosis has not been fully elucidated and further investigation is needed.

      Response to the reviewer #2 (Public review):

      We greatly appreciate the reviewer’s high evaluation of our paper and helpful comments and suggestions. We isolated CD4<sup>+</sup>CD25<sup>+</sup> T cells and used them as Tregs in several experiments. As the reviewer pointed out, we realize that CD4<sup>+</sup>CD25<sup>+</sup> T cell population contains some activated effector T cells. However, in consideration of the high expression levels of the most reliable Treg marker Foxp3 in isolated CD4<sup>+</sup>CD25<sup>+</sup> T cells determined by flow cytometry, we believe that our method for separating Tregs would be acceptable.

      Regarding the role of Th17 cells in atherosclerosis, conflicting results have been reported. Therefore, it is unclear whether augmented Th17 cell immune responses contribute to accelerated atherosclerosis in Ccr4<sup>-/-</sup>Apoe<sup>-/-</sup> mice.

      As the reviewer pointed out, it is important to consider the clinical relevance of our findings. We analyzed public database to determine if Ccr4 single nucleotide polymorphisms correlate with a higher incidence of atherosclerotic cardiovascular disease. However, no evidence supporting the clinical relevance of our findings was found.

      Response to the Reviewer #3 (Public review):

      We greatly appreciate the reviewer’s high evaluation of our paper and helpful comments and suggestions. In accordance with the reviewer’s suggestion, we described the detailed methods and carefully performed data analysis regarding flow cytometry, which would strengthen the conclusion of this study.

      We understood the importance of reviewer’s claim that CCR4 deficiency does not shift the Th1 cell/Treg balance toward Th1 cell responses in all lymphoid tissues. CCR4 deficiency promoted the accumulation of Th1 cells but did not affect the accumulation of Tregs in the atherosclerotic aorta, which led to the shift of the Th1 cell/Treg balance toward Th1 cell responses. The frequencies of both Tregs and Th1 cells in peripheral lymphoid tissues were increased by CCR4 deficiency, while these CCR4-deficient Tregs exhibited impaired suppressive function. Given this, we speculate that CCR4 deficiency may shift the Th1 cell/Treg balance toward Th1 cell responses in peripheral lymphoid tissues. However, it is difficult to clearly show this. We revised the manuscript accordingly.

      Although the reviewer pointed out the possibility that modulation of the Th1 cell/Th17 cell balance might be responsible for the changes in aortic inflammatory cells in Ccr4<sup>-/-</sup>Apoe<sup>-/-</sup> mice, the role of Th17 cells in atherosclerosis remain controversial. However, we cannot completely exclude the possibility of the involvement of the Th17 response modulation in accelerated atherosclerosis in Ccr4<sup>-/-</sup>Apoe<sup>-/-</sup> mice.

      As the limitation of this study, the phenotypic heterogeneity and dynamics of aortic leukocytes could not be revealed by flow cytometric analysis. Single-cell proteomic and transcriptomic approaches would provide additional important information on various aortic cells including immune cells and vascular cells.

      Reviewer #1 (Recommendations for the authors):

      Issue (1) Ideally, CCR4 could be deleted on Foxp3+ cells and some staining on double positive Rorg+Foxp3+ done. On the other side, a whole gene expression of infiltrated Foxp3 and effector could be also helpful. More challenging, it would be important to see whether those CCR4-specific Trges could or not regulate effector infiltrating cells.

      As the reviewer suggested, single-cell proteomic and transcriptomic approaches would be helpful to reveal the phenotypic heterogeneity and dynamics of aortic leukocytes including Tregs. Also, the use of conditional knockout mice would reveal the precise role of CCR4-expressing Tregs in regulating aortic immune cell infiltration and atherosclerosis.

      Reviewer #2 (Recommendations for the authors):

      Minor Suggestions:

      Issue (1) In supplementary Figure 1, CCR4 expression would be better represented by dot plots rather than histograms.

      We revised Supplementary Figure 1A through 1C.

      Issue (2) The reduction in CD103 expression shown in Figure 2E at 8 weeks should be discussed.

      In Figure 2E, we found that the expression of CD103 in peripheral LN Tregs was slightly lower in 8-week-old Ccr4<sup>-/-</sup>Apoe<sup>-/-</sup> mice than in age-matched Apoe<sup>-/-</sup> mice, while there was no difference in its expression levels between 18-week-old Apoe<sup>-/-</sup> and Ccr4<sup>-/-</sup>Apoe<sup>-/-</sup> mice. In addition, there was no significant difference in the mRNA expression of this molecule in splenic Tregs between 8-week-old Apoe<sup>-/-</sup> and Ccr4<sup>-/-</sup>Apoe<sup>-/-</sup> mice. Based on the minor effect of CCR4 deficiency on CD103 expression in Tregs, reduced CD103 expression in Ccr4<sup>-/-</sup>Apoe<sup>-/-</sup> mice does not seem to be an important change.

      Issue (3) The increased expression of CD86 by DCs should be discussed.

      The upregulated CD86 expression on DCs in Ccr4<sup>-/-</sup>Apoe<sup>-/-</sup> mice might be explained by the data on a Treg-DC coculture experiment showing the impaired cell–cell contacts between CCR4-deficient Tregs and DCs. On the other hand, the expression of another important costimulatory molecule CD80 on DCs was not altered in these mice, which is not consistent with the data on the above coculture experiment. The reason why only CD86 expression on DCs was upregulated in Ccr4<sup>-/-</sup>Apoe<sup>-/-</sup> mice remains unclear.

      Issue (4) In Figures 5F-H, using larger dots would enhance visibility.

      We revised the graphs in Figure 5F-H.

      Issue (5) In Figure 5I, since the data is normalized, a one-sample t-test is more appropriate.

      In accordance with the reviewer’s suggestion, we reconsidered the data analysis. Because there was a dramatic difference in the absolute number of Kaede-expressing Tregs accumulated in the aorta among experiments, we were worried that the statistical analysis of the combined data from multiple experiments might draw a wrong conclusion. We have decided to show the representative data from 3 independent experiments in Figure 5I.

      Issue (6) On page 11, line 256, the text mentions IL4 and IL10 being detected by cytokine array; however, the figures do not show these cytokines.

      We are afraid that the reviewer might have misunderstood the data. The cytokine levels of IL-4 and IL-10 could not be detected by cytokine array analysis. Accordingly, we carefully revised the text in the manuscript.

      Issue (7). On page 14, lines 326-330, the text should be revised for clarity.

      We revised the text in the manuscript.

      Issue (8) Several data are marked as "not shown"; some of this information is relevant and should be included in the supplementary figures.

      We showed the data on CCL17 and CCL22 expression in peripheral LNs in Supplementary Figure 2.

      Major Suggestions:

      Issue (1) FoxP3 expression should be evaluated post-isolation of CD4<sup>+</sup>CD25<sup>+</sup> T cells, and FoxP3- CD4<sup>+</sup>CD25<sup>+</sup> T cells should be characterized. Tregs could be more effectively isolated using FoxP3eGFP mice.

      After isolation of CD4<sup>+</sup>CD25<sup>+</sup> T cells (the purity was >95%), we examined Foxp3 expression by flow cytometry and found that most of these cells express Foxp3 (Supplementary Figure 10). Therefore, CD4<sup>+</sup>CD25<sup>+</sup> T cells without Foxp3 expression, which are considered contaminated effector T cells, are minor cells and would not substantially affect the results. Nonetheless, the use of Foxp3-eGFP mice would enable us to isolate Tregs more accurately.

      Issue (2) In Figure 3, it would be interesting to evaluate whether there are RORgt+Tbet+ (IL17+IFNg+) cells. These cells would be pathogenic, whereas RORgt+CD73+ cells would be non-pathogenic.

      We analyzed CD4<sup>+</sup> T cells producing both IL-17 and IFN-γ in the peripheral lymphoid tissues of Apoe<sup>-/-</sup> and Ccr4<sup>-/-</sup>Apoe<sup>-/-</sup> mice. We found that this cell population was quite rare and that there was no significant difference its proportion between the 2 groups, suggesting the possible minor contribution of this cell population to the atherosclerosis phenotype.

      Author response image 1.

      Issue (3) Different time points after adoptive cell transfer should be evaluated to confirm reduced migration to the atherosclerotic aorta.

      It would be interesting to evaluate Treg migration to the atherosclerotic aorta at different time points after Treg transfer. However, it seems difficult to accurately evaluate the migration of Tregs at later time points because they would proliferate in the aorta.

      Issue (4) The authors could evaluate whether Ccr4 SNPs correlate with an increased risk of atherosclerosis.

      As the reviewer pointed out, it is important to consider the clinical relevance of our findings. However, there is no evidence supporting that Ccr4 single nucleotide polymorphisms correlate with a higher incidence of atherosclerotic cardiovascular disease.

      Issue (5) The authors could evaluate if the transfer of Apoe<sup>-/-</sup> Tregs rescues early atherosclerosis development in Ccr4<sup>-/-</sup>Apoe<sup>-/-</sup> mice.

      To confirm whether transfer of CCR4-intact Tregs rescues the development of early atherosclerotic lesions in Ccr4<sup>-/-</sup>Apoe<sup>-/-</sup> mice, we injected Ccr4<sup>-/-</sup>Apoe<sup>-/-</sup> mice with saline or Tregs from Apoe<sup>-/-</sup> or Ccr4<sup>-/-</sup>Apoe<sup>-/-</sup> mice and analyzed the aortic root atherosclerotic lesions of recipient Ccr4<sup>-/-</sup>Apoe<sup>-/-</sup> mice. However, we found no significant difference in the aortic sinus plaque area among the 3 groups. We described this result in the results section and included the data in Supplementary Figure 8.

      Reviewer #3 (Recommendations for the authors):

      Analysis of TCD4<sup>+</sup> cell populations in different tissues:

      Issue (1) The description of flow cytometry analysis is incomplete and requires clarification. Please detail the use of controls to ensure correct analysis, including the following: i) cell viability; ii) staining controls to define positive and negative cells; iii) the gating strategy used to identify cell populations in each lymphoid tissue and aorta (please provide them as supplementary figures).

      As we thought that most of the prepared cells would be viable, we did not check their viability. Based on our previous work where various immune cells including Tregs, effector memory T cells, and helper T cell subsets were clearly detected, in this study we performed flow cytometric analysis of these immune cells without preparing negative controls stained with isotype control antibodies. The gating strategy of flow cytometric analysis of various immune cells in peripheral lymphoid tissues was reported in our previous report (J Am Heart Assoc 2024; 13: e031639). We provided the gating strategy of flow cytometric analysis of helper T cells and Tregs in the aorta in Supplementary Figure 9.

      Issue (2) The phenotype/differentiation markers used for analysing T CD4<sup>+</sup> cell subsets differ between lymphoid tissues and aortic lesions; might this influence results? If so, please comment on that.

      As the number of aortic T cells was quite few compared with that in peripheral lymphoid tissues, it seemed difficult to precisely detect aortic T cells including various helper T cell subsets and Tregs by intracellular cytokine staining. Therefore, we decided to analyze these cells by evaluating transcription factors specific for helper T cell subsets. The difference in the markers used for analyzing T cell subsets would not considerably influence the results.

      Issue (3) Considering my observations about the effect of CCR4 deficiency on the T CD4<sup>+</sup> differentiation profile in different tissues, I suggest comparing Th1/Treg and Th17/Treg ratios in all examined tissues. The modulation of the Th17/Th1 balance could shape inflammation.

      The Th1 cell/Treg balance is shifted toward Th1 cell responses in the atherosclerotic aorta of Ccr4<sup>-/-</sup>Apoe<sup>-/-</sup> mice, while this balance would not be altered in the peripheral lymphoid tissues. It remains unclear whether CCR4 deficiency affects the Th17 cell/Treg ratio. We do not think that it is important to investigate the effect of CCR4 deficiency on the balance of Th17 cell/Treg or Th17 cell/Th1 cell because the role of Th17 cell responses in atherosclerosis remains controversial.

      Issue (4) Cell numbers of recovered Treg from para-aortic lymphoid nodes and aortic tissues might not allow Treg functional assays. Analysis by flow cytometry of biomarkers of Treg activation state would be more informative than by quantifying mRNA expression levels. In particular, TGFβ analysis at the mRNA level does not provide much more information about the suppressive activity of Treg, and even at the protein level, the recognition of the active form of this cytokine is required. Analysis of PD1 (for exhausted cell phenotype) and Treg apoptosis along the stages of atherosclerosis could also yield useful information.

      We performed flow cytometric analysis of activation markers CTLA-4 and CD103, cell exhaustion marker PD1, and apoptosis in Tregs in the para-aortic LNs of Apoe<sup>-/-</sup> or Ccr4<sup>-/-</sup>Apoe<sup>-/-</sup> mice, and found no major differences in the expression levels of these molecules or the proportion of apoptotic cells between the 2 groups. We showed these data below.

      Author response image 2.

      Unfortunately, we failed to evaluate the activity of TGF-β in Tregs because an appropriate experimental method for precisely detecting its active form was unavailable.

      Issue (5) Regarding the result´s interpretation, I recommend being precise when concluding to avoid misunderstanding. A shift in the T CD4<sup>+</sup> response in lymphoid tissues might be interpreted as a modulation of the T cell differentiation process, which strongly depends on signals derived from DCs, which were not the focus of this study.

      There are two possible mechanisms for the altered CD4<sup>+</sup> T cell responses in peripheral lymphoid tissues, which include the modulation of their differentiation and proliferation processes. These processes are substantially regulated by DCs whose function could be favorably modulated by CCR4-expressing Tregs as described in the manuscript. Therefore, we think that the interactions between Tregs and DCs are crucial for shifting the CD4<sup>+</sup> T cell responses in peripheral lymphoid tissues, though it remains unclear which process plays a major role in regulating CD4<sup>+</sup> T cell polarization.

      Suppression studies:

      Issue (1) In vitro assays. According to the methodology suppression studies were performed using Treg collected from peripheral lymphoid nodes and spleen, but it is unclear whether these cells were analysed separately or as a pool (this was not clarified in the legend of Figure 5 either). Besides, be precise about which cells were used as antigen-presenting cells in the Treg suppression assay.

      In in vitro Treg suppression assay, we used Tregs purified from peripheral lymph nodes and spleen as a pool. We used splenocytes as antigen-presenting cells in Treg suppression assay. We revised the manuscript accordingly.

      Issue (2) Obtaining CD4<sup>+</sup>CD25<sup>+</sup> and CD4<sup>+</sup>CD25-. The control of the purity and viability of cell preparations from CCR4 deficient and CCR4 sufficient Apoe<sup>-/-</sup> mice should be included as a supplementary material; these purified cells were used in in vitro suppressive assays and in vivo cell transfer experiments, being relevant information to guarantee results. Since this control was performed by flow cytometry, I wonder whether Foxp3 levels were also checked.

      We included the data on the purity and viability of CD4<sup>+</sup>CD25<sup>+</sup> Tregs and CD4<sup>+</sup>CD25<sup>-</sup> T cells from Apoe<sup>-/-</sup> or Ccr4<sup>-/-</sup>Apoe<sup>-/-</sup> mice in Supplementary Figure 10. After the isolation of CD4<sup>+</sup>CD25<sup>+</sup> T cells, we examined Foxp3 expression by flow cytometry and found that most of these cells express Foxp3.

      Issue (3) For in vitro assays, IL-2, IL-10, and TGFβ measurement in culture supernatants could confirm and provide more information about Treg function.

      As both CD4<sup>+</sup>CD25<sup>+</sup> Tregs and CD4<sup>+</sup>CD25<sup>-</sup> T cells would produce various cytokines in in vitro Treg suppression assay, it is difficult to determine which cells mainly produce the above cytokines. Therefore, measurement of these cytokines would not provide more information about Treg function.

      Issue (4) It would be interesting to assess whether CCR4-mediated DC-Treg interaction is equally important to regulate Th1 than Th17 and Th2 activation; this likely requires using different settings to favour each activation profile.

      Based on our findings, we speculate that CCR4 may play an important role in regulating not only Th1 cell responses but also Th2 and Th17 cell responses by maintaining the interactions between Tregs and DCs. However, it may not be meaningful to investigate the effect of CCR4 deficiency on these T cell responses because the roles of Th2 and Th17 cell responses in atherosclerosis remain controversial.

      Issue (5) The authors showed that the presence of Treg decreased CD80 and CD86 surface levels in DCs in vitro, remarking a lower capacity of Treg derived from CCR4-deficient mice (Figure 5B). However, the fact that CD86 on splenic CD11c+MHC-II+ DCs in 8-week-old Ccr4<sup>-/-</sup>Apoe<sup>-/-</sup> mice was significantly higher than in Apoe<sup>-/-</sup> was underestimated (Supplementary Figure 4). This data needs reconsideration as it might indicate an in vivo more permissive activation state of DCs in Ccr4<sup>-/-</sup>Apoe<sup>-/-</sup> mice than in Apoe<sup>-/-</sup> mice, explaining the augmented effector T cell response observed in these mice (Figure 2).

      Our finding of the upregulated CD86 expression on DCs in Ccr4<sup>-/-</sup>Apoe<sup>-/-</sup> mice could be explained by the data on a Treg-DC coculture experiment showing the impaired ability of CCR4-deficient Tregs to downregulate CD80 and CD86 expression on DCs. As the reviewer pointed out, our data may indicate more permissive activation state of DCs and subsequent augmentation of effector T cell responses in Ccr4<sup>-/-</sup>Apoe<sup>-/-</sup> mice, which may be derived from impaired Treg suppressive function.

      Assays for chemokine levels and influence on T cell activation and traffic:

      Issue (1) Considering the findings described by Döring et al. (reference 24 in the paper), monitoring CCL22, CCL17, and CCL3 levels in the aorta and lymph nodes along atherosclerosis development would help in understanding when and how CCL17/CCL20-CCR4 might influence T cell activation and traffic. I wonder whether these chemokines were assayed by qPCR in lymphoid nodes and aorta from CCR4-deficient and sufficient Apoe<sup>-/-</sup> mice. The authors report that CCR8 (capable also of binding CCL17) was unaltered by CCR4 deficiency in splenic and para-aortic lymph nodes Treg from 8 and 18 weeks-old mice, respectively (Supplementary Figure 5 and 6), although a trend towards a high-level was observed for splenic Treg. It would be informative to evaluate CCR8 Treg levels along with atherosclerosis progress.

      As it is considered that the mRNA expression levels of chemokines do not necessarily reflect their protein expression levels, we did not analyze the mRNA expression of Ccl17 or Ccl22 by quantitative reverse transcription PCR. Instead of this, we evaluated the protein expression of CCL17 and CCL22 not only in the aorta but also in the peripheral lymph nodes of 18-week-old wild-type, Apoe<sup>-/-</sup>, and Ccr4<sup>-/-</sup>Apoe<sup>-/-</sup> mice by immunohistochemistry. We found no marked differences in their expression levels in peripheral lymph nodes among these mice and included the data in Supplementary Figure 2.

      As we focused on the role of the CCL17/CCL22–CCR4 axes in atherosclerosis, we did not examine the expression of CCL3 that is not directly related to these axes. The evaluation of CCR8+ Treg proportion is beyond the scope of this study, though we are interested in the change of this population by CCR4 deficiency associated with atherosclerotic lesion development.

      Issue (2) According to IFNγ and IL-17 expressing TCD4<sup>+</sup> subclasses, Th1 and Th17 cell subset levels increase in the spleen (Figure 3B-D) and para-aortic lymphoid nodes (Figure 4E) in CCR4 absence. A comparison of the CCR4 dependence for the migration of Th17 and Th1 cell subsets to the aorta was not performed in this atherosclerosis model; this study could help to understand the mechanisms associated with the aortic inflammation development.

      To evaluate the migration of Th1 or Th17 cells in the aorta, we need to specifically isolate them from the peripheral lymphoid tissues of Apoe<sup>-/-</sup> or Ccr4<sup>-/-</sup>Apoe<sup>-/-</sup> mice and adoptively transfer them into recipient Apoe<sup>-/-</sup> mice. However, it is impossible to isolate alive Th1 or Th17 cells because specific cell surface markers that enable us to separate these cells are unavailable.

      Issue (3) The numbers of Kaede Treg cells detected in the aorta were extremely low in both Apoe<sup>-/-</sup> and Ccr4<sup>-/-</sup>Apoe<sup>-/-</sup> mice (Figure 5I), opening results to question. Besides, the flow cytometry assay used for determining Kaede Treg cells in tissues was not well described. How were cell viability and formation of doublets examined to avoid artefacts? The gating strategy used to ensure a confident analysis of Kaede Tregs, particularly in the aorta, should be included as supplementary material.

      The extremely low number of Kaede-expressing Tregs migrated in the aorta of Apoe<sup>-/-</sup> and Ccr4<sup>-/-</sup>Apoe<sup>-/-</sup> mice may be derived from the small number of the transferred Tregs. As another explanation for this finding, Tregs may rarely migrate in the aorta under hypercholesterolemic conditions. We did not check the viability or doublets of Kaede-expressing Tregs because we thought that such experimental procedures would not considerably affect the results. We provided the gating strategy of flow cytometric analysis of Kaede-expressing Tregs in peripheral lymphoid tissues and aortas in Supplementary Figure 11.

      Other comments:

      Issue (1) As an alternative for statistical data analysis from independent experiments, two-way ANOVA with Tukey's post hoc (for data normally distributed) or the Mack Skillings exact test with Conover´s post hoc multiple comparison test (for a two-way layout in non-parametric conditions) could improve analysis.

      We performed statistical analysis in Figure 5A according to the reviewer’s suggestion.

      Issue (2) For future work, employing recombinant pseudo-receptor proteins capable of neutralizing chemokines (doi: 10.1016/j.jhep.2021.08.029) might help as an alternative to complete knockout mice.

      We thank the reviewer for giving us the information on an interesting approach as an alternative to CCR4-deficient mice.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      In this study, Pan DY et al. discovered that the clearance of senescent osteoclasts can lead to a reduction in sensory nerve innervation. This reduction is achieved through the attenuation of Netrin-1 and NGF levels, as well as the regulation of H-type vessels, resulting in a decrease in pain-related behavior. The experiments are well-designed. The results are clearly presented, and the legends are also clear and informative. Their findings represent a potential treatment for spine pain utilizing senolytic drugs.

      Strengths:

      Rigorous data, well-designed experiments as well as significant innovation make this manuscript stand out.

      Weaknesses:

      Quantification of histology and detailed statistical analysis will further strengthen this manuscript.

      I have the following specific comments.

      (1) Since defining senescent cells solely based on one or two markers (SA-β-gal and p16) may not provide a robust characterization, it would be advisable to employ another wellestablished senescence marker, such as γ-H2AX or HMGB1, to corroborate the observed increase in senescent osteoclasts following LSI and aging.

      We value the comments provided by the reviewer. In accordance with your suggestion, we have performed co-staining of HMGB1 with Trap in Supplementary Figure 1 to corroborate the observed augmentation of senescent osteoclasts following LSI and aging.

      Author response image 1.

      (2) The connection between heightened Netrin-1 secretion by senescent osteoclasts following LSI or aging and its relevance to pain warrants thorough discussion within the manuscript to provide a comprehensive understanding of the entire narrative.

      We appreciate the reviewer's insightful comments. We have thoroughly addressed the entire narrative in the revised manuscript, as outlined below:

      During lumbar spine instability (LSI) or aging, endplates undergo ossification, leading to elevated osteoclast activity and increased porosity1-4. The progressive porous transformation of endplates, accompanied by a narrowed intervertebral disc (IVD) space, is a hallmark of spinal degeneration4,5. Considering that pain arises from nociceptors, it is plausible that low back pain (LBP) may be attributed to sensory innervation within endplates. Additionally, porous endplates exhibit higher nerve density compared to normal endplates or degenerative nucleus pulposus6. Netrin-1, a crucial axon guidance factor facilitating nerve protrusion, has been implicated in this process7-9. The receptor mediating Netrin-1-induced neuronal sprouting, deleted in colorectal cancer (DCC), was found to co-localize with CGRP+ sensory nerve fibers in endplates after LSI surgery10,11. In summary, during LSI or aging, osteoclastic lineage cells secrete Netrin-1, inducing extrusion and innervation of CGRP+ sensory nerve fibers within the spaces created by osteoclast resorption. This Netrin-1/DCC-mediated pain signal is subsequently transmitted to the dorsal root ganglion (DRG) or higher brain levels.

      (3) It appears that the quantitative data for TRAP staining in Figure 1j is missing.

      We appreciate the reviewer's comments. We have added the statistical data of TRAP staining (Figure. 1p) to Figure 1 in the revised manuscript.

      Author response image 2.

      (4) Regarding Figure 6, could you please specify which panels were analyzed using a t-test and which ones were subjected to ANOVA? Alternatively, were all the panels in Figure 6 analyzed using ANOVA?

      We appreciate the reviewer’s comments here. Upon careful review, we have ensured that quantitative data in panels b, c, and f are analyzed using t-tests, while panels d, e, and g are subjected to one-way ANOVA. These updates have been reflected in the revised figure legend.

      Reviewer #2 (Public Review):

      Summary:

      This manuscript examined the underlying mechanisms between senescent osteoclasts (SnOCs) and lumbar spine instability (LSI) or aging. They first showed that greater numbers of SnOCs are observed in mouse models of LSI or aging, and these SnOCs are associated with induced sensory nerve innervation, as well as the growth of H-type vessels, in the porous endplate. Then, the deletion of senescent cells by administration of the senolytic drug Navitoclax (ABT263) results in significantly less spinal hypersensitivity, spinal degeneration, porosity of the endplate, sensory nerve innervation, and H-type vessel growth in the endplate. Finally, they also found that there is greater SnOCmediated secretion of Netrin-1 and NGF, two well-established sensory nerve growth factors, compared to non-senescent OCs. The study is well conducted and data strongly support the idea. However, some minor issues need to be addressed.

      (1) In Figure 2C, "Number of SnCs/mm2", SnCs should be SnOCs.

      We apologize for the oversight. This has been rectified in the revised manuscript.

      Author response image 3.

      (2) In Figure 3A-E, is there any statistical difference between groups Young and Aged+PBS?

      We appreciate the reviewer's comments. Following your recommendation, we conducted additional statistical analyses to compare the young and PBS-treated aged mice, and we have incorporated these findings into the revised manuscript. The data reveals a significant increased paw withdrawal frequency (PWF) in aged mice treated with PBS compared with young mice, particularly at 0.4g instead of 0.07g (Figure 3a, 3b). Moreover, aged mice treated with PBS exhibited a significant reduction in both distance traveled and active time when compared to young mice (Figure. 3d, 3e). Additionally, PBS-treated aged mice demonstrated a significantly shortened heat response time relative to young mice (Figure. 3c).

      Author response image 4.

      (3) Again, is there any statistical difference between the Young and Aged+PBS groups in Figure 4F-K?

      We appreciate the reviewer's comments. As per your suggestion, we conducted a thorough analysis to determine the statistical differences between the young and aged+PBS groups, and these statistical results have been implemented in the revised manuscript. The caudal endplates of L4/5 in PBS-treated aged mice exhibited a significant increase in endplate porosity (Figure. 4f) and trabecular separation (Tb.Sp) (Figure. 4g) compared to young mice.

      Additionally, PBS-treated aged mice showed a significant elevation in endplate score (Figure. 4h), as well as an increased distribution of MMP13 and ColX within the endplates when compared to young mice (Figure. 4i, 4j). Furthermore, TRAP staining revealed a significant increase in TRAP+ osteoclasts within the endplates of PBS-treated aged mice as compared to young mice (Figure. 4k).

      Author response image 5.

      (4) What is the figure legend of Figure 7?

      The legend for Figure 7 (as below) is included in a separate PDF file labeled 'Figures and Legends.' We have carefully checked the revised manuscript and made sure all the legends are included.

      “Fig. 7. (a) Representative images of immunofluorescent analysis of CD31, an angiogenesis marker (green), Emcn, an endothelial cell marker (red) and nuclei (DAPI; blue) of adult sham, LSI and aged mice injected with PBS or ABT263. (b) Quantitative analysis of the intensity mean value of CD31 per mm2 in sham, LSI mice treated with PBS or ABT263. (c) Quantitative analysis of the intensity mean value of CD31 per mm2 in aged mice treated with PBS or ABT263. (d) Quantitative analysis of the intensity mean value of Emcn per mm2 in sham, LSI mice treated with PBS or ABT263. (e) Quantitative analysis of the intensity mean value of Emcn per mm2 in aged mice treated with PBS or ABT263. n ≥ 4 per group. Statistical significance was determined by one-way ANOVA, and all data are shown as means ± standard deviations. “

      (5) In "Mice" section, an Ethical code is suggested to be added.

      We appreciate the reviewer's comments. In accordance with your suggestion, we have included the Johns Hopkins University animal protocol number in the revised manuscript. The relevant paragraph has been updated to read: “All mice were maintained at the animal facility of The Johns Hopkins University School of Medicine (protocol number: MO21M276).”

      (6) In "Methods" section, please indicate the primers of GAPDH.

      We apologize for the absence of the GAPDH primers. Upon review, the GAPDH primers used were as follows: forward primer 5'-ATGTGTCCGTCGTGGATCTGA-3' and reverse primer 5'-ATGCCTGCTTCACCACCTTCTT-3'. These primer sequences have been included in the revised manuscript.

      (7) Preosteoclasts are regarded to be closely related to H-type vessel growth, so do the authors have any comments on this? Any difference or correlation between SnCs and preosteoclasts?

      The pre-osteoclast plays a crucial role in secreting anabolic growth factors that facilitate H-type vessel formation, osteoblast chemotaxis, proliferation, differentiation, and mineralization. The osteoclast represents the terminal differentiation phase, ultimately leading to the induction of resorption.

      Senescent cells, including senescent osteoclasts, are characterized by permanent cell cycle arrest and changes in their secretory profile, which can impact their function. In the context of osteoclasts, senescence can lead to a reduction in bone resorption capacity and impaired bone remodeling. Senescent osteoclasts are believed to contribute to age-related bone loss and bonerelated diseases, such as osteoporosis.

      Reviewer #3 (Public Review):

      Summary:

      This research article reports that a greater number of senescent osteoclasts (SnOCs), which produce Netrin-1 and NGF, are responsible for innervation in the LSI and aging animal models.

      Strengths:

      The research is based on previous findings in the authors' lab and the fact that the IVD structure was restored by treatment with ABT263. The logic is clear and clarifies the pathological role of SnOCs, suggesting the potential utilization of senolytic drugs for the treatment of LBP. Generally, the study is of good quality and the data is convincing.

      Weaknesses:

      There are some points that can be improved:

      (1) Since this work primarily focuses on ABT263, it resembles a pharmacological study for this drug. It is preferable to provide references for the ABT263 concentration and explain how the administration was determined.

      Thank you for your comment. ABT263 has been extensively employed in diverse research studies12-15. The concentration and administration of ABT263 followed the protocol outlined in the published paper13. The reference on how to use ABT263 is cited in the method section: “ABT263 was administered to mice by gavage at a dosage of 50 mg per kg body weight per day (mg/kg/d) for a total of 7 days per cycle, with two cycles conducted and a 2-week interval between them39”.

      (2) It would strengthen the study to include at least 6 mice per group for each experiment and analysis, which would provide a more robust foundation.

      Thank you for your comment here. In response, we conducted a new set of experiments, augmenting the majority of the sample size to six, and updated the corresponding statistical data in the revised manuscript.

      (3) In Figure 4, either use "adult" or "young" consistently, but not both. Additionally, it's important to define "sham," "young," and "adult" explicitly in the methods section.

      Thank you for your comment. We have addressed the inconsistency in the labeling of Figure 4. Additionally, we have explicitly defined "sham," "young," and "adult" in the methods section as follows: The control group (sham group) for the LSI group refers to C57BL/6J mice that did not undergo LSI surgery, while the control group (young group) for the Aged group refers to 4-month-old C57BL/6J mice.

      Author response image 6.

      (4) Assess the protein expression of Netrin 1 and NGF.

      Thank you for your comment here. We employed ELISA to assess the protein expression of Netrin-1 and NGF in the L3 to L5 endplates. The data revealed that compared to the young sham mice, LSI was associated with significantly greater protein expression of Netrin1 and NGF, which was substantially attenuated by ABT263 treatment in LSI mice (Supplementary Fig. 2a, 2b)

      Author response image 7.

      Reference

      (1) Bian, Q. et al. Excessive Activation of TGFbeta by Spinal Instability Causes Vertebral Endplate Sclerosis. Sci Rep 6, 27093, doi:10.1038/srep27093 (2016).

      (2) Bian, Q. et al. Mechanosignaling activation of TGFbeta maintains intervertebral disc homeostasis. Bone Res 5, 17008, doi:10.1038/boneres.2017.8 (2017).

      (3) Papadakis, M., Sapkas, G., Papadopoulos, E. C. & Katonis, P. Pathophysiology and biomechanics of the aging spine. Open Orthop J 5, 335-342, doi:10.2174/1874325001105010335 (2011).

      (4) Rodriguez, A. G. et al. Morphology of the human vertebral endplate. J Orthop Res 30, 280-287, doi:10.1002/jor.21513 (2012).

      (5) Taher, F. et al. Lumbar degenerative disc disease: current and future concepts of diagnosis and management. Adv Orthop 2012, 970752, doi:10.1155/2012/970752 (2012).

      (6) Fields, A. J., Liebenberg, E. C. & Lotz, J. C. Innervation of pathologies in the lumbar vertebral end plate and intervertebral disc. Spine J 14, 513-521, doi:10.1016/j.spinee.2013.06.075 (2014).

      (7) Hand, R. A. & Kolodkin, A. L. Netrin-Mediated Axon Guidance to the CNS Midline Revisited. Neuron 94, 691-693, doi:10.1016/j.neuron.2017.05.012 (2017).

      (8) Moore, S. W., Zhang, X., Lynch, C. D. & Sheetz, M. P. Netrin-1 attracts axons through FAK-dependent mechanotransduction. J Neurosci 32, 11574-11585, doi:10.1523/JNEUROSCI.0999-12.2012 (2012).

      (9) Serafini, T. et al. Netrin-1 is required for commissural axon guidance in the developing vertebrate nervous system. Cell 87, 1001-1014, doi:10.1016/s0092-8674(00)81795-x (1996).

      (10) Forcet, C. et al. Netrin-1-mediated axon outgrowth requires deleted in colorectal cancer-dependent MAPK activation. Nature 417, 443-447, doi:10.1038/nature748 (2002).

      (11) Shu, T., Valentino, K. M., Seaman, C., Cooper, H. M. & Richards, L. J. Expression of the netrin-1 receptor, deleted in colorectal cancer (DCC), is largely confined to projecting neurons in the developing forebrain. J Comp Neurol 416, 201-212, doi:10.1002/(sici)1096-9861(20000110)416:2<201::aid-cne6>3.0.co;2-z (2000).

      (12) Born, E. et al. Eliminating Senescent Cells Can Promote Pulmonary Hypertension Development and Progression. Circulation 147, 650-666, doi:10.1161/CIRCULATIONAHA.122.058794 (2023).

      (13) Chang, J. et al. Clearance of senescent cells by ABT263 rejuvenates aged hematopoietic stem cells in mice. Nat Med 22, 78-83, doi:10.1038/nm.4010 (2016).

      (14) Lim, S. et al. Local Delivery of Senolytic Drug Inhibits Intervertebral Disc Degeneration and Restores Intervertebral Disc Structure. Adv Healthc Mater 11, e2101483, doi:10.1002/adhm.202101483 (2022).

      (15) Yang, H. et al. Navitoclax (ABT263) reduces inflammation and promotes chondrogenic phenotype by clearing senescent osteoarthritic chondrocytes in osteoarthritis. Aging (Albany NY) 12, 12750-12770, doi:10.18632/aging.103177 (2020).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      (1) Adding page numbers would have helped the reviewers. 

      We apologize for the oversight and have added page numbers for the revision.

      (2) Page 2, second paragraph: please do not generalize. Also, this sentence is confusing: "the addiction neuroscience field has moved from recognizing that "compulsive drug seeking/use" and "continued seeking/use despite negative consequences" are two distinct aspects of addiction to defining the former nearly exclusively by the latter in animal models." 

      We acknowledge that the sentence in question may have been unclear. We have revised the introduction to avoid generalizations and improve clarity to read:

      “Recently, the preclinical addiction field has moved from recognizing compulsive drug seeking/use and continued seeking/use despite negative consequences as two distinct aspects of addiction, to examining compulsive-like behavior nearly exclusively by models of continued seeking/use despite negative consequences.”

      In the revised introduction, we have focused on the specific aims and findings of our study, emphasizing the use of a large, genetically diverse sample and an extended drug access paradigm to better model addiction-like behaviors. We have also clarified the relationship between the different measures of addiction-like behavior and the potential role of sex differences in resilience to these behaviors.

      (3) Again here, please do not generalize: "While these three behaviors capture different aspects of addictionlike behaviors, a pervasive view in the field is that the only way to identify an individual with an addiction phenotype is to measure continued drug use despite adverse consequences." This is not unanimous in the addiction field. Same on page 21. 

      We have revised the sentence to avoid generalizing and to acknowledge that this perspective is held by some researchers, rather than presenting it as a pervasive view. We have also included relevant citations to support this point. This sentence now reads:

      “These measures are thought to capture different aspects of addiction-like behaviors. Some researchers argue that continued drug use despite adverse consequences is the most critical measure for identifying an addiction phenotype, as it reflects the compulsive nature of drug use (Deroche-Gamonet et al., 2004; Vanderschuren and Everitt, 2004)”

      (4) This sentence needs citations: "A key argument in favor of this hypothesis is that responding despite adverse consequences is sometimes uncorrelated to drug taking/seeking." 

      We have added references (Chen et al., 2013; Domi et al., 2021; Giuliano et al., 2019; Li et al., 2021; Siciliano et al., 2019; Timme et al., 2022; Belin et al., 2008; Pelloux et al., 2007) that provide evidence for this assertion. These studies demonstrate that individual differences in responding despite adverse consequences can be dissociated from drug intake and seeking behaviors, suggesting that they may measure distinct aspects of addiction-like behaviors.

      (5) Page 4: what is "an advanced model?" (also on page 22). Change "characterization" to "characterized." Delete "as much as possible" in "as much genetic diversity as possible." 

      These have been addressed

      (6) Page 7, statistical analysis: PCA needs to be explained further. Was the PCA varimax rotated, normalized, eigenvalues, etc. Was this used to find "latent variables?" (PCA versus factor analysis) 

      It was a principal component analysis (PCA), deriving components that are a linear combination of the original variables, with the following coefficients for the first two components, which were added in the results for PC1:

      Author response table 1.

      The PCA was performed in R with prcomp in the stats package, using centering and scaling, which was added in the methods section. No orthogonal loadings rotation (varimax) was used. The eigenvalues of the PCs are 1.9, 1.0, 0.7, 0.4 and explain variance as shown in the scree plot:

      Author response image 1.

      (7) Page 9: correct "an indexes." 

      This was corrected as ‘indexes’

      (8) Figure 1 legend: correct "test at the." 

      Corrected to ‘tested’

      (9) Page 17: rewrite "except for the low addicted one." 

      Done

      (10) Page 19: delete "state-of-the-art." Intravenous self-administration is not new. 

      Done

      (11) Page 20: replace "abuse" with cocaine use disorder. 

      Done

      (12) Page 20: The distinction of qualitative and quantitative differences between males and females is inaccurate given that resilient and vulnerable groups were arbitrarily defined by quantitative differences. 

      This distinction between quantitative and qualitative was removed.

      (13) The discussion about DSM-V criteria is "over the top" and unnecessary. One cannot determine whether rodents took more drugs than intended, made efforts to quit, etc. 

      This discussion was toned down and shortened, as this is not the focus of the manuscript.

      (14) Page 21: The discussion about small n and the test of nondependent rats should also be toned down and it is incomplete. There are several behavioral and pharmacological studies that indicate that different measures may capture, at least to some degree, different aspects of behavior in alcohol and opioid-dependent rodents (e.g., PMID: 28461696; PMID: 25878287; PMID: 36683829). 

      the discussion has been toned down and expanded as suggested by the reviewer.

      Now it reads: 

      “A possible explanation as to why previous studies failed to observe this correlation between escalation, motivation, and aversion-resistance is that most of the previous studies used small sample sizes that may not provide sufficient statistical power to observe this relationship between variables. Another explanation is that previous studies often used animal models with limited access to the drug, where animals exhibit low levels of acute intoxication and very little, if any, signs of drug dependence (George et al., 2022). However, it is important to note that several behavioral and pharmacological studies have indicated that different measures may capture, at least to some degree, different aspects of addiction-like behavior in alcohol and opioid-dependent rodents (Aoun et al., 2018; Barbier et al., 2015; Marchette et al., 2023). While the present results suggest that escalation of drug intake highly predicts drug responding despite adverse consequences in an animal model with long access to cocaine and evidence of drug dependence, further research is needed to determine the extent to which these findings generalize to other drugs of abuse and different stages of the addiction cycle”.

      (15) Several factors should be considered for explaining their PCA findings. The progression of the progressive ratio (too steep, not steep enough), the shock intensity (too low, too high), the contingency of the shock (high or not high enough), the cocaine unit dose, the use of multiple punishment sessions (learning; the first session is likely to reflect the previous session, same for PR) etc, all could affect the outcomes. Not finding differences in one dataset (even large ones) obtained from a particular experimental design does not necessarily mean that these differences do not exist. 

      Thank you for raising this important point about the potential impact of experimental factors on our PCA findings. We now acknowledge in the discussion (pages 22-23) that several factors, such as the progression of the progressive ratio schedule, shock intensity, contingency of the shock, cocaine unit dose, and the use of multiple punishment sessions, could influence the outcomes of our analysis. Now it reads: 

      “It is important to acknowledge that several experimental factors could influence the outcomes of the PCA analysis. These factors include the schedule of reinforcement, the progression of the progressive ratio schedule, the shock intensity, the contingency of the shock, the cocaine unit dose, and the use of multiple punishment sessions (Belin et al., 2008; Deroche-Gamonet et al., 2004; Pelloux et al., 2007). In particular, learning effects may play a role when animals undergo multiple punishment or progressive ratio sessions. An animal's response to punishment or its performance in progressive ratio sessions may change over time as it learns from its previous experiences (Marchant et al., 2013; Vanderschuren et al., 2017). While the present study utilized a large dataset obtained from a particular experimental design, it is essential to acknowledge that not finding differences in one dataset does not necessarily mean that these differences do not exist. Future studies should investigate the impact of these experimental factors, including learning effects, on the relationship between escalation, motivation, and aversion-resistance to further elucidate the underlying constructs of addiction-like behaviors.”

      (16) Related to the above, another reason for all "consummatory variables" to load onto the same factor can be due to the selection of the variables. For example, the inclusion of all ShA and LgA access sessions makes the PCA much less powerful. In fact, these many similar variables would make the PCA less powerful in a large dataset than a much smaller dataset that includes fewer variables in the PCA. The authors should attempt to avoid redundant variables in the PCA (all ShA and all LgA sessions). Perhaps use the average of the last three sessions of each ShA and LgA (or the slope of the escalation curve for LgA), or not even include ShA. They should also attempt PCAs without the irritability test. It is very common to find clusters of variables pertaining to the same tests (i.e., all consummatory variables clustered together, and all irritability measures clustered together in an independent factor. 

      For the PCA in figure 4E, only 4 variables were included: the Z-scores for A) escalation (calculated as the average intake of the last three long-access sessions, similar to the average or slope of the escalation curve as suggested by the reviewer), B) motivation (intake under progressive ratio), C) compulsivity (continued responding despite adverse consequences), and D) irritability. This approach aimed to minimize redundancy in the variables and focus on key measures of addiction-like behaviors.

      To further address the reviewer's concern, we performed an additional PCA on the same 377 animals, excluding the irritability index. This PCA included only the escalation, motivation, and compulsivity indices. The results of this analysis (Figure S3A) were consistent with our original findings, with the three variables loading similarly (>+1 standard deviation) onto factor 1 explaining 63.5% of the variance in addiction-like behaviors." This analysis was added as supplementary figure S3A.

      (17) Also related to the above, males and females may behave differently, sometimes in opposite directions, thus "cancel each other out." The authors should take advantage of their huge sample size and do PCAs separately for males and females to learn more about potential sex differences in behavioral constructs. 

      First, we looked at male vs. female differences in the biplot represented in Fig. 4E, which included the irritability index. This analysis showed no sex differences and was added as supplemental figure S3B. 

      Next, we ran the PCA analysis on males (left panel) and females (right panel) separately, which revealed a difference in the relationship between the Compulsivity Index and the other variables. In males, the Compulsivity Index separated from the escalation and motivation indices in the opposite direction relative to PC2 compared to females. Additionally, in males, compulsivity became more positively correlated with irritability, while in females, the relationship was opposite. These interpretations were added to the discussion page 21 and the results were included in the Supplemental Figure S3 C-D. The discussion was updated accordingly.

      (18) Figure 3 legend. There is no correlation in the figure. 

      This was intended to summarize that vulnerable animals, as defined with a high intake in the last 3 LgA sessions are also more vulnerable in the other measures, but was removed to avoid further confusion.

      (19) Page 22: the authors contradict themselves: "The evaluation of different addiction-like behaviors is important as multiple elements of addiction vulnerability were found to be independently heritable (Eid et al., 2019), and likely controlled by distinct genes that remain to be identified." 

      we agree with the reviewer, and we edited the discussion to clarify the relationship between the current findings and the potential for distinct genetic influences on different aspects of addiction vulnerability. The text now reads: 

      “The evaluation of different addiction-like behaviors is important, as previous research has suggested that multiple elements of addiction vulnerability may be independently heritable (Eid et al., 2019). While our current findings indicate that escalation, motivation, and compulsivity are highly correlated and load onto a single construct in our model, it is possible that distinct genes contribute to different aspects of addiction vulnerability. The high correlation between these behaviors in our study may reflect common underlying genetic influences, but it does not preclude the existence of additional, unique genetic factors that shape specific aspects of addiction-like behavior. Further research is needed to identify the specific genes that contribute to the overall construct of addiction vulnerability, as well as those that may influence distinct behavioral elements. The behavioral characterization of HS rats in this study provides a foundation for future genome-wide association studies (GWAS) aimed at identifying specific alleles and genes that contribute to vulnerability and resilience to cocaine addiction-like behavior (Chitre et al., 2020).”

      Reviewer #2:

      (1) I strongly suggest the authors include effect sizes. They are likely correct that many studies using rats during self-administration are underpowered, but because it is unlikely that most studies will use over 500 rats, the effect size information would be beneficial for future researchers. That is, if an effect requires 100 rats per group, this would be critical to know. 

      Standardized effect sizes (Cohen d and 95% confidence intervals) were included for the sex differences, intake group differences, and addiction groups. Moreover, a statement about the required amount of animals needed to detect significant effects was added in the discussion.

      (2) I suggest that the authors tone down the portions of the Discussion that appear to be defenses of the extended access model. The data in this paper do not address short vs. long-access in a way that supports that. Moreover, they should acknowledge some of the ways that I noted above in which the short access period seems to be just as predictive as the long-access. It raises the question of whether keeping another group of rats on short access through all 25 days would have led to some of the same outcomes that were observed. 

      This discussion was toned down and shortened, as this is also not the focus of the manuscript (see also response to reviewer 1’s 14th comment).

      We appreciate the reviewer's comment on the potential predictive value of the short access period for addiction-like behaviors. We agree that maintaining a group of rats on short access throughout the 25 days could have provided valuable insights into the development of these behaviors, particularly in light of the individual differences observed in our genetically diverse HS rat population. As we mention also in our response to Reviewer 3 (comment 5), the observed escalation of drug intake during the short access condition in our study may be attributed to the genetic diversity of the HS rat population. To address this important point, we have added a new paragraph in the Discussion section that elaborates on this observation:

      "It is important to note that while our study focused on the differences between resilient and vulnerable rats under long access conditions, the short access period may also be predictive of addiction-like behaviors, particularly in genetically diverse populations. The observed escalation of drug intake during short access in our study is not due to an acquisition issue, as rats start differentiating the active from the inactive levers on the first day of ShA1 (Fig. S1B), with a 3 to 1 ratio between active/inactive pressing by ShA7. Rather, this early escalation may be attributed to the individual differences in drug-taking behavior among the HS rats, highlighting the importance of using genetically diverse animals to capture the full spectrum of individual differences in addiction-like behaviors."

      (3) I suggest the authors explain how the dosing was maintained across the self-administration period. I also suggest that the authors provide figures that show mg/kg of cocaine consumed for each day, rather than just infusions per day. This would be especially helpful for the sex difference claims. 

      To ensure consistent dosing, animals were weighed weekly to adjust the drug solution concentration, rounded to the nearest ten grams. This sentence was added to the methods section. Each infusion is 0.5 mg/kg, so the amount the animals consumed = number of infusions x 0.5 mg/kg. Moreover, a second axis with the dose in mg/kg of cocaine consumed was added to the escalation curves in figures 1B, 2A, 3A, 5A, and S1A.

      (4) Throughout the paper, and especially the 2nd paragraph of the Introduction, the authors make a number of assertions for which they should provide references. 

      We have carefully reviewed the manuscript and have now included relevant references to ensure that all statements are properly supported by the existing literature.

      (5) Likewise, with the Discussion about sex and gender differences, I suggest a more nuanced and better-cited discussion. Many rodent studies with self-administration have not identified sex differences, though this often gets under-noticed as the titles and abstracts do not mention the lack of effects. The support for gender differences in humans in terms of vulnerability to cocaine use disorder, beyond that men have higher rates, is thin and this section should be modified.

      The section was modified with additional references and linked to the newly introduced effect sizes for sex differences.

      (6) I also suggest the authors change some of the language such as referring to their behavioral measures as "state of the art". Extended access has been around for over two decades.

      This has been adjusted, also see response to reviewer 1’s comments on page 4, 19, and 22.

      Reviewer #3:

      Strengths: 

      (1) The number of animals run through this study is particularly impressive and allows for analyses that cannot be done with smaller cohorts. 

      (2) The inclusion of males and females in a study of this size allows for a better understanding of potential sex differences across a range of behavioral domains. 

      (3) Relating these measures to each other is incredibly important. If they are all measuring the same thing this would have important implications for the field. 

      Weaknesses: 

      (1) The authors claim that escalation of intake, increased motivation under progressive ratio, and responding despite negative consequences can all be explained by the same psychological construct, which they conclude is predictive of an addiction-like phenotype. However, previous research has demonstrated that the aforementioned behavioral measures highly correlate with the rate at which animals lever press to receive a reinforcer. For example, animals that have higher baseline rates of behavior will also be less sensitive to punishment and will press more on a PR. In fact, early behavioral pharmacology work from Peter Dews showed that the same is true for drug effects on behavior, where the same drug has less of a behavioral effect with behavior was maintained on a schedule that resulted in higher response rates. This is not ruled out and actually could explain the results in a parsimonious way. This is not highlighted or mentioned in the manuscript. 

      Thank you for raising this important point about the potential influence of baseline response rates on the observed correlations between addiction-like behaviors. We agree that individual differences in baseline response rates may contribute to the relationships we observed, and we have added a paragraph to the discussion acknowledging this possibility (see page 22). We now discuss how previous research has shown that animals with higher baseline rates of responding tend to be less sensitive to punishment and exhibit higher levels of responding under progressive ratio schedules, as demonstrated in early behavioral pharmacology work by Dews and others (Dews, 1955; Sanger and Blackman, 1976). While our findings suggest that escalation of intake, motivation, and responding despite negative consequences can be explained by a single psychological construct related to addiction vulnerability, we cannot rule out the influence of baseline response rates. We have highlighted the need for future studies to investigate the relationship between baseline response rates and addiction-like behaviors to further clarify the underlying mechanisms

      (2) The authors draw major conclusions from data collected using only one dose of cocaine. Can the authors comment on how the dose of cocaine was selected? Although the majority of the animals maintained responding to the drug, one finding of the manuscript claims that roughly 20% of animals were resilient to developing an addiction-like phenotype. The differences observed could simply be a result of selecting too high or too low of a dose per infusion. 

      We selected a dose of 0.5 mg/kg/infusion of cocaine for our study based on our and others previous literature demonstrating that this dose is commonly used in rat self-administration studies and is effective in producing addiction-like behaviors (de Guglielmo et al. 2017, Kallupi et al. 2022, Kononoff et al. 2018, Sedighim et al. 2021, Ahmed and Koob, 1998; Deroche-Gamonet et al., 2004; Belin et al., 2009). This dose has been shown to maintain stable responding and induce escalation of intake, motivation, and compulsive-like responding in a significant proportion of animals (Ahmed and Koob, 1998; DerocheGamonet et al., 2004; Belin et al., 2009).

      (3) In line with the previous comment, rats self-administered cocaine under one schedule of reinforcement and were exposed to only one, mild, foot shock intensity. Although a large number of animals were used, it is difficult to translate these results to understand patterns of drug intake in humans. 

      We appreciate the reviewer's comment on the limitations of using a single schedule of reinforcement and a single foot shock intensity in our study. We acknowledge that these factors may limit the direct translatability of our findings to patterns of drug intake in humans. As mentioned in our response to Reviewer 1 (comment 15), we have now added a paragraph to the discussion (pages 22-23) addressing the potential impact of various experimental factors on our PCA findings. These factors include the schedule of reinforcement, the progression of the progressive ratio schedule, shock intensity, contingency of the shock, cocaine unit dose, and the use of multiple punishment sessions. We acknowledge that the specific parameters used in our study may have influenced the observed individual differences in addiction-like behaviors and that different results might be obtained under different experimental conditions. To further address the current reviewer's concern, we would like to emphasize that our study aimed to investigate individual differences in addiction-like behaviors within a specific experimental context, rather than directly modeling the complex patterns of drug intake in humans. While our findings provide valuable insights into the relationship between different addiction-like behaviors in rats, we agree that additional studies using a range of experimental conditions are needed to fully understand the extent to which these findings translate to human drug use patterns. Future studies could investigate the impact of different schedules of reinforcement, shock intensities, and other experimental parameters on the development and expression of addiction-like behaviors in the HS rat population. Such studies would help to determine the generalizability of our findings and provide a more comprehensive understanding of the factors influencing individual differences in addiction vulnerability.  

      (4) It is unclear how a principal component analysis, which includes irritability-like behavior, was conducted when the total number of animals used for behavior is nearly half the number of animals used for drugintake behaviors. The authors should expand on the PCA methodology and explain how that is not a problem for the PCA method that is used. 

      The PCA (Figure 4E) can only be performed using animals that had the data for all measures, including irritability. Since not all animals were tested for irritability-like behaviors the PCA was performed on those 377 animals who had behavioral measures for all variables. Once irritability was excluded as a measure, the larger animal set could be used (including the animals missing irritability measurements). This was clarified in the text and figure legend, where animal numbers were added.

      (5) It is surprising that the authors observed an escalation of drug intake during the short access condition (Fig. 1B, 2A, 3A, 5A). Previous literature has demonstrated that animals with short access to cocaine maintain stable and low intake, even when tested daily for weeks. Can the authors comment on this discrepancy? Are these animals still acquiring the task during this period? 

      We were indeed surprised by the fact that some individuals started escalating their intake early on during short access, as most of the literature shows that short access leads to stable intake. However, we have some hypotheses that may explain this phenomenon. It is unlikely that this early escalation is due to an acquisition issue as rats start differentiating the active from the inactive levers on the first day of ShA1 (new data included as Fig. S1B) and that there is a 3 to 1 ratio between active/inactive pressing by ShA7. Three factors are more likely to play a key role in this early escalation. First, it is likely that the early escalation observed in some animals is due to the genetic diversity of the HS rat population used in our study. Indeed, most of the literature used Wistar, Sprague Dawley, and Long Evans rats, while the HS rats includes 8 different strains as founder parents. Indeed, profound strain differences have been observed in the vulnerability to self-administer cocaine, the maintenance of cocaine self-administration during short access,  and the level of escalation of intake (Freeman et al., 2009; Kosten et al., 2007; Perry et al., 2006; Picetti et al., 2010; Valenza et al., 2016). Second, we used a 2 h short access while most studies used 1 h of short access. The level of escalation is proportional to the duration of access, and it is likely that a 1 h access period leads to a ceiling effect preventing detection of individual differences in early escalation. Third, it is likely that reporting and publication bias played a significant role in the lack of reporting of such a phenomenon. When using a low sample size, many laboratories remove outliers during short access to ensure a homogeneous population before being given long access or moving on to a specific experimental condition. The combination of using a limited number of strains with limited genetic diversity, a 1h short access, and reporting bias is likely to have led to the conclusion that escalation of cocaine intake does not occur during short access. The current report using a rat stock with high genetic diversity, a 2 h short access, and no reporting bias conclusively demonstrates that escalation of cocaine intake occurs in some individuals. The discussion has been updated to reflect these points on page 20.

      (6) Although the authors provide PR and foot shock data separated by sex in Supplemental Figure 2, the manuscript would benefit from denoting the number of males and females in each data set shown in Figures 3 and 5. Is there a difference in the proportion of males or females that display a vulnerable phenotype? Given that the authors are interested in investigating sex differences, it would greatly improve the manuscript to disaggregate the resilient/vulnerable data (Figure 3) and degree of vulnerability data (Figure 5) by sex. 

      We have now added the proportion of males and females in each of these subgroups and discussed these results. 

      - For figure 3: when categorizing on intake, there is a greater number of males in the Resilient population than females, as a logical conclusion from the findings in figure 2. The following was added: “From the analysis of sex differences above, we could expect the Resilient group to contain more males. Amongst the resilient animals, there were twice as many males compared to females (N = 122 total with 82 males and 40 females). The amounts in the vulnerable group were almost equal (N = 445 total with 210 males and 235 females).

      - For figure 5: as the z-scoring of the behavioral measures is performed per sex, these differences are normalized, and all groups contain equal amounts of males and females. The following was added: “As the indices were derived per sex, quantile normalization results in roughly equal number of males and females in each group: 57 females and 71 males in the Low group, 68 females and 60 males in the Mild group, 67 females and 60 males in the Moderate group, and 57 females and 71 males in the Severe group.” To make this clearer, we also elaborated on the calculation of the indices in the methods and results sections. 

      (7) Consistent with previous reports, the authors demonstrate an increase in irritability-like behavior during withdrawal after cocaine self-administration; however, they make the claim that this variable was orthogonal to drug intake behavior. The discussion claims that the increase in irritability-like behavior was likely due to factors independent of drug intake, such as undergoing surgery, catheter implants, or being tested daily for two months. Individuals with a history of substance use disorder are thought to continue use as a consequence of negative reinforcement. Unwanted behavioral states, such as irritability, can be a driving factor in relapse; therefore, it would perhaps be more translationally relevant to understand the degree to which irritability-like behavior acts as a negative reinforcer rather than correlating this behavior with initial drug-seeking behavior. While this is outside of the scope of the current manuscript, perhaps this is worth noting in the discussion.

      the reviewer raises a good point and we added a paragraph to the discussion acknowledging the translational relevance of understanding the relationship between irritability and drug-seeking behavior in the context of negative reinforcement and relapse. Now it reads: 

      “Despite the lack of correlation between irritability-like behavior and drug intake in our study, it is important to consider the translational relevance of irritability in the context of substance use disorders. In individuals with a history of substance use disorder, negative affective states, such as irritability, are thought to contribute to continued drug use and relapse through negative reinforcement processes (Baker et al., 2004; Koob and Le Moal, 2008). Specifically, the desire to alleviate or escape from these unwanted behavioral states may drive individuals to seek and use drugs, thus perpetuating the cycle of addiction (Baker et al., 2004; Solomon and Corbit, 1974). While our study focused on the relationship between irritability-like behavior and initial drug-seeking behavior, future research should investigate the degree to which irritability acts as a negative reinforcer in the context of drug relapse”.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public reviews:

      Reviewer #1 (Public Review):

      Summary:

      Songbirds provide a tractable system to examine neural mechanisms of sequence generation and variability. In past work, the projection from LMAN to RA (output of the anterior forebrain pathway) was shown to be critical for driving vocal variability during babbling, learning, and adulthood. LMAN is immediately adjacent to MMAN, which projects to HVC. MMAN is less well understood but, anatomically, appears to resemble LMAN in that it is the cortical output of a BG-thalamocortical loop. Because it projects to HVC, a major sequence generator for both syllable phonology and sequence, a strong prediction would be that MMAN drives sequence variability in the same way that LMAN drives phonological variability. This hypothesis predicts that MMAN lesions in a Bengalese finch would reduce sequence variability. Here, the authors test this hypothesis. They provide a surprising and important result that is well motivated and well analyzed: MMAN lesions increase sequence variability - this is exactly the opposite result from what would be predicted based on the functions of LMAN.

      Strengths:

      (1) A very important and surprising result shows that lesions of a frontal projection from MMAN to HVC, a sequence generator for birdsong, increase syntactical variability.

      (2) The choice of Bengalese finches, which have complex transition structures, to examine the mechanisms of sequence generation, enabled this important discovery.

      (3) The idea that frontal outputs of BG-cortical loops can generate vocal variability comes from lesions/inactivations of a parallel pathway from LMAN to RA. The difference between MMAN and LMAN functions is striking and important.

      Weaknesses:

      (1) If more attention was paid to how syllable phonology was (or was not) affected by MMAN lesions then the claims could be stronger around the specific effects on sequence.

      Reviewer #2 (Public Review):

      Summary:

      This study investigates the neural substrates of syntax variation in Bengalese finch songs. Here, the authors tested the effects of bilateral lesions of mMAN, a brain area with inputs to HVC, a premotor area required for song production. Lesions in mMAN induce variability in syntactic elements of song specifically through increased transition entropy, variability within stereotyped song elements known as chunks, and increases in the repeat number of individual syllables. These results suggest that mMAN projections to HVC contribute to multiple aspects of song syntax in the Bengalese finch. Overall the experiments are well-designed, the analysis excellent, and the results are of high interest.

      Strengths:

      The study identifies a novel role for mMAN, the medial magnocellular nucleus of the anterior nidopallium, in the control of syntactic variation within adult Bengalese finch song. This is of particular interest as multiple studies previously demonstrated that mMAN lesions do not affect song structure in zebra finches. The study undertakes a thorough analysis to characterise specific aspects of variability within the song of lesioned animals. The conclusions are well supported by the data.

      Weaknesses:

      The study would benefit from additional mechanistic information. A more fine-grained or reversible manipulation, such as brain cooling, might allow additional insights into how mMAN influences specific aspects of syntax structure. Are repeat number increases and transition entropy resulting from shared mechanisms within mMAN, or perhaps arising from differential output to downstream pathways (i.e. projections to HVC)? Similarly, unilateral manipulations would allow the authors to further test the hypothesis that mMAN is involved in inter-hemispheric synchronization.

      We thank the reviewers and editor for their encouraging and helpful comments and suggestions. We have revised the previous submission with new analyses and discussion to address points raised by the reviewers.

      Following the suggestion of Reviewer 1 we have added an analysis of the effects of mMAN lesions on syllable phonology, using a variety of measures. We have included 3 new Figure Supplements that detail our analyses and elaborate on these points.

      We agree with Reviewer 2 that reversible and unilateral manipulations would be interesting and potentially enable additional insights into the mechanisms by which mMAN influences song sequencing, and we are planning to perform such experiments in future studies.

      We made additional minor changes throughout the manuscript to address other points raised by the reviewers, and we thank them again for their time and effort in providing constructive feedback to improve our study.

      A complete point by point detailing of these changes is included below, interspersed with the reviewer comments.

      Reviewer #1 (Recommenda1ons For The Authors):

      The opposite result from what would be predicted based on the functions of LMAN.

      Shoring up the paper's claims and ruling out alternative interpretations will require attention to the following issues:

      Major comments

      (1) Acoustic structure of syllables

      Line 294 & Sup. Figure 2, in some birds the syllable acoustic structures seem to be significantly different between the pre- and post-lesion condition, e.g. 'w' in Bird 1, 'g' in Bird 2, 'blm' in Bird 6. This observation seems to contradict the claim that acoustic structures are not affected by MMAN lesions.

      Related to the previous point, a more detailed analysis is needed to quantify the extent of acoustic changes caused by MMAN lesions. For example, do these pre- and post- lesion syllables form distinct clusters if embedded in a UMAP? Do more standard measures of syllable phonology (e.g. SAP similarity scores or feature distributions) show differences in pre- and post-MMAN lesion?

      We agree with the reviewer that there were individual syllables as illustrated in the average spectrograms of Figure 2 – figure supplement 1 that qualitatively differed between pre- and post-lesion recordings. We have followed the reviewer’s suggestion to quantify changes to syllable phonology using both similarity scores by Sound Analysis Pro (SAP) and a variety of identified acoustic features.

      In brief, these measures largely corroborate the conclusion that for most birds and syllables there was little or no difference in phonology between pre- and post-lesion songs, but that in a minority of cases syllables were altered noticeably (further detail below). In those cases where syllable phonology was altered, changes were not consistent across birds, and we cannot rule out off-target effects due to damage to structures or fibers of passage neighboring mMAN, so that it is unclear whether some subtle changes to syllable phonology can be attributed to mMAN lesions versus other causes. Future studies could more specifically examine whether damage to mMAN alone is sufficient in some cases to degrade syllable structure by using viral or other approaches that enable the more specific disruption of mMAN projection neurons.

      In practice, almost all syllables were identifiable in post-lesion songs so that we could unambiguously assign identity for purposes of evaluating effects of lesions on sequencing. Moreover, in any individual cases where there was ambiguity in syllable identity, we used the sequential context to assign the most likely label. Thus, any errors in assignment in such cases would have tended to reduce rather than accentuate the magnitude of reported sequencing effects. Lastly, each of the reported effects of mMAN lesions on sequencing were observed in multiple birds for which we detected no significant changes to syllable similarity.

      Further details of the analyses of syllable structure are detailed below, and have been added as new figure supplements:

      (1) Syllable similarity scores calculated using SAP (Sound Analysis Pro) (new Figure 2 – figure supplement 2). We compared pre-post lesion similarity scores for each syllable with selfsimilarity measures for the same syllables taken from separate control recordings before lesions. For comparison, we also included a cross-similarity score for syllables of different types. These measures confirmed the qualitative impression from spectrograms that for most birds there were no greater changes to syllable structure following lesions than was present across control recordings. For one bird, pre-post changes were significantly larger than changes across control recordings, but pre-post similarity remained higher than crosssimilarity.

      (2) Analysis of fundamental frequency and coefficient of variation (CV) of fundamental frequency of select syllables for each bird before and after mMAN lesions (new Figure 2- figure supplement 3). This analysis is directly comparable with the same analysis performed on LMAN lesions in Sakata, Hampton, Brainard (2008). We carried out this analysis in part to address changes to syllable structure that might have inadvertently arisen due to damage to LMAN, which sits immediately lateral to mMAN. In the Bengalese finch and zebra finch, lesions of LMAN cause little change to the mean fundamental frequency of individual syllables but cause a consistent reduction in the coefficient of variation (CV) of fundamental frequency across repeated renditions of a given syllable (Sakata, Hampton, Brainard 2008, Andalman, Fee 2009, Warren et al. 2011,). We therefore supposed that unintended damage to LMAN or its projections to RA might have resulted in a reduction in the CV of syllables following mMAN lesions. Instead, we saw a modest increase in the CV of fundamental frequency (mean across birds of +20%; range -19 to +43%). These data suggest that off target effects on LMAN were largely absent in our experiments (consistent with histology, e.g. Figure 1 - figure supplement 1).

      (3) Comparison of Entropy of spectral envelope (entS), Temporal centroid for the temporal envelope (meanT), First, second and third formants (F1, F2, F3), before and after lesions (calculated using the python SoundSig toolbox (Elie and Theunissen 2016) (new Figure 2- figure supplement 4). Acoustic features generally showed little change between pre and post lesion songs. They highlight as relative outliers the same individual examples that stand out in the average spectrograms in Figure 2 – figure supplement 1.

      Author response image 1.

      Syllable similarity calculated using Sound Analysis Pro (SAP). ‘Self Similarity’ = Similarity comparison of syllables before mMAN lesions to syllables of the same type, taken from two separate control recordings before the lesions, ‘Pre vs Post’ = Similarity comparison of the same syllable types before and aqer mMAN lesions, ‘Cross Similarity’ = Similarity comparison of each syllable type to other syllable types. For Birds 1-2 and 4-7, ‘Self Similarity’ was not significantly different from ‘Pre vs Post’ Similarity (p>0.05, Wilcoxon sign rank test), while for Bird 3, there was a significant difference (p = 0.03, Wilcoxon sign rank test). For all birds ‘Pre vs Post’ was significantly different from ‘Cross Similarity’ (p<0.05, Wilcoxon sign rank test). On average, ‘Pre vs Post’ was 4.8 % less than ‘Self Similarity’ (range 0.2%-14%) while ‘Cross Similarity’ was 40% less than ‘Self Similarity’ (range 20.2%-56.3%). These measures confirm the qualitative impression from Figure 2- figure supplement 1 that for most birds and syllables there were no greater changes to syllable structure following lesions than was present across control recordings, and that pre-post similarity remained higher than cross-similarity, i.e. syllables remained clearly identifiable.

      Author response image 2.

      (A) CV of fundamental frequency (FF) of select syllables before and aqer mMAN lesions. In the Bengalese finch and zebra finch, lesions of lMAN, which sits immediately lateral to mMAN, cause a consistent reduction in the coefficient of variation (CV) of fundamental frequency across repeated renditions of a given syllable (Sakata, Hampton, Brainard 2008, Andalman, Fee 2009, Warren et al. 2011). We therefore supposed that unintended damage to lMAN or its projections to RA might have resulted in a reduction in the CV of syllables following mMAN lesions. Instead we saw a modest increase in the CV of fundamental frequency (p<0.05, Wilcoxon sign rank test; mean across birds of +20%; range -19 to +43%). These data suggest that it is unlikely that changes to syllable structure might have arisen due to accidental damage to lMAN. (B) Percent change in mean fundamental frequency aqer mMAN lesions vs mean fundamental frequency before mMAN lesions.

      Author response image 3.

      Selected acoustic features for all syllables in all birds before and after mMAN lesions. Different colors represent different syllable types per bird. ‘entS’ = Entropy of spectral envelope, ‘meanT’ = Temporal centroid for temporal envelope, ‘F1’ = First formant, ‘F2’= Second formant, ‘F3’ = Third formant. Acoustic features generally showed little change between pre and post lesion songs. They highlight as relative outliers the same individual examples that stand out in the average spectrograms in Figure 2 – figure supplement 1.

      (2) Shoring up claims of increased transitional variability

      Line 301 & Sup. Figure 1, in several birds (1, 2, 5, 6), seems that there is a downward trend for postlesion, i.e. the transition entropy gradually decreases with time. How to exclude the possibility that the increased variability is a transient effect, e.g. caused by surgery side effects or destabilization of circuits, which may eventually recover to normal?

      Transition entropy remains elevated for as long as the birds were followed in this study. While the persistence of the effects we observed is longer than transient effects such as those following Nif lesion in zebra finches (Otchy et al., 2015 ~2 days), we cannot rule out either recovery or further deterioration following lesions on much longer time scales, such as those reported by Kubikova et al., 2007 (X lesions, 6 months). We have now added data points for 4 birds where we had songs from later timepoints following lesions; for three of these birds, transition entropy remained elevated above the baseline values for 14 and 33 days, respectively (Figure 1 - figure supplement 2).

      Line 313 & Sup. Figure 4, the claim that "transitions that had low history dependence tended to show larger changes after mMAN lesions" needs better statistical support, because in Sup. Figure 4, the correlation is not significant.

      We apologize for the phrasing. We have changed the sentence to: “Consistent with the first possibility, we observed that there was a nonsignificant trend toward larger changes after mMAN lesion for transitions with low history dependence.”

      Figure 4C-D, only data from 5 out of 7 birds was included, did the other two birds not have repeats? If so, the authors need to be explicit on data exclusion.

      The reviewer’s inference is correct that in our dataset only 5 out of 7 birds had songs which contained repeat phrases. We have added the following sentence to state that explicitly: “In our dataset of 7 birds, only 5 birds had songs which contained repeat phrases.”

      Minor comments

      Sup. Figure 3, to help readers understand, 1) add symbols and arrows to point to the structures; 2) indicate the orientation of the slide, e.g. which direction is medial/lateral; 3) a negative control without lesion needs to be shown for comparison.

      We have made the suggested changes and updated new Figure 1- figure supplement 1.

      Author response image 4.

      Image of calcitonin gene-related peptide (CGRP)-stained frontal section (leq) control and (right) bird 5. CGRP labels cells in both lMAN (seen in black to the leq of the lesion) and mMAN (blue, intact; red, completely destroyed).

      A statistical test is needed for Sup. Figure 5B.

      We have modified the Figure legend for Figure 3 – figure supplement 1 as follows:

      “Change in transition entropy was not significantly different for transitions within chunks and at branchpoints (p> 0.05, Wilcoxon rank sum test)”

      Line 363, these can be moved to the Introduction, so readers have a better sense of what's already known about MMAN lesion.

      We have moved the sentence to Introduction.

      Fig 1e. RA also projects to DLM.

      Our intention was to focus on the connections involving mMAN; we have now added the connection in Figure 1E.

      Reviewer #2 (Recommenda1ons For The Authors):

      Please address this issue in the discussion (no new experiments required): It would be interesting to consider how social context modulates the variability of the song. In these experiments, Bengalese finches were singing in isolation. How might changes in syntax be modulated by the presence of a female in directed song and in other social contexts?

      Thank you for your suggestion. One study by Jarvis, et al., (Jarvis E., et al., 1998) shows that ZENK expression in mMAN aqer singing does not differ between female-directed singing, undirected singing and singing in presence of a male conspecific. This suggests that activity in mMAN might not be modulated by social context. But we agree that it would be interesting to test how a change in social context (which typically leads to reduced transition entropy) interacts with the increased variability we see aqer mMAN lesions. We have added the following sentences to the discussion:

      “In our study, we only recorded song sequencing of male Bengalese finches singing in isolaBon. Social context, such as female-directed song, can also change song sequencing (Hampton, Sakata and Brainard, 2009; Chen, Matheson and Sakata, 2016). It would be interesBng to test whether mMAN plays a role in the social context-modulated changes in sequencing (Jarvis et al., 1998), similar to how lMAN contributes to social context-modulated changes in syllable structure (Sakata, Hampton and Brainard, 2008).”

    1. Author Response

      The following is the authors’ response to the original reviews.

      REVIEWER 1

      The claim that olivooid-type feeding was most likely a prerequisite transitional form to jet-propelled swimming needs much more support or needs to be tailored to olivooids. This suggests that such behavior is absent (or must be convergent) before olivooids, which is at odds with the increasing quantities of pelagic life (whose modes of swimming are admittedly unconstrained) documented from Cambrian and Neoproterozoic deposits. Even among just medusozoans, ancestral state reconstruction suggests that they would have been swimming during the Neoproterozoic (Kayal et al., 2018; BMC Evolutionary Biology) with no knowledge of the mechanics due to absent preservation.

      Thanks for your suggestions. Yes, we agree with you that the ancestral swimming medusae may appear before the early Cambrian, even at the Neoproterozoic deposits. However, discussions on the affinities of Ediacaran cnidarians are severely limited because of the lack of information concerning their soft anatomy. So, it is hard to detect the mechanics due to absent preservation. Olivooids found from the basal Cambrian Kuanchuanpu Formation can be reasonably considered as cnidarians based on their radial symmetry, external features, and especially the internal anatomies (Bengtson and Yue 1997; Dong et al. 2013; 2016; Han et al. 2013; 2016; Liu et al. 2014; Wang et al. 2017; 2020; 2022). The valid simulation experiment here was based on the soft tissue preserved in olivooids.

      While the lack of ambient flow made these simulations computationally easier, these organisms likely did not live in stagnant waters even within the benthic boundary layer. The absence of ambient unidirectional laminar current or oscillating current (such as would be found naturally) biases the results.

      Many thanks for your suggestion concerning the lack of ambient flow in the simulations. We revised the section “Perspectives for future work and improvements” (lines 381-392 in our revised version of manuscript). Conducting the simulations without ambient flow can reduce the computational cost and, of course, making the simulation easier, while adding ambient flow can lead to poorer convergency and more technical issues. Meanwhile, we strongly agreed that these (benthic) organisms did not live in stagnant waters, as discussed in Liu et al. 2022. However, reducing computational complexity is not the main reason that the ambient flow was not incorporated in the simulations. As we discussed in section “Perspectives for future work and improvements”, our work focuses on the theoretical effect caused by the dynamics (based on fossil observation and hypothesis) of polyp on ambient environment (i.e., how fast the organism inhales water from ambient environment) rather than effect caused by ambient flow on organism (e.g., drag forces), which was what previous palaeontological CFD simulations mainly focused based on fossil morphology and hydrodynamics. To this end, we mainly concern the flow velocity above or near peridermal aperture (and vorticity computed in this paper) generated only by polyp’s dynamics itself without the interference of ambient flow (as many CFD simulations for modern jellyfish, i.e., McHenry & Jed 2003; Gemmell et al. 2013; Sahin et al. 2009. All those simulations were conducted under hydrostatic conditions). Adding ambient flow to our simulations “biases” the flow velocity profiles we expect to obtain in this case.

      Nevertheless, we do agree that the ambient unidirectional laminar current or oscillating current plays an important role in feeding and respiration behavior of Quadrapyrgites. Further investigations need to be realized by designing a set of new insightful simulations and is beyond the scope of this work. We conducted CFD simulations incorporated with a randomly generated surface that imitated uneven seabed, where unidirectional laminar current and oscillating current (or vortex) were formed and exerted on Quadrapyrgites located in different places on the surface (Zhang et al. 2022). We assumed that combining the method we used in Zhang et al. 2022 and the velocity profiles collected in this work to conduct new simulations may be a promising way to further investigate the effect of the ambient current on organisms’ active feeding behavior.

      There is no explanation for how this work could be a breakthrough in simulation gregarious feeding as is stated in the manuscript.

      Thanks for your suggestion. We revised the section “Perspectives for future work and improvements” (lines 396-404 in our revised version of manuscript).

      Conducting simulations of gregarious active feeding behavior generally need to model multi (or clustered) organisms, which is beyond the present computational capability. However, exploiting the simulation result and thus building a simplified model can be possible to realize that, as we may apply an inlet or outlet boundary condition to the peridermal aperture of Quadrapyrgites with corresponding exhale or inhale flow velocity profiles collected in this work. By doing this we can obtain a simplified version of an active feeding Quadrapyrgites model without using computational expensive moving mesh feature. Such a model can be used solely or in cluster to investigate gregarious feeding behavior incorporated with ambient current. Those above are explicit explanations for how this work could be a “breakthrough” in simulation gregarious feeding. However, we modified the corresponding description in section “Perspectives for future work and improvements” to make it more appropriate.

      Throughout the manuscript there are portions that are difficult to digest due to grammar, which I suspect is due to being written in a second language. This is particularly problematic when the reader is attempting to understand if the authors are stating an idea is well documented versus throwing out hypotheses/interpretations.

      Thanks. Our manuscript was checked and corrected by a native speaker of English again.

      Line-by-line:

      L023: "Although fossil evidence suggests..."

      L026: "demonstrated" instead of "proven"

      We corrected them accordingly.

      L030: "The hydrostatic simulations show that the..." Maybe I'm confused by the wording, but shouldn't this be the case since it's a set part of the model?

      As is demonstrated in our manuscript, all the simulations were conducted under “hydrostatic” environment. We originally intend to use the description “hydrostatic” here to emphasize the simulation condition we set in our work. However, it can literally lead to misunderstanding that some of the simulations we conducted are “hydrostatic” while the others are not. To this end, deleting the word “hydrostatic” here (line 30) may be appropriate to eliminate confusion.

      L058: "lacking soft tissue" Haootia preservation suggests it is soft tissue (Liu et al., 2014), unless the preceding sentence is not including Haootia, in which case this section is confusingly worded

      Thank you. We deleted the sentence “However, their affinities are not without controversy as the lacking soft tissue.”

      L085: change "proxy"

      Yes, we changed to “Considering their polypoid shape and cubomedusa-type anatomy, the hatched olivooids appear to a type of periderm-bearing polyp-shaped medusa (Wang et al. 2020) (lines 86-88).”

      L092: "assist in feeding" has this been stated before? Citation needed, else this interpretation should primarily be in the discussion

      Yes, you are right. We cited the reference at the end of the mentioned sentence (lines 91-94).

      L095: Remove "It is suggested that"

      Thanks for your suggestions. We corrected it.

      L100: "Probably the..." here to the end belongs in the discussion and not introduction.

      Thanks for your suggestions. We corrected the sentences.

      L108: "an abapical"

      Thanks for your suggestions. We revised it in line 107.

      L112: "for some distance" be specific or remove

      Yes, we deleted “for some distance” in line 111.

      L133: I can't find a corresponding article to Zhang et al., 2022. Is this the correct reference?

      The article Zhang et al. 2022 (entitled “Effect of boundary layer on simulation of microbenthic fossils in coastal and shallow seas”.) was in press at the time when we first submitted this manuscript. We complemented the corresponding term in References with the doi (10.13745/j.esf.sf.2023.5.32), which may help readers to locate this article easier.

      L138: You can't be positive that your simulations "provide a good reproduction of the movement." You have attempted to reconstruct said movement, but the language here is overly firm - as is "pave a new way"

      Thanks for your suggestions. We corrected the corresponding description (lines 138-140) to make it more rigorous.

      L149: "No significant change" implies statistics were computed that are not presented here.

      The statistics were computed by using built-in function of Excel and presented in Table supplement 2 (deposited in figshare, https://doi.org/10.6084/m9.figshare.23282627.v2) rather than in manuscript. To be specific, the error computations are followed by the formula of relative error, which is defined by:

      where u_z denotes the velocity profile collected on each cut point z with the current mesh parameters, u_z^* denotes the velocity profile collected on each cut point z with the next finer mesh parameters, i denotes each time step (from 0.01 to 4.0). In this case, the total average error was computed by averaging the sum of each 〖error〗_i on corresponding time step. The results are red marked in Table supplement 2. We revised the corresponding description in lines 140-146

      L152: "line graphs" >> "profiles"

      Thanks for your suggestions. We corrected it in line 144.

      L159: remove "significant" unless statistics are being reported, in which case those need to be explained in detail.

      Thanks for your suggestions. We removed "significant" and corrected the corresponding sentences in lines 150-153 to make them more rigorous.

      L159: I would recommend including a supplemental somewhere that shows how tall the modeled Quadrapyrgites is and where the cut lines exist above it.

      Many thanks for your suggestions. Corresponding complementation was made in the last paragraph of section “Computational fluid dynamics” (line 455 and line 535). We agree that it is appropriate to elucidate the height of modeled Quadrapyrgites and the position of each cut point. Hence, we add a supplementary figure (entitled Figure supplement 1) to illustrate those above.

      L183: "The maximum vorticity magnitude was set..." I do not follow what this threshold is based on the current phrasing.

      The vorticity magnitude mentioned here is the visualisation range of the color scalebar, which can be set manually set in the software. The positive number represent the vortex rotated counterclockwise, while the negative number represent that rotated clockwise on the cut plane. In this case, the visualisation range is [-0.001,0.001] (i.e., the absolute value of 0.001 is the threshold), as the color scalebar in Figure 7. Decreasing the threshold, for example, setting the visualisation range to [-0.0001,0.0001], can capture smaller vorticity on the cut plane, as the figure below on the left. Otherwise, setting the range to [-0.01,0.01] will focus on bigger vorticity, as the figure below on the right. We found [-0.001,0.001] could be an appropriate parameter to visualize the vortex near periderm based on our trial. To be more rigorous and to avoid confusion, we modified the description in the corresponding place of the manuscript (lines 172-174).

      Author response image 1.

      L201: "3.9-4 s"

      Thanks, we corrected it in line 191.

      L269: "Sahin et al.,..." add to the next paragraph

      Yes, we rearranged the corresponding two paragraphs (lines 258-289).

      L344: "Higher expansion-contraction..." this needs references and/or more justification.

      Thanks. We deleted the sentence.

      L446: two layers of hexahedral elements is a very low number for meshing boundary layer flow

      Many thanks for your question. We agree that an appropriate hexahedral elements mesh for boundary layer is essential to recover boundary flow, especially in cases where turbulence model incorporated with wall function is adopted such as the standard k-epsilon model. In this case, the boundary flow is not the main point since the velocity profile was collected above periderm aperture rather than near no-slip wall region. What else, we do not need drag (related to sheer stress and pressure difference) computations in this case, which requires a more accurate flow velocity reconstruction near no-slip walls as what previous palaeontological CFD simulations have done. Thus, we think two layers of hexahedral elements are enough. What else, hexahedral elements added to periderm aperture domain, as illustrated in figure below, can let the velocity near wall vary smoothly and thus can benefit the convergency of simulations.

      Author response image 2.

      L449: similar to comments regarding lines 146-148, key information is missing here. Figure 3C appears to be COMSOL's default meshing routine. While it is true that the domain is discretized in a non-uniform manner, no information is provided as to what mesh parameters were "tuned" to determine "optimal settings" or what those settings are (or how they are optimal).

      Many thanks for your question. Specific mesh parameters were listed in Table supplement 3 and corresponding descriptions and modifications were made both in lines 475-479 and lines 542-549. In most CFD cases, the mesh parameters need to be tuned to ensure a balance between computational cost and accuracy. If the difference of the result obtained from present mesh and that obtained from the next finer mesh ranges from 5% -10%, the present mesh is expected to be “optimal”. To achieve this, we prescribed several sets of different mesh (mainly concerning maximum and minimum element size) to each subdomain (domain of the inner cavity, domain of the peridermal aperture and domain outside of fossil model) of the whole computational domain in the test model. Subsequently, we refined the mesh step by step as much as possible and adjust the element size of subdomains to find suitable mesh parameters, that is how the mesh parameters were "tuned". We agree that we should explicit what mesh parameters were tuned and what those settings are.

      Figure 7 should have the timesteps included and the scaling of the arrows should be explicit in the caption

      Many thanks for your suggestions. We intended to use the white arrows to represent the velocity orientation rather than true velocity scale in Figure 7 (Instead, the white arrows in Animation supplement 1 represent a normalized velocity profile). To avoid confusion, we revised Figure 7 with timesteps and arrows represent a normalized velocity profile, making it consistent with Animation supplement 1. Corresponding modification is also made in the caption of Figure 7.

      The COMSOL simulation files (raw data) are missing from the supplemental data. These should be posted to Dryad or here.

      We uploaded the files to Dryad (https://datadryad.org/stash/share/QGDSqLh8HOll7ofl6JWVrqM57Rp62ZPjvZU0AQQHwTY), and added the corresponding link to section “Data Availability Statement”.

      REVIEWER 2

      Lines 319-334: The omission in this paragraph of Paraconularia ediacara Leme, Van Iten and Simoes (2022) from the terminal Ediacaran of Brazil is a serious matter, as (1) the medusozoan affinities of this fossil are every bit as well established as those of anabaritids, Sphenothallus, Cambrorhytium and Byronia, and (2) P. ediacara was a large (centimetric) polyp, the presence of which in Precambrian times is thus a problem for the simple evolutionary scenario (very small polyps followed later in evolutionary history by large polyps) outlined in the paragraph. Thus, Paraconularia ediacara must be mentioned in this paper, both in connection with the early evolution of size in cnidarian polyps and in other places where the early evolution of cnidarians is discussed.

      Thanks for your important suggestions. We added some sentences in lines 323-326 as following: “Significantly, the large-bodied, skeletonized conulariids-like Paraconularia found from the terminal Ediacaran Tamengo Formation of Brazil confirmed their ancient predators like the extant medusozoans and suggested the origin of cnidarians even farther into the deep evolutionary scenario (Leme et al. 2022).”

      Line 23. Delete the word, been.

      Line 25. Replace conjecture with conjectural.

      Line 26. Delete the word, the before calyx-like.

      Line 32. Replace consisting with consistent.

      Thanks for your suggestions. We all corrected them.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public reviews

      Reviewer 1 (Public Review):

      Summary:

      The authors set out to clarify the molecular mechanism of endocytosis (re-uptake) of synaptic vesicle (SV) membrane in the presynaptic terminal following release. They have examined the role of presynaptic actin, and of the actin regulatory proteins diaphanous-related formins (mDia1/3), and Rho and Rac GTPases in controlling the endocytosis. They successfully show that presynaptic membrane-associated actin is required for normal SV endocytosis in the presynaptic terminal and that the rate of endocytosis is increased by activation of mDia1/3. They show that RhoA activity and Rac1 activity act in a partially redundant and synergistic fashion together with mDia1/3 to regulate the rate of SV endocytosis. The work adds substantially to our understanding of the molecular mechanisms of SV endocytosis in the presynaptic terminal.

      Strengths:

      The authors use state-of-the-art optical recording of presynaptic endocytosis in primary hippocampal neurons, combined with well-executed genetic and pharmacological perturbations to document effects of alteration of actin polymerization on the rate of SV endocytosis. They show that removal of the short amino-terminal portion of mDia1 that associates with the membrane interrupts the association of mDia1 with membrane actin in the presynaptic terminal. They then use a wide variety of controlled perturbations, including genetic modification of the amount of mDia1/3 by knock-down and knockout, combined with inhibition of activity of RhoA and Rac1 by pharmacological agents, to document the quantitative importance of each agent and their synergistic relationship in regulation of endocytosis.<br /> The analysis is augmented by ultrastructural analyses that demonstrate the quantitative changes in numbers of synaptic vesicles and in uncoated membrane invaginations that are predicted by the optical recordings.

      The manuscript is well-written and the data are clearly explained. Statistical analysis of the data is strengthened by the very large number of data points analyzed for each experiment.

      Weaknesses:

      There are no major weaknesses. The optical images as first presented are small and it is recommended that the authors provide larger, higher-resolution images.

      Response: We thank the referee for these highly positive remarks. In response, we now provide larger, high-resolution images as requested.

      Reviewer 2 (Public Review):

      Summary:

      This manuscript expands on previous work from the Haucke group which demonstrated the role of formins in synaptic vesicle endocytosis. The techniques used to address the research question are state-of-the-art. As stated above there is a significant advance in knowledge, with particular respect to Rho/Rac signalling.

      Strengths:

      The major strength of the work was to reveal new information regarding the control of both presynaptic actin dynamics and synaptic vesicle endocytosis via Rho/Rac cascades. In addition, there was further mechanistic insight regarding the specific function of mDia1/3. The methods used were state-of-theart.

      Weaknesses:

      There are a number of instances where the conclusions drawn are not supported by the submitted data, or further work is required to confirm these conclusions.

      Response: We thank the referee for his/ her thorough reading of the manuscript and the thoughtful comments and questions. We have conducted additional experiments and made textual change to our manuscript to address these points and to further strengthen the conclusions as detailed in our response to the recommendations for authors.

      Recommendations for the authors

      Reviewer 1 (Recommendations For The Authors):

      Most of the figures contain images that are too small to be easily interpreted because the resolution is degraded when they are enlarged in the PDF file. The authors should redesign the figures so that the letters marking each panel are smaller, and the size of each data panel is much larger (at least twice as large with increased resolution). There is, at present, a great deal of white space in most of the figures that should be reduced to make room for larger, higher-resolution images. Larger fonts should be used for annotations of the images so that they are easier to read. The data appears to be very high quality, but it is presented at a size and resolution that don't do it justice.

      Response: We thank the referee for his/ her helpful comments. In response to the referee’s comment, we have carefully re-arranged all figures and now provide larger, high-resolution images.

      Reviewer 2 (Recommendations For The Authors):

      Major points

      (1) Figure 1 - While there is a rationale for employing a cocktail of drugs to interfere with actin dynamics, it would be highly informative to determine the effect of these modulators in isolation. This is important, since in their previous publication (Soykan et al Neuron 2017 93:854) the authors demonstrated that latrunculin had no effect, while jasplakinolide accelerated endocytosis of originating purely from Y-27362 and ROCK kinase inhibition, rather than destabilisation/stabilisation of actin. It will be key to dissect this by examining the effect on endocytosis of both 1) a cocktail of latrunculin/jasplakinolide and 2) Y-27362 alone.

      Response: We thank the referee for highlighting this interesting point. We have now experimentally addressed the effect of latrunculin (L), jasplakinolide (J) and the ROCK inhibitor Y-27362 (Y) either alone or in combination on the kinetics of synaptic vesicle (SV) endocytosis(new Fig. 1-Supplement 1C,D). We now demonstrate that application of the ROCK inhibitor Y-27362 or the combination of latrunculin (L) and jasplakinolide (J) have no effect on Syph-pH endocytosis. Combined use of jasplakinolide (J) and the ROCK inhibitor Y-27362 (Y) has a small phenotype. In contrast, a mix of all three inhibitors (JYL) potently impairs endocytosis kinetics at hippocampal synapses. These data demonstrate that actin dynamics are required for SV endocytosis, while ROCK inhibition alone does not appear to impair endocytosis kinetics. We note that our data are in line with a study by Ann Saal et al (2020) who reported a lack of effect of ROCK inhibition on the kinetics of Synaptotagmin1-CypHer retrieval.

      (2) Figure 1 - There are clear effects on the retrieval of pHluorin reporters and also endogenous vGAT in the presence of disruptors of actin function. However, there was no assessment of the impact of these interventions on either neurotransmitter release or SV fusion (with the exception of 1 condition with one stimulus train (Fig S1D), and the effect of Rac modulation in Fig S6F). As quoted by the authors, previous studies using knockout of beta- or gamma-actin have shown a profound effect on these parameters in hippocampal neurons, which has the potential to impact the speed and extent of compensatory endocytosis. The authors will already have this data from the use of the two reporters (pHluorn and GAT-cypHer), and it is important to include this to allow interpretation of the effect on endocytosis observed.

      Response: We agree with the referee that this is an important point that we have tackled experimentally using vGAT-CypHer and synapto-pHluorin responses as measures. In the new Fig. 1-Supplement 1, Fig. 5- Supplement 1, and Fig.6 -Supplement 1 of our revised manuscript, we show that SV exocytosis is largely unaffected by any of the applied manipulations of actin function.<br /> Specifically, we have added surface normalized data as a surrogate measure for exocytosis for the following:

      • JLY treatment monitored by Syph-pH (Figure 1-Supplement 1A) and vGAT-CypHer (Figure 1-Supplement 1B),

      • shCTR/shmDia1 (transfected) assayed via Syph-pH (Figure 1-Supplement 1G),

      • shCTR/shmDia1/shmDia1+3 assayed via vGLUT1-pH (40AP: Figure 1-Supplement 1J; 80AP: Figure 1-Supplement 1L),

      • shCTR/shmDia1+3 (transduced) assayed by vGAT-CypHer (Figure 1-Supplement 1M),

      • IMM treatment monitored by vGLUT1-pH (Figure 1-Supplement 1O),

      • RhoA/B WT/DN overexpression monitored by Syph-pH (Figure 5-Supplement 1B),

      • shCTR/shRhoA+B (transfected) monitored via Syph-pH (Figure 5-Supplement 1D),

      • shCTR/shmDia1+3 +/- EHT 1864 (Rac Inhibitor) assayed by vGAT-CypHer (Figure 6-Supplement 1D),

      • shCTR/shmDia1+3 +/- Rac1-CA/DN assayed by Syph-pH (Figure 6-Supplement 1F).

      The lack of effect of these manipulations on exocytic SV fusion is thus distinct from the effects of complete abrogation of actin expression in beta- or gamma-actin knockout studies reported by the LingGang Wu laboratory (Neuron 2016) as the referee also noted.

      (3) Figure 3H, 3K, 4C, 4F - It is unclear how the values on the Y-axis were calculated. Regardless, to confirm that there is a specific increase in presynaptic mDia1/actin, the equivalent values for Homer/mDia1 should be presented (with Basson/Homer as a negative control). Without this, it is difficult to argue for a specific enrichment of mDia1/actin at the presynapse. The CRISPR experiments help with this interpretation (Fig 4G-I), however, inclusion of the Homer/mDia1 STED data would strengthen it greatly.

      Response: We apologize if the description has been unclear. We essentially have followed the same type of analysis as recently described by Bolz et al (2023). In brief, the rationale for quantifying presynaptic protein levels of interests is as follows: The presynaptic area was defined by the normalized distribution curve of Bassoon, i.e. area between 151.37 and -37.84 nm as marked by purple shading with a cutoff set where Bassoon and Homer1 distributions overlap (-37.84 nm) as shown in Figure 3Supplement 1H (pasted below). The individual synaptic line profiles, e.g. of mDia1 were integrated to yield presynaptic (between 151.37 and -37.84 nm (purple in the graph) vs. postsynaptic levels (from - 56.76 to -245.97 nm (green shaded area). new Figure 3-Supplement 1H-J

      Author response image 1.

      Based on this analysis postsynaptic mDia1 levels were also elevated upon Dynasore treatment (new Figure 3-Supplement 1I). In spite of this and consistent with the fact that the majority of mDia1 is localized at the presynapse, we found that postsynaptic F-actin levels were unchanged in mDia1/3depleted neurons (p = 0.0966; One sample t-test) (new Figure 4-Supplement 1E,F). new Figure 4 – Supplement 1E,F

      Author response image 2.

      Moreover, we also conducted further analysis with respect to possible effects of Dynasore on synaptic architecture in general. Neither presynaptic Bassoon nor postsynaptic Homer1 levels were significantly altered by Dynasore treatment (new Figure 3–Supplement 1J).

      (4) Figure 4J - The rescue of the pHlourin response by jasplakinolide is difficult to interpret when considering previous work from the same authors. In their 2017 publication (Soykan et al Neuron 2017 93:854), they revealed that the drug accelerated the pHluorin response, whereas now they demonstrate no effect in the control condition. If the drug does accelerate endocytosis, then it may be working via a different mechanism to restore endocytosis in mDia1/3 knockdown neurons.

      Response: The referee is correct. The very mild acceleration of endocytosis in the presence of jasplakinolide can be observed using synaptophysin-pHluorin as a reporter under moderate mediumfrequency stimulation at 10Hz for 5 s (i.e. 50 APs). In the present dataset using a different pHluorin reporter (i.e. vGLUT1-pHluorin) that tends to yield faster endocytic responses (as noted before by the Ryan lab) and using a high frequency stimulus (20Hz) we fail to observe a significant effect. While this cannot be excluded, we would be reluctant to conclude that these differences indicate distinct mechanisms of jasplakinolide action. Alternatively, actin may be of particular importance under conditions of high-frequency stimulation.

      In this regard, the conclusions from the pHluorin experiment would be greatly strengthened by demonstrating that jasplakinolide corrects the reduction of presynaptic actin in mDia1/3 knockdown synapses observed in figures 4E-I.

      Response: As demonstrated in Figure 4-Supplement 1G and in support of a common mechanism of action, we find that application of jasplakinolide rescues reduced presynaptic actin levels in mDia1/3depleted neurons. The respective data for presynaptic actin (normalized to shCTR + DMSO set to 100) are: shCTR + DMSO = 100 ± 6.3; shmDia1+3 + DMSO = 47.7 ± 4.3; shCTR + Jasp = 150.6 ± 11.9; shmDia1+3 + Jasp = 94.3 ± 11.5. These data are now also quoted in the revised manuscript text.

      Minor points

      (1) There is no rationale provided regarding why different stimulation protocols are sometimes used in the pHluorin/cypHer experiments. In most cases it is 200 APs (40 Hz), however, in some cases, it is 40 APs or 80 APs. Can the authors explain why they used these different protocols?

      Response: The referee noted this correctly. This in part reflects the history of the project, in which initial datasets were acquired using 200 AP trains using pHluorin reporters. To probe whether the phenotypic effects induced by actin perturbations, were robust over different stimulation paradigms and optical reporters, additional data using either 40 or 80 AP trains as well as experiments capitalizing on vGLUT1 or endogenous vGAT monitiored by pH-sensitive cypHer-labeled antibodies were conducted. We hope the referee agrees that these additional data add to the general importance of our study.

      (2) Figure 2 - The reduction in SV density in mDia1/3 knockdown neurons correlates with the results in Figures 1 and 7. However, a functional consequence of this reduction (change in size of RRP or neurotransmitter release, as stated above) would have increased the impact of these experiments.

      Response: We agree with the referee and will address this interesting possibility using electrophysiolgical recordings in future studies.

      (3) It appears the experimental n in Figure 2 is profiles, rather than experiments. This should be clarified, especially since there is no reference to how many times the experiments in Fig2E-G were performed.

      Response: This point has been clarified in the revised figure legend.

      (4) Figure 6 - The authors state that inhibition of Rac function either via a dominant negative mutant or an inhibitor increases the inhibition of endocytosis via knockdown of mDia1/3. However, both interventions inhibit endocytosis themselves in the control condition. It would be informative to see the full statistical analysis of this data since there does not appear to be a significant additive effect when comparing Rac inhibition with the additional knockdown of mDia1/3.

      Response: In our revised manuscript, we now provide the full statistical analysis in the revised Source Data Table for Figures 6G,H. We observe that Rac1-DN expression indeed further aggravates phenotypes elicited by depletion of mDia1+3, but not vice versa. We have modified the corresponding section in the results section of our revised manuscript accordingly.

      (5) Figure 7 - The increase in endosomes in mDia1/3 knockdown neurons is consistent with previous studies examining pharmacological inhibition of formins (Soykan et al Neuron 2017 93:854). However, it is noted that these structures were absent in the images shown in Figure 2. Similar to the previous point in figure 6, a full reporting of the significance of different conditions is important here, since it appears that the only difference between EHT1864 and its co-incubation with mDia1/3 knockdown neurons is in the number of ELVs (Fig 7H).

      Response: Similar to the example EM images shown in Figure 7, enlarged endocytic structures are also observed in shmDia1+3 depleted synapses shown in Figure 2. However, ELVs and membrane invaginations were not color-coded as the focus in figure 2 is on the reduction of the SV pool. To better illustrate this, we have chosen a more representative example of this phenotype in revised Figure 2.

      Moreover, we now provide the full statistical analysis of EM phenotypes in the revised Source Data Table for Figure 7. We find that Rac1 inhibition indeed significantly aggravates the effects of mDia1+3 loss with respect to the accumulation of membrane invaginations, while the effect on ELVs remains insignificant. However, accumulation of ELVs in the presence of the Rac1 inhibitor EHT1864 is further aggravated upon depletion of mDia1+3. We have modified the corresponding section in the results section of our revised manuscript accordingly.

      We speculate that Rac1 may thus predominantly act at the plasma membrane, whereas mDia1/3 may serve additional functions in SV reformation at the level of ELVs. Clearly, further studies would be needed to test this idea in the future.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We have made substantial revisions to the manuscript, incorporating new data, which led to a renumbering and relabeling of several figures: • Figure 3F now features a modified graph color.

      • Figure 4I introduces a new experiment.

      • What was previously labeled as Figure 4I-O is now Figure 4J-P.

      • Figure 5H presents another new experiment.

      • The earlier Figure 5H is now rebranded as Figure 5I.

      • A fresh experiment has been incorporated into Supplement Figure 1a.

      • The former Supplement Figure 1a is now Supplement Figure 1b.

      • Supplement Figure 2d describes an additional new experiment.

      • In accordance with the HUGO gene nomenclature committee (HGNC) recommendations, we've updated the names of genes/proteins in both figures and their accompanying legends.

      Reviewer #1 (Recommendations For The Authors):

      Comment #1. Standard practice would include multiple TNBC cell lines to test the author's hypotheses, but the authors rely only on one cell line in the entire paper, MDA-MB-231 cells. The authors do correlate their findings to patient data, but the inclusion of an additional TNBC cell line would strengthen their findings about the L-DOXR cells and help with the assessment as to how reproducible their original microfluidics system is.

      Response: Thank you for your valuable feedback. We recognize the importance of utilizing multiple TNBC cell lines for rigorous validation and reproducibility. There are several reports highlighting the generation of L-DOXR cells in other types of breast cancer cell lines, such as MCF-7 (Fei et al., 2015), and in other cancer types like the prostate cancer cell line PC-3. These studies utilized a microfluidic device with a concentration gradient of Doxorubicin. With this existing evidence, we are confident that a variety of cancer cell types have the potential to form L-DOXR cells in a doxorubicin gradient. The cited reports support our choice of the MDA-MB-231 cell line for our current study:

      “L-DOXR cells exhibit increased genomic content (4N+) as compared to WT cells. The presence of cells with increased nuclear size and increased genomic content has been demonstrated to be associated with poor clinical outcomes in several types of cancers (Alharbi et al., 2018; Amend et al., 2019; Fei et al., 2015; Imai et al., 1999; Liu et al., 2018; Lv et al., 2014; Mukherjee et al., 2022; O’connor et al., 2002; Saini et al., 2022; Trabzonlu et al., 2023). (Page 5, Line 24)”

      However, we acknowledge the validity of your point regarding the strengthening of our findings with the inclusion of additional TNBC cell lines. We are considering expanding our research in future studies to further validate our findings across multiple TNBC cell lines. Thank you for bringing this to our attention, and we hope our response adequately addresses your concerns.

      Comment #2. It would be helpful to comment on the frequency at which doxorubicin is used clinically to treat TNBC patients. The authors equate their resistance phenotype to all chemotherapies (in patient data and title) but only test doxorubicin. Does NUPR1 overexpression result in resistance to other chemotherapies?

      Response: Thank you for raising these pertinent questions. To address your first point regarding the clinical use of doxorubicin for TNBC patients: At the Samsung Medical Center, the typical chemotherapy regimen for TNBC patients involves administering Neo. AC (Doxorubicin 34 mg + Cyclophosphamide 840 mg per session) four times, followed by Adj. D (Docetaxel 25 mg + 80 mg per session) for another four sessions. This provides insight into the clinical relevance and frequency of Doxorubicin's use in treating TNBC.

      Regarding your second point about NUPR1 overexpression and its broader implications for chemotherapy resistance: Yes, NUPR1 overexpression has been documented to result in resistance to various chemotherapies. A study by Lei Jiang et al. in the Journal of Pharmacy and Pharmacology found that NUPR1 plays a role in YAP-mediated gastric cancer malignancy and drug resistance through the activation of AKT and p21 (Jiang et al., 2021, https://doi.org/10.1093/jpp/rgab010). Additionally, another study by Wang et al. in Cell Death and Disease observed that the transcriptional coregulator NUPR1 is linked to tamoxifen resistance in breast cancer cells (Wang et al., 2021, https://doi.org/10.1038/s41419-021-03442-z). In light of this, while our study primarily focused on doxorubicin, the role of NUPR1 in resistance spans across various chemotherapeutic agents, adding depth to our findings and their broader implications in cancer therapy.

      Comment #3. The authors knockdown NUPR1 in L-DOXR cells, but overexpression of NUPR1 in WT TNBC cells to see if this renders the WT cells more resistant would be an important experiment.

      Response: We appreciate the reviewer's suggestion, which indeed underscores an important aspect of our study. In response, we have incorporated additional experiments in the revised manuscript. Specifically, on page 7 (lines 7-8) and in Supplement Figure 2c, we present data from experiments where we overexpressed Nupr1 in WT-MDA-MB231 cells. Our findings revealed that overexpression of GST-Nupr1 not only attenuates Dox-induced cell death but also mildly enhances cell viability in WT cells even without DOX treatment. This implies that cells expressing Nupr1 exhibit resistance to the cytotoxic effects of DOX. We believe these new data further solidify our conclusions and address the valuable point you raised.

      Comment #4. The similar colors/symbols chosen for the different groups in the xenograft plots are hard to easily interpret without zooming in.

      Response: We modified the xenograft plots as you recommended in Figure 3F.

      Comment #5. There are some grammatical errors throughout the paper. Below is an example: In the opening of the Discussion "TNBC is the most aggressive subtype of breast cancer, and chemotherapy is a mainstay of treatment. However, chemoresistance is common and contributes to the long-term survival of TNBC patients" - this sentence makes it seem like chemoresistance makes TNBC patients survive longer. The following sentence "These cells demonstrated a large phenotype with increased genomic content." is abrupt and doesn't make sense. Consider carefully re-reading the manuscript for grammatical errors.

      Response: Thank you for highlighting the grammatical errors and providing specific <br /> examples. We deeply apologize for the oversight. In response to your feedback, we've carefully re-reviewed the manuscript and made the necessary corrections. Based on your example: We've revised the sentences to: “TNBC is the most aggressive subtype of breast cancer, with chemotherapy being a mainstay of treatment. However, the development of chemoresistance frequently occurs and poses significant challenges to the long-term survival prospects of TNBC patients.” “As for the cells in question, they exhibited an enlarged phenotype along with an increased genomic content.”

      We appreciate your meticulous review, and we have made an effort to address and rectify other such errors throughout the manuscript.

      Reviewer #2 (Recommendations for The Authors):

      I recommend the authors to address the following minor issues. Below are specific comments on the manuscript.

      Comments # 1. Thank you for the comment. In CDRA chip, DOXR cells and L-DOXR cells appeared in the mid-DOX region. What is the concentration of DOX in this region? Can the authors calculate the concentrations of DOX in high-, mid-, and low- regions (or ranges of concentrations)?

      Response: Instead of DOX, we used FITC dye to visualize the concentration gradient over the chip as below because DOX generate very low fluorescent light.

      Author response image 1.

      While our method provides an estimation rather than precise measurement due to the difference in molecular weight between FITC (389.38 g/mol) and DOX (579.98 g/mol), it is still possible to approximate the distribution of DOX concentrations across different regions. We utilize a formula where the ratio of the average fluorescence intensity of FITC for each specific region to the highest recorded fluorescence intensity is multiplied by the peak DOX concentration (1.5 μM). This approach gives us an estimated average concentration of DOX in each region, acknowledging that the diffusion characteristics of FITC and DOX may vary due to their differences in molecular weight. The following formula.

      With this formula we can calculate the concentration in each region. High region= 1.161 μM; Mid region = 0.554 μM; Low region = 0.098 μM

      Comment #2. Is there any other phenotypic difference between DOXR cells and L-DOXR cells besides their size?

      Response: "In addition to differences in cell size, L-DOXR cells exhibit several distinct phenotypic characteristics when compared to DOXR cells. These include variations in the cell cycle profile (as detailed in Fig. 2F-H), altered drug efflux capabilities (presented in Fig. 2I-J), and changes in nuclear morphology (illustrated in Fig. S3D). These phenotypic distinctions suggest that L-DOXR cells may have adapted unique mechanisms of resistance and survival, which are comprehensively depicted in the figures mentioned.

      Comment #3. Please add a description of abbreviations when the abbreviation is first used in the manuscript (e.g. NUPR1, HDAC11 etc.).

      Response: We corrected the mistake.

      Comment # 4. Figure 2B is the schematic of the chip, not the dimension of the chip. Please add the dimension of the chip to keep the figure caption as is or change the figure caption.

      Response: Thank you for the correction. We change the figure caption as Schematic of the chip.

      Reviewer #3 (Recommendations for The Authors):

      In this manuscript, Lim and colleagues use an innovative CDRA chip platform to derive and mechanistically elucidate the molecular wiring of doxorubicin-resistant (DOXR) MDA-MB-231 cells. Given their enlarged morphology and polyploidy, they termed these cells as Large-DOXR (L-DORX). Through comparative functional omics, they deduce the NUPR1/HDAC11 axis to be essential in imparting doxorubicin resistance and, consequently, genetic or pharmacologic inhibition of the NUPR1 to restore sensitivity to the drug. Although innovative, some deficiencies in the present manuscript slightly weaken the primary conclusions. A couple of critical issues are the use of a single cell line model (i.e., MDA-MB-231) for all the phenotypic and functional experiments and absolutely no mechanistic insights into how NUPR1 imparts resistance to doxorubicin. Some questions and comments are listed below for the authors' consideration and response:

      Major:

      Comment #1. The authors treated only the MDA-MB-231 cells with doxorubicin in the CDRA chip. Do other TNBC cell lines (namely, MDA-MB-436, HCC1187, or others) respond similarly to dox treatment, eventually yielding enlarged, aneuploid cells with the resistant phenotype? It is important to show that this phenotype is not confined to a single cell line, particularly given the numerous TNBC models that are commonly used.

      Response: Thank you for your insightful query regarding the generalizability of our findings across different TNBC cell lines. In this initial study, we focused exclusively on MDA-MB-231 cells due to their widespread use as a model for aggressive triple-negative breast cancer and the constraints of time and resources. While we cannot definitively claim that the observed phenotypic changes upon doxorubicin treatment will be identical in other TNBC cell lines such as MDA-MB-436 or HCC1187, we hypothesize that the underlying mechanisms of chemoresistance and cellular response could be similar across various TNBC models. This hypothesis is supported by literature indicating common pathways of drug resistance in TNBC. We believe that our findings lay the groundwork for future studies to explore the response of a broader range of TNBC cell lines to doxorubicin treatment. Such studies would greatly enhance our understanding of the cellular adaptations to chemotherapeutic agents in TNBC and help to validate the potential universal application of our findings.

      Comment #2: Do the L-DOXR cells permanently hold onto the enlarged and polyploid states upon prolonged culture in vitro? Does that change given the presence or withdrawal of the drug? In other words, is the physical state of the resistant cells reversible, or is it passed onto the progeny cells regardless of continued stress from the drug?

      Response: Thank you for your question about the stability of the phenotypic changes in L- DOXR cells. Our observations suggest that the enlarged and polyploid states in L-DOXR cells are not permanently fixed. When cultured in vitro over an extended period without the selective pressure of doxorubicin, we have noted that some cells may revert to a non- polyploid state. However, this reversion does not seem to be a stable change as subsequent generations can present with polyploidy again, even in the absence of the drug. This indicates a potential epigenetic or microenvironmental influence on the phenotypic state of these cells, suggesting a complex interplay between the drug-induced stress and the inherent cellular response mechanisms. Further investigation is needed to fully understand the dynamics of these phenotypic changes and whether they are heritable and/or reversible under different culture conditions.

      Comment #3: In Figures 2F-H, the authors perform DNA-staining-based FACS to estimate the ploidy of the cells. These estimations could be improved using 2D cell cycle analyses using EdU or BrdU co-treatment and staining. This would further allow a clear distinction between S-phase and G0/G1 and M-phase cells in the WT, DOXR, and L-DORX populations.

      Response: Thank you for the suggestion to enhance the accuracy of our ploidy estimations. We appreciate the advice to implement 2D cell cycle analyses using EdU or BrdU co-treatment and staining, as this could indeed provide a clearer distinction between the various phases of the cell cycle in our WT (wild-type), DOXR (doxorubicin-resistant), and L-DOXR (large doxorubicin-resistant) cell populations. Incorporating these thymidine analogs would allow us to label newly synthesized DNA and thereby accurately delineate cells in the synthesis phase from those in the G0/G1 and M phases. This approach will likely add depth to our understanding of the cell cycle dynamics and the mechanism behind the drug resistance phenotype. We will consider incorporating these techniques in our future experiments to validate and extend the findings reported in this study.

      Comment #4. In Figure 3H, the authors quantitate the number of enlarged cells detected in human specimens of TNBC or normal breast tissues. How were these cells detected simply using the H&E staining, particularly when assessing the genomic content? Were certain size and nuclear staining intensity thresholds used for these categorizations? If so, these should be mentioned in the paper.

      Response: In our study, we identified enlarged cells within human TNBC and normal breast tissue specimens using H&E staining, and their quantitation was carried out using the Colour Deconvolution 2 plugin (Landini G et al., 2020) within the ImageJ software. This method allowed us to analyze the staining intensity and cell size systematically. To ascertain that we were indeed observing cells with increased genomic content, we established specific size and nuclear staining intensity thresholds. Cells exceeding these predetermined thresholds were categorized as 'enlarged'. Additionally, we used continuous serial slides for the human TNBC tissues microarray (BR1301, US Biomax) for more accurate comparisons in Figures 3H, I, and 5H. To strengthen our findings, we verified that NUPR1 expression, which is associated with the observed cell enlargements, was indeed elevated in these same cells from the patient samples. We have detailed these methodological aspects and the criteria for cell categorization in the 'Tissue Microarray and Immunohistochemistry' section of our Materials and Methods to ensure clarity and reproducibility of our results.

      Comment #5: In Figure 3I, the authors label the enlarged cells in the patient tissues as L-DOXR cells. Were these assessments done in dox-treated tumors? Even if that is the case, it'll be unfair to call them resistant to doxorubicin. The axis label "% enlarged cells" might be more accurate.

      Response: We appreciate the reviewer's attention to detail and agree that the terminology used in Figure 3I was inaccurate. The cells identified in patient tissues were labeled based on their morphological resemblance to L-DOXR cells observed in vitro; however, these patient tissue samples were not confirmed to be treated with doxorubicin, nor were the cells confirmed to be resistant. Therefore, we have amended the figure legend to reflect this and now refer to these cells simply as 'enlarged cells’.

      Comment #6: The authors uncovered that NUPR1 expression is dramatically increased in the L-DOXR cells vs the wild-type cells. How does the NUPR1 gene expression and activity compare between L-DOXR and DOXR MDA-MB-231 cells?

      Response: Thank you for the valuable comment. The data are included in figure supplement 3 and we revise the manuscript as below. “While DOXR cells exhibited a marked increase in Nupr1 expression compared to the WT cells, this expression was substantially less than that observed in L-DOXR cells, as detailed in figure supplement 3.”(Page 7, Line 3).

      Comment #7: Following from above, the authors show that NUPR1 activity is not necessary for cell survival in the absence of doxorubicin (Fig. 4H). But, does it control the cellular size and polyploid states of the L-DOXR cells? In other words, is there any association between increased size and genomic content of the cells to their sensitivity to doxorubicin? Are cells resistant to other chemotherapeutics as well? Or is the resistant phenotype specific to doxorubicin? The authors causally implicate NUPR1 in driving the dox-resistant phenotype in MDA-MB-231 cells. To fully substantiate this claim, the authors should perform gain-of-function studies, in at least 2-3 TNBC cell lines, to show that over-expression of NUPR1 alone is sufficient to impart doxorubicin resistance. Also, the most critical information missing from the study is how NUPR1 drives resistance to doxorubicin. What is the function of NUPR1 in L-DOXR cells and what gene expression program does it activate to impart the resistant phenotype?

      Response: During the experimental process either the loss of function or gain of function of Nupr1 in the L-DOXR cells, we have not noticed any specific changes in the cellular size and polyploid states of L-DOXR cells. Although we cannot rule out the possibility that not only by DOX treatment, phenotypically larger cell might arise in response to other chemotherapeutics, in the current study, we found that high level of Nupr1 expression is correlated with sensitivity to doxorubicin in L-DOX cells. Moreover, as followed by the reviewer’s suggestion we performed gain-of-function study to determine whether over-expression of NUPR1 alone is sufficient to impart doxorubicin resistance in TNBC cells. Overexpression of GST-NUPR1 attenuates DOX-induced cell death while slightly increased cell viability of WT (MDA-MB231) cells in the condition of vehicle -treatment, indicating that NUPR1 expressing cells are resistant to the cytotoxic effect of DOX. We have also demonstrated that Nupr1 upregulation in L-DOXR cells are due to suppressed expression of HDAC11 in these cells as we found that HDAC11 triggers promoter acetylation of Nupr1 in L-DOXR cells. Thus, it is conceivable that increased expression of Nupr1 upon HDAC11 suppression in L-DOXR cells is at least responsible for doxorubicin resistance.

      Comment #8: Do the authors speculate the dox-resistant phenotype to be restricted to basal TNBC tumors or even NUPR1-high ER+ breast cancer cells (MCF7 or T47D) would likely be resistant to doxorubicin or other chemotherapeutics?

      Response: Yes, NUPR1-high ER+ breast cancer cells (MCF7 or T47D) would likely be resistant to doxorubicin or other chemotherapeutics as reported elsewhere; Wang, L., Sun, J., Yin, Y. et al. Transcriptional coregualtor NUPR1 maintains tamoxifen resistance in breast cancer cells. Cell Death Dis 12, 149 (2021). https://doi.org/10.1038/s41419-021-03442-z

      Comment #9: The authors suggest that HDAC11 continuously deacetylates the NUPR1 promoter to suppress its expression. Consequently, does the inactivation of HDAC11 in wild-type TNBC cells lead to NUPR1 up-regulation? Is this increase in NUPR1 expression reverted upon inhibition of the HAT machinery (say P300/CBP) in HDAC11-deficient TNBC cells?

      Response: In the revised manuscript (pg 8, lines 14-16 and Fig 5H) consistent with our observation that while overexpression of HDAC11 suppresses the expression of Nupr1 in the both WT and L-DOXR cells, HDAC11 inhibitor treatment enhances Nupr1 expression in WT cells, inversely mirroring an unusual low expression of HDAC11 and high level of Nupr1 in L-DOXR cells. Conceivably, the increased Nupr1 expression reflects reverting of promoter acetylation.

      Minor:

      Comment #10: In Figure 4L, how many animals or tumors were in each of the treatment arms? Were the weights of all the tumors recorded as well? It would be meaningful to add this data, if available. The authors keep changing gene nomenclature throughout the manuscript, listing the gene names in either capital letters or the small-case. This can be made consistent.

      Response: We have used 6 mice per group and one tumor for one mouse due to the tumor <br /> size of L-DORX with the vehicle group. We also added new data showing the weights of the tumors in Figure supplement 2D. We apologize for the unmatched gene names. Following the reviewer’s suggestion, the names of genes/proteins have been changed in figures and legends to the recommendations of the HUGO gene nomenclature committee (HGNC).

    1. Author Response

      The following is the authors’ response to the original reviews.

      Thanks for your comments and suggestions concerning our manuscript entitled “miR-252 targeting temperature receptor CcTRPM to mediate the transition from summer-form to winter-form of Cacopsylla chinensis”. These comments are all of great important and extremely helpful for revising and improving our manuscript. We have revised the manuscript carefully according to all your comments. Our point-by-point responses to the comments are listed below.

      Reviewer #1 (Recommendations For The Authors):

      1) If the authors wish to improve their phylogenetic analysis, I strongly suggest using their hemipteran sequences alongside the Drosophila homolog and at least all of the human paralogs. This should be generally sufficient to recapitulate the generally accepted TRPM phylogeny. If the authors contend that this is in fact a separate lineage from other insect TRPMs, a phylogeny that is as taxonomically inclusive as possible, and as methodologically rigorous as possible, would be ideal.

      Thanks for your great suggestion. We have redid the phylogenetic analysis in Figure S1B using CcTRPM sequence with homologs from other 16 species, including 8 human paralogs, 1 Mus musculus homolog, 1 Drosophila homolog, and 6 insect homologs. The relative description was added in Line 489-491 and Line 1044-1049 of our revised manuscript.

      2) If the authors wish to conclude that this is a cold-sensitive ion channel, I strongly suggest repeating at least the Ca2+ imaging with a cold stimulus. In the absence of this experiment, I think that the conclusions need to be significantly softened/hedged, making it clear that the only evidence of cold sensitivity is indirect (resulting from the knockdown experiments).

      Thanks for your excellent suggestion. We have performed Ca2+ imaging with a cold stimulus of 10°C. As expected, there was a clear increase of Ca2+ concentration was observed when treated with cold stimulus of 10°C, which was similar with menthol treatment. So, we could get the solid conclusion that CcTRPM is a direct cold-sensitive ion channel in C. chinensis. We also have added the Ca2+ imaging result with a cold stimulus of 10°C in Figure 2D and moved the results of Ca2+ imaging with menthol treatment to Figure S2I. The related results and methods were added in Line 193-200, Line 919-923, and Line 1065-1069 of our revised manuscript.

      3) Lines 173 and 181: The method used to identify the putative transmembrane domains was not described (although the 3D model does have the correct TRP structure, these methodological details would be appreciated).

      Thanks for your great suggestion. We used an online software of SMART (a Simple Modular Architecture Research Tool) to identify the putative transmembrane domains of CcTRPM, and have added these methodological details in Line 485-487 of Materials and Methods of our revised manuscript.

      4) Lines 176-178: The authors state that "phylogenetic analysis revealed that CcTRPM was most closely related to the DcTRPM homologue (Diaphorina citri, XP_017299512.2), which was consistent with the evolutionary relationships predicted from the multiple alignment of amino acid sequences." The meaning of this sentence is unclear to me. I'm not sure what it means to be "consistent with the evolutionary relationships predicted from the multiple alignment of amino acid sequences."

      Thanks for your excellent suggestion. We have revised this sentence in Line176 to 179 of our revised manuscript.

      5) Lines 474-475: The authors state that the NCBI database was used to identify homologous sequences, but there isn't sufficient methodological detail to repeat the search. For example, was this a BLASTP search? Was it taxonomically restricted? What statistical thresholds for homology inference were used? These details would be much appreciated.

      Thanks for your great suggestion. We used BLASTP of NCBI database to identify homologous sequences and preferred the representative species that TRPM sequences have been reported. We have added more description about the methodological detail of phylogenetic analysis in Line 489 to 491 of our revised manuscript.

      6) It would be very interesting, but not critical, to know if menthol and borneol alone have an effect on cuticle thickness.

      Thanks for your excellent suggestion. Actually, we performed the experiments of menthol and borneol alone on cuticle thickness at the beginning. Under 25°C condition, treatment of menthol and borneol alone induced 30-40% transition of 1st instar nymphs from summer-form to winter-form, but only had some slight effect on cuticle thickness, not strong as 10°C of low temperature, because of the opposite effect of 25°C. However, under 10°C condition, we could not know whether the effect on cuticle thickness is from 10°C of low temperature, or direct from menthol and borneol alone.

      7) It would be interesting, but not critical, to confirm the authors' ab initio protein folding by comparing their model to the AlphaFold2-derived model, either by folding it themselves or extracting it from the AlphaFold Protein Structure Database, if it has already been folded by DeepMind.

      Thanks for your great suggestion. We have predicted the tertiary protein structures of CcTRPM with AlphaFold2 software and the result was shown in Author response image 1. Compared with the result in Figure 2A, the conserved ankyrin repeats (ANK) and six transmembrane domains were almost similar.

      Author response image 1.

      The tertiary structures of CcTRPM predicted with AlphaFold2 software.

      8) Figures 1F-G, 3F, 4A-B, 5G-J, S6C, and S7C-D do not plot replicates (although these are plotted in other figures).

      Thanks for your excellent suggestion. Besides Figure 1F-G was stacked grouped graph type and could not add the plot replicates, we have added the plot replicates in Figures 3F, 4A-B, 5G-J, S6C, and S7C-D of our revised manuscript.

      9) Figure 5A-C, and associated text: The significance of these findings is somewhat lost on me, coming from a position of general naivety concerning chitin biosynthesis. My interpretation of Figure 5A was that each of these steps was a necessary component of chitin biosynthesis. It was thus surprising that not all of the steps were required. I think it would be exceptionally helpful if the authors spent more time describing this pathway, alternative pathways to generating the intermediate steps, and ultimately, their hypothesis of why only two steps seem critical.

      Thanks for your great suggestion. The signal pathway of chitin biosynthesis in Figure 5A was modified from the paper of Doucet and Retnakaran, 2012. De novo biosynthesis of chitin has eight enzymatic steps, including 1 Trehalose, 2 enzymes in Glycolysis, 4 enzymes in Hexosamine pathway, and 1 Chitin synthesis. Glycolysis and hexosamine pathway are two complex cellular metabolic processes within organisms. We supposed that there are two reasons for not all of these steps were required: (1) the function of some enzymes may be replaced or supplemented by other enzymes, for examples, function of hexokinase and glucokinase was similar. (2) The reason for no obviously phenotypic defects might be cause by insufficient interference efficiency of RNAi. So, it’s worth to further study the functions of these chitin biosynthesis enzymes by CRISPR-Cas9 in future. We have added more describing about this chitin biosynthesis pathway in Line 379-390 of our revised manuscript.

      Reviewer #2 (Recommendations For The Authors):

      1) Line 19, should be morphological transition.

      Thanks for your excellent suggestion. We have changed “behavioral transition” to “morphological transition” in Line 19 of our revised manuscript.

      2) Line 21, delete the novel.

      Thanks for your excellent suggestion. We have deleted the word of “novel” in Line 21 of our revised manuscript.

      3) Fig. 2B, did authors examine the CcTRPM expression level before 3 d? Given that CcTRPM acts as a cold sensor, it is supposed to respond to temperature change quickly.

      Thanks for your excellent suggestion. We have examined the CcTRPM expression level in 1 d and 2 d after 10°C treatment compared with 25°C treatment. As expected, CcTRPM expression levels were also obviously increased in 1 d and 2 d after 10°C treatment. We have added the relative results in Figure S2F and relative description in Line 184-185, Line 500, and Line 1059-1060 of our revised manuscript.

      4) Fig. 2I, from the figure legend and the text in the panel, it's hard for readers to understand what the authors intend to say. This data is important since knockdown of CcTRPM decreases the winter-form from 90% to 30% at 10℃. Provide more information in the figure legend.

      Thanks for your excellent suggestion. We have added more information in the figure legend of Figure 2I in Line 933-939 of our revised manuscript.

      5) Line 224, ...CcTRPM functions as a molecular switch to modulate the transition from .... The phrase 'molecular switch' is inappropriate because knockdown of CcTRPM partially decreases the form ratio as shown in Fig.2I instead of reversing the effect completely. So, use other words instead of 'molecular switch'.

      Thanks for your excellent suggestion. We have changed “a molecular switch” to “an essential molecular signal” in Line 225 of our revised manuscript.

      6) Fig. 4G, this data is important. It's nice to see that this data is provided.

      Thanks for your excellent suggestion. We have provided the data of Figure 4G in Table S2 of our revised manuscript.

      7) Authors showed that CcTRPM functions as a cold receptor to regulate the transition of C. chinensis from summer-form to winter-form. Does this mean that a heat receptor gene functions oppositely by transiting winter-form into summer-form? Did the authors test the function of a heat TRP in the form transition? At least, discuss this in the discussion part.

      Thanks for your excellent suggestion. TRPV ion channel has been reported to function as a heat receptor in mammals by David Julius (Caterina et al., 1997; Cao et al., 2013). So, we supposed TRPV maybe function as a heat receptor to induce the transition from winter-form to summer-form in C. chinensis. The relative tests are on going. We have added two references in Line 681-686 and some discussion about the heat receptor in Line 341-345 of our revised manuscript.

      8) Line 433, which tissue was used for transmission electron microscopy?

      Thanks for your excellent suggestion. The thorax was used for transmission electron microscopy, and we have added the information in Line 448 and Line 453 of our revised manuscript.

      9) How is the conservation of miR-252? Does the regulatory role of CcTRPM and miR-252 apply to the psylla family in addition to C. chinensis?

      Thanks for your excellent suggestion. Besides C. chinensis, the phenomenon of summer-form and winter-form also existed in other psylla species, like Cyamophila willieti. Because of no genomic information was reported in most psylla species, we could not evaluate the conservation of miR-252 between different psylla species. However, it is worth and interesting to clarify whether the function of TRPM and miR-252 were conserved in the future.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This is a valuable study in which the authors provide an expression profile of the human blood fluke, Schistosoma mansoni. A strength of this solid study is in its inclusion of in situ hybridisation to validate the predictions of the transcript analysis.

      We thank the reviewers and the editor for their effort and expertise in reviewing our manuscript. We have made changes based on the reviews and believe this has greatly strengthened our manuscript. We appreciate their insightful comments and suggestions.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this work, the authors provide a valuable transcriptomic resource for the intermediate free-living transmission stage (miracidium larva) of the blood fluke. The single-cell transcriptome inventory is beautifully supplemented with in situ hybridization, providing spatial information and absolute cell numbers for many of the recovered transcriptomic states. The identification of sex-specific transcriptomic states within the populations of stem cells was particularly unexpected. The work comprises a rich resource to complement the biology of this complex system, however falls short in some technical aspects of the bioinformatic analyses of the generated sequence data.

      (1) Four sequencing libraries were generated and then merged for analysis, however, the authors fail to document any parameters that would indicate that the clustering does not suffer from any batch effects.

      We thank the reviewer for this comment which has given us the opportunity to elaborate on this interesting point. Consequently, we have added evidence to show that the data do not suffer from batch effects between samples (e.g. between sorted samples 1 and 4, and unsorted samples 2 and 3). We now show that there are contributions to all clusters from sorted and unsorted samples and highlight the benefits to using both conditions in a cell atlas with unknown cell types.

      Accordingly, we have now added the following paragraph to line 153:

      There were contributions from sorted and unsorted samples in almost all clusters (except ciliary plates). We found that some cell/tissue types had similar recovery from both methods (e.g. Stem A, Muscle 2, and Tegument), others were preferentially recovered by sorting (e.g Neuron 1, Neuron 4, and Stem E), and some were depleted by sorting (e.g. Parenchyma 1, Protonephridia, and Ciliary plates) (Supplementary Figure 1) , Supplementary Table 4). This variation in recovery, therefore, enabled us to maximise the discovery and inclusion of different cell types in the atlas.

      We have now added a Supplementary Figure 1 showing the contribution of sorted and unsorted cells to the Seurat clusters. We have also included a Supplementary Table 4 detailing the cell number contribution for both conditions and the percentages in order to easily compare differential recovery between cell types.

      These are added to the manuscript.

      (2) Additionally, the authors switch between analysis platforms without a clear motivation or explanation of what the fundamental differences between these platforms are. While in theory, any biologically robust observation should be recoverable from any permutation of analysis parameters, it has been recently documented that the two popular analysis platforms (Seurat - R and scanPy python) indeed do things slightly differently and can give different results (https://www.biorxiv.org/content/10.1101/2024.04.04.588111v1). For this reason, I don't think that one can claim that Seurat fails to find clusters resolved by SAM without running a similar pipeline on the cluster alone as was done with SAM/scanPy here. The manuscript itself needs to be checked carefully for misleading statements in this regard.

      We thank the reviewer for this comment and agree that it’s important to increase the clarity on this matter. We have added additional detail to explain that results of subclustering Neuron 1 using Seurat and SAM/ScanPy were broadly similar, but that we presented the results from the SAM/ScanPy analysis due to the strengths of SAM in detecting small differences in gene expression (Tarashanky et al., 2019 PMID: 31524596). We have included here the UMAP showing subclustering of Neuron 1 in Seurat for comparison.

      Author response image 1.

      UMAP showing subclustering of Neuron 1 cluster in Seurat (SCT normalisation, PC = 19, resolution = 0.3).

      We’ve added this additional text to the ‘Neuron abundance and diversity’ section on line 220:

      We explored whether Neuron 1 could be further subdivided into transcriptionally distinct cells by subclustering (Supplementary Figure 2; Supplementary Table 6) using the self-assembling manifold (SAM) algorithm (Tarashansky et al., 2019) with ScanPy (Wolf et al., 2018), given its reported strength in discerning subtle variation in gene expression (Tarashansky et al., 2019), although a similar topology was subsequently found using Seurat.

      (3) Similarly, the manuscript contains many statements regarding clusters being 'connected to', or forming a 'bridge' on the UMAP projection. One must be very careful about these types of statements, as the relative position of cells on a reduced-dimension cell map can be misleading (see Chari and Pachter 2023). To support these types of interpretations, the authors should provide evidence of gene expression transitions that support connectivity as well as stability estimates of such connections under different parameter conditions. Otherwise, these descriptors hold little value and should be dropped and the transcriptomic states simply defined as clusters with no reference to their positions on the UMAP.

      We thank the reviewer for this thoughtful comment. We agree and have rephrased those statements accordingly e.g. line numbers 218, 439, 543, and 557.

      (4) The underlying support for the clusters as transcriptomically unique identities is not well supported by the dot plots provided. The authors used very permissive parameters to generate marker lists, which hampers the identification of highly specific marker genes. This permissive approach can allow for extensive lists of upregulated genes for input into STRING/GO analyses, this is less useful for evaluating the robustness of the cluster states. Running the Seurat::FindAllMarkers with more stringent parameters would give a more selective set of genes to display and thereby increase the confidence in the reader as to the validity of profiles selected as being transcriptomically unique.

      The Reviewer is correct in noting that we used a permissive approach to enable a better understanding of the biology of each cluster, based on analysing enriched functions. However, we disagree about the suitability of the approach for finding markers. First, the permissive approach produced longer candidate lists, but those with the best AUC scores for each cluster are at the top of the list for each cluster. Second, some of the markers with lower expression also revealed interesting biology (e.g. Notum in the muscles). Furthermore, we used filtering on the marker genes lists to increase the minimum marker gene scores for analyses such as the GO analyses (details in the GO section of the methods). It’s important to stress that our approach also utilised validation by FISH for top marker genes, as well as biologically informative genes that were lower down the marker gene list.

      (5) Figure 5B shows a UMAP representation of cell positions with a statement that the clustering disappears. As a visual representation of this phenomenon, the UMAP is a very good tool, however, to make this statement you need to re-cluster your data after the removal of this gene set and demonstrate that the data no longer clusters into A/B and C/D.

      We’ve added Supplementary Figure 13 to show that after removing WSR and ZSR genes and reclustering, the data no longer clusters in A/B and C/D, even at a higher resolution where clusters appear oversplit.

      Also, as a reader, these data beg the question: which genes are removed here? Is there an over-representation of any specific 'types' of genes that could lead to any hypotheses of the function? Perhaps the STRING/GO analyses of this gene set could be informative.

      We have performed GO-enrichment analyses on W-specific genes, Z-specific genes and both together compared to the rest of the genome, but we did not find very informative results (see Supplementary Table 13 that we have now added, line 464). This may be due to the large difference in size. There are approx 900 Z-specific genes (males two copy, females one copy), while approx 30 W-specific genes many of which have homologs in the Z-specific region of the genome. Instead we suggest that tissue-specific regulation of gene dosage compensation is the more likely explanation as reported for other species (Valsecchi et al. 2018).

      (6) How do the proportions of cell types characterized via in situ here compare to the relative proportions of clusters obtained? It does not correspond to the percentages of the clusters captured (although this should be quantified in a similar manner in order to make this comparison direct: 10,686/20,478 = ~50% vs. 7%), how do you interpret this discrepancy? While this is mentioned in the discussion, there is no sufficient postulation as to why you have an overabundance of the stem cells compared to their presence in the tissue. While it is true that you could have a negative selection of some cell types, for example as stated the size of the penetration glands exceeds both that of the 10x capabilities (40uM), and the 30uM filters used in the protocol, this does not really address why over half of the captured cells represent 'stem cells'. A more realistic interpretation would be biological rather than merely technical. For example, while the composition of the muscle cells and the number of muscle transcriptomes captured are quite congruent at ~20%, the organism is composed of more than 50% of neurons, but only 15% of the transcriptomic states are assigned to neuronal. Could it be that a large fraction of the stem cells are actually neural progenitors? Are there other large inconsistencies between the cluster sizes and the fraction of expected cells? Could you look specifically at early transcription factors that are found in the neurons (or other cell types) within the various stem cell populations to help further refine the precursor/cell type relationships?

      Yes, it is really interesting that more than 50% of cells in the animal are neurons whereas more than 50% of cells in scRNAseq data are stem cells. This dataset provides a unique opportunity to compare tissue composition in the whole animal to the corresponding single cell RNAseq dataset.

      The table (in Supplementary Table 17) shows the percentage of cells from each tissue type in the miracidium (identified via in situ hybridisation of tissue-type marker genes) and in the scRNAseq to understand this phenomenon.

      This table shows that the single cell protocol used in this study negatively selected for nerves and tegument, and positively selected for stem and parenchyma. The composition of the muscle and protonephridia cells and the number of muscle and protonephridia transcriptomes captured are quite congruent.

      This technical finding is also biologically consistent. For instance, the tegument cells span the body wall muscles, with the cell bodies below and a syncytial layer above. It is not known how the tegument fragments during the dissociation process, and which parts of the cells get packaged by the 10X GEMs. Because of tegumental structure, the cells are likely prone to damage, and therefore we speculate that is why the tegument cells are under-represented in our 10X data. Unusually shaped fragments may not have been captured in 10X GEMs and of those that were, damaged or distressed tegument cells/fragments may have been excluded post-sequencing, by QC filters including cell calling, mitochondrial percentage and low transcript count (e.g. if there there was a tegumental fragment with 100 transcripts it would have not passed QC). Stem cells are spherical with a large nucleus:cytoplasm ratio, likely making them more robust during dissociation and more likely to be captured in 10X GEMs.

      We don’t think that a large fraction of the stem cells are actually neural progenitors because:

      (1) we used previously reported marker genes of different tissue types to identify the single cell RNAseq clusters, e.g. Ago2-1 for stem cells, which has been used in multiple life stages.

      (2) The stem cell transcriptomes express many previously reported stem cell marker genes.

      (3) We found that the stem cells from the single cell data generally had higher numbers of transcripts than the other cell types which is consistent with the Wang et al. 2013 observation that RNA marker POPO-1 could distinguish germinal (stem) cells from other cell types as they are RNA rich.

      (4) We also found higher numbers of ribosomal related transcripts in our stem cell transcriptomes, which is consistent with Pan’s observation that part of the distinct morphology of stem cells is densely packed ribosomes in the cytoplasm.

      In order to elaborate on this discussion we have generated new visualisations:

      (1) A UMAP of the stem cell marker ago2-1 (Supplementary figure 10), to further illustrate our evidence in classifying the stem cell clusters

      (2) A co-expression plot of the stem cell marker ago2-1 with neural marker complexin to confirm that there is little coexpression (the most coexpression being in Neuron 1 and Stem F). We identified that 15.56% of cells in the Stem F cluster show some expression of complexin (neural marker), suggesting that a small fraction of Stem F may be early/precursor neurons, but the gene expression indicates that the majority of cells in Stem F are more likely to be stem cells than any other tissue type. There is little to no complexin expression in the other stem clusters.

      (3) Expression plots of the 5 neurogenins (TFs involved in neuronal differentiation) we could identify using WormBase ParaSite in these data. Four of the five showed very little expression, and not in specific clusters. The fifth (Smp_072470) showed slightly more expression, though still sparse, mostly across the stem and neural clusters not enough to indicate that any of the stem clusters are neural progenitors.

      Author response image 2.

      Coexpression UMAP showing the expression of stem cell marker Ago2-1 and neural marker complexin.

      Author response image 3.

      UMAPs showing the expression five putative neurogenins of S.mansoni.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript the authors have generated a single-cell atlas of the miracidium, the first free-living stage of an important human parasite, Schistosoma mansoni. Miracidia develop from eggs produced in the mammalian (human) host and are released into freshwater, where they can infect the parasite's intermediate snail host to continue the life cycle. This study adds to the growing single-cell resources that have already been generated for other life-cycle stages and, thus, provides a useful resource for the field.

      Strengths:

      Beyond generating lists of genes that are differentially expressed in different cell types, the authors validated many of the cluster-defining genes using in situ hybridization chain reaction. In addition to providing the field with markers for many of the cell types in the parasite at this stage, the authors use these markers to count the total number of various cell types in the organism. Because the authors realized that their cell isolation protocols were biasing the cell types they were sequencing, they applied a second method to help them recover additional cell types.

      Schistosomes have ZW sex chromosomes and the authors make the interesting observation that the stem cells at this stage are already expressing sex (i.e. W)-specific genes.

      Weaknesses:

      The sample sizes upon which the in situ hybridization results and cell counts are based are either not stated (in most cases) or are very small (n=3). This lack of clarity about biological replicates and sample sizes makes it difficult for the reader to assess the robustness of the results and the extremely small sample sizes (when provided) are a missed opportunity to explore the variability of the system, or lack thereof.

      We have now added more details about the methods we used for validating cell type marker genes by in situ hybridisation. We have added to the methods that ‘We carried out at least three in situ hybridisation experiments for each marker gene we validated (each experiment was a biological replicate). From each experiment we imaged (by confocal microscopy) at least 10 miracidia (technical replicates) per marker gene experiment.’ on line 1036.

      In the figure legends we have added the number of miracidia that were screened, and documented the percentage of the screened larvae that showed the in situ gene expression pattern that is seen in the images in the figures, and that we described in the text.

      We manually segmented the nuclei of pan tissue marker genes, and we did this for one miracidium in the case of all tissues, except stem cells where we segmented stem cells in five larvae. Manual segmentation of gene expression in a confocal z-stack is very time consuming. We consider that the variability of different cell and tissue types (stereotypy) between miracidia is beyond the scope of this paper and can be investigated in future work.

      Although assigning transcripts to a given cell type is usually straightforward via in situ experiments, the authors fail to consider the potential difficulty of assigning the appropriate nuclei to cells with long cytoplasmic extensions, like neurons. In the absence of multiple markers and a better understanding of the nervous system, it seems likely that the authors have overestimated the number of neurons and misassigned other cell types based on their proximity to neural projections.

      This is a valid point, and we acknowledge the difficulties of assigning a nucleus to a cell using mRNA expression only and in the absence of a cell membrane marker. We tried to address this issue by labelling the cell membranes using an antibody against beta catenin after the HCR in situ protocol. This method has been used successfully on sections on slides (Schulte et al., 2024), but we failed to get usable results in our miracidia whole-mounts. The beta catenin localisation marked the membranes of the gland cells but didn’t do the same for the neurons or other cell types (see image below).

      Author response image 4.

      Image showing a maximum intensity projection of a subvolume of a confocal z-stack of a miracidia wholemount in situ hybridisation (by HCR) for paramyosin counterstained with a beta catenin antibody (1:600 concentration of Sigma C2206). The cell membrane of a lateral gland is clearly labelled, but those of the neurons of the brain and the paramyosin+ muscle cells are not.

      Our observation that 57% of the cells in a miracidium are nerves is high compared to the C.elegans hermaphrodite adult in which 302 out of 959 cells are neurons (Hobert et al., 2016), few studies have equivalent data with which to make comparisons. Despite this, and the limitation described above, we believe that we have not overestimated the number of neural cells. During the process of validating the marker genes and closely examining gene expression in hundreds of miracidia, we noted that the nuclei of different tissue types are distinct and recognisable (see figure below). The nuclei of stem, tegument and parenchymal cells are comparatively large and spherical with obvious nucleoli (i). The four nuclei of the apical gland cell are angular, pentagonal in shape and sit adjoining each other (inside red dashed circle, i-iii), those of the two lateral glands are bilaterally symmetrical and surrounded by flask shaped cytoplasm (arrows, iv). The nuclei of the body wall muscle cells are peripheral and flattened on the outer edge (iii). The notum+ muscle cell nuclei are anterior of the apical gland (manuscript Figure 2E). The only other two tissue types are the nerves and protonephridia, and their nuclei are smaller and more compact/condensed. In situ expression of the protonephridia marker suggests that 6 cells make up the protonephridial system (manuscript Figure 4 B&E). Therefore, by process of elimination, the remaining nuclei should belong to neurons. The complexin expression pattern supports this and we counted 209 nuclei that were surrounded by cpx transcript expression. To help the reader interpret this for themselves we have added confocal z-stacks of miracidia where tissue level markers have been multiplexed (supplementary videos 18-20). We counted all tissue type cells individually and the tissue type cell numbers added up to the overall cell count.

      Author response image 5.

      Image showing the diversity of nucleus morphology between tissue types in the miracidium.

      Biologically, it is not surprising that this larva is dominated by neural cells. It must navigate a complex aquatic environment and identify a suitable mollusc host in less than 12 hours. It is a non-feeding vehicle that must deliver the stem cells to a suitable environment where they can develop into the subsequent life cycle stage. Accordingly, the cell type composition reflects this challenge.

      The conclusion that germline genes are expressed in the miracidia stem cells seems greatly overstated in the absence of any follow-up validation. The expression scales for genes like eled and boule are more than 3 orders of magnitude smaller than those used for any of the robustly expressed genes presented throughout the paper. These scales are undefined, so it isn't entirely clear what they represent, but neither of these genes is detected at levels remotely high (or statistically significant) enough to survive filters for cluster-defining genes.

      Given that germ cells often develop early in embryogenesis and arrest the cell cycle until later in development, and that these transcripts reveal no unspliced forms, it seems plausible that the authors are detecting some maternally supplied transcripts that have yet to be completely degraded.

      We agree that the expression of genes such as eled and boule are low. We made this clear in the figure legends and text, and have now added scale information to the figure legends. We did not explore these genes as cluster-defining genes, partly due to their comparatively low levels of expression, but as genes already reported to be important in germ line specification. We found the expression of these genes to be consistent with our hypothesis that the Kappa stem cells may include germ line segregated cells, but our hypothesis does not rest on these lower-expressed genes.

      It is certainly possible that we have detected some maternally supplied transcripts in the miracidia stem cells. However experiments to distinguish between zygotic and maternal transcripts using metabolic labelling of zygotic transcripts (e.g. Fishman et al. 2023) would be hard in this species due to the hard egg capsule and its ectolethical embryogenesis. Therefore this is out of scope for this work, but this would be a very interesting topic to follow up on and develop tools for.

      We have added these sentences to the Discussion ln 746 ‘Intriguingly, the presence of spliced-only copies of the germline defining genes eled and boule could suggest that they are maternal transcripts that have been restricted to the primordial germ cells during embryogenesis, as is the case in Zebrafish embryos (Fishman et al., 2023). An alternative explanation is that unspliced transcripts exist for these lowly expressed genes but their abundance was below our threshold for detection.’

      Reviewer #1 (Recommendations For The Authors):

      Ln 138: specify the version of Seurat used, and reference the primary papers for this software. Also, from the dot plot shown here, these do not all appear to be supported by unique gene sets. How was the final clustering determined? This information is in the methods section, but a summary here could make it more robust for the readership.

      In addition to the details in the methods section, we have added the version and referenced the version-specific primary paper for Seurat when it is first mentioned. We have also summarised the methods used to select the final clustering when we first present the results to aid in clarity.

      We added to line 140 ‘Using Seurat (version 4.3.0) (Hao et al., 2021), 19 distinct clusters of cells were identified, along with putative marker genes best able to discriminate between the populations (Figure 1C & D and Supplementary Table 2 and 3). We used Seurat’s JackStraw and ElbowPlot, along with molecular cross-validation to select the number of principal components, and Seurat’s clustree to select a resolution where clusters were stable (Hao et al., 2021).’

      Ln 147: isn't seven stem cell clusters a lot? See comment in public review.

      We did not have preconceived expectations of the number of stem cell clusters, and were guided by the data and gene expression. In doing so we also discovered that four of those clusters were likely only two ‘biologically or functionally distinct’ clusters, but these split into four clusters based on the expression of genes on the sex-specific regions of the chromosomes, which was both unexpected and interesting.

      Figure 1D: gene model names are un-informative for the general reader. Can you provide any putative gene identities here to render this plot interpretable? For example in the main text you state that Smp-085540 is paramyosin; please use this annotation in all your visual material (as is used in Figure 2A).

      We have added gene names to the dotplots in all figures with the locus identifier (minus the ‘Smp’ prefix) in brackets after the gene name.

      Ln 191:196 Identification of the two muscle clusters as circular and longitudinal muscles is very well supported. However, it would be interesting to look specifically at the genes that are different here. Did the authors attempt to specifically pull out genes differentially expressed between these two groups, or only examine the output of FindAllMarkers at this point?

      We did indeed look specifically for genes differentially expressed between the muscle clusters, the results of which can be found in Supplementary Table 5 (Line 206). This analysis revealed “Wnt-11-1 (circular) and MyoD (longitudinal) were among the most differentially expressed genes”, which were important findings in our understanding of the muscle cells in the miracidium.

      Ln 207: "connected to stem F" - does this refer specifically to their relative positions on the UMAP in Figure 1C? One must be very careful about these types of statements, as the relative position of cells on a reduced-dimension cell map can be misleading (public review).

      We agree, and have rephrased accordingly.

      Ln 209:211: Here the authors switch from Seurat (R) as an analysis package, to SAM (python) for subset analysis of one large neural cluster. The results indicate that there may be small populations of transcriptomically distinct neural subtypes also within the neural1 cluster, but that the vast majority of these cells do not express unique transcriptomic profiles. Also in the supplementary material for this (SF1) there is a question of whether or not there is any clustering according to batch effects.

      In general, I find the neuronal section a little difficult to follow and it is unclear how many unique profiles are present and which are documented with in situ. I would recommend re-running the analysis on the entire neural subset (n1:5: complexin positive) and generating an inventory of putatively unique neural states with the associated in situ validation altogether in a main figure.

      In response to comments above we have both clarified our reasoning for using SAM analysis, and presented more details on possible batch effects. We have gone through the neural system results in order to make it clearer for the reader to follow.

      Ln 236: here the authors introduce a STRING analysis for the first time. Also, this method requires some introduction for the general audience in terms of its goals and general functionality and output.

      We used STRING analysis on some well defined clusters to provide additional clues about function. At the first mention of STRING (neuron 3 results) we have added the following statement to give more introduction to the reader: “STRING analysis of the top 100 markers of Neuron 3 predicted two protein interaction networks with functional enrichment: ….”

      Ln. 280:281. It is unclear why Steger et al is referenced here. In what way does a description of neural and glandular cell transcriptomic similarity in a Cnidarian inform your data on a member of the playhelmenthes? (which should also be referenced in the introduction: to which phylogenetic lineage does Schistosoma belong).

      We have now added that the Schistosoma belong to the Platyhelminths on the first line of the introduction.

      Ln 295 we have added ‘We expected to find a discrete cluster(s) for the penetration glands, and that it would show similarities to the neural clusters (as glandular cells arise from neuroglandular precursor cells in other animals, such as the sea anemone, Nematostella vectensis, Steger et al., 2022).’

      Ln 339: explain the motivation for generating a further plate-based scRNA of the ciliary plates.

      We wished to include the ciliary plates alongside the gland cells for plate based RNAseq as they are unique to the miracidium stage and wanted to make sure we had captured them in this study.

      Ln 345: Define the tegumental cells for the general reader.

      We have added further description on tegument cells in the introduction and tegument results section, e.g. on line 61, 366).

      Ln 365: "this cluster" is imprecise. Which cluster are we looking at here?' Also: were flame cells already described morphologically at this stage, or is this the first description of the protonephridial system for this stage of the life cycle?

      We have now clarified which cluster we are talking about in the text. The flame cells have been described using TEM before (Pan, 1980).

      Stem Cells: also here you refer to cells as 'bridge' which refers to the configuration of the UMAP. While this is likely a biological representation of a different differentiation state, the nomination of this based solely on the UMAP representation should be avoided.

      We have rephrased this.

      Figure 5B: What is neuron 6? This was Neuron 3 in Figure 1.

      Thank you for spotting these mistakes in the labelling, we have corrected them now.

      Ln 421:438 - Here you represent a UMAP representation of the cell positions, but state that the clustering disappears. See comment in Public Review.

      Modified accordingly, see response in public review.

      Ln 472 "Cells in stem E, F, and G in silico clusters might be stressed/damaged/dying cells or cells in transcriptionally transitional states." Is there any evidence supporting either of these conclusions?

      We found that 15.56% of the cells in Stem F expressed the neural marker complexin, leading us to consider the possibility that a fraction of these cells may be neural precursors. Stem F also had some cells with a mitochondrial % near the maximum threshold we set, suggesting they could be experiencing some stress. Since we could not identify clear markers for these clusters, their function and a more specific identity, beyond ‘stem’, is not yet known.

      That the two stem cell populations contribute to different parts of the next life cycle stage is interesting. The combined analysis suffers from the same issues as the previous analysis in terms of sample distribution; are the 'grey' sporocyst cells also contributing to the stem A/B (kappa) C/D (delta/phi) clusters? This is not possible to tell from the plot as the miracidia may simply be plotted on the top. A different representation of sample contribution to clusters is warranted.

      We have made an alternative visualisation here to demonstrate that the miracidia cells are not plotted on top of the sporocyst stem cells. Unfortunately this visual is hampered as there is not a straightforward way to split the panels. In the figure below, the left pane shows the miracidia cells, and the right pane shows the sporocyst cells. Below that, we have included the original figure for comparison. It can be clearly seen that there are three miracidia tegument cells in the sporocyst tegument cluster, and one sporocyst cell in the miracidia stem cells (Stem E), but the miracidia A/B and C/D stem cells are not plotted on top of any sporocyst cells.

      Author response image 6.

      Methods: Why is the multiplet rate estimate at >50% for the unsorted sample?

      We have added more detail on this: “The estimated doublet rate was calculated based on 10X loading guidelines and adjusted for our sample concentrations”.

      Reviewer #2 (Recommendations For The Authors):

      (1) The manuscript would benefit from a more careful consideration of what was already known based on previous literature, which would help the authors to better put their results in context. For example, previous work suggested that one of the sporocyst stem cell populations (phi) gives rise to tegument and other temporary larval structures; this appears not to be mentioned here. The model in Figure 7 suggests that two of the stem cell populations are gone at day 15 post-infection; the literature shows that those cells can still be detected at this stage (there are just far fewer of them).

      We have added the definition of Kappa, Delta and Phi as per Wang et al (2018) in the stem cell results p13 ln 428.

      We have amended Figure 7 to include further elements from the Wang et al (2018) paper that show that mother sporocyst stem cells classified as delta and phi are still detectable on day 15 post-infection in mother sporocysts.

      We intentionally didn’t put too much emphasis on fitting our data to the model of Wang et al (2018), because a) it’s a different life cycle stage and b) the single cell data the model was based on was from 35 stem cells and gathered using a different method, c) more recent data (Diaz, Attenborough et al. 2024) with 119 stem cells from sporocysts did not recover the same populations of stem cells. We therefore linked our data to previous literature where it was relevant but focused on being led by the data we gathered (>10,000 stem cells).

      (2) To add some detail to the public comment about the lack of clarity about sample sizes and biological replicates, and how this leads to questions about the robustness of the results, Figures 4 B and F show the expression pattern for the same parenchyma marker (Smp_318890) in two different samples. The patterns appear quite distinctive. In B, the cell bodies are so clearly labeled that the signal appears oversaturated. In F the cell bodies are barely apparent. Based on the single-cell clustering, it should be possible to distinguish between Parenchyma clusters 1 and 2 based on the levels of this transcript. Careful quantification of signal intensity from multiple samples across multiple experiments might enable the authors to detect such differences.

      The reason the expression patterns look different between panels 4Bii and 4F is that in 4Bii we have manually segmented the nuclei of the parenchymal cells in order to count them, whereas in the images in 4F there is no segmentation. We have made this more clear in this legend now, and also in the legends of Figures 2,3, and 5. If there was any signal intensity difference between parenchyma 1 and 2 cells based on expression of the marker gene, Smp_318890, it was not obvious. We carried out 6 experiments for parenchyma markers, multiplexing the pan-parenchyma marker, Smp_318890, with markers for parenchyma 2 but we were unable to distinguish between the two populations.

      (3) The authors find that the "somatic" stem cells in miracidia seem to combine attributes of the previously defined delta and phi stem cells from sporocysts. Because the 3 classes of sporocyst stem cells were defined by expression of nanos-2 and fgfrA, using those probes in in-situ experiments could have helped them resolve whether or not the miracidial cells represent precursors that can adopt either fate or if the heterogeneity is already present in miracidia.

      In silico expression of the marker genes for the 3 classes of sporocyst stem cells didn’t support those three classes in the miracidia stem cells (See supplementary table 10). We further subclustered the delta/phi cells to see if we could recover separate delta and phi populations but we were unable to do so. We therefore did not pursue in situ experiments of these genes. We instead prioritised cluster-defining genes in the miracidia stem cell populations rather than cluster defining genes in the sporocyst (defined by Wang et al., 2018), but we still explored these in silico. For example, instead of using klf to define Kappa (Wang et al 2018), we used UPPA to validate the Kappa population as it showed similar expression to klf but higher expression levels and was specific to that population. However, like Wang et al 2018, we did use p53, which is a cluster marker of delta and phi in sporocysts, as it showed clear and high expression in our miracidia delta/phi population. We were guided by our data and our knowledge of the literature. More in depth single cell RNAseq is needed from the mother and daughter sporocyst stages to understand the heterogeneity and fates of these stem populations.

      (4) Scale bars should be included throughout the figures and the scale should be defined either on the figure or in the legend. Similarly, all the scales used for velocity and expression analysis should be defined.

      We have added scale bars to all figures and legends.

      The statements “Gene expression has been log-normalised and scaled using Seurat(v. 4.3.0)”, “Gene expression has been normalised (CPM) and log-transformed using scvelo(v. 0.2.4)”, or “Library size was normalised and gene expression values were log-normalised using SAM (v1.0.1) and Scanpy (v1.8.2)” has been added to all figures as appropriate.

      (5) The table entitled In situ hybridization probes (Supplementary Table 15) contains no probe sequences, so any interested reader wishing to use these probes would have to design their own. To ensure the reproducibility of the results presented here, the authors should provide the probe sequences they used.

      In Supplementary Table 15 we have added the Molecular Instruments Lot number of all the probes used. Anyone wanting to repeat the experiment can order the same probes from the company.

      (6) It is unclear how useful the supplemental figures showing the STRING enrichment analyses will be for readers. Unannotated Smp gene identifiers provide no way to help readers digest the information in these hairballs. It would probably be best to replace the Smp names with useful annotations based on their orthologs; if not, these figures could probably be dropped entirely. (Also, the bottom panel of Supplementary Figure 7 has the word "Lorem" embedded on one of the connecting nodes.)

      “Lorem” has been removed.

      Many of the genes in these analyses do not have short descriptions, therefore we have used Smp gene identifiers in the STRING analysis supplementary figures. These ‘Smp_’ numbers can be used to search WormBase Parasite, where a description can be found and the history of the gene ID traced. This latter function facilitates searching for these genes in the literature and consistency between versions as gene models are updated.

      Minor edits

      (1) Figures 4A-D aren't cited in the text until after 4E-F are. It seems like moving the section on protonephridial cells (line 364) before the section on tegumental cells (line 345) better reflects the order of the figures.

      Thank you for flagging this, we have updated the in-text citations of Figure 4.

      (2) In-text references to Sarfati et al, 2021 should be to Nanes Sarfati, as listed in the references. Poteaux et al 2023 is cited in the text, but not in the reference list.

      Both of these have been fixed.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      The authors track the motion of multiple consortia of Multicellular Magnetotactic Bacteria moving through an artificial network of pores and report a discovery of a simple strategy for such consortia to move fast through the network: an optimum drift speed is attained for consortia that swim a distance comparable to the pore size in the time it takes to align the with an external magnetic field. The authors rationalize their observations using dimensional analysis and numerical simulations. Finally, they argue that the proposed strategy could generalize to other species by demonstrating the positive correlation between the swimming speed and alignment time based on parameters derived from literature.

      Strengths:

      The underlying dimensional analysis and model convincingly rationalize the experimental observation of an optimal drift velocity: the optimum balances the competition between the trapping in pores at large magnetic fields and random pore exploration for weak magnetic fields.

      Weaknesses:

      The convex pore geometry studied here creates convex traps for cells, which I expect enhances their trapping. The more natural concave geometries, resulting from random packing of spheres, would create no such traps. In this case, whether a non-monotonic dependence of the drift velocity on the Scattering number would persist is unclear.

      We agree that convex walls increase the time that consortia remain trapped in pores at high magnetic fields. Since the non-monotonic behavior of the drift velocity with the Scattering number arises largely due to these long trapping times, we agree that experiments using concave pores are likely to show a peak drift velocity that is diminished or erased.

      However, we disagree that a random packing of spheres or similar particles provides an appropriate model for natural sediment, which is not composed exclusively of hard particles in a pure fluid. Pore geometry is also influenced by clogging. Biofilms growing within a network of convex pillars in two-dimensional microfluidic devices have been observed to connect neighboring pillars, thereby forming convex pores. Similar pore structures appear in simulations of biofilm growth between spherical particles in three dimensions. Moreover, the salt marsh sediment in which MMB live is more complex than simple sand grains, as cohesive organic particles are abundant. Experiments in microfluidic channels show that cohesive particles clog narrow passageways and form pores similar to those analyzed here. Thus, we expect convex pores to be present and even common in natural sediment where clogging plays a role.

      The concentration of convex pores in the experiments presented here is almost certainly much higher than in nature. Nonetheless, since magnetotactic bacteria continuously swim through the pore space, they are likely to regularly encounter such convexities. Efficient navigation of the pore space thus requires that magnetotactic bacteria be able to escape these traps. In the original version of this manuscript, this reasoning was reduced to only one or two sentences. That was a mistake, and we thank the reviewer for prompting us to expand on this point. As the reviewer notes, this reasoning is central to the analysis and should have been featured more prominently. In the final version, we will devote considerable space to this hypothesis and provide references to support the claims made above.

      The reviewer suggests that the generality of this work depends on our finding a ”positive correlation between the swimming speed and alignment [rate] based on parameters derived from literature.” We wish to emphasize that, in addition to predicting this correlation, our theory also predicts the function that describes it. The black line in Figure 3 is not fitted to the parameters found in the literature review; it is a pure prediction.

      Reviewer #2 (Public review):

      The authors have made microfluidic arrays of pores and obstacles with a complex shape and studied the swimming of multicellular magnetotactic bacteria through this system. They provide a comprehensive discussion of the relevant parameters of this system and identify one dimensionless parameter, which they call the scattering number and which depends on the swimming speed and magnetic moment of the bacteria as well as the magnetic field and the size of the pores, as the most relevant. They measure the effective speed through the array of pores and obstacles as a function of that parameter, both in their microfluidic experiments and in simulations, and find an optimal scattering number, which they estimate to reflect the parameters of the studied multicellular bacteria in their natural environment. They finally use this knowledge to compare different species to test the generality of this idea.

      Strengths:

      This is a beautiful experimental approach and the observation of an optimal scattering number (likely reflecting an optimal magnetic moment) is very convincing. The results here improve on similar previous work in two respects: On the one hand, the tracking of bacteria does not have the limitations of previous work, and on the other hand, the effective motility is quantified. Both features are enabled by choices of the experimental system: the use the multicellular bacteria which are larger than the usual single-celled magnetotactic bacteria and the design of the obstacle array which allows the quantification of transition rates due to the regular organization as well as the controlled release of bacteria into this array through a clever mechanism.

      Weaknesses:

      Some of the reported results are not as new as the authors suggest, specifically trapping by obstacles and the detrimental effect of a strong magnetic field have been reported before as has the hypothesis that the magnetic moment may be optimized for swimming in a sediment environment where there is a competition of directed swimming and trapping. Other than that, some of the key experimental choices on which the strength of the approach is based also come at a price and impose some limitations, namely the use of a non-culturable organism and the regular, somewhat unrealistic artificial obstacle array.

      In the “Recommendations for the Authors,” this reviewer drew our attention to a manuscript that absolutely should have been prominently cited. As the reviewer notes, our manuscript meaningfully expands upon this work. We are pleased to learn that the phenomena discussed here are more general than we initially understood. It was an oversight not to have found this paper earlier. The final version will better contextualize our work and give due credit to the authors. We sincerely appreciate the reviewer for bringing this work to our attention.

      We disagree that the use of non-culturable organisms and our unrealistic array should be considered serious weaknesses. While any methodological choice comes with trade-offs, we believe these choices best advance our aims. First, the goal of our research, both within and beyond this manuscript, is to understand the phenotypes of magnetotactic bacteria in nature. While using pure cultures enables many useful techniques, phenotypic traits may drift as strains undergo domestication. We therefore prioritize studying environmental enrichments.

      Clearly, an array of obstacles does not fully represent natural heterogeneity. However, using regular pore shapes allows us to average over enough consortium-wall collisions to enable a parameter-free comparison between theory and experiment. Conducting an analysis like this with randomly arranged obstacles would require averaging over an ensemble of random environments, which is practically challenging given the experimental constraints. Since we find good agreement between theory and experiment in simple geometries, we are now in a position to justify extending our theory to more realistic geometries. Additionally, we note that a microfluidic device composed of a random arrangement of obstacles would also be a poor representation of environmental heterogeneity, as pore shape and network topology differ between two and three dimensions.

      Recommendations for the Authors: 

      Reviewer #1 (Recommendations for the authors):

      My main suggestion is for the authors to describe the limitations of their approach in the case of concave pores.

      As we noted in our public comments, this was a very useful comment to hear from you and one that has been repeated as we have spoken about these results to colleagues. Convexities here represent an experimentally simple way to force bacteria to back track through the maze, as they must through natural sediment. We have greatly expanded this discussion to clarify this reasoning (lines 84–105). We provide references to three types of physical processes that may lead to such traps. First, as in figure 1 of Kurz et al, biofilm (white) can fill the spaces between convex pillars to create covexities. Additionally, clogging by cohesive particles can make narrow passageways between convex particles impassible. An example of clogging is shown in figure 6 of Dressaire & Sauret 2017. Finally, air bubbles trapped in the sediment can create pore-scale dead ends that require bacteria to backtrack. The full references are provided in the main text.

      Small points:

      (1) How many trajectories were used to produce Figures 2 b and c?

      We have modified the caption to note that these data represent the measured transition rates of a total 938 consortia at various Scattering numbers. Each consortium may pass between pores many times.

      (2) Can the authors describe in more detail how Equation (3) is derived? Why doesn’t it depend on the gap size between the pores?

      We have provided a derivation of this equation in Appendix 2 of the new version. This derivation shows that the drift velocity U<sub>drift</sub> is proportional to the pore diameter and difference between the transition rates.

      The proportionality constant α depends on how the pores are connected together in space. In the original version, we wanted to highlight the role of the asymmetry of the transition rates, so we imagined a one dimensional network of pores without gaps. In this case, α \= 1. This reasoning was poorly explained in the previous version and we thank the reviewer for pointing this deficiency out. In the new version, we include the gap size and use the layout of pores in a square lattice with gaps, which is shown in figure 1. The proportionality constant for a square lattice in the absence of gaps√ would be 1/2. The limitations of photolithography require some gap that increase the proportionality constant to α \= 0.8344.

      We have updated the text, equation (3), and the figures to account for the finite gap sizes.

      (3) I found the second part of the abstract, related to the comparison between diverse bacteria, to be slightly misleading. Upon first reading, my expectation was that the authors carried out experiments with different species.

      We have modified the abstract to make clear that we rely on values taken from a literature review.

      (4) More information is needed on how many trajectories were used to produce the probability densities in Figures 1b-d. How were the densities computed?

      The probability distributions give the probability that a pixel in a pore is covered by a consortium. They reflect between 1.2 and 7 million measurements (depending on the panel) of the instantaneous positions of consortia. We have added a section (Lines 453–469) to Materials and Methods that describes exactly how these distributions were calculated.

      Reviewer #2 (Recommendations for the authors):

      (1) As mentioned under Weaknesses in the Public review, some results are less new than claimed here. The existence of an optimal magnetic moment has been shown by Codutti et al eLife eLife13:RP98001 in very similar experiments, where it was also proposed that this may be an evolutionary adaptation to the sediment habitat. The paper here provides additional evidence for this, and with better tracking and quantification, but previous work should be discussed. Likewise, the work by Dekharghani et al. that is mentioned rather suddenly in the Results section appears to be a crucial previous state of the art and could already be mentioned in the introduction.

      We thank the reviewer for bringing this paper, which came out as we were writing this manuscript, to our attention. The hypothesis that there is an optimal phenotype that balances magnetotaxis with obstacle avoidance—and that natural selection could guide organisms to this optimum—goes back to at least 2022. It seems that Codutti et al independently came up with this same hypothesis and provided the first test.

      We have substantively rewritten the introduction (Lines 46–58) to better contextualize our work and give due attention to Dekharghani et al.

      (2) The first paragraph of Results also contains background information and could be moved into the introduction.

      As part of the rewrite to better contextualize our work, we moved the first two paragraphs of results to the introduction.

      (3) I found Figure 1 a bit confusing and it took me some time to understand the geometry. I think the black obstacles are very dominant to the viewer’s eye and draw attention away from the essentially circular shape of the pores. Likewise, I am not sure that cutting the neighboring pores off in a circular fashion in Figures 1b-d was the best choice. The authors should think about whether the presentation can be improved. Likewise, when describing the direction of the field in the text, I would suggest adding that it is along the horizontal direction in Figure 1.

      We have modified the figure and the text as the reviewer suggests.

      (4) That collisions with a pore wall are an important mechanism of changing direction is clear and it is nice to see the paper demonstrate that this mechanism is dominant over rotational diffusion. However, this may not be universal, as (i) rotational diffusion is more important for smaller cells and (ii) interaction with walls can result in all kinds of different behaviors than complete randomization (e.g. swimming along the walls as shown in microfluidic chambers, Ostapenko et al. Phys Rev Lett 2018, Codutti et al. eLife 2022, or reversals, Kuhn et al PNAS 2017). Here, it appears that complete randomization of the direction is an assumption, but this could be tested/quantified by analyzing the trajectories.

      This is an excellent point. We have modified the text to describe qualitatively how these tendencies would shift the Critical Scattering number. We also note in the text that there is evidence of these differences in Fig 3. The Desulfobacterota are shifted upwards in Fig 3 relative to the α-proteobacteria. This shift indicates that Desulfobacterota tend to live at slightly greater scattering numbers of 0.9±0.3 than the α-proteobacteria, which live at scattering number 0.37 ± 0.03. It is likely that this difference reflects taxonomic differences in rotational diffusion and cell-wall interactions.

      It is true that total randomization of the direction is indeed an assumption, and it is stated as such in line 189. We performed all of the numerics to find the solid curves in Fig 2 before we got any experimental data and so, at the time, total randomization seemed like a fair choice. Looking at Fig 2b, it is clear that these numerics systematically overestimate k<sub>−</sub>. We believe that this error is do to the assumption of total randomization.

      As this effect is small and does not change any of the conclusions of the paper and Codutti et al were able to publish their paper in the time that we were writing ours, we feel some urgency to move forward.

      (5) From the manuscript it is not fully clear to what extent experiments and simulations are or can be quantitatively compared. For example: is the curve (“fit”) in Figure 2c based on the simulations? Is there an explicit expression or is this just a spline or something like that? Why does Figure 5 (simulation) show the velocity as a function of Sc<sup>−1</sup>and Figure 2 (experiment) as a function of Sc? It looks to me as if a quantitative comparison could be achieved.

      The original version of Figure 2 shows a quantitative comparison between theory and experiment with no fit parameters. The data points are the result of experiments in which consortia are tracked as they as they move between connected pores. The solid line is a found by interpolating a smooth curve through the data from simulations. As we make clear in the new version (Lines 537–551), this blue curve is the most probable smooth curve that explains the simulations.

      We have added the simulations to figure 2 so that a single panel includes the data, the simulations, and the smooth curve. To further make clear that this comparison is quantitative and parameter free, we have added a panel to Figure 2. This panel directly compares the prediction to observation and is independent of the blue curve.

      As was noted (deep within the methods section) in the original version, our numerics can exactly simulate Sc = ∞. Consequently, it was reasonable to simulate parameters that are uniformly spaced in Sc<sup>−1</sup>.

      (6) While I like the idea behind Figure 3, the data shown here is not as convincing as suggested. If one looks at the data without the black line, I think one gets a weaker dependence. The correlation between U<sub>0</sub> and γ<sub>geo</sub> is likely not as strong as it seems. Calculating a correlation coefficient might be helpful here. In any case, the assumptions going into this figure should be discussed more explicitly and the results should in my opinion be phrased more cautiously (I tend to believe what the authors claim, but I don’t think the evidence for this point is very strong).

      We appreciate the reviewer’s skepticism. However, we believe that the data are stronger than one might understand from the previous text. We have rewritten the text (Lines 219–291) and included new analysis, figures, and explanation to make three points clear.

      (a) It is surprising that speed, magnetic moment, and mobility all vary tremendously(between one and three orders of magnitude) across taxa and environment, however, their dimensionless combination Sc is narrowly distributed. We have added a panel to Fig. 3 to show the measured Scattering numbers.

      It is notable that there are no adjusted parameters in the calculation of the Scattering numbers: it is a simple dimensionless combination of phenotypic and environmental parameters. All but one of these parameters (the pore size) is measured either by us or by other authors. The pore radius is likely narrowly distributed. We measure it at our field site and, when it is not reported, we use a value typical of the geological and fluvial environment. Just as the size of sand grains does not vary greatly between the beaches of Australia, Africa, and California, it is a good assumption that the pore spaces that host these magnetotactic bacteria do not vary tremendously in size.

      (b) In the new version we compare the Scattering number statistics to a parameterfree null model of phenotypic diversity. We argue in the text that it is appropriate to bootstrap over the phenotypic diversity of species. This null model provides the correct method to calculate p-values as the variability in the Scattering numbers is neither identically distributed nor normally distributed.

      We use this null model to show that—given the measured phenotypic diversity across species—the probability that fifteen random species would fall within the measured range of Scattering numbers that is consistent with optimal navigation is ∼ 10<sup>−6</sup>. This result is strong evidence that the phenotypic variables exhibit the correlations that are predicted by our analysis.

      (c) The correlation between U<sub>0</sub>/r and γ<sub>geo</sub> is reasonably strong. I think that our choice of axes in Fig 3, which were chosen to fit the legend, make the data look flatter than then they actually are. Here are the same data plotted without the line with tighter axes:

      Author response image 1.

      With the exception of the very first point and the very last point, the data appear to our eyes to be pretty correlated. This impression is born out by a calculation of the correlation coefficient which gives 0.77. The p-value is 4 × 10<sup>−4</sup>. We have included these values in the main text to clarify that this correlation is both statistically significant and of primary importance.

      (7) There is a comment at the end of the discussion that the evolutionary hypothesis could be tested by transferring the magnetotaxis genes to nonmagnetotactic organisms. This would indeed be highly desirable, but this is very difficult as indicated by the successful efforts in that direction (which often are only moderately magnetic/magnetotactic), see Kolinko et al Nature Nanotech 2014, Dziuba et al Nature Nanotech 2024.

      Thank you for highlighting these references, which we have included. We agree that these experiments will be challenging. Our results make a prediction about the evolution of these strains, so it seems worth mentioning this fact. We feel that this manuscript is not the correct space for a detailed description of challenges that we will encounter should we pursue this direction of study.

      (8) A section on how the bacterial samples were obtained could be added in Methods.

      We have done so.

      Additional Changes

      (1) In the original version, we feared that the consortia in the microfluidic device arepoorly representative of the natural population. Consequently, we used the values from previous experiments, which we performed using consortia taken from the same pond. Since submitting this manuscript we have undertaken new experiments that allowed us to measure the Scattering number of individual consortia. It turns out the effect is smaller than we worried. We have included these measurements in the new version. We find that even as the most common phenotypes vary over the course of time, the Scattering number remains constant. This result is additional evidence that there is strong selective pressure to optimally navigate.

      As a result of these additions, we have added an author, Julia Hernandez, who contributed to these experiments and analysis.

      (2) We have expanded the table of phenotypic variable in Appendix 1 to make it easier forother researchers to reproduce our analysis.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Hearing and balance rely on specialized ribbon synapses that transmit sensory stimuli between hair cells and afferent neurons. Synaptic adhesion molecules that form and regulate transsynaptic interactions between inner hair cells (IHCs) and spiral ganglion neurons (SGNs) are crucial for maintaining auditory synaptic integrity and, consequently, for auditory signaling. Synaptic adhesion molecules such as neurexin-3 and neuroligin-1 and -3 have recently been shown to play vital roles in establishing and maintaining these synaptic connections ( doi: 10.1242/dev.202723 and DOI: 10.1016/j.isci.2022.104803). However, the full set of molecules required for synapse assembly remains unclear.

      Karagulan et al. highlight the critical role of the synaptic adhesion molecule RTN4RL2 in the development and function of auditory afferent synapses between IHCs and SGNs, particularly regarding how RTN4RL2 may influence synaptic integrity and receptor localization. Their study shows that deletion of RTN4RL2 in mice leads to enlarged presynaptic ribbons and smaller postsynaptic densities (PSDs) in SGNs, indicating that RTN4RL2 is vital for synaptic structure. Additionally, the presence of "orphan" PSDs-those not directly associated with IHCs-in RTN4RL2 knockout mice suggests a developmental defect in which some SGN neurites fail to form appropriate synaptic contacts, highlighting potential issues in synaptic pruning or guidance. The study also observed a depolarized shift in the activation of CaV1.3 calcium channels in IHCs, indicating altered presynaptic functionality that may lead to impaired neurotransmitter release. Furthermore, postsynaptic SGNs exhibited a deficiency in GluA2/3 AMPA receptor subunits, despite normal Gria2 mRNA levels, pointing to a disruption in receptor localization that could compromise synaptic transmission. Auditory brainstem responses showed increased sound thresholds in RTN4RL2 knockout mice, indicating impaired hearing related to these synaptic dysfunctions.

      The findings reported here significantly enhance our understanding of synaptic organization in the auditory system, particularly concerning the molecular mechanisms underlying IHC-SGN connectivity. The implications are far-reaching, as they not only inform auditory neuroscience but also provide insights into potential therapeutic targets for hearing loss related to synaptic dysfunction.

      We would like to thank the reviewer for appreciating the work and the advice that helped us to further improve the manuscript. We have carefully addressed all concerns, please see our point-per-point response below and the revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      Kargulyan et al. investigate the function of the transsynaptic adhesion molecule RTN4RL2 in the formation and function of ribbon synapses between type I spiral ganglion neurons (SGNs) and inner hair cells. For this purpose, they study constitutive RTN4RL2 knock-out mice. Using immunohistochemistry, they reveal defects in the recruitment of protein to ribbon synapses in the knockouts. Serial block phase EM reveals defects in SGN projections in mutants. Electrophysiological recordings suggest a small but statistically significant depolarized shift in the activation of Cav1.3 Ca<sup>2+</sup> channels. Auditory thresholds are also elevated in the mutant mice. The authors conclude that RTN4RL2 contributes to the formation and function of auditory afferent synapses to regulate auditory function.

      We would like to thank the reviewer for appreciating the work and the advice that helped us to further improve the manuscript. We have carefully addressed all concerns, please see our point-per-point response below and the revised manuscript.

      Strengths:

      The authors have excellent tools to analyze ribbon synapses.

      Weaknesses:

      However, there are several concerns that substantially reduce my enthusiasm for the study.

      (1) The analysis of the expression pattern of RTN4RL2 in Figure 1 is incomplete. The authors should show a developmental time course of expression up into maturity to correlate gene expression with major developmental milestones such as axon outgrowth, innervation, and refinement. This would allow the development of models supporting roles in axon outgrowth versus innervation or both.

      We agree that it would be valuable to show the developmental time course of RTN4RL2 expression. In response to the reviewer’s comment, we are providing RNAscope data from developmental ages E11.5, E12.5 and E16 in Figure 1. RTN4RL2 shows expression at E11.5/E12.5 both in the spiral ganglion and hair cell region, with first onset in the hair cells. We conclude that RTN4RL2 is expressed highest during fiber growth at embryonic stages and is downregulated during postnatal development maintaining low levels of expression during adulthood.

      (2) It would be important to improve the RNAscope data. Controls should be provided for Figure 1B to show that no signal is observed in hair cells from knockouts. The authors apparently already have the sections because they analyzed gene expression in SGNs of the knock-outs (Figure 1C).

      In Figure 1C gene expression in SGNs was assessed at p40, while the expression in hair cells is provided for p1 animals. Unfortunately, we do not have KO controls for p1 animals. However, as indicated in our manuscript, previously published RNA expression datasets do find RTN4RL2 expression in hair cells. Therefore, we think it is unlikely that our results are unspecific.

      (3) It is unclear from the immunolocalization data in Figure 1D if all type I SGNs express RTN4RL2. Quantification would be important to properly document the presence of RTN4RL2 in all or a subset of type I SGNs. If only a subset of SGNs express RTN4RL2, it could significantly affect the interpretation of the data. For example, SGNs selectively projecting to the pillar or modiolar side of hair cells could be affected. These synapses significantly differ in their properties.

      According to already published single cell RNAseq dataset from Shrestha et al., 2018, RTN4RL2 expression does not seem to show a clear type I SGN subtype specificity (Author response image 1). In response to the reviewer’s comment, we have further performed anti-Parvalbumin (PV) and anti-calretinin (CR) immunostainings in mid-modiolar cryosections of RTN4RL2<sup>+/+</sup> and RTN4RL2<sup>-/-</sup> cochleae. Parvalbumin was chosen to label all SGNs and CALB2 was chosen primarily as a type Ia SGN marker (Sun et al., 2018). We present the data from all analyzed samples below (figure 2 of this rebuttal letter). Cell segmentation masks of PV positive cells were obtained using Cellpose 2.0 and the average CR intensity was calculated in those masks. While the distributions of CR intensity and the ratio of CR and PV intensities are slightly shifted in RTN4RL2<sup>-/-</sup> cochleae, we take the data to suggest that the composition of the spiral ganglion by molecular type I SGN subtypes is largely unchanged in RTN4RL2<sup>-/-</sup> mice.

      Author response image 1.

      Author response image 1 cites single cell RNAseq data of Brikha R Shrestha, Chester Chia, Lorna Wu, Sharon G Kujawa, M Charles Liberman, Lisa V Goodrich. Sensory neuron diversity in the inner ear is shaped by activity. Cell. 2018 Aug 23; 174(5):1229-1246.e17. doi: 10.1016/j.cell/2018.07.007

      Author response image 2.

      Calretinin intensity distribution in spiral ganglion of RTN4RL2<sup>+/+</sup> and RTN4RL2<sup>-/-</sup> mice. (A) Mid-modiolar cochlear cryosections from RTN4RL2<sup>+/+</sup> (top) and RTN4RL2<sup>-/-</sup> (bottom) mice immunolabeled against Parvalbumin (PV) and Calretinin (CR). Scale bar = 20 mm. (B) Distribution of CR intensity in PV positive cells (N = 3 for each genotype). (C) Distribution of the ratio of CR and PV intensities (N = 3 for each genotype).

      (4) It is important to show proper controls for the RTN4RL2 immunolocalization data to show that no staining is observed in knockouts.

      Unfortunately, our recent attempts to perform RTN4RL2 immunostainings on cryosections failed and therefore, we decided to remove the RTNr4RL2 immunostainings from Figure 1. We have adjusted the results section accordingly.

      (5) The authors state in the discussion that no staining for RTN4RL2 was observed at synaptic sites. This is surprising. Did the authors stain multiple ages? Was there perhaps transient expression during development? Or in axons indicative of a role in outgrowth, not synapse formation?

      We thank the reviewer for the comment. We have now tried RTN4RL2 immunostainings on cryosections at several developmental stages, but unfortunately this time did not succeed to obtain reproducible and reliable results. Therefore, we decided to also remove the previous immunostainings from Figure 1. We have adjusted the results section as well as removed our statement of not detecting RTN4RL2 near the synaptic regions from the discussion.

      (6) In Figure 2 it seems that images in mutants are brighter compared to wildtypes. Are exposure times equivalent? Is this a consistent result?

      Yes, the samples were prepared in parallel, imaged and analyzed in the same manner.

      No, we did not observe consistent differences in brightness and also did not find it in the exemplary images of figure 2.

      (7) The number of synaptic ribbons for wildtype in Figure 2 is at 10/IHCs, and in Figure 2 Supplementary Figure 2 at 20/IHCs (20 is more like what is normally reported in the literature). The value for mutant similarly drastically varies between the two figures. This is a significant concern, especially because most differences that are reported in synaptic parameters between wild-type and mutants are far below a 2-fold difference.

      The key message is that there is no difference in the numbers of ribbons and synapses between the genotypes for the cochlear apex (~10 ribbons/IHCs, Figure 2 and Figure 2-figure supplement 2) and the mid- and base of the cochlea (more ribbons/IHCs, Figure 2-figure supplement 2). Figure 2-figure supplement 3 (now Figure 3) shows that there is a massive reduction of postsynaptic GluA2, while both Figure 2 and Figure 2-figure supplement 2 indicate that the number synapses is normal. These are two different data sets and while we closely collaborated and also shared the Moser lab protocols and analysis routines, we agree that there is a difference in the absolute synapse count, which most likely was an observer difference and different choice of tonotopic positions of analysis. In Figure 2 only the apical hair cells have been analyzed. The Moser lab, since establishing the immunofluorescence-based quantification of synapse number (Khimich et al., 2005) reported tonotopic differences in synapse counts (focus of Meyer et al., 2009 and reported by others: e.g. Kujawa and Liberman, 2009): apical and basal IHCs lower synapse numbers than mid-cochlear IHCs.

      (8) The authors report differences in ribbon volume between wild-type and mutant. Was there a difference between the modiolar/pillar region of hair cells? It is known that synaptic size varies across the modiolar-pillar axis. Maybe smaller synapses are preferentially lost?

      We thank the reviewer for the comment. Unfortunately, our already acquired datasets from 3-week-old mice did not allow us to check whether the previously described modiolar-pillar gradient of the ribbon size was collapsed in RTN4RL2<sup>-/-</sup> mice due to the not so well-preserved morphology of the inner hair cells in our preparations. However, since the number of the ribbons is not changed in the RTN4RL2 KO mice, we do not think that the increase in the ribbon size is due to the loss of small ribbons. In response to the reviewers comment we have analyzed the modiolar-pillar gradient of the ribbon size in IHCs of middle turn of the cochlea form a newly acquired dataset of 14-week-old mice. We took the fluorescence intensity of Ctbp2 positive puncta as a proxy for the ribbon size. In these older mice we found a preserved modiolar-pillar gradient of the ribbon size (larger ribbons at the modiolar side). We summarized the results in the below Author response image 3.

      Author response image 3.

      The modiolar-pillar gradient of ribbon size is preserved in RTN4RL2<sup>-/-</sup> IHCs. (A) Maximum intensity projections of approximately 2 IHCs stained against Vglut3 and Ctbp2 from 14-week-old RTN4RL2<sup>+/+</sup> (left) and RTN4RL2<sup>-/-</sup> (right) mice. Scale bar = 5 mm. (B) Synaptic ribbons on the modiolar side show higher fluorescence intensity than the ones on the pillar side of mid-cochlear IHCs in both RTN4RL2<sup>+/+</sup> (left, N=2) RTN4RL2<sup>-/-</sup> (right, N=2) mice. (C) Average fluorescence intensity of modiolar ribbons per IHC is higher than the average fluorescence intensity of pillar ribbons (paired t-test, p < 0.001).

      (9) The authors show in Figure 2 - Supplement 3 that GluA2/3 staining is absent in the mutants. Are GluA4 receptors upregulated? Otherwise, synaptic transmission should be abolished, which would be a dramatic phenotype. Antibodies are available to analyze GluA4 expression, the experiment is thus feasible. Did the authors carry out recordings from SGNs?

      In response to the reviewer’s comment, we have performed GluA4 stainings in RTN4LR2<sup>-/-</sup> mice and did not detect any GluA4 positive signal in the mutants (new Figure 3-figure supplement 1). Unfortunately, our animal breeding license was expired at the time we received the reviews and that is why our results are from 14-week-old animals. To verify that the absence of GluA4 signal is not due to potential PSD loss in 14-week-old RTN4RL2<sup>-/-</sup>, we have additionally performed anti-Ctbp2, anti-Homer1 and anti-Vglut3 stainings in 14-week-old animals. Despite the reduced number, we still observed juxtaposing pre- and postsynaptic puncta. We assume that the reviewer asks for patch-clamp recordings from SGNs, which are, as we are confident the reviewer is aware of, technically very challenging and beyond the scope of the present study but an important objective for future studies.  In response to the reviewers comment we have added a statement to the discussion pointing to these patch-clamp recordings from SGNs as important objective for future studies.

      (10) The authors use SBEM to analyze SGN projections and synapses. The data suggest that a significant number of SGNs are not connected to IHCs. A reconstruction in Figure 3 shows hair cells and axons. It is not clear how the outline of hair cells was derived, but this should be indicated. Also, is this a defect in the formation of synapses and subsequent retraction of SGN projections? Or could RTN4RL2 mutants have a defect in axonal outgrowth and guidance that secondarily affects synapses? To address this question, it would be useful to sparsely label SGNs in mutants, for example with AAV vectors expression GFP, and to trace the axons during development. This would allow us to distinguish between models of RTN4RL2 function. As it stands, it is not clear that RTN4RL2 acts directly at synapses.

      We agree with the reviewer on the value of a developmental study of afferent connectivity but consider this beyond the scope of the present study. In response to the reviewer's comment, we have replaced the IHC outlines with volume-reconstructed IHCs in Figure 3B (now Figure 4B). Moreover, as shown in Figure 3F (now Figure 4F), most if not all type-I SGNs (both with and without ribbon) were unbranched in the mutants just like in wildtype (also shown for a larger sample in Hua et al., 2021), arguing against morphological abnormality during development.

      (11) The authors observe a tiny shift in the operation range of Ca<sup>2+</sup> channels that has no effect on synaptic vesicle exocytosis. It seems very unlikely that this difference can explain the auditory phenotype of the mutant mice.

      We assume that the statement refers to the normal exocytosis of mutant IHCs at the potential of maximal Ca<sup>2+</sup> influx (Figure 3G and H, now Figure 4G and H). We would like to note that this experiment was performed to probe for a deficit of synapse function beyond that of the Ca<sup>2+</sup> channel activation, but did not address the impact of the altered voltage—dependence of Ca<sup>2+</sup> channel activation. In response to the reviewer’s comment, we have now added further discussion to more clearly communicate that for the range of receptor potentials achieved near sound threshold we expect impaired IHC exocytosis as the Ca<sup>2+</sup> channels require slightly more depolarization for activation in the mutant IHCs.

      (12) ABR recordings were conducted in whole-body knockouts. Effects on auditory thresholds could be a secondary consequence of perturbation along the auditory pathway. Conditional knockouts or precisely designed rescue experiments would go a long way to support the authors' hypothesis. I realize that this is a big ask and floxed mice might not be available to conduct the study.

      Thanks for this helpful comment and, indeed, unfortunately, we do not have conditional KO mice at our disposal. We totally agree that this will be important also for clarifying the role of IHC vs. SGN expression of RTN4RL2. In response to the reviewer’s comment, we now discussed the shortcoming of using constitutive RTN4RL2<sup>-/-</sup> mice and added this important experiment on IHC and SGN specific deletion of RTN4RL2 as an objective of future studies.

      Reviewer #3 (Public review):

      In this study, the authors used RNAscope and immunostaining to confirm the expression of RTN4RL2 RNA and protein in hair cells and spiral ganglia. Through RTN4RL2 gene knockout mice, they demonstrated that the absence of RTN4RL2 leads to an increase in the size of presynaptic ribbons and a depolarized shift in the activation of calcium channels in inner hair cells. Additionally, they observed a reduction in GluA2/3 AMPA receptors in postsynaptic neurons and identified additional "orphan PSDs" not paired with presynaptic ribbons. These synaptic alterations ultimately resulted in an increased hearing threshold in mice, confirming that the RTN4RL2 gene is essential for normal hearing. These data are intriguing as they suggest that RTN4RL2 contributes to the proper formation and function of auditory afferent synapses and is critical for normal hearing. However, a thorough understanding of the known or postulated roles of RTN4Rl2 is lacking.

      We would like to thank the reviewer for appreciating the work and the advice that helped us to further improve the manuscript. We have carefully addressed all concerns, please see our point-per-point response below and the revised manuscript.

      While the conclusions of this paper are generally well supported by the data, several aspects of the data analysis warrant further clarification and expansion.

      (1) A quantitative assessment is necessary in Figure 1 when discussing RNA and protein expression. It would be beneficial to show that expression levels are quantitatively reduced in KO mice compared to wild-type mice. This suggestion also applies to Figure 2-supplement 3.D, which examines expression levels.

      The processing of our control and KO samples for RNAscope was not strictly done in parallel and therefore we would like to refrain from quantitative comparison.

      (2) In Figure 2, the authors present a morphological analysis of synapses and discuss the presence of "orphan PSDs." I agree that Homer1 not juxtaposed with Ctbp2 is increased in KO mice compared to the control group. However, in quantifying this, they opted to measure the number of Homer1 juxtaposed with Ctbp2 rather than directly quantifying the number of Homer1 not juxtaposed with Ctbp2. Quantifying the number of Homer1 not juxtaposed with Ctbp2 would more clearly represent "orphan PSDs" and provide stronger support for the discussion surrounding their presence.

      We appreciate the reviewer’s comment. We did not perform this analysis primarily because “orphan” Homer1 puncta, as seen in our immunostainings, are distributed away from hair cells in diverse morphologies and sizes. This makes distinguishing them from unspecific immunofluorescent spots—also present in wild-type samples—challenging. In response to the reviewer’s request, we analyzed the number of “orphan” Homer1 puncta in our previously acquired RTN4RL2<sup>+/+</sup> and RTN4RL2<sup>-/-</sup> samples. Using the surface algorithm in Imaris software, we applied identical parameters across all samples to create surfaces for Homer1-positive puncta (total Homer1 puncta). We quantified “orphan” Homer1 puncta as the difference between total and ribbon-juxtaposing Homer1 puncta and normalized this number to the IHC count. Our results showed 4.3 vs. 26.8 “orphan” Homer1 puncta per IHC in RTN4RL2<sup>+/+</sup> and RTN4RL2<sup>-/-</sup> samples, respectively. We note that variations in acquired volumes between samples may introduce confounding effects.

      (3) In Figure 2, Supplementary 3, the authors discuss GluA2/3 puncta reduction and note that Gria2 RNA expression remains unchanged. However, there is an issue with the lack of quantification for Gria2 RNA expression. Additionally, it is noted that RNA expression was measured at P4. While the timing for GluA2/3 puncta assessment is not specified, if it was assessed at 3 weeks old as in Figure 2's synaptic puncta analysis, it would be inappropriate to link Gria2 RNA expression with GluA2/3 protein expression at P4. If RNA and protein expression were assessed at P4, please indicate this timing for clarity.

      GluA2/3 immunostainings were performed in 1 to 1.5-month-old animals. We apologize for not indicating this before and have now included it in Figure 3 legend. The processing of our control and KO samples for RNAscope was not strictly done in parallel and therefore we would like to refrain from quantitative comparison.

      (4) In Figure 3, the authors indicate that RTN4RL2 deficiency reduces the number of type 1 SGNs connected to ribbons. Given that the number of ribbons remains unchanged (Figure 2), it is important to clearly explain the implications of this finding. It is already known that each type I SGN forms a single synaptic contact with a single IHC. The fact that the number of ribbons remains constant while additional "orphan PSDs" are present suggests that the overall number of SGNs might need to increase to account for these findings. An explanation addressing this would be helpful.

      In Figure 3 (now Figure 4), we found additional type-1 SGNs that are unconnected to IHC, in good agreement with “orphan PSDs” observed under the light microscope. Indeed, we also confirmed monosynaptic, unbranched fiber morphology (Figure 3F, now Figure 4F). Together, these results imply about a 20% increase in the overall number of SGNs, which however we did not observe in SGN soma counting.

      (5) In Figure 4F and 5Cii, could you clarify how voltage sensitivity (k) was calculated? Additionally, please provide an explanation for the values presented in millivolts (mV).

      Voltage sensitivity (k) was calculated as the slope of the Boltzmann fit to the fractional activation curves: , Where G is conductance, G<sub>max</sub> is the maximum conductance, V<sub>m</sub> is the membrane potential, V<sub>half</sub> is the voltage corresponding to the half maximal activation of Ca<sup>2+</sup> channels and k (slope of the curve) is the voltage sensitivity of Ca<sup>2+</sup> channel activation. We have now added this to our Materials and Methods section.

      (6) In Figure 6, the author measured the threshold of ABR at 2-4 months old. Since previous figures confirming synaptic morphology and function were all conducted on 3-week-old mice, it would be better to measure ABR at 3 weeks of age if possible.

      ABR measurements for comparisons in a cohort of age-matched mice require fully developed individuals. 3 weeks is the minimum age that is regarded for a mature ear. However, variation in developmental differences among one litter is very frequent that affects normal hearing thresholds. From our own experience we do not regard the ear fully functional before 6 weeks of age. Then hearing thresholds are lowest indicating full functionality. Since the C57BL/6 background strain has a genetic defect in the Cadherin 23-coding gene (Cdh23) at the ahl locus of mouse chromosome 10 these mice exhibit early onset and progression of age-related hearing loss starting at 5–8 months (Hunter & Willott, 1987). Therefore, we chose a “safe” time window for stable and unaffected ABR recordings of 2-4 months to provide most representative data.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Please include information on the validation of all the antibodies used in this study, or reference the relevant work where the antibodies were previously validated.

      In response to the reviewer’s comment, we have now included a table listing all primary antibodies used in this study. Where possible, we provide references for knockout (KO) validation. Otherwise, we refer to the manufacturer’s information, as provided in the respective datasheets.

      (2) Figure 2 illustrates the pre- and postsynaptic changes observed in RTN4RL2 knockout (KO) mice. Please specify the age of the mice and the cochlear region depicted and analyzed in Figure 2.

      We thank the reviewer for the comment. The IHCs of apical cochlear region were analyzed in mice at 3 weeks of age. We have now added this to the figure legend.

      (3) The discovery of orphan SGN neurites in RTN4RL2 KO mice is particularly intriguing. I wonder whether the additional Homer1-positive puncta illustrated in Figure 2 are present in these orphan SGN neurites, which would suggest that they may be functional. Conducting immunohistochemistry (IHC) labeling for type I SGN neurites using an anti-Tuj1 antibody, along with Homer1, would help localize the additional Homer1 puncta shown in Figure 2. Additionally, the "extra" Homer1 puncta appears less striking in the data presented in Figure 2-Supplement 2. Quantifying the number of Homer1 puncta in wild-type versus KO mice across different cochlear regions will help visualize the Figure 2-Supplement 2 data and relate the presence of extra neurites to the increased auditory brainstem response (ABR) thresholds observed at all frequencies.

      We thank the reviewer for the comment and we agree that localizing orphan PSDs on the SGN neurites would be very useful. Unfortunately, the animal breeding license in the Göttingen lab had expired. At the time we received the reviews we only had access to 14-week-old animals and could not perform the stainings in animals which would have comparable age range to the rest of the study (3-4 weeks). The phenotype of extra Homer1 puncta was not as drastic in 14-week-old animals as it was in previously stained 3-week-old animals. Nevertheless, we still tried NF200, Homer1 and Vglut3 immunostainings in 14-week-old animals. We present representative single imaging planes of NF200, Homer1 and Vglut3 stainings in Author response image 4. Additionally, we provide exemplary images from 7-week-old RTN4RL2<sup>-/-</sup>, where it looks like that the orphan Homer1 puncta are found on calretinin positive neurites.

      Author response image 4.

      Attempts to localize “orphan” Homer1 patches on type I SGN neurites. (A) Single exemplary imaging planes of apical IHC region from RTN4RL2<sup>+/+</sup> (left) and RTN4RL2<sup>-/-</sup> (right) mice immunolabeled against NF200, Vglut3 and Homer1. White arrows show putative “orphan” Homer1 puncta on NF200 positive neurites. Scale bar = 5 mm. (B) Maximum intensity projections of representative confocal stacks of IHCs from RTN4RL2<sup>-/-</sup> mice immunolabeled against Calretinin and Homer1. Scale bars = 5 mm. White arrows show possible “orphan” Homer1 puncta on Calretinin positive boutons.

      (4) The authors noted a reduction in the number of GluA2/3-positive puncta in RTN4RL2 KOs, as shown in Figure 2-Supplement 3. However, in the Results section (page 5, line 124), it is unclear whether the authors refer to a reduction in fluorescence intensity or the number of puncta. Please clarify this.

      We thank the reviewer for the comment. We refer to the number and have now added this to the manuscript.

      (5) I find it particularly interesting that, despite the presence of smaller but synaptically engaged Homer1-positive SGN neurites, these appear to lack or present a reduction in the number of GluA2/3 puncta, and that GluA2/3 puncta are observed in non-ribbon juxtaposed neurites. Therefore, I suggest including GluA2/3 (Fig2 supplement 3) data in the main figure. It would be valuable to determine whether the orphan neurites express both Homer1 and GluA2/3, which could indicate that the defect is not solely due to reduced GluA2/3 expression at the formed synapses, but also to the presence of additional orphan synapses. I would also mention in the discussion how the phenotype of the RTN4L2 KO compares to the GluA2/3 KO and if the lack of GluA2/3 at the AZ could explain the increase in ABR threshold. Quantification of GluA2/3 puncta at the apical, middle, and basal region would also help understand the auditory phenotype of the KO mice.

      We have changed Figure2-figure supplement 3 to become a main figure (Figure 3) based on the recommendation of the reviewer. We agree, that it would be valuable to perform immunohistochemistry combining anti-GluA2/3 and anti-Homer1 and anti-Ctbp2 antibodies to see if the “orphan” Homer1 patches house GluA2/3 not juxtaposing synaptic ribbons. Unfortunately, as mentioned above, due to the expiration of our animal breeding and experimentation licenses we did not manage to do those experiments. We have however performed stainings with anti-GluA4 antibodies and could not detect GluA4 signal in RTN4RL2<sup>-/-</sup> mice (Figure 3-figure supplement 1). This potentially could explain the more drastic ABR threshold elevation in RTN4RL2<sup>-/-</sup> mice compared to e.g. GluA3 KO mice. We have now made this clearer in our discussion.

      (6) I suggest considering the use of color-blind friendly palettes for figures and graphs in this manuscript to enhance clarity and ensure that the findings are accessible to a wider audience and improve the overall effectiveness of the presentation. Please use color-blind-friendly schemes in Figure 1 and Figure 2 Supplement 3.

      Done.

      (7) Could you please explain what "XX {plus minus} Y, SD = W" means in the figure legends?

      Mean ± SEM (standard error of the mean), SD (standard deviation) are indicated in the legends. In response to the reviewer comment we have now added an explanation in the Materials and Methods –> Data analysis and statistics section.

      (8) Please include information about the ear tested (left or right or both).

      Both ears were tested. Since there was no significant difference between right and left ear we did not further consider this factor. We will add this fact more precisely in the Material and methods section.

      Reviewer #3 (Recommendations for the authors):

      (1) Line 90: Why not show this control, it is a nice control.

      Unfortunately, our recent attempts to perform RTN4RL2 immunostaining on cryosections were unsuccessful. Therefore, we decided to remove RTN4RL2 immunostaining from Figure 1 and have adjusted the results section accordingly.

      (2) Line 94: Please provide a reference for these interactions.

      Done.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      In this manuscript, the authors demonstrated that YAP/TAZ promotes P-body formation in a series of cancer cell lines. YAP/TAZ modulates the transcription of multiple P-body-related genes, especially repressing the transcription of the tumor suppressor proline-rich nuclear receptor coactivator 1 (PNRC1) through cooperation with the NuRD complex. PNRC1 functions as a critical repressor in YAP-induced biogenesis of P-bodies and tumorigenesis in colorectal cancer (CRC). Reexpression of PNRC1 or disruption of P-bodies attenuated the protumorigenic effects of YAP. Overall, these findings are interesting and the study was well conducted.

      We thank the reviewer for the positive comments for our work.

      Major concerns:

      (1) RNAseq data indicated that Yap has the capacity to suppress the expression of numerous genes. In addition to PNRC1, could there be additional Yap targeting factors involved in Yap-mediated the formation of P-bodies?

      Yes, indeed. Additional YAP target genes, such as AJUBA, SAMD4A, are also involved in YAP-mediated the formation of P-bodies (Fig. 1B-D). Knockdown of either SMAD4A or AJUBA attenuated the P-body formation induced by overexpression of YAP5SA (Fig. 3A).

      (2) It is still not clear how PNRC1 regulates P-bodies. Knockdown of PNRC1 prevented the reduction of P-bodies caused by Yap knockdown. How do the genes related to P-bodies that are positively regulated by Yap, such as SAMD4A, AJUBA, and WTIP, change in this scenario? Given that the expression of Yap can differ considerably among various cell types, is it possible for P-bodies to be present in tumor cells lacking Yap expression?

      The detail mechanism of PNRC1’s suppressive effect on P-body formation was well explored in Gaviraghi et al.’s paper, in which PNRC1 was first identified as a tumor suppressor gene (EMBO, 2018, PMID: 30373810). Gaviraghi et al. revealed that overexpression of PNRC1 leads to translocation of cytoplasmic DCP1A/DCP2 into the nucleolus, which subsequently attenuates rRNA transcription and ribosome biogenesis. Since DCP1A and DCP2 are essential for formation of P-bodies, loss of cytoplasmic DCP1A/DCP2 also disrupts P-body formation. This background information has been included in the Results and Discussion sections in the manuscript:

      Previously, we have performed the RNA-seq analysis of HCT116 cells with overexpression of PNRC1. Compared with YAP5SA overexpression (520 differentially expressed genes), overexpression of PNRC1 showed less effect on the gene expression profile (147 differentially expressed genes) and expression of SAMD4A, AJUBA and WTIP were not affected by PNRC1 overexpression.

      In this study, we found that YAP could promote P-body formation in a series of cancer cell lines. During the exploration, we observed that P-bodies hardly existed in the RKO colorectal cancer cell line (Figure 1 for the reviewer). However, the regulatory effect of YAP/TAZ on SAMD4A, AJUBA, and WTIP was still observed (Figure 2 for the reviewer). These data suggest that YAP’s activity could be sufficient but not required for the P-body formation. So, we agree that P-bodies could be present in tumor cells lacking Yap expression.

      Author response image 1.

      Author response image 2.

      (3) The authors demonstrated that CHD4 can bind to Yap target genes, such as CTGF, AJUBA, SAMD4A (Figure 4 - Figure Supplement 1D). Does the NuRD complex repress the expression of these genes? the NuRD complex could prevent the formation of P-bodies?

      Good point! Following the reviewer’s suggestions, we detected the mRNA levels of AJUBA, WTIP and SAMD4A, and the P-body formation the CHD4 knockdown cells. Interestingly, knockdown of CHD4 induced mild downregulation of AJUBA, WTIP and SAMD4A in HCT116 cells (Figure 3 for the reviewer). Of note, NuRD complex is involved in both transcriptional repression and activation (PNAS 2011, PMID: 21490301; Stem Cell Reports. 2021, PMID: 33961790). As expected, knockdown CHD4 induced decreased number of P-bodies in HCT116 cell (new Figure 4-Supplement 1E), which is consistent to the enhanced expression of PNRC1 (Figure 4F).

      Author response image 3.

      Author response image 4.

      (4) YAP/TAZ promotes the formation of P-bodies which contradicts the previous study's conclusion (PMID: 34516278). Please address these inconsistent findings.

      The contradictory observations between our and the previous studies could be due to the different cell lines (HUVEC vs cancer cell lines) and different stimuli (KHSV infection vs normal culture condition or serum stimulation, cell density and stiffness). Actually, we have discussed the contradictory observation in the previous study in the Discussion section as followed:

      “In contrast, a recent study, which provided the first link between YAP and P-bodies, implicated YAP as a negative regulator of P-bodies in KHSV-infected HUVECs (Castle et al, 2021). Elizabeth L. Castle et al. reported that virus-encoded Kaposin B (KapB) induces actin stress fiber formation and disassembly of P-bodies, which requires RhoA activity and the YAP transcriptional program (Castle et al, 2021). YAP-enhanced autophagic flux was proposed to participate in KapB-induced P-body disassembly, consistent with the concept that stress granules and P-bodies are cleared by autophagy (Buchan et al, 2013; Castle et al, 2021). However, an increasing number of studies have reported the contradictory role of YAP in autophagy regulation, which suggests that YAP-mediated autophagy regulation is cell type- and context-dependent (Jin et al, 2021; Pei et al, 2022; Totaro et al, 2019; Wang et al, 2020). Furthermore, though YAP is required for the cell proliferation in HUVEC, transformed cell lines often display elevated baseline YAP/TAZ activity compared to normal cells and possess many alterations in growth signaling pathways including autophagy signaling (Nguyen & Yi, 2019; Shen & Stanger, 2015; Zanconato et al, 2016). Thus, the contradictory observations regarding the role of YAP in modulating P-body formation between Elizabeth L. Castle et al.’s study and our study could be due to the different cell contexts and different cell conditions (baseline vs. KHSV infection).”

      Reviewer #2 (Public Review):

      In a study by Shen et al., the authors investigated YAP/TAZ target genes that play a role in the formation of processing bodies (P-bodies). P-bodies are membraneless cytoplasmic granules that contain translationally repressed mRNAs and components of mRNA turnover. GO enrichment analysis of the RNA-Seq data of colorectal cancer cells (HCT116) after YAP/TAZ knockdown showed that the downregulated genes were enriched in P-body resident proteins. Overexpression, knockdown, and ChIP-qPCR analyses showed that SAMD4A, PNRC1, AJUBA, and WTIP are YAP-TEAD target genes that also play a role in P-body biogenesis. Using P-body markers such as DDX6 and DCP1A, the authors showed that the knockdown of YAP in the HCT116 cell line causes a reduction in the number of P-bodies. Similarly, overexpression of constitutively active YAP (YAP 5SA) increased the P-body number. The YAP-TEAD target genes SAMD4A and AJUBA positively regulate P-body formation, because lowering their expression levels using siRNA reduces the number of P-bodies. The other YAP target gene, PNRC1, is a negative regulator of P-body biogenesis and consistently YAP suppresses its expression through the recruitment of the NuRD complex. YAP target genes that modulate P-body formation play prominent roles in oncogenesis. PNRC1 suppression is key to YAP-mediated proliferation, colony formation, and tumorigenesis in HCT116 xenografts. Similarly, SAMD4 and AJUBA knockdown abrogated cell viability. In summary, this study demonstrated that SAMD4, AJUBA, WTIP, and PNRC1 are bona fide YAP-TEAD target genes that play a role in P-body formation, which is also linked to the oncogenesis of colon cancer cells.

      We thank the reviewer for the positive comments for our work.

      Major Strengths:

      The majority of the experiments were appropriately planned so that the generated data could support the conclusions drawn by the authors. The phenotype observed with YAP/TAZ knockdown correlated inversely with YAP5SA overexpression, which is complementary. Where possible, the authors also used point mutations that selectively disrupt protein-protein interactions, such as YAP S94A and PNRC1 W300A. The CRC cell line HCT116 was used throughout the study; additionally, data from other cancer cell lines were used to support the generality of the findings.

      We thank the reviewer for the positive comments regarding the strength and significance of our work.

      Weaknesses:

      The authors did not elucidate the mechanistic link between P-body formation and oncogenesis; therefore, it is unclear why an increase in the number of P-bodies is pro-tumorigenic. AJUBA and SAMD4 may have housekeeping functions and reduce the proliferation of YAP-independent cell lines. Figure 6 - Figure Supplement 4 shows a reduction in cell viability and migration in control HCT116 cell lines upon AJUBA/SAMD4 knockdown. Therefore, it is unclear whether their tumor suppressive role is YAP-dependent. The authors extrapolated and suggested that their findings could be exploited therapeutically, without providing much detail. How do they plan to stimulate the expression of PNRC1? It is not necessary for every scientific finding to lead to a therapeutic benefit; therefore, they can tone down such statements if therapeutic exploitation is not realistic. The authors elucidated a mechanism for PNRC1 repression and one wonders why no attempts were made to understand the mechanism of activation of SAMD4, AJUBA, and WTIP expression.

      We thank the reviewer for pointing out these issues to further improve the quality of our study. As mentioned in the Abstract section, the role of P-bodies in tumorigenesis and tumor progression is not well studied. In this study, we revealed that disruption of P-body formation by knockdown of essential P-body-related genes attenuates YAP-driven oncogenic function in CRC, which provides evidence implicating the pro-tumorigenic role of P-bodies. We agree with the reviewer that the mechanism of P-body formation promoting tumorigenesis is an important scientific question warranting exploration and plan to investigate this fancy question in next study.

      AJUBA has been known to act as a signal transducer in oncogenesis and promote CRC cell survival (Pharmacol Res. 2020, PMID: 31740385; Oncogene. 2017, PMID: 27893714). Furthermore, as the reviewer suggested, we found that knockdown of both AJUBA and SAMD4A suppressed the cell proliferation in the YAP-deficient cell line, SHP-77, which further implicates the oncogenic role of AJUBA and SAMD4A (Figure 4 for the reviewer). Numerous studies have shown that YAP/TAZ knockdown suppressed the cell proliferation of HCT116 cells. Thus, not surprisingly, knockdown of AJUBA and SAMD4A also repressed the cell proliferation of the “parental” control HCT116 cells. Since the molecular mechanistic studies identified the AJUBA and SAMD4A were bona fide YAP-TEAD target genes, the co-dependencies of YAP and AJUBA/SAMD4A in the HCT116 cells imply that the pro-tumorigenic function of YAP could be dependent on activation of AJUBA/SAMD4A, in some extent (due to the large amount of YAP target genes).

      Author response image 5.

      Tumor suppressor genes are frequently epigenetically silenced in cancer cells, so is PNRC1. In our preliminary study, we found that the DNA methyltransferase inhibitor 5-Azacytidine dramatically increased the mRNA level of PNRC1 in HCT116 cells (Figure 5 for the reviewer), which suggests that PNRC1 is epigenetically suppressed by DNA methylation in CRC cells and could be re-activated or re-expressed by DNA methyltransferase inhibitor for the cancer treatment.

      Author response image 6.

      YAP/TAZ are well-known as transcriptional co-activators and the mechanism of transcriptional activation of target genes has been well-studied (Cell Stress. 2021, PMID: 34782888). However, years later, the function of YAP/TAZ as the transcriptional co-repressors was brought to the forefront. Both NuRD and Polycomb repressive complex 2 (PRC2) are involved in the transcriptional repressor function of YAP (Cell Rep. 2015, PMID: 25843714; Cancer Res. 2020, PMID: 32409309). Thus, we focused on exploring mechanism for PNRC1 repression in this study, but not the mechanism of activation of SAMD4A, AJUBA, and WTIP expression.

      Reviewer #2 (Recommendations For The Authors):

      Suggested experiments: The suggested experiments were aimed at minimizing the weaknesses of the manuscript. The roles of AJUBA and SAMD4 can be elucidated in a YAP-independent cell line. After knockdown of AJUBA or SAMD4 in a YAP-independent cell line, the effects on proliferation and migration should be determined.

      Following the reviewer’s suggestions, we explored the role of AJUBA and SAMD4A in the YAP-independent cell line, SHP-77 (Cancer Cell. 2021, PMID: 34270926). Unfortunately, SHP-77 cells are suspension cells mixed with some loosely adherent cells, and we found that SHP-77 cells are not available for cell migration assay. By CCK8 assay, we found that knockdown of both AJUBA and SAMD4A suppressed the cell proliferation in SHP-77 cells, which further implicates the oncogenic role of AJUBA and SAMD4A.

      Author response image 7.

      Experiments directed at elucidating whether the mRNAs of tumor suppressor genes undergo sequestration and decay in P-bodies that ultimately promote tumorigenesis will provide a mechanistic link between P-body formation and tumorigenesis. The enrichment of P-bodies through biochemical methods has been employed in other studies. RNA-seq after P-body enrichment may provide opportunities to unravel the link between P-body formation and tumorigenesis.

      We thank the reviewer for the constructive suggestions to further improve the significance of our study. We do have plans to purify the P-bodies to further elucidate underlying mechanisms of pro-tumorigenic role of P-bodies tumor cells. However, we are newcomers in the P-body field and encountered a lot of issues to establish the biochemical assays of P-bodies. Hopefully, we can solve these technical issues soon and present our new data in the next paper.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors explore associations between plasma metabolites and glaucoma, a primary cause of irreversible vision loss worldwide. The study relies on measurements of 168 plasma metabolites in 4,658 glaucoma patients and 113,040 controls from the UK Biobank. The authors show that metabolites improve the prediction of glaucoma risk based on polygenic risk score (PRS) alone, albeit weakly. The authors also report a "metabolomic signature" that is associated with a reduced risk (or "resilience") for developing glaucoma among individuals in the highest PRS decile (reduction of risk by an estimated 29%). The authors highlight the protective effect of pyruvate, a product of glycolysis, for glaucoma development and show that this molecule mitigates elevated intraocular pressure and optic nerve damage in a mouse model of this disease.

      Strengths:

      This work provides additional evidence that glycolysis may play a role in the pathophysiology of glaucoma. Previous studies have demonstrated the existence of an inverse relationship between intraocular pressure and retinal pyruvate levels in animal models (Hader et al. 2020, PNAS 117(52)) and pyruvate supplementation is currently being explored for neuro-enhancement in patients with glaucoma (De Moraes et al. 2022, JAMA Ophthalmology 140(1)). The study design is rigorous and relies on validated, standard methods. Additional insights gained from a mouse model are valuable.

      We thank the reviewer for these supportive comments.

      Weaknesses:

      Caution is warranted when examining and interpreting the results of this study. Among all participants (cases and controls) glaucoma status was self-reported, determined on the basis of ICD codes or previous glaucoma laser/surgical therapy. This is problematic as it is not uncommon for individuals in the highest PRS decile to have undiagnosed glaucoma (as shown in previous work by some of the authors of this article). The authors acknowledge a "relatively low glaucoma prevalence in the highest decile group" but do not explore how undiagnosed glaucoma may affect their results. This also applies to all controls selected for this study. The authors state that "50 to 70% of people affected [with glaucoma] remain undiagnosed". Therefore, the absence of self-reported glaucoma does not necessarily indicate that the disease is not present. Validation of the findings from this study in humans is, therefore, critical. This should ideally be performed in a well-characterized glaucoma cohort, in which case and control status has been assessed by qualified clinicians.

      We appreciate the comment regarding the challenges of glaucoma ascertainment in UK Biobank. This is a valid limitation, as glaucoma in UK Biobank is based on self-reports and hospital records rather than comprehensive ophthalmologic examinations for all participants. To the best of our knowledge, there is no comparably sized dataset where all participants have undergone standardized glaucoma assessments, comprehensive metabolomic profiling, and high-throughput genotyping. Work is currently ongoing to link UK Biobank data to ophthalmic EMR data, which will help confirm self-reported diagnoses. This work is not complete, and the coverage of the cohort from such linkage is uncertain at present. Nonetheless, several factors speak to the validity of our findings. The top members of the metabolomic signature associated with resilience in the top decile of glaucoma polygenic risk score (PRS) decile—lactate (P=8.8E-12) and pyruvate (P=1.9E-10) —had robust values for statistical significance after appropriate adjustment for multiple comparisons, with additional validation for pyruvate in a human-relevant, glaucoma mouse model. Strikingly, the glaucoma odds ratio (OR) for subjects in the highest quartile of glaucoma PRS and metabolic risk score (MRS) was 25-fold, using participants in the lowest quartile of glaucoma PRS and MRS as the reference group. An effect size this large for a putative glaucoma determinant has only been seen for intraocular pressure (IOP), which is now largely accepted to be in the causal pathway of the disease.

      The Discussion now contains the following statement: “A second limitation is that glaucoma ascertainment in the UK Biobank is based on self-reported diagnoses and hospital records rather than comprehensive ophthalmologic examinations. Nonetheless, it is reassuring that the prevalence of glaucoma in our sample (~4%) is similar to a directly performed disease burden estimate in a comparable, albeit slightly older, United Kingdom sample (2.7%)(79)”. (Lines 379-382)

      The authors indicate that within the top decile of PRS participants with glaucoma are more likely to be of white ethnicity, while they are more likely to be of Black and Asian ethnicity if they are in the bottom half of PRS. Have the authors explored how sensitive their predictions are to ethnicity? Since their cohort is predominantly of European ancestry (85.8%), would it make sense to exclude other ethnicities to increase the homogeneity of the cohort and reduce the risk for confounders that may not be explicitly accounted for?

      Comparing data in Tables 3 and 4 of the manuscript, we observe that, on a percentage basis, more individuals have glaucoma in the highest 10th percentile of risk compared to the lowest 50th percentile of risk across all ancestral groups.  We recently reported that the risk of glaucoma increases with each standard deviation increase in the glaucoma PRS across ancestral groups in the UK Biobank, utilizing a slightly different sample size (see Author response table 1 below). (1)Since the PRS is applicable across ancestral groups, we aim to make our results as generalizable as possible; therefore, we prefer to report our findings for all ethnic groups and not restrict our results to Europeans.

      Author response table 1.

      Performance of the mtGPRS Across Ancestral Groups in the UK Biobank

      Abbreviations: mtGPRS, multitrait analysis of GWAS polygenic risk score; OR, odds ratio; CI, confidence interval.(1)

      UK Biobank ancestry was genetically inferred based on principal component analysis. The OR represents the risk associated with each standard deviation change in mtGRS and is adjusted for multiple covariates including age, sex, and medical comorbidities.

      In the discussion, we stated that “... we chose to analyze Europeans and non-Europeans together to make the results as generalizable as possible.” (Lines 378-379)

      The authors discuss the importance of pyruvate, and lactate for retinal ganglion cell survival, along with that of several lipoproteins for neuroprotection. However, there is a distinction to be made between locally produced/available glycolysis end products and lipoproteins and those circulating in the blood. It may be useful to discuss this in the manuscript, and for the authors to explore if plasma metabolites may be linked to metabolism that takes place past the blood-retinal barrier.

      As the reviewer points out, it is crucial to interpret the results for lipoproteins within the context of their access to the blood-retinal barrier. Even for smaller metabolites like pyruvate and lactate, it is essential to consider local production versus serum-derived molecules in mediating any neuroprotective effects. Our murine data suggest that exogenous pyruvate contributed to neuroprotection. However, for the other glycolysis-related metabolites (lactate and citrate), we cannot rule out the possibility that locally produced metabolites may also contribute to neuroprotection. None of the lipoproteins identified as potential resilience biomarkers had an adjusted P-value of less than 0.05. Nevertheless, HDL analytes can cross blood-ocular barriers to enter the aqueous humor.(2) Therefore, it is also possible for serum-derived HDL to influence retinal ganglion cell homeostasis. Overall, much more research is needed to clarify the roles of locally produced versus serum-derived factors in conferring resilience to genetic predisposition to glaucoma.

      We have added the following sentences to the discussion:

      “Notably, although our validation data confirm the neuroprotective effects of exogenous pyruvate, it remains possible that endogenously produced pyruvate within ocular tissues may also contribute to RGC protection.” (Lines 329-331)

      “Furthermore, as HDL analytes can cross blood-ocular barriers,(78) there is a plausible route for serum-derived HDL to influence RGC homeostasis. Nonetheless, the relative contributions of circulating lipoproteins versus local synthesis within ocular tissues remain unclear and warrant further investigation.” (Lines 355-358)

      “Incorporating ocular physiology and blood-retinal barrier considerations when interpreting lipoproteins as potential resilience biomarkers will be critical for future studies aimed at understanding and therapeutically targeting increased glaucoma risk.” (Lines 360-363)

      Reviewer #2 (Public review):

      Summary

      The authors have used the UK Biobank data to interrogate the association between plasma metabolites and glaucoma.

      (1) They initially assessed plasma metabolites as predictors of glaucoma: The addition of NMR-derived metabolomic data to existing models containing clinical and genetic data was marginal.

      (2) They then determined whether certain metabolites might protect against glaucoma in individuals at high genetic risk: Certain molecules in bioenergetic pathways (lactate, pyruvate, and citrate) conferred protection.

      (3) They provide support for protection conferred by pyruvate in a murine model.

      Strengths

      (1) The huge sample size supports a powerful statistical analysis and the opportunity for the inclusion of multiple covariates and interactions without overfitting the models.

      (2) The authors have constructed a robust methodology and statistical design.

      (3) The manuscript is well written, and the study is logically presented.

      (4) The figures are of good quality.

      (5) Broadly, the conclusions are justified by the findings.

      We thank the reviewer for these supportive comments.

      Weaknesses

      (1) Although it is an invaluable treasure trove of data, selection bias and self-reporting are inescapable problems when using the UK Biobank data for glaucoma research. The high-impact glaucoma-related GWAS publications (references 26 and 27) referenced in support of the method suffer the same limitations. This doesn't negate the conclusions but must be taken into consideration. The authors might note that it is somewhat reassuring that the proportion of glaucoma cases (4%) is close to what would be expected in a population-based study of 40-69-year-olds of predominantly white ethnicity.

      While there are limitations when open-angle glaucoma (OAG) is ascertained by self-report, as discussed above, we agree with the reviewer that the prevalence of glaucoma is consistent with data from population-based studies of Europeans who are 40-69 years of age. 

      We also want to point out that references 26 and 27 indicate glaucoma self-reports can be an acceptable surrogate for OAG that is ascertained by clinical evaluation. Consider the methodologic details for each study:

      Reference 26 is a 4-stage genome-wide meta-analysis to identify loci for OAG from 21 independent populations. The phenotypic definition of OAG was based on clinical assessment in the discovery stage, and 7286 glaucoma self-reports from the UK Biobank served as an effective replication set.  It is also important to note that 120 out of the 127 discovered OAG loci were nominally replicated in 23andMe, where glaucoma was ascertained entirely by self-report.

      Reference 27 is a genome-wide meta-analysis to identify IOP genetic loci, an important endophenotype for OAG. The study identified 112 loci for IOP. These loci were incorporated into a glaucoma prediction model in the NEIGHBORHOOD study and the UK Biobank. The area under the receiver operator curve was 0.76 and 0.74, respectively, in these studies. While the AUCs were similar, OAG was ascertained clinically in NEIGHBORHOOD and largely by self-report in UK Biobank. 

      Finally, a strength of the UK Biobank is that selection bias is minimized. Patients need not be insured or aligned to the study for any reason aside from being a UK resident. There is indeed a healthy bias in the UK Biobank. Ambulatory patients who tend to be health conscious and willing to donate their time and provide biological specimens tend to participate. We agree with the reviewer that the use of self-reported cases does not negate the conclusions, and hopefully, future iterations of the UK Biobank where clinical validation of self-reports are performed will confirm these findings, which already have some validation in a preclinical glaucoma model.

      We add the following sentence to the first action item above regarding our case ascertainment method. “Nonetheless, it is reassuring that the prevalence of glaucoma in our sample (~4%) is similar to a directly performed disease burden estimate in a comparable, albeit slightly older, United Kingdom sample (2.7%)..”(3) (Lines 381-383)

      (2) As noted by the authors, a limitation is the predominantly white ethnicity profile that comprises the UK Biobank. 

      (3) Also as noted by the authors, the study is cross-sectional and is limited by the "correlation does not imply causation" issue.

      While the epidemiological arm of our study was cross-sectional, the studies testing the ability of pyruvate to mitigate the glaucoma phenotype in mice with the Lmxb1 mutation were prospective.

      We already pointed out in the discussion that pyruvate supplementation reduced glaucoma incidence in a human-relevant genetic mouse model.

      (4) The optimal collection, transport, and processing of the samples for NMR metabolite analysis is critical for accurate results. Strict policies were in place for these procedures, but deviations from protocol remain an unknown influence on the data.

      Comments 4 and 5 are related and will be addressed after comment 5.

      (5) In addition, all UK Biobank blood samples had unintended dilution during the initial sample storage process at UK Biobank facilities. (Julkunen, H. et al. Atlas of plasma NMR biomarkers for health and disease in 118,461 individuals from the UK Biobank. Nat Commun 14, 604 (2023) Samples from aliquot 3, used for the NMR measurements, suffered from 5-10% dilution. (Allen, Naomi E., et al. Wellcome Open Research 5 (2021): 222.) Julkunen et al. report that "The dilution is believed to come from mixing of participant samples with water due to seals that failed to hold a system vacuum in the automated liquid handling systems. While this issue is likely to have an impact on some of the absolute biomarker concentration values, it is expected to have limited impact on most epidemiological analyses."

      We thank the reviewer for making us aware of the unintended sample dilution issue from aliquot 3, used for NMR metabolomics in UK Biobank participants. While ~98% of samples experienced a 5-10% dilution, this would not affect our reported results, which did not rely on absolute biomarker concentrations. All metabolites in the main tables were probit transformed and used as continuous variables per 1 standard deviation increase.  Nonetheless, in supplemental material, we show the unadjusted median levels of pyruvate (in mmol/L) were higher in participants without glaucoma vs those with glaucoma, both in the population overall and in those in the top 10 percentile of glaucoma risk. 

      Furthermore, we see no evidence in the literature that unidentified protocol deviations might impact metabolite results in UK Biobank participants. For example, a recent study evaluated the relationship between a weighted triglyceride-raising polygenic score (TG.PS) and type 3 hyperlipidemia (T3HL) in the Oxford Biobank (OBB) and the UK Biobank. In both biobanks, metabolomics was performed on the Nightingale NMR platform. A one standard deviation increase in TG.PS was associated with a 13% and 15.2% increased risk of T3HL in the OBB and UK Biobank, respectively.(4) Replication of the OBB result in the UK Biobank suggests there are no additional concerns regarding the processing of the UK Biobank for NMR metabolomics. Of course, we remain vigilant for protocol deviations that might call our results into question and will seek to validate our findings in other biobanks in future research.

      Impact

      The findings advance personalized prognostics for glaucoma that combine metabolomic and genetic data. In addition, the protective effect of certain metabolites influences further research on novel therapeutic strategies.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Given the uncertainty in the proportion of controls with undiagnosed glaucoma, it may be appropriate to include a sensitivity analysis in the manuscript. The authors could then provide the readers with an estimate of how sensitive their predictions are to the proportion of undiagnosed individuals among controls.

      Since UK Biobank participants did not undergo standardized clinical assessments, it is not possible to perform sensitivity analyses as we don’t know which controls might have glaucoma, although we can offer the following comments.

      We are performing a cross-sectional, prospective, detailed glaucoma assessment of participants in the top and bottom 10 percent of genetic predisposition recruited from BioMe at Icahn School of Medicine at Mount Sinai and Mass General Brigham Biobank at Harvard Medical School. We find that 21% of people in the top decile of genetic risk have glaucoma,(5) which compares reasonably well to the 15% of people in the top 10% of genetic risk in the UK Biobank. This underscores the assertion that our definition of glaucoma in the UK Biobank, while not ideal, is a reasonable surrogate for a detailed clinical assessment.

      Currently, 10,077 subjects in the top decile of glaucoma genetic predisposition did not meet our definition of glaucoma. If we assume that the glaucoma prevalence is 3% and 50% of people with glaucoma are undiagnosed, then that would translate to an additional 150 cases misclassified as controls, which could either drive our result to the null, have no impact on our current result or contribute to a false positive result, depending on their pyruvate (and other metabolite) levels.   

      We have already addressed the issue of a lack of standardized exams in the UK Biobank and the need for more studies to confirm our findings.

      Reviewer #2 (Recommendations for the authors):

      (1) I am curious about the proposed reason for some individuals having metabolic profiles conferring resilience. Plasma pyruvate levels are normally distributed. Is it simply the case that some individuals with naturally high levels of pyruvate are fortuitously protected against glaucoma? Some sort of self-regulation mechanism seems unlikely.

      Thank you for your insightful question regarding the potential mechanism underlying the association between pyruvate levels and glaucoma resilience. There may be modest inter-individual differences which can have significant physiological implications, particularly in the context of neurodegeneration and metabolic stress. One possibility is that individuals with naturally higher pyruvate levels may benefit from pyruvate's known neuroprotective and metabolic support functions(6–8), which could confer resilience against the oxidative and bioenergetic challenges associated with glaucoma. Pyruvate is important for cellular metabolism, redox balance, and mitochondrial function - processes that are increasingly implicated in glaucomatous neurodegeneration. (9)Elevated pyruvate levels support mitochondrial ATP production(10), buffer oxidative stress,(11) and impact metabolic flux(12,13) through pathways such as the tricarboxylic acid cycle and NAD+/NADH homeostasis. This is consistent with prior studies suggesting that mitochondrial dysfunction contributes to retinal ganglion cell vulnerability in glaucoma.(14–17) While a direct self-regulation mechanism may seem unlikely, both genetic and environmental factors can influence pyruvate metabolism, which could lead to subtle but clinically meaningful variations in its levels. Our findings are supported by validation in a mouse model, which suggests that the association is less likely fortuitous, but there may be an underlying biological process that merits further mechanistic investigation. Future studies incorporating longitudinal metabolic profiling and functional validation in human-derived models will help better understand this relationship.

      (2) Conceivably, the higher levels of pyruvate and lactate may have resulted from recent exercise and may reflect individuals with high levels of exercise that confers resilience against glaucoma by independent mechanisms such as improved blood flow. Any way to rule that out from the UK Biobank data?

      Thank you for raising this important point. To account for the potential confounding effects of physical activity, we adjusted for metabolic equivalents of task (METs) in our models, a standardized measure of physical activity available in the UK Biobank. By incorporating METs as a covariate, we aimed to minimize the influence of individual exercise levels on plasma pyruvate and lactate levels. This helps us ascertain that the observed associations are not solely attributable to differences in physical activity. However, we do acknowledge that longitudinal analysis of exercise patterns would provide further clarity on this relationship. 

      (3) It may be worth mentioning that the retinal ganglion cells contain a plasma membrane monocarboxylate transporter that supports pyruvate and lactate uptake from the extracellular space.

      Thank you for this extremely insightful suggestion on retinal ganglion cell (RGC) expression of monocarboxylate transporters, which can facilitate the uptake of pyruvate and lactate from the extracellular space. This is relevant for our study, given the high metabolic demands of RGCs and their reliance on both glycolytic and oxidative metabolism for neuroprotection and survival.

      We acknowledged this in the discussion section of the manuscript by adding the following statement: "RGCs express monocarboxylate transporters, which facilitate the uptake of extracellular pyruvate and lactate, improving energy homeostasis, neuronal metabolism, and survival.” (Lines 309-311)

      (4) The mechanism of protection in the mice, at least in part, is likely due to the lower IOP in the pyruvate-treated animals. Did the authors investigate the influence of pyruvate on IOP in the UK Biobank data (about 110,000 individuals had IOP measurements)?

      Thank you for your suggested investigation. We ran the suggested analysis among 68,761 individuals with IOP measurements and metabolomic profiling. Imputed pretreatment IOP values for participants using ocular hypotensive agents were calculated by dividing the measured IOP by 0.7, based on the mean IOP.

      We plotted the relationship between IOP and pyruvate levels (probit transformed). We compared participants with pyruvate levels +2 standard deviations, above the mean (red dashed line), which has a probit-transformed value of 2 and an absolute concentration of 0.15 mmol/L. We found a statistically significant difference between the groups (p=0.017) using the Welch two-sample t-test. We have not added this analysis to the manuscript, but readers can find the data here as the reviews are public. We acknowledge and addressed the dilutional issue above, where we utilized probit-transformed metabolite levels analyzed as continuous variables per 1 SD increase, rather than absolute concentrations.

      Author response image 1.

      (5) Line 88: I suggest changing "patients" to "affected individuals". The term "patients" tends to imply that the individual has already been diagnosed, but the idea being conveyed is about underdiagnosis in the population.

      Thank you for your suggestion.

      We have added the change from "patients" to "affected individuals" in the introduction. (Line 90)

      (6) Line 93: "However, glaucoma is also significantly affected by environmental and lifestyle factors,10-14". Although lifestyle risk factors such as caffeine intake, alcohol, smoking, and air pollution have been reported, the associations are generally weak and inconsistently reported. Consider modifying this notion to stress the emerging evidence around gene-environment interactions (reference 14) rather than environmental factors per se, with the implication that genes + metabolism may be greater than the sum of the parts.

      Thank you for this thoughtful suggestion to highlight gene-environment interactions, where genetic susceptibility may amplify or mitigate the impact of metabolic and environmental influences on glaucoma progression. We have revised the statement to better reflect the synergistic effects of genetics and metabolism rather than considering environmental factors in isolation.

      Revised sentence for inclusion in the introduction of the manuscript: "Glaucoma risk is influenced by both genetic and metabolic factors, with emerging evidence suggesting that gene-environment interactions may play a greater role in conferring disease risk than independent exposures alone.” (Lines 95-97)

      (7) Lines 156-161: In model 4, rather than stating that the very small increase in AUC with the addition of metabolic data compared to clinical and genetic data alone, "modestly enhances the prediction of glaucoma", it may be better interpreted as a marginal difference that was statistically significant due to the very large sample size but not clinically significant.

      Thank you for your suggested comment.

      We have adjusted the wording by changing “modestly” to “marginally” to address that the statistical significance is in the context of the study’s large sample size in the results section (Line 162) and throughout the manuscript.

      NB: We made other minor edits to correct minor grammatical errors, improve clarity, and streamline the revised manuscript. Furthermore, the paragraph regarding slit lamp examination in the Methods was inadvertently omitted but is added back in the revised manuscript (Lines 571-579).

      References:

      (1) Kim J, Kang JH, Wiggs JL, et al. Does Age Modify the Relation Between Genetic Predisposition to Glaucoma and Various Glaucoma Traits in the UK Biobank? Invest Ophthalmol Vis Sci. 2025;66(2):57. doi:10.1167/iovs.66.2.57

      (2) Cenedella RJ. Lipoproteins and lipids in cow and human aqueous humor. Biochim Biophys Acta BBA - Lipids Lipid Metab. 1984;793(3):448-454. doi:10.1016/0005-2760(84)90262-5

      (3) Minassian DC, Reidy A, Coffey M, Minassian A. Utility of predictive equations for estimating the prevalence and incidence of primary open angle glaucoma in the UK. Br J Ophthalmol. 2000;84(10):1159-1161. doi:10.1136/bjo.84.10.1159

      (4) Pieri K, Trichia E, Neville MJ, et al. Polygenic risk in Type III hyperlipidaemia and risk of cardiovascular disease: An epidemiological study in UK Biobank and Oxford Biobank. Int J Cardiol. 2023;373:72-78. doi:10.1016/j.ijcard.2022.11.024

      (5) Zhao H, Pasquale LR, Zebardast N, et al. Screening by glaucoma polygenic risk score to identify primary open-angle glaucoma in two biobanks: An updated report. ARVO 2025 meeting. Published online 2025.

      (6) Zilberter Y, Gubkina O, Ivanov AI. A unique array of neuroprotective effects of pyruvate in neuropathology. Front Neurosci. 2015;9. doi:10.3389/fnins.2015.00017

      (7) Quansah E, Peelaerts W, Langston JW, Simon DK, Colca J, Brundin P. Targeting energy metabolism via the mitochondrial pyruvate carrier as a novel approach to attenuate neurodegeneration. Mol Neurodegener. 2018;13(1):28. doi:10.1186/s13024-018-0260-x

      (8) Gray LR, Tompkins SC, Taylor EB. Regulation of pyruvate metabolism and human disease. Cell Mol Life Sci. 2014;71(14):2577-2604. doi:10.1007/s00018-013-1539-2

      (9) Harder JM, Guymer C, Wood JPM, et al. Disturbed glucose and pyruvate metabolism in glaucoma with neuroprotection by pyruvate or rapamycin. Proc Natl Acad Sci. 2020;117(52):33619-33627. doi:10.1073/pnas.2014213117

      (10) Kim MJ, Lee H, Chanda D, et al. The Role of Pyruvate Metabolism in Mitochondrial Quality Control and Inflammation. Mol Cells. 2023;46(5):259-267. doi:10.14348/molcells.2023.2128

      (11) Wang X, Perez E, Liu R, Yan LJ, Mallet RT, Yang SH. Pyruvate Protects Mitochondria from Oxidative Stress in Human Neuroblastoma SK-N-SH Cells. Brain Res. 2007;1132(1):1-9. doi:10.1016/j.brainres.2006.11.032

      (12) Tilton WM, Seaman C, Carriero D, Piomelli S. Regulation of glycolysis in the erythrocyte: role of the lactate/pyruvate and NAD/NADH ratios. J Lab Clin Med. 1991;118(2):146-152.

      (13) Li X, Yang Y, Zhang B, et al. Lactate metabolism in human health and disease. Signal Transduct Target Ther. 2022;7(1):305. doi:10.1038/s41392-022-01151-3

      (14) Zhang ZQ, Xie Z, Chen SY, Zhang X. Mitochondrial dysfunction in glaucomatous degeneration. Int J Ophthalmol. 2023;16(5):811-823. doi:10.18240/ijo.2023.05.20

      (15) Ju WK, Perkins GA, Kim KY, Bastola T, Choi WY, Choi SH. Glaucomatous optic neuropathy: Mitochondrial dynamics, dysfunction and protection in retinal ganglion cells. Prog Retin Eye Res. 2023;95:101136. doi:10.1016/j.preteyeres.2022.101136

      (16) Jassim AH, Inman DM, Mitchell CH. Crosstalk Between Dysfunctional Mitochondria and Inflammation in Glaucomatous Neurodegeneration. Front Pharmacol. 2021;12. doi:10.3389/fphar.2021.699623

      (17) Yang TH, Kang EYC, Lin PH, et al. Mitochondria in Retinal Ganglion Cells: Unraveling the Metabolic Nexus and Oxidative Stress. Int J Mol Sci. 2024;25(16):8626. doi:10.3390/ijms25168626

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review): 

      Q1: First of all, the term organoid must be discarded. The authors just seed the endometrial cell mixture which assembles and aggregates into a 3D structure which is then immediately used for analysis. Organoids grow from tissue stem cells and must be passage-able (see their own description in lines 69-71). So, the term organoid must be removed everywhere, to not confuse the organoid field. It is not shown that the whole 3D assembly is passageable, which would be very surprising given the fact that immune and stromal cells do not grow in Matrigel because of the unfavorable growing conditions (which are targeted to epithelial cell growth).

      We appreciate for your highlighting concerns regarding our organoid construction.

      (1) The organoids in our system were originated from tissue stem cells.

      We induced adult stem cells derived from endometrial tissue to construct organoids in vitro by various small molecules (such as Noggin, EGF, FGF2, WNT-3A and R-Spondin1), which involves a complex self-assembly process rather than a mere cellular assembly. Initially, there are single cells and small cell clusters in the system two days after the planting. On the fourth day, the glandular epithelial cells gradually assembled to glands, while the stromal cells spontaneously organized themselves around the glands.  On the eleventh day, the endometrial glands enlarged, epithelial cells organized in a paving stone arrangement, and stromal cells established an extensive network. (Author response image1) (Figure 1C)

      (2) The organoids we constructed are passage-able.  

      Most organoids were used for experiments up to the fifth generation, while some are extended to the 10th generation and cryopreserved. (Response Figure 1B, C)

      (3) Immune and stromal cells are present in our system from the primary to the fourth generation. In our study, immune and stromal cells were identified not only from scRNA-seq data (third generation of organoids) (Figure 2A), but also from the morphology using 3D transparent staining and light sheet microscopy imaging (third generation of organoids), with Vimentin marking stromal cells, CD45 designating immune cells, and FOXA2 identifying glands. Further, flow cytometric analysis was applied to verify immune cells within the organoids (third generation of organoids). (Response Figure 1D, E, F)  

      Moreover, Immune cells and stromal cells can grow in Matrigel, which was also found in the study of organoid pioneer Hans Clevers (Hans Clevers et al., Nature Reviews Immunology 2019).

      Author response image 1.

      (A) The growth condition of endometrial cells was observed from day2 to day11 after plating under an inverted microscope. Scale bar = 200 μm. (B) The endometrial organoids of different passages were observed from P1 to P5. Scale bar = 200 μm. (C) Stromal cells formed an extensive network (down). The arrowhead indicates dendritic stromal cells. Scale bar = 100 μm (left), Scale bar = 50 μm (right). (D) Exhibition of stromal cells marked by vimentin. Nuclei were counterstained with DAPI. The arrow indicates stromal cells. Scale bar = 40 μm (up), Scale bar = 30 μm (down). (E)Exhibition of immune cells marked by CD45 and endometrial gland marked by FOXA2. Nuclei were counterstained with DAPI. The arrow indicates immune cells. Scale bar = 50 μm. (F) Flow cytometric analysis of T cells and macrophages in the endometrial organoid. Gating strategy used for determining white blood cells (CD45+ cells), T cells (CD45+CD3+ cells) and macrophages (CD45+CD68+CD11b+ cells).

      Q2: Second, the study remains fully descriptive, bombing the reader with a mass of bioinformatic analyses without clear descriptions and take-home messages. The paper is very dense, meaning readers may give up. Moreover, functional validation, except for morphological and immunostaining analyses (which are posed as "functional" but actually are only again expression) is missing, such as in vivo functionality (after transplantation e.g.) and embryo interaction. Importantly, the 3D structure misses the right architecture with a lining luminal epithelium which is present in the receptive endometrium in vivo and needed as the first contact site with the embryo. So, in contrast to what the authors claim, this is not the best model to study embryo interaction, or the closest model to the in vivo state (line 318, line 326).

      Thank you.

      (1) We have made the following improvements. Firstly, we have conducted additional experiments to validate the bioinformatics analysis. Secondly, the structure of the manuscript has been refined to ensure logical coherence and clear transitions between paragraphs. Thirdly, important findings have been emphasized to ensure readers’ comprehension and inspiration. Furthermore, the manuscript was revised by both domestic and international experts to enhance the readability and clarity.

      (2)  For the functional validation, in vivo transfer could not be carried out so far due to ethical limitation. But human embryos are able to develop and grow more efficiently in combining with the receptive endometrial organoids we generated (unpublished data).

      (3) As you suggested, we replaced the “closest” with “closer”. It is undeniable that the model cannot completely simulate the in vivo implantation process that the luminal epithelium of the endometrium contacts the embryo first.  

      Q3: Third, receptive endometrial organoids (assembloids; Rawlings et al., eLife 2021) and receptive organoid-derived "open-faced endometrial layer" (Kagawa et al., Nature 2022) have already been described, which is in contrast to what the authors claim in several places that "they are the first" (e.g. lines 87-88, 316-319, etc). These studies used real organoids to achieve their model (and even showed embryo interaction), while in the present study, different cell types are just seeded and assembled. Hence, logically, immune cells are present which are never found in real organoid models. The only original aspect in the present study is the use of hormones to enhance the WOI phenotype. However, crucial information on this original aspect is missing such as concentration of the hormones, refreshment schedule, all 3 hormones added together or separately, and all 3 required?

      Thank you for pointing out these researches referring to endometrial organoids.

      (1) While we didn’t explicitly state "the first", we should be careful to use the expressions similar to "the first". It has been changed to a gentle and modest expression, as follows “we are far from understanding how embryo implantation occurs during the WOI due to ethical limitations and fewer in vitro receptive endometrial model” and “which confirms that they are closer to the in vivo state”.

      (2) The definition of organoids and the existence of immune cells have been detailed addressed in the first question.

      (3) In terms of hormone scheme, hormone concentrations have been detailed in Table S2 of Supplementary. Estrogen was supplemented to the basal medium for the initial two days, after which a combination treatment of MPA, cAMP, PRL, hPL, and HCG was administered for the subsequent six days. The medium was refreshed every two days.

      All three hormones were deemed necessary, which was validated by multiple group comparisons. Only the organoids treated with all six hormones together exhibited an endometrial receptivityrelated gene expression profile. (Author response image 2).

      Author response image 2.

      Heatmap showing receptivity related gene expression profile of organoids in each hormone regimen.  

      Q4: Moreover, it is not a "robust" model at all as the authors claim, given the variability of the initial cell mixture (varying from patient to patient). Actually, the reproducibility is not shown. The proportions of the different cell types seeded in the Matrigel droplet will be different with every endometrial biopsy. It would be much better to recombine epithelial (passageable) organoids with stromal and immune cells in a quantified, standardized manner to establish a "robust" model.

      Thanks for your suggestion.  

      Firstly, the constructed endometrial organoids generally consist of epithelial, stromal, and immune cells. However, it is undeniable that the cell proportions may vary slightly among different patients. Secondly, the term "robust" is intended to convey strong support for embryo development, which will be supported by our next study (unpublished data). Therefore, robust is replaced here as alternative. Thirdly, as for "reproducibility", the hormone-treated organoids from different women exhibited similarity to the in vivo receptive endometrium through multi-omics analysis, ERT, and various other experiments.  

      Reviewer #2 (Public Review):

      Q1: With endometrial receptivity analysis, they suggest a successful formation of the implantation window in vitro, but this result is difficult to interpret.

      Thanks for your question.  

      We understand that the most effective way to demonstrate endometrial receptivity is embryo implantation, which was conducted simultaneously and will be presented in our next study. In this study, we validated the receptivity based on the current researches.

      (1) At the single-cell transcriptome level, the cellular composition and function of the receptive endometrial organoids were similar to those of the in vivo implantation window (Stephen R. Quake et al, 2020).

      (2) At the whole organoids level, the receptive endometrial organoids exhibited the similar characteristics in transcriptome and proteome to the in vivo mid-secretory endometrium (Andres Salumets 2017, Qi Yu 2018, Triin Laisk 2018, Edson Guimarães Lo Turco 2018, Xiaoyan Chen 2020, Francisco Domínguez 2020, DavidW. Greening 2021, Norihiro Sugino 2023). The receptive endometrial organoids were also validated by endometrial receptivity test (ERT), which utilized high-throughput sequencing and machine learning to assess endometrial receptivity (Yanping Li et al., 2021).  

      (3) At the microstructural level under electron microscope, the receptive endometrial organoids exhibited characteristics of the implantation window, such as pinopodes, glycogen particles, microvilli, and cilia.

      Overall, the receptive organoids we constructed closely resemble the in vivo implantation window at the single-cell, organoids, and microstructural levels based on existing researches.

      Q2: Analyzing transcriptome and proteome information of WOI organoids, authors demonstrate a strong response to estrogen and progesterone, but some comparisons are made with CTRL and SEC, and others only with CTRL, which limits the power of some results. In the same way, some genes related to Cilia and pinopodes appear dominant in WOI organoids, but the comparison by electron microscopy is made only against CTRL organoids.  

      In subsequent analysis, WOI organoids showed a marked differentiation from proliferative to secretory epithelium, and from proliferative epithelium to EMT-derived stromal cells than SEC organoids. These statements are based on their upregulation of monocarboxylic acid and lipid metabolism, their enhanced peptide metabolism and mitochondrial energy metabolism, or their pseudotime trajectories. However, other analyses (such as the accumulation of secretory epithelium or decreased proliferative epithelium, the increased ciliated epithelium after hormonal treatment, or the presence of EMT-derived stromal cells) show only small differences between SEC and WOI organoids.

      Thank you for raising these important questions.

      (1) At the organoid level, the differences in transcriptome and proteome between SEC and WOI organoids are not significant. This is understandable because WOI organoids are further induced towards the implantation window based on the secretory phase (i.e. SEC organoids), and both are similar at the overall organoid level.  

      (2) At the single-cell level, the accumulation of secretory epithelium, decreased proliferative epithelium, increased ciliated epithelium post hormonal treatment, or the presence of EMTderived stromal cells are the fundamental features of the secretory endometrium. Therefore, these features are present in both WOI and SEC organoids. However, the most notable differences lie in the more comprehensive differentiation and varied cellular functions exhibited by WOI organoids compared to SEC organoids.

      (3) Regarding electron microscopy, we have now quantitatively compared the presence of various characteristic structures such as microvilli, cilia, pinopodes and glycogen in the CTRL, SEC and WOI groups. It has been observed that WOI organoids possess longer microvilli and increased cilia, glycogen, and pinopodes compared to SEC organoids (Fig2H).

      Reviewer #1 (Recommendations For The Authors):

      Q1: Several of the key methods are performed by companies, hence not in detail described and therefore not verifiable which is essential for reviewers and readers.

      We are grateful for the suggestion. Specific methods have now been incorporated into the "Supporting Information" section. (Line91~102, Line 107~123, Line 132~139)

      Q2 - Line 49: It is not shown in the present study whether the WOI organoids are a 'robust' platform.

      - Line 76: There is a study (Dolat L., Valdivia RH., Journal of Cell Science, 2021) that developed a co-culture with endometrial organoids and immune cells (neutrophils) which should be mentioned.:

      We have reweighed the word and now replace 'robust' with 'alternative' (Line 54).  We have considered the reviewer's suggestion and added this citation (Line 82-83) about the cocultivation of immune cells with endothelial organoids, which was not previously cited mainly because the research model was mouse.

      Q3: Figure 1: Endometrial organoids possess endometrial morphology and function. - The authors should further explain their decision to add PRL, hCG, and hPL to the organoid culture. Why these particular compounds? What is their specific role during the WOI?

      In terms of hormone scheme, estrogen and progesterone promote the transition of endometrial organoids into the secretory phase, and on this basis, pregnancy hormones can further promote their differentiation. PRL promotes immune regulation and angiogenesis during implantation, HCG improves endometrial thickness and receptivity, and HPL promotes the development and function of endometrial glands. Our constructed WOI organoid is in a state conducive to embryo implantation. We aim to develop an in vitro model for embryo implantation study. The detailed explanation of this aspect was initially provided in the Discussion section (Lines 298–313). To enhance the clarity for reviewers and readers regarding the selection of the hormonal regimen, we have now articulated it in the Results section (Lines 124–130).

      When selecting hormone formulations, multiple group comparisons were made. It was found that the number, area, and average intensity of organoids in these groups were similar over time. But the WOI organoids showed endometrial receptivity related gene expression profile, which highly expressed genes positively correlated with endometrial receptivity, and lowly expressed genes negatively correlated with receptivity, compared to the other hormone formulations (added to Figure S1E, S1F). Hormone dosage was primarily based on peri-pregnant maternal body or localized endometrium levels (Margherita Y. Turco et al., Nature Cell Biology 2017).

      -  Line 108: "the endometrial cells" instead of "endometrial organoid"? Because the authors also refer to the stromal cells.

      You should be referring to this sentence “The endometrial organoid, consisting of vesicle-like glands, fibrous stromal cells, and other surrounding cells, developed into a 3D structure with the support of Matrigel”. Organoid, a self-assembled 3D structure, consists of multiple cells and closely resembles in vivo tissue or organ. It offers high expansibility, phenotypic, and functional properties. Here, we aim to delineate the endometrial organoid, comprising epithelial cells, stromal cells, and other cellular components that assemble to form intricate 3D structures. Hence, the term "endometrial organoid" is more appropriate.

      -  Line 110: "the endometrial glands", do the authors mean the endometrial organoids? The authors also mention they enlarge, which must be quantified.

      You should be referring to this sentence “As the organoids grew and differentiated, the endometrial glands enlarged, epithelial cells adopted a paving stone arrangement, and stromal cells formed an extensive network”. Here, we mean the “endometrial glands” grow progressively in the organoids. We agree with your suggestion to quantify the change of organoids’ area over time, and found that they increased progressively in all three groups (shown as follows) (Fig.S1E) (Line130-131) 

      Author response image 3.

      The dynamic changes of the area of organoids over time in the CTRL, SEC and WOI organoids.

      -  Line 112: E-cadherin is a general epithelial marker, not a glandular marker.

      We agree with your suggestion and now change to ‘The epithelium marker E-cadherin’ (Line110).

      -  Line 116: Which group was used for KI67 and CC3 staining?

      The CTRL organoids were used for Ki67 and CC3 staining. We have modified this expression in the Figure 1E Legend.

      -  Line 123: Organoid size (diameter or area) needs to be quantified to claim that WOI organoids grow slower than SEC/CTRL organoids. The same goes for Ki67+ cells for proliferation. In the legend of Fig 1B, the authors in contrast state that the organoids show a similar growth pattern.

      We are extremely grateful to you for pointing out this problem. We quantitatively analyzed the size of organoids in the three groups. The area was found to be increasing over time, with the three groups growing the most vigorously in the CTRL group, followed by the SEC group and the WOI group, but the differences were not statistically significant. Relevant results have been added to Figure S1E (Line130-131). There were no significant differences in Ki67 expression of these organoids. Therefore, the three groups of organoids showed a similar growth pattern. We decided to delete the statement “Following hormonal stimulation, WOI organoids exhibited slower growth than SEC and CTRL organoids, while CTRL organoids maintained robust proliferative activity (Fig. 1B)”.

      Author response image 4.

      The dynamic changes of the area of organoids over time in the CTRL, SEC and WOI organoids.

      -  Line 126: Fourteen days of organoid treatment is a very long time. Growing organoids may already be dying which should be checked by CC3 staining to prove that organoids are still fully viable.

      Endometrial organoids are vigorous in proliferation and have a long survival period due to the presence of adult stem cells. To address your queries effectively, we conducted CC3 staining on the organoids treated for 14 days, revealing negligible expression levels (shown as below).

      Author response image 5.

      Figure note: The Ki67 and CC3 immunostaining on the organoids after 14-day hormone treatment.

      -  Line 128: Changes in hormone receptors should be supported by RT-qPCR data to be more convincing

      We agree with your suggestion. Here we supplemented the RT-PCR results of hormone receptors as follows (Figure S1D) (Line119-121). PAEP and PGR are associated with progesterone, and OLFM4 and EGR1 are associated with estrogen.

      -  1A: Are authors able to see and characterize decidualized stromal cells as indicated in the illustration?

      Upon the reviewer's inquiry, we carefully observed the morphology of stromal cells in hormone-treated organoids. Regrettably, the morphology of decidualized stromal cells was not ascertainable through light microscopy in our endometrial organoids.

      -  1C: Which treatment condition are the organoids in these images?

      This figure showed the bright-field morphology of the CTRL organoids, which is now noted in the Figure 1C legend.

      -  1D: PAS staining should be quantified to support the claims.

      We agree with your suggestion. The quantitative comparison of PAS staining was conducted in these three groups of organoids (Figure S1G) (Line142-143)

      -  1D: Where are the stromal cells in the model? There should be vimentin-positive cells outside of the glands.

      The figure 1D illustrates the outcomes of section staining, which owned limitation to displaying stromal cells around the gland. Considering the 3D structure of organoids, we conducted organoid clearing and staining, and observed stromal cells (marked by Vimentin) under light sheet microscope (shown as below). The stromal cells were also presented using this method in the original Figure 2B.

      Author response image 6.

      Exhibition of stromal cell marked by vimentin of CTRL organoid through whole-mount clearing, immunostaining and light sheet microscopy imaging. Nuclei were counterstained with DAPI. The arrowhead indicates stromal cells. Scale bar = 70 μm.

      Figure 2: Developing receptive endometrial organoids in vitro mimicking the implantation window endometrium.

      -  Line 142: CD44 is not an exclusive marker for immune cells. It has been shown to be expressed in glandular secretory epithelial cells (Fonseca et al., 2023). The authors also mention that CD44 is expressed in stromal cells (line 265). Staining for CD45 (or another immune-specific marker) is needed to demonstrate the presence of immune cells. 

      We appreciated your suggestions. We demonstrated the distribution of immune cells in organoids using the organoid clearing technique in combination with light-sheet microscopy imaging, using CD45 as a marker (Figure 2C).

      -  Line 144: What are the proportions of the immune cells? What is the variation between patient samples?

      We assessed the proportion of immune cells with the help of flow cytometry and analyzed the proportion of Macrophages and T cells in organoids derived from 8 patients. The proportion of WBC in organoids was about 3%~4% (Figure 2D), among which macrophages were less than 1% and T cells less than 2% (Figure S2E). There existed a very few patients with large heterogeneity, and the proportion of immune cells in most patients was

      relatively stable.

      -  Line 161: What is the endometrial receptivity test (ERT)? Not explained at all.

      Endometrial Receptivity Test (ERT) is a kind of gene analysis-based method for detecting endometrial receptivity, which combines high-throughput sequencing and machine learning to analyze the expression of endometrial receptivity-related genes, allowing for a relatively accurate assessment of endometrial receptivity. It is currently used in clinical practice to determine endometrial receptivity and guide personalized embryo transfer (Yanping Li et al., J Transl Med 2021). (line179-183)

      -  2A: The authors' dataset is compared to a published dataset. How were they combined? Were they merged, mapped on each other, or integrated? Were all cells employed from the published dataset or specific cell types? Much detail to evaluate the analysis is missing.

      We are very grateful for your comments.  

      (1) The four raw datasets (CTRL, SEC and WOI organoids, and mid-secretory endometrium) underwent batch correction and integration using Harmony. Subsequently, the integrated dataset underwent dimensionality reduction via  PCA. The soft k-means clustering algorithm was employed to address batch effects and clustering, utilizing a clustering parameter resolution of 0.5. Finally, the clustering results were visualized using tSNE based on the cell subpopulation classification. (“Methods” Line164-175)

      (2) The Figure 2A displayed comparison of glandular and luminal epithelium, secretory epithelium, LGR5 epithelium, EMT-derived stromal cells, ciliated epithelium, and glandular secretory epithelium (shown as Figure S2C~S2D) (Line150-154)

      - 2E: Please add the cell type names above the heatmaps to improve readability.

      Thanks to your suggestion, we have added the cell type names above the heatmaps.

      - 2G: The difference between the left and right graphs is not clear from the figure itself. Improve by adding a title and more explanation.

      Thanks for your careful review. We have added the title to the left and right graphs.

      Supplementary Figure 3 is referenced with Figure 2. Supplementary Figure 2 is referenced with Figure 3. The order needs to be changed.

      Thanks for your careful review. We have changed the order.

      - S3B: Typical markers for annotation of the different cell clusters are not included and therefore it is not convincing enough that annotations are correct. E.g. Epithelial markers (EPCAM, CDH1), Stromal cells (VIM, PDGFRA), SOX9+LGR5+ cells (SOX9, LGR5). How were the EMT-derived stromal cells designated? It is not clear from the data whether they are in fact EMT-derived or whether they show epithelial markers as well (stated in line 246).

      We deeply appreciate your suggestion. We provided more details to describe the cell clustering as the following. Single-cell transcriptomics analysis referred to CellMarker, PanglaoDB, Human Cell Atlas, Human Cell Landscape, and scRNASeqDB, and previous endometrium related studies. (W. Wang et al., Nat Med 2020, P. D. Harriet C. Fitzgerald et al., PNAS 2019, K. M. Thomas, M Rawlings et al., eLife 2021, L. Garcia-Alonso et al., Nat Genet 2021) 

      (1) SOX9+LGR5+ cells: SOX9 and LGR5 are both proliferative markers. SOX9 is expressed in all clusters dispersedly. LGR5 is mainly expressed in two clusters, one of which is stem derived epithelium, and the other cluster expresses LGR5 in a scattered manner. Refer to the markers of SOX9+LGR5+ cells, SOX9+LGR5- cells, and SOX9+ proliferative cells in 2021 Nature Genetics (L. Garcia-Alonso et al., Nat Genet 2021), the cells in this cluster expressed high levels of NUAK2, CNKSR3, FOS and LIF, which was consistent with the expression profiles of SOX9+LGR5+ cells and SOX9+ proliferative cells. However, considering that the number of cells expressing LGR5 was relatively small, this cluster of cells was renamed SOX9+ proliferative epithelium.

      Figure 3: Receptive endometrial organoids recapitulate WOI-associated biological characteristics. - Line 173-174: The WOI organoids should be compared in detail to the SEC organoids in addition to the CTRL organoids, to show that this WOI model and new hormonal treatment is providing better results compared to the SEC organoids and the results obtained in previous studies.

      Thanks for your suggestion. At the organoid level, the differences in transcriptome and proteome between SEC and WOI organoids are not significant. This is understandable because WOI organoids are further induced towards the implantation window based on the secretory phase (i.e. SEC organoids), which prompted us to continue exploring at the single-cell level.

      - Line 190: Quantification of pinopodes is required to claim that they are more densely arranged in WOI organoids. 

      - Line 190-191: Again, is there a difference in pinopode presence between the WOI and SEC organoids to show that the WOI organoids are really distinct and a better model?

      We agree with the reviewer’s suggestion and quantified the pinopodes. The CTRL, SEC and WOI organoids were found to have increasing numbers of pinopodes, with WOI organoid owning the most abundant pinopodes under electron microscope. (Figure 2H) (Line184-186)

      - Line 194: Also here, quantification of the glycogen particles is missing.

      We agree with your suggestion. We have quantified the area of glycogen particles under electron microscope in the CTRL, SEC and WOI organoids. It was found that WOI organoid had the most glycogen particles. (Figure 2H) (Line184-186)

      - 3C: There is no difference between SEC and WOI organoids condition for OLFM4 and PRA/B. What is the purpose then of adding extra hormones if no difference is present?

      The figure 3C indicated that there was no significant difference in OLFM4 and PRA/B level (reflecting estrogen and progesterone responsiveness) in SEC and WOI organoids at the organoids level. It is understandable because WOI organoids are induced further into the implantation window on the basis of the secretory phase (i.e., SEC organoids), and both are similar at the overall level of organoids. Based on this, we further explored the differences between WOI organoids and SEC organoids at the single-cell level.

      - 3G: A higher magnification is necessary to evaluate cilia staining. From these images, it seems like CTRL organoids also express acetyl-a-tubulin.

      Thanks for your suggestion. The figure has been enlarged and shown as below. The acetyl-a-tubulin of WOI organoids is different from that of CTRL organoids in morphology and expression level. The glands of WOI organoids have small green tips (expressing acetyl-α-tubulin) convex toward the lumen. WOI organoids expressed higher level of acetyl-α-tubulin than CTRL organoids. (Now replaced with Figure 3G in the revised draft).

      Figure 4: Structural cells construct WOI with functionally dynamic changes

      - Line 211: To which figure are these claims referring to?

      You should be referring to this sentence “In terms of energy metabolism, the WOI organoids exhibited upregulation of monocarboxylic acid and lipid metabolism, and hypoxia response”. Up-regulation of monocarboxylic acid and lipid metabolism in WOI organoids is reflected in Figure 3B, and up-regulation of hypoxia responses is reflected in Figure S3F.

      - In general, it should be stated in the text that CellPhoneDB is a useful tool to investigate ligandreceptor interactions, however, it only proposes potential interactions. To validate such interactions, stainings and functional assays are required.

      Thanks for your suggestion. The CellphoneDB was briefly introduced in the "Methods" section of "Supporting information" originally. Now it has been explained in the line 256-257 of main text.

      We agree that staining and functional assays are required to validate the ligand-receptor interactions. Therefore, we used the proximity ligation assay (PLA) to verify the trend of interaction. (Figure S2J, Line259-261, Line 277-279, Line 285-288)

      - Line 243: Please describe the process of EMT in the endometrium more specifically.

      EMT is a common and crucial biological event in the endometrium during the implantation window. During the EMT process, epithelial cells lose their epithelial characteristics while gaining migratory and invasive properties of fibroblasts.

      During the attachment and adhesion phases of embryo implantation, interaction mediated by trophoblastic factors (e.g. integrins) and maternal ECM factors (e.g. fibronectin) induce the eventual EMT in the trophectoderm. During the peri-implantation period, microRNAs, (e.g. miR429 and miR-126a-3p) which regulate EMT, are expressed in the maternal luminal epithelium to different degrees, mediating its transformation process as the blastocyst invades the maternal decidua. The epithelium of endometrium transforms to epithelioid stromal cells with increased migratory and invasive capacities through the EMT process. The decidual stromal cells migrate away from the implantation site, having acquired increased motility. (Line 265-267)

      - Lines 247-251 and 313-316: the claim that proliferative epithelium transforms into EMT derived stromal cells by pseudotime trajectory is too bold and must be underpinned by other means. Pseudotime analysis only suggests and is by definition biased since the first/originating population must be defined by the operator.

      In addition to pseudotime analysis based on monocle, RNA rate analysis based on scVelo is also used for cell evolution analysis. They can prove each other if both analyses indicate the transformation from proliferative epithelium to EMT-derived stromal cell. RNA rate analysis automatically determines the direction of differentiation, which can be used as evidence to determine the starting point of pseudotime analysis.

      RNA rate analysis showed that the EMT derived stromal cell was most closely connected to the proliferative epithelium. Besides, the pseudotime point plot inferred that the proliferative epithelium was the root cell. It can be mutually proved with pseudotime analysis that the transformation from proliferative epithelium to EMT-derived stromal cell.

      Author response image 7.

      RNA rate junction diagram (To infer intercellular connectivity)

      Author response image 8.

      Time differentiation of cells

      Discussion

      - Line 300-302: It would be interesting to investigate ATP production and IL8 release in the WOI organoids to validate with findings from in vivo.

      To answer this point of your interest, we purposely examined ATP production and IL8 release. It was found that WOI organoids indeed produced much more ATP and IL8 than CTRL and

      SEC organoids (Figure S3L) (Line323-324)

      - Line 313-316: Do the WOI organoids lose polarity and cell-to-cell junctions?

      Transcriptome sequencing revealed downregulation of cell adhesion and RHO GTPase signaling in WOI organoids (Figure 3B). Electron microscopy revealed that the cellular arrangement of WOI organoids was slightly looser than that of CTRL organoids, but the microvilli were still oriented toward the medial side of the glands and did not undergo polarity reversal (shown as below).

      Author response image 9.

      Electron micrograph of the CTRL (left), and WOI (right) endometrial organoid. Scale bar = 5 μm.  

      - Line 322: Where is the data that shows that 'a decreased abundance of immune cells', is observed?  

      A decreased abundance of immune cells was observed through single-cell transcriptome sequencing and flow cytometry. The number of immune cells was reduced in WOI organoids compared to CTRL organoids in single-cell sequencing results (Figure 4A). Besides, flow cytometry also showed that the percentage of WBCs in WOI organoids was lower than that in CTRL organoids (Figure S2F).  

      - Line 324: Elaborate more on how the immune cell composition differs from the endometrium.

      The differences of immune cell composition between organoids and endometrium were mainly reflected in the proportion of WBC, the proportion of immune cell subtypes and the changes of T cells after entering the implantation window.

      Firstly, the proportion of WBCs in organoids was lower than that in endometrium. Flow cytometry showed that the proportion of WBC in organoids was about 3%~4% (Figure 2D), but the proportion of WBCs in endometrium was about 8% (W. Wang et al., Nat Med 2020). Secondly, the proportions of T cells and macrophages in organoids were about 2%~3% and 1% (Figure 2D), respectively, but the proportions of lymphocytes and macrophages in endometrium were 7%~8% and 0.6%~0.7% (W. Wang et al., Nat Med 2020). Besides, after entering the implantation window, T cells in WOI organoids decreased (Figure S2F), while T cells in endometrium increased (W. Wang et al., Nat Med 2020). These three aspects have differences in vivo and in vitro. (Line347353)

      Material and Methods

      -  What are the concentrations of all medium components?

      Thanks to your suggestions. The concentrations of all medium components have now been refined in Table S1.

      -  Authors mention 10x while Smartseq2 is mentioned in Dataset S7?

      Thanks for your careful review. Single cell transcriptome sequencing in this study was done using 10X Genomics. Smartseq2 was used to sequence the transcriptome of a gland and its surrounding cells, which can be regarded as small bulk RNA sequencing. A small number of cells are utilized in Smartseq2 to construct a full-length mRNA library with enhanced transcript sequencing coverage, making it particularly well-suited for small-scale samples such as organoids.

      The data in Dataset S7 are acquired from small bulk RNA-seq with Smartseq2.  

      Reviewer #2 (Recommendations For The Authors):

      Q1: The theoretical choice of extra reagents added to the WOI organoids culture (PRL, hCG, and hPL) is theoretically justified, but not experimentally. On what previous studies, or performed experiments, are the choice of conditions used based?

      When selecting hormone formulations, multiple group comparisons were made. It was found that the number, area, and average intensity of organoids in these groups were similar over time. But the WOI organoids showed endometrial receptivity related gene expression profile, which highly expressed genes positively correlated with endometrial receptivity, and lowly expressed genes negatively correlated with receptivity, compared to the other hormone formulations (added to Figure S1E, S1F). Hormone dosage was primarily based on peri-pregnant maternal body or localized endometrium levels (Margherita Y. Turco et al., Nature Cell Biology 2017).

      Q2: Text in line 111 indicates that "stromal cells formed an extensive network", but vimentin fluorescence is not present on any image surrounding organoids in that figure. This assertion could only be supported by the subsequent results in Figure 2B. In addition, it is not indicated what kind of organoids have been used for these experiments

      The stromal cells arranged around the glands in the 3D structure (as shown in Figure 1C and Figure 2B), where bright-field high magnification photography, clearing staining of the organoids, and light microscopy imaging were used, respectively. However, there are many steps of fixation, embedding, staining and elution during the immunostaining of sections. It is difficult to preserve the arrangement and morphology of the stromal cells in the slice, so the stromal cells were not intentionally captured in the other images.  

      Figure 1C and Figure 2B are both CTRL organoids, which are now noted in the corresponding figure legend section.  

      Q3: It is not clear how glycogen secretion into the lumen is assessed in Figure 1D.

      Glycogen from the subnuclear region of the glandular cells gradually reaches the top of the cells, i.e., the supranuclear region, and is discharged into the glandular lumen as parietal plasma secretion. Glycogen-containing eosinophilic secretion can be seen in the glandular lumen in Figure1D.

      Q4: Assertions about differences in proliferation between groups are purely subjective; some kind of measurement and analysis would be necessary to be sure that there is differential proliferation based on Figure 1B.

      We are extremely grateful to you for pointing out this problem. We quantitatively analyzed the size of organoids in the three groups. The area was found to be increasing over time, with the three groups growing the most vigorously in the CTRL group, followed by the SEC group and the WOI group, but the differences were not statistically significant. Relevant results have been added to Figure S1E (Line130-131).

      Q5: For progesterone receptor expression analysis organoids are cultured for fourteen days. What is the basis for this change in culture time? 

      The choice of time point here is based on the secretary period of 14 days in the female menstrual cycle, when the endometrium is stimulated by estrogen and progesterone to maximized

      level.

      Q6: "n" number of individuals analysed through single-cell transcriptomics is not indicated.

      One patient's endometrium was simultaneously constructed into CTRL, SEC and WOI organoids, which were then subjected to single-cell transcriptome sequencing. This is described in the Supporting Information (Line 141-142).

      Q7: Where does the classification of EMT-derived stromal cells come from?

      EMT is a common and crucial biological event in the endometrium during the implantation window. During the EMT process, epithelial cells lose their epithelial characteristics while gaining migratory and invasive properties of fibroblasts.

      This cluster of cells expresses both epithelium markers CDH1 and EPCAM, and specifically expresses high levels of the EMT-related stromal cell markers AURKB, HJURP and UBE2C. During endometrial EMT, AURKB upregulates MMP2, VEGFA/Akt/mTOR and Wnt/β-catenin/Myc pathways to induce EMT (Zhen Wang et al., Cancer Manag Res 2020). HJURP also activates Wnt/β-catenin signaling to promote EMT (Y Wei et al., Eur Rev Med Pharmacol Sci 2019, Tianchi Chen et al., Int J Biol Sci 2019). UBE2C is upregulated by estrogen to promote EMT (Yan Liu et al., Mol Cancer Res 2020). Therefore, this cluster was defined as "EMT-derived stromal cells”.

      Q8: In the endometrial receptivity test (ERT), endometrium sample data matches with prereceptive endometrium and WOI organoids data matches with a receptive endometrium, but why there is no information about CTRL and SEC organoids?

      We performed ERT on these samples at a time when our hospital has a cooperative project with Yikon Genomics (Jiangsu, China). However, only endometrium and WOI organoids were sent for testing due to the limited quotas. Considering the end of cooperation and batch effect, no more CTRL and SEC organoids were tested. Moreover, the current ERT is a machine learning model based on the sequencing data of endometrium samples. But there are still differences in cellular composition between endometrial organoids and endometrium. Thus, the results need to be interpreted in conjunction with other results.

      Q9: When analysing the transcriptome and proteome, some comparisons are made between WOI vs CTRL and SEC, or just WOI vs CTRL. It would be interesting to have all the comparisons since the power of WOI organoids lies in their differences with SEC organoids.

      Thanks for your suggestion. At the organoid level, the differences in transcriptome and proteome between SEC and WOI organoids are not significant. This is understandable because WOI organoids are further induced towards the implantation window based on the secretory phase (i.e. SEC organoids), which prompted us to continue exploring at the single-cell level.

      Q10: Electron microscopy comparisons with respect to pinopods, cilia, and microvilli are only performed between WOI and CTRL. It would be interesting to check it with SEC.

      We now quantitatively compared the presence of various characteristic structure like microvilli, cilia, pinopodes and glycogen in the CTRL, SEC and WOI organoids. It was found that WOI organoid had longer microvilli and increased cilia, glycogen, and pinopodes (Figure 2H).

      Q11: Line 190 states that pinopods are arranged more densely in WOI organoids than in CTRL organoids. Seems to be a subjective observation. Is there an objective method to quantify this?

      We agree with the reviewer’s suggestion and quantified the pinopodes. The CTRL, SEC and WOI organoids were found to have increasing numbers of pinopodes, with WOI organoid owning the most abundant pinopodes. (Figure 2H) (Line184-186)

      Q12: Some characteristics are very similar between WOI and SEC organoids (such as the accumulation of secretory epithelium or decreased proliferative epithelium, the increased ciliated epithelium after hormonal treatment, or the presence of EMT-derived stromal cells). The authors should complement the discussion by objectively justifying the use of WOI versus SEC organoids. Would they be useful in more specific cases or at a general level when studying implementation?

      Thanks for your comments. WOI organoids are differentiated from SEC organoids towards the implantation window. Therefore, WOI organoids are suitable for studying periimplantation physiological changes or exploring pathological mechanisms. SEC organoids can be used when studying only a range of pathological problems such as endometrial secretory phase changes or hormone reactivity. (Line 365-368)

      Q13:ExM media is described in Table S1, but it does not include the concentration of the different reagents in the culture medium, which is the most interesting data about the ExM medium.

      Thanks to your suggestions. The concentrations of all medium components have now been refined in Table S1.

      Q14: It is not specified which organoid pass is used in each experiment. Is it always the same pass?

      Our experiments were conducted using P1~P3 generation endometrial organoids, as specified in the “Supporting Information” Line 54~55.

      Q15: As a protocol for freezing organoids is included in materials and methods, do the authors use freshly cultured organoids or do they cryopreserve them and thaw them for culturing?

      Thanks for your question. We used freshly cultured organoids in the manuscript. We listed the freezing protocol to illustrate that the constructed organoids can be frozen and recovered for special experimental needs and the establishment of sample banks.

      Q16: The most important point: Neither of the two studies that developed human endometrial organoids from tissue biopsies (Boretto et al. 2017 and Turco et al. 2017), observed stromal cell growth in culture. They disappeared between the first and second pass (as indicated by Turco et al. 2017). How do the authors justify the presence of stromal cells in their organoid culture if they rely on the protocols previously described by these research groups? If it is the case that they can only use the initial pass (freshly planted cells from endometrium), it does not make sense to include the freezing of the different passes in materials and methods, since the expansion capacity of the culture would be lost, which implies a major limitation of the model.

      Thanks for your question.  

      (1) We did not completely follow the protocols of these research groups. To maximize the recovery of both epithelial and stromal cells, we optimized key steps such as tissue digestion and cell strainer filtration. We shortened the digestion time to 20 minutes to protect cells from the digestion solution and retain some cell aggregates, which are beneficial for maintaining cell stemness and preserving stromal and immune cells cluster. The 40 μm filter membrane was used to isolate the endometrial cells, which may acquire both epithelial, and stromal cells.

      (2) Our experiments were conducted using P1~P3 generation of freshly constructed organoids. However, we also used recovered organoids when fresh endometrial samples were not available due to the COVID-19 epidemic. It was found that the organoids (e.g., P0~P5) still exhibited vigorous growth condition after recovery and could continue to be cultured by passaging (shown as below).

      The recovered organoids can be used for special experiments and biobank establishment.

      Author response image 10.

      The endometrial organoids of different passages were observed before cryopreservation and after recovery. Scale bar = 200 μm.

      Q17: It is not clear which organoids include Figure S2F. Does it include the three types of organoids or just WOI organoids?

      This circle diagram showed the functions of upregulated genes in the WOI group compared to CTRL group from combined transcriptome and proteome analysis, which has been labeled in the figure legend section.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The Hedgehog (HH) protein family is important for embryonic development and adult tissue maintenance. Deregulation or even temporal imbalances in the activity of one of the main players in the HH field, sonic hedgehog (SHH), can lead to a variety of human diseases, ranging from congenital brain disorders to diverse forms of cancers. SHH activates the GLI family of transcription factors, yet the mechanisms underlying GLI activation remain poorly understood. Modification and activation of one of the main SHH signalling mediators, GLI2, depends on its localization to the tip of the primary cilium. In a previous study the lab had provided evidence that SHH activates GLI2 by stimulating its phosphorylation on conserved sites through Unc-51-like kinase 3 (ULK3) and another ULK family member, STK36 (Han et al., 2019). Recently, another ULK family member, ULK4, was identified as a modulator of the SHH pathway (Mecklenburg et al. 2021). However, the underlying mechanisms by which ULK4 enhances SHH signalling remained unknown. To address this question, the authors employed complex biochemistry-based approaches and localization studies in cell culture to examine the mode of ULK4 activity in the primary cilium in response to SHH. The study by Zhou et al. demonstrates that ULK4, in conjunction with STK36, promotes GLI2 phosphorylation and thereby SHH pathway activation. Further experiments were conducted to investigate how ULK4 interacts with SHH pathway components in the primary cilium. The authors show that ULK4 interacts with a complex formed between STK36 and GLI2 and hypothesize that ULK4 functions as a scaffold to facilitate STK36 and GLI2 interaction and thereby GLI2 phosphorylation by STK36. Furthermore, the authors provide evidence that ULK4 and STK36 co-localize with GLI2 at the ciliary tip of NIH 3T3 cells, and that ULK4 and STK36 depend on each other for their ciliary tip accumulation. Overall, the described ULK4-mediated mechanism of SHH pathway modulation is based on detailed and rigorous Co-IP experiments and kinase assays as well as confocal imaging localization studies. The authors used various mutated and wild-type constructs of STK36 and ULK4 to decipher the mechanisms underlying GLI2 phosphorylation at the tip of the primary cilium. These novel results on SHH pathway activation add valuable insight into the complexity of SHH pathway regulation. The data also provide possible new strategies for interfering with SHH signalling which has implications in drug development (e.g., cancer drugs).

      However, it will be necessary to explore additional model systems, besides NIH3T3, HEK293 and MEF cell cultures, to conclude on the universality of the mechanisms described in this study. Ultimately, it needs to be addressed whether ULK4 modulates SHH pathway activity in vivo. Is there evidence that genetic ablation of ULK4 in animal models leads to less efficient SHH pathway induction? It also remains to be resolved how ULK3 and ULK4 act in distinct or common manners to promote SHH signalling. Another remaining question is, whether cell type- and tissue-specific features exist, that play a role in ULK3- versus ULK4-dependent SHH pathway modulation. In particular for the studies on ciliary tip localization of factors, relevant for SHH pathway transduction, a higher temporal resolution will be needed in the future as well as a deeper insight into tissue/ cell type-specific mechanisms. These caveats, mentioned here, don't have to be addressed in new experiments for the revision of this manuscript but could be discussed.

      We agree with the reviewer that it would be important to investigate in the future the in vivo function Ulk4 in Shh signaling, the relationship between Ulk3 and Ulk4/Stk36, and possible cell type/tissue specificity of these two kinase systems. This will need the generation of single and double knockout mice and examine Hh related phenotypes in different tissues and developmental stages. The precise mechanism by which Ulk4 and Stk36 are translocated to the ciliary tip is also an important and unsolved issue. We include several paragraphs in the “discussion” section to address these outstanding questions for future study.

      Reviewer #2 (Public Review):

      The authors provide solid molecular and cellular evidence that ULK4 and STK36 not only interact, but that STK36 is targeted (transported?) to the cilium by ULK4. Their data helps generate a model for ULK4 acting as a scaffold for both STK36 and its substrate, Gli2, which appear to co-localise through mutual binding to ULK4. This makes sense, given the proposed role of most pseuodkinases as non-catalytic signaling hubs. There is also an important mechanistic analysis performed, in which ULK4 phosphorylation in an acidic consensus by STK36 is demonstrated using IP'd STK36 or an inactive 'AA' mutant, which suggests this phosphorylation is direct.

      The major strength of the study is the well-executed combination of logical approaches taken, including expression of various deletion and mutation constructs and the careful (but not always quantified in immunoblot) effects of depleting and adding back various components in the context of both STK36 and ULK3, which broadens the potential impact of the work. The biochemical analysis of ULK4 phosphorylation appears to be solid, and the mutational study at a particular pair of phosphorylation sites upstream of an acidic residue (notably T2023) is further strong evidence of a functional interaction between ULK4/STK36. The possibility that ULK4 requires ATP binding for these mechanisms is not approached, though would provide significant insight: for example it would be useful to ask if Lys39 in ULK4 is involved in any of these processes, because this residue is likely important for shaping the ULK4 substrate-binding site as a consequence of ATP binding; this was originally shown in PMID 24107129 and discussed more recently in PMID: 33147475 in the context of the large amount of ULK4 proteomics data released.

      The reviewer raised an interesting question of whether ATP binding to the pseudokinase domain of Ulk4 might be required for its function, i.e., by regulating the interaction with its binding partner. In a recent study (Preuss et al. 2020;PMID: 33147475), the critical Lys39 for ATP binding was converted to Arg (KR mutation); however, unlike in most kinases the KR mutation affect ATP binding, the K39R mutation in the Ulk4 pseudokinase did not affect ATP binding although it slightly increased ADP binding (PMID: 33147475). Another mutation made by Preuss et al(PMID: 33147475), N239L, affected protein stability, making it impossible to determine whether this mutation affect ATP binding. Therefore, in the absence of clear approach to perturb ATP binding without affecting the overall structure of Ulk4, it would be challenging to address whether ATP binding regulates the ability of Ulk4 to bind its substrates. Nevertheless, we discuss the possibility that ATP binding might regulate Ulk4/Stk36 interaction and Shh signaling.

      The discussion is excellent, and raises numerous important future work in terms of potential transportation mechanisms of this complex. It also explains why the ULK4 pseudokinase domain is linked to an extended C-terminal region. Does AF2 predict any structural motifs in this region that might support binding to Gli2?

      The extended C-terminal domain of Ulk4 contains Arm/HEAT repeats (protein-protein interacting domain), which are predicted by AF2 to form alpha helixes.

      A weakness in the study, which is most evident in Figure 1, where Ulk4 siRNA is performed in the NIH3T3 model (and effects on Shh targets and Gli2 phosphorylation assessed), is that we do not know if ULK4 protein is originally present in these cells in order to actually be depleted. Also, we are not informed if the ULK4 siRNA has an effect on the 'rescue' by HA-ULK4; perhaps the HA-ULK4 plasmid is RNAi resistant, or if not, this explains why phosphorylation of Gli2 never reaches zero? Given the important findings of this study, it would be useful for the authors to comment on this, and perhaps discuss if they have tried to evaluate endogenous levels of ULK4 (and Stk36) in these cells using antibody-based approaches, ideally in the presence and absence of Shh. The authors note early on the large number of binding partners identified for ULK4, and siRNA may unwittingly deplete some other proteins that could also be involved in ULK4 transport/stability in their cellular model.

      Due to the lack of reliable Ulk4 and Stk36 antibodies, we were unable to confirm knockdown efficiency by western blot analysis. Therefore, we relied on the measure Ulk4 and STk36 mRNA expression by RT-qPCR to estimate the knockdown efficiency (Fig 1- figure supplement 1). We used mouse Ulk4 shRNA to carry out the knockdown experiments in NIH3T3 and MEF cells while the human version of Ulk4 (hUlk4) was used for the rescue experiments (Fig 1- figure supplement 2; Fig. 8). We have confirmed that the mUlk4 shRNA targeting sequence is not conserved in hUlk4; therefore, the hULK4 construct is RNAi resistant. The rescue experiments strongly argue that the effect of Ulk4 RNAi on Shh signaling is due to loss of endogenous Ulk4. This argument is further strengthened by the observations that mutations that affected Ulk4 and Stk36 ciliary tip localization also affected Shh signaling such as Gli2 phosphorylation and Ptch1/Gli expression (Fig. 8).

      The sequence of ULK4 siRNAs is not included in the materials and methods as far as I can see.

      We have added the mouse Ulk4 RNAi target sequence in the revised version.

      Reviewer #3 (Public Review):

      In this manuscript, Zhou et al. demonstrate that the pseudokinase ULK4 has an important role in Hedgehog signaling by scaffolding the active kinase Stk36 and the transcription factor Gli2, enabling Gli2 to be phosphorylated and activated.

      Through nice biochemistry experiments, they show convincingly that the N-terminal pseudokinase domain of ULK4 binds Stk36 and the C-terminal Heat repeats bind Gli2.

      Lastly, they show that upon Sonic Hedgehog signaling, ULK4 localizes to the cilia and is needed to localize Stk36 and Gli2 for proper activation.

      This manuscript is very solid and methodically shows the role of ULK4 and STK36 throughout the whole paper, with well controlled experiments. The phosphomimetic and incapable mutations are very convincing as well. I think this manuscript is strong and stands as is, and there is no need for additional experiments.

      Overall, the strengths are the rigor of the methods, and the convincing case they bring for the formation of the ULK4-Gli2-Stk36 complex. There are no weaknesses noted. I think a little additional context for what is being observed in the immunofluorescence might benefit readers who are not familiar with these cell types and structures.

      We thank this reviewer for the positive comments.

      Recommendations For the Authors

      Reviewer #1 (Recommendations For The Authors):

      This elegant study has been thoroughly and thoughtfully designed and the dataset is solid. The biochemistry results are overall very convincing. Some data lack quantification and there needs to be more information on data analyses and statistics. The following suggestions and comments aim at strengthening the manuscript.

      1. Please provide quantification normalized to input for IP experiments (Figures 1 E - F; Figure 8 C). More information on data analyses and statistics should be provided and included as information in the figure legends.

      Thanks for the suggestions, we have done the quantification and statistics analyses for Figures 1E-G and Figure8 C as requested.

      1. Did the authors investigate whether overexpressing hULK4 in the control NIH3T3 cells leads to an increase in pS230/232 (related to Figure 1E)? This would nicely support the notion of a promoting effect of ULK4 on GLI2 phosphorylation.

      We did not. We speculated that overexpressing hULK4 may not significantly promote GLI2 phosphorylation because Ulk4 is a pseudokinase and endogenous Stk36 (the kinase partner of Ulk4) is limited.

      1. The CO-IP experiments to show GLI2 activation were performed in NIH3T3 cells, whereas HEK293 cells were used for the experiments shown in Figure 2. Is there a specific reason for switching between cell lines also for experiments shown in Figures 3 C- I? Did the authors repeat some of the key experiments in both cell lines?

      In mammalian cells, Shh-induced activation of GLI2 depends on primary cilia (Han et al., 2019). NIH3T3 cells form the primary cilia but HEK293T cells do not. Therefore, we used NIH3T3 cells to examine the processes that are regulated by the Shh treatment assay (e.g., the Shh-induced phosphorylation of GLI2 and STK36). The HEK293 cells were used to map binding domain between ULK4 and STK36/GLI2/SUFU due to the high transfection efficiency.

      1. In Figure 2 D-E the authors nicely showed that hUlk4N-HA interacted with CFP-Stk36 but not with Myc-Gli2/Fg-Sufu whereas hUlk4C-HA formed a complex with Myc-Gli2/Fg-Sufu but not with CFP-Stk36. In Figure 4E the authors showed in their Co-IP experiments that Fg-Stk36 and Myc-Gli2 form a complex independent of SHH treatment. Did the authors see some pull down of Stk36, still in complex with Gli2, using hUlk4C IP and pull down of Gli2, still in complex with Stk36, using hUlk4N IP?

      We did not test that. As we have shown in Figures 4A and 4E, knockdown of endogenous ULK4 nearly abolished the interaction between Myc-GLi2 and Fg-Stk36, suggesting that Ulk4 is the major scaffold to bring Skt36 and Gli2 together, and that there is little if any direct interaction between GLi2 and Stk36.

      1. Another method to verify hULK4-Stk36-Gli2 complex formation (Figure 4) would be helpful. For example, proximity ligation assays, tripartite split GFP assays, or colocalization based on expansion STED immunofluorescence microscopy could be performed to temporally and spatially resolve localization of Ulk4, Stk36 and Gli2 upon SHH stimulation in the primary cilium

      Thanks for the suggestions. We think that our current study using biochemical and cell biology approaches have provide sufficient evidence that Ulk4, Stk36 and Gli2 form complexes. We will keep in mind of those more sophisticated methods in our future endeavors.

      1. Please provide more representative images of Ulk4, Stk36 and Gli2 localization in NIH3T3 cells or lower magnification overview images showing more than one cell (Figure 5).

      We have provided more representative images in Figure 5- figure supplement 1A-F of the revised manuscript.

      1. Confirmation of the results shown in Figure 5 in a second cell line would strengthen the data.

      We have confirmed the results in MEFs (see Figure 5- figure supplement 1G-J)

      1. Did the authors add immunofluorescence for tubulin as a ciliary base marker to ensure correct assignment of ciliary tip versus ciliary base localization for quantification experiments (Figures 5 - 8)?

      It has been well documented that GLi2 is accumulated at the ciliary tip in respond to Shh treatment; therefore, we used Gli2 as a marker for ciliary tip where both Ulk4 and Stk36 were also accumulated. γ tubulin staining could be another marker to assign the ciliary tip vs base; however, the antibody combination we have did not allow us to simultaneously stain γ tubulin and acetylated tubulin (Ac-Tub).

      1. SMO localization as a further readout of SHH pathway activation might be considered to be added for some of the key results (e.g., Figure 6). Is SMO trafficking affected after depletion or overexpression of ULK4?

      Due to the lack of a workable antibody to detect endogenous Smo in our hands, we did not determine whether the trafficking of SMO is affected after depletion or overexpression of ULK4. However, we noticed that a recent study reported that the SHH-induced ciliary SMO accumulation was impaired in Ulk4 siRNA treated cells (Mecklenburg et al. 2021). We include this information and its implication in the discussion section

      1. Do the authors see ULK4 only at the ciliary tip after SHH stimulation or is there also a dynamic time-dependent localization along the ciliary shaft? The image in Figure 6E (dKO + Stk36 WT) seems to show ULK4 also in the shaft.

      Unlike Smo that is evenly distributed alone the axoneme of primary cilia, ULK4 is mainly accumulated at ciliary tips upon Shh stimulation. Ulk4 is also located at low levels outside the cilia and sometimes in the ciliary shaft during its transit to the ciliary tip (e.g., see Figure 5- figure supplement 1F1-2; J1-2).

      1. Is the immunofluorescence signal for Ulk4 significantly reduced after shRNA treatment to deplete Ulk4 (Figure 6A)?

      We constructed a cell line that stably expressed ULK4 shRNA. The knockdown efficiency was determined by measuring Ulk4 mRNA expression (Fig 1_figure supplement 1). Because we were unable to obtain a reliable ULK4 antibody for immunostaining, we did not examine by whether ULK4 signal was depleted by Ulk4 shRNA.

      1. The labelled ciliary tip resembles in some cases images seen for ciliary abscission. The authors could use membrane/ciliary membrane markers to ensure "intraciliary" localization of the investigated factors.

      Thanks for the suggestion. We will try that in our future experiments.

      1. How many replicates were used in the three independent quantitative RT-PCR experiments (Figure 1 A-D)?

      We used 3 replicates in each independent quantitative RT-PCR assay.

      1. Please provide p values or statement on no significance for the comparison between Ulk3 single and Ulk3/Ulk4 double knockdown (Figure 1C) and between Stk36 single and Stk36/Ulk4 double knockdown (Figure 1D; Fig1_Figure Supplement 2).

      Thanks for the suggestion, we have added the p value or “ns” as asked.

      1. Figure legends in general are a bit short could have some more detailed information.

      Thank you for the suggestion, we have revised the Figure legends as asked.

      1. What do the asterisks present in Figure 4 C-D?

      Thanks for the suggestion. The asterisks in Figure 4C-D indicated the full length STK36 and truncated form STK36N and STK36C fragments. We that included this information in the figure legend.

      1. The authors state that a previous study described ULK4 as a genetic modifier for holoprosencephaly and that this raised the possibility that ULK4 may participate in HH signal transduction. Primary ciliary localization of ULK4 in mouse neuronal tissue and SHH pathway modulation by ULK4 in cell culture have been shown by Mecklenburg et al. 2021 before. Maybe the authors could rephrase their introduction and discussion accordingly.

      Thanks for the suggestion, we have changed the introduction and discussion accordingly.

      1. Overexpression studies in heterologous systems using tagged proteins can potentially have an influence on their subcellular localization and function. Please discuss this caveat.

      We have mentioned this caveat in the “discussion” section of the revised manuscript. However, we have tried to express the transgene at low levels using the lentiviral vector containing a weak promoter to ensure that the exogenously expressed proteins are still regulated by Hh signaling. We have also confirmed that the tagged Ulk4 and Stk36 can rescue the loss of endogenous genes.

      1. More details in the Methods section should be provided on the SHH induction in NIH3T3 cells, HEK293 cells and MEFs.

      We have revised the methods section on Shh induction.

      1. ULK4 is known to have at least three isoforms that exhibit varying abundance across developmental stages in mice and humans (Lang et al., 2014) (DOI:10.1242/jcs.137604). Can the authors speculate on potential common and distinct functions of the different ULK4 isoforms on SHH pathway modulation based on their present results?

      It is interesting that Ulk4 has multiple isoforms in both mouse and human. Several short isoforms in both mouse and human lack the pseudokinase domain while one short isoform in mouse lacks the C-terminal region essential for Ulk4 ciliary tip localization. We speculate that the C-terminally deleted isoform may not have a function in the Shh pathway based on our results shown in Fig. 7 and 8 but might still have functions in other cellular processes.

      Reviewer #2 (Recommendations For The Authors):

      The paper is well written, and clear throughout, with excellent (up-to-date) citations to the field.

      We thank reviewer #2 for the positive comments.

      Reviewer #3 (Recommendations For The Authors):

      My only quibble is that the immunofluorescence images are a little confusing, especially to people outside of the field. Please include an image of the whole field and improve the captions. Is that a single cell for each cilia? Why are there so few cilia? The DAPI makes it seem like What are we looking at? Are those multiple nuclei in Figure 6? They seem a little small if that's the 5 uM scale bar

      We provide uncropped images of Figure 5E to show the entire cells (below). We have added some context to improve the captions. Most of the mammalian cells such as MEF and NIH3T3 cells contain a single primary cilium; however, mutilated cells do exist. The DAPI staining indicated the nuclei. The cells shown in Figure 6 have single nucleus (the scale should be 2 µM). Due to the unevenness of DAPI signals in the nuclei, only the strong signals (puncta) were shown for individual nuclei.

      Author response image 1.

      One small typo: GLL2 instead of GLI2 on line 363

      Thanks, we have corrected this spelling mistake.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      (1) Two genes from the Crp/cAMP complex (crp and cyaA) are hypothesized to be key for persistence but key metabolomics and proteomics data are obtained from only one deletion mutant in the crp gene.

      We thank the reviewer for their thoughtful assessment of our manuscript and for providing valuable comments.

      In our study, we have demonstrated that deletion of both cyaA and crp genes results in the same persistence phenotype. In a previous study, we screened knockout strains of global transcriptional regulators using the aminoglycoside (AG) potentiation assay and found that, across a panel of carbon sources, AG potentiation occurred in tolerant cells derived from most knockout strains—except for Δcrp and Δcrp (Mok et al., 2015). This indicated that both genes are critical components of the Crp/cAMP regulatory network in persistence. Because cAMP exerts its effects when bound to its receptor protein Crp, disrupting crp alone should effectively abolish Crp/cAMP complex function (Keseler et al., 2011). Thus, we reasoned that comparing Δcrp to wild-type would be sufficient to capture the key metabolic and proteomic alterations arising from Crp/cAMP perturbation. Given the substantial cost and labor intensity of untargeted metabolomics and proteomics analyses, this experimental design allowed us to extract meaningful insights while maintaining feasibility. Nonetheless, to ensure the robustness of our findings, we have conducted all subsequent validation experiments using both Δcrp and Δcrp strains, confirming that the observed metabolic and proteomic changes are consistent across both mutants. We have now provided a concise justification statement in the manuscript (see lines 197-200 in the current manuscript).

      (2) The deletion of crp and crp have opposite effects on the concentration of cAMP, a comparison of metabolomics and proteomics data obtained using both mutants might aid in understanding this difference.

      Although this is an interesting outcome, we have already discussed in the manuscript that it is likely due to the feedback regulation of the Crp/cAMP complex on crp expression (see Fig. 1 Keseler et al., 2011) (Aiba, 1985; Keseler et al., 2011; Majerfeld et al., 1981). Specifically, perturbation of the Crp/cAMP complex by deleting crp should enhance crp promoter (Pcrp) activity, leading to increased CyaA protein expression and, consequently, elevated intracellular cAMP levels. To experimentally verify this predicted feedback regulation, we utilized E. coli K-12 MG1655 WT, Δcrp, and Δcrp strains harboring the pMSs201 plasmid, which encodes green fluorescent protein (gfp) under the control of the P<sub>cyaA</sub> promoter. This design allowed us to directly assess the effect of Crp/cAMP perturbation on P<sub>cyaA</sub> activity by quantifying gfp expression as a reporter. By comparing the mutant strains to WT, we could determine whether loss of Crp/cAMP function indeed derepresses crp expression. As expected, genetic perturbation of Crp/cAMP enhanced P<sub>cyaA</sub> promoter activity, resulting in increased gfp expression (Figure 1-figure supplement 2). This result supports the role of Crp/cAMP in regulating crp expression via feedback control. We have now explicitly discussed this rationale in the manuscript and included the corresponding data (see lines 410-418 and Figure 1-figure supplement 2 in the current manuscript).

      (3) Metabolomics, proteomics, and metabolic activity data are obtained at the whole population level rather than at the level of the persister sub-population.

      Performing metabolomic, proteomic, and other assays at the level of the persister subpopulation is inherently challenging in this study and across the persister research field, as it requires isolating a pure persister population. While metabolic inhibitors like rifampin and tetracycline can induce dormancy and antibiotic tolerance in the entire population (Kwan et al., 2013), these treatments generate artificially altered cell states that may not accurately reflect naturally occurring persisters. Fluorescent reporters combined with fluorescence-activated cell sorting (FACS) have been utilized to study persister cells, including in our previous studies (Amato et al., 2013; Orman & Brynildsen, 2013, 2015). However, this approach only enriches for persisters rather than isolating a pure population, as persisters still constitute a small fraction of the sorted cells (Amato et al., 2013; Orman & Brynildsen, 2013, 2015). Despite these limitations, our untargeted metabolomics and proteomics analyses at the whole-population level provide valuable insights into the regulatory mechanisms of the Crp/cAMP complex and its potential role in persister formation. We have rigorously examined the impact of these mechanisms on non-growing cell formation (see Figure 4 in the current manuscript) and persister levels (see Figure 5 in the current manuscript) through flow cytometry and single-gene deletion experiments. We appreciate the reviewer’s comment and have acknowledged and discussed these methodological challenges in our manuscript (see lines 397-406 in the current manuscript).

      Reviewer #2:

      (1) The approaches used here are aimed at the major bacterial population, but yet the authors used the data reflecting the major population behavior to interpret the physiology of persister cells that comprise less than 1% of the major bacterial population. How they can pick up a needle from the hay without being fooled by the spill-over artifacts from the major population? Although it is probably very difficult to isolate and directly assay persister cells, firm conclusions for the type proposed by the authors cannot be firmly established without such assays. Perhaps introducing crp/crp mutation into the best example of persistence, the hipA-7 high persistence phenotype may clarify this issue to a certain extent.

      We thank the reviewer for their thoughtful assessment of our manuscript and for providing valuable comments.

      Performing metabolomics and proteomics at the level of the persister subpopulation remains a major challenge in this study and across the persister research field, as it requires isolating a pure persister population. While metabolic inhibitors like rifampin and tetracycline can induce dormancy and antibiotic tolerance in the entire population (Kwan et al., 2013), these treatments generate artificially altered cell states that may not accurately reflect naturally occurring persisters. Similarly, fluorescent reporters combined with fluorescence-activated cell sorting (FACS) have been employed to study persister cells, including in our previous studies (Amato et al., 2013; Orman & Brynildsen, 2013, 2015). However, this approach only results in persister-enriched populations rather than a pure isolate, meaning that persisters still constitute a small fraction of the sorted cells (Amato et al., 2013; Orman & Brynildsen, 2013, 2015). Despite these inherent limitations, our untargeted metabolomics and proteomics analyses at the whole-population level provide valuable insights into the regulatory mechanisms of the Crp/cAMP complex and its potential role in persister formation. Specifically, our data reveal clear indications that Crp/cAMP activity promotes the formation of a non-growing cell subpopulation, while its deletion reduces this effect. We have validated this observation through single-cell analyses (see Figure 4 in the current manuscript). Additionally, our data strongly suggest that energy metabolism plays a critical role in persister cell physiology, and we have rigorously tested this hypothesis using persister assays for single-gene deletions (see Figure 5 in the current manuscript).

      Furthermore, in response to the reviewer’s suggestion, we introduced crp and crp deletions into the HipA-7 high-persistence mutant strain. The impact of these deletions in HipA-7 mirrored their effects in the wild-type strain (Figure 1-figure supplement 8), further supporting our conclusions. This data has been provided and discussed in the manuscript (see lines 185-189, and Figure 1-figure supplement 8 in the current manuscript).

      We acknowledge the challenges in directly assaying persister cells, and we have now discussed this in the manuscript (see lines 397-406 in the current manuscript).

      (2) The authors overlooked/omitted a recently published work regarding cyaA and crp (PMID: 35648826). In that work, a deficiency in cyaA or crp confers tolerance to diverse types of lethal stressors, including all lethal antimicrobials tested. How a mutation conferring pan-tolerance to the major bacterial population would lead to a less protective effect with a minor subpopulation? The authors are kind of obligated to discuss such a paradox in the context of their work because that is the most relevant literature for the present work. It is also very interesting if the cyaA/crp deficiency really has an opposing effect on tolerance and persistence. As a note, most of the conclusions from the omics studies of the present work have been reached in that overlooked literature, which addresses mechanisms of tolerance, a major rather than a minor population behavior. That supports comment #1 above. The inability of the authors to observe tolerance phenotype with the cyaA or crp mutant possibly derived from extremely high antimicrobial concentrations used in the study prevents tolerance phenotype from being observed because tolerance is sensitive to antimicrobial concentration while persistence is not.

      (3) The authors overly stressed the effect of cyaA/crp on persister formation but failed to test an alternative explanation of their effect on persister waking up after antimicrobial treatment. If the cyaA/crp-derived persisters are put into deeper sleep during antimicrobial treatment than wildtype-derived persisters, a 16-h recovery growth might have underestimated viable bacteria. This is often the case especially when extremely high concentrations of antimicrobials are used in performing persister assay. Thus, at least a longer incubation time (e.g. 48 and 72h) of agar plates for persister viable count needs to be performed to test such a scenario.

      (4) The rationale for using extremely high drug concentrations to perform persister assay is unclear. There are 2 issues with using extremely high drug concentrations. First, when overly high concentrations are used, drug removal becomes difficult. For example, a two-time wash will not be able to bring drug concentration from > 100 x MIC to below MIC. This is especially problematic with aminoglycoside because drug removal by washing does not work well with this class of compound. Second, overly high concentrations of drug use may make killing so rapidly and severely that may mask the difference from being observed between mutants and the control wild-type strain. In such cases, you would need to kill over a wide range of drug concentrations to find the right window to show a difference. The gentamicin data in the present work is likely the case that needs to be carefully examined. The mutants and the wild-type strain have very different MICs for gentamicin, but a single absolute drug concentration rather than concentrations normalized to MIC was used. This is like to compare a 12-year-old with a 21-year-old to run a 100-meter dash, which is highly inappropriate.

      The reviewer notes that key literature (PMID: 35648826) was overlooked, showing cyaA/crp deficiency confers broad stress tolerance—contradicting the reported reduction in persister protection. They suggest high drug concentrations may mask tolerance, and also, longer incubation (48–72 h) and normalized drug levels based on MIC are recommended. Given that these three independent comments are interconnected, we will address them together.

      We follow a rigorous washing protocol to minimize antibiotic carryover. After treatment, 1 ml of culture is centrifuged at 13,300 RPM (17,000 x g) for 3 minutes, and >950 µl of supernatant is removed without disturbing the pellet. The pellet is resuspended in 950 µl PBS, diluting antibiotics >20-fold. This step is repeated, resulting in a >400-fold cumulative dilution. After the final wash, cells are resuspended in 100 µl PBS, then serially diluted and plated on antibiotic-free agar to ensure consistency and eliminate residual antibiotics. Preliminary experiments are routinely done in our laboratory to confirm the effectiveness of washing procedures. To address concerns that high antibiotic concentrations may mask phenotypic differences—particularly in the gentamicin assay—we conducted additional experiments using MIC-normalized doses (5×, 10×, and the original study concentration) with six wash steps. As shown in Figure 1-figure supplement 6, all concentrations consistently reduced persister levels, supporting our original findings. While 5× MIC ampicillin allowed detection of persisters in mutant strains, their levels remained multiple orders of magnitude lower than in wild-type, maintaining statistical significance. These results, along with updated washing protocols, are now included in the revised manuscript (see lines 176-185 and Figure 1-figure supplement 6 in the current manuscript).

      Although we standardize the incubation time of the agar plates for all conditions and strains, most strains form sufficiently large colonies within 16 hours, and longer incubation often leads to large, overlapping colonies that hinder accurate counting. We assure the reviewer that we always leave the plates in the incubator beyond the initial counting period to monitor the emergence of any new colonies. Here, we provide plate images of key strains after antibiotic treatments, demonstrating that extended incubation did not alter CFU levels, as shown in Figure 1-figure supplement 7. We have updated the relevant section in the Materials and Methods to clarify this point and included the plate images in the current manuscript (see lines 181-182 and Figure 1-figure supplement 7 in the current manuscript).

      We acknowledge the significance of the study highlighted by the reviewer (Zeng et al., 2022); however, direct comparisons with our results are challenging due to substantial differences in experimental conditions, antibiotic concentrations, treatment durations, and most importantly, the E. coli strains used. The study of Zeng et al., 2022, utilized strains from the Keio collection, a commercially available E. coli BW25113 mutant library, which may contain unknown background mutations that could influence tolerance phenotypes. While we used the Keio collection for initial screening, we always validate single clean deletions in our lab strain, E. coli MG1655, to ensure robust conclusions. The observed variations in tolerance and persistence between studies can largely be attributed to these methodological differences rather than an inherent paradox. The concentrations of ampicillin (200 µg/mL) and ofloxacin (5 µg/mL) used in our assays are in line with concentrations employed in foundational persister studies (Amato & Brynildsen, 2015; Cui et al., 2016; Hansen et al., 2008; Leszczynska et al., 2013; Lin et al., 2022; Orman & Brynildsen, 2015; Shah et al., 2006). These levels represent >10 × the MIC and are necessary to ensure the elimination of actively growing cells, thus enriching for persister cells that, by definition, survive high bactericidal drug exposure. Our aim is not to model pharmacokinetics per se, but to apply a standardized challenge to distinguish phenotypic persistence. Furthermore, pharmacokinetic and pharmacodynamic clinical data show that antibiotics such as ofloxacin and ampicillin can reach levels far exceeding 10× MIC for extended periods in patients (OFLOXACIN, 2019; Soto et al., 2014).

      To assess how cyaA and crp deletions affect antibiotic responses under conditions similar to those used by Zeng et al. (Zeng et al., 2022) —specifically, exponential-phase E. coli BW25113 strains (Keio collection), lower antibiotic concentrations, and short treatments (e.g., 1 hour)—we first tested E. coli MG1655 WT, Δcrp, and Δcrp strains in late stationary phase using reduced antibiotic concentrations and shorter exposures. Both knockouts showed decreased survival following ampicillin and ofloxacin treatment compared to WT (see Figure 1-figure supplement 6), consistent with our findings in Figure 1 in the manuscript. In exponential phase, the knockout strains exhibited reduced survival after ampicillin treatment but increased survival after ofloxacin treatment relative to WT (see Author response image 2A below), again mirroring the trends in Figure 1. Gentamicin treatment, however, produced variable results in MG1655 knockouts, likely due to the brief 1-hour exposure being insufficient for robust conclusions (Author response image 2A). Notably, when we tested the corresponding Keio knockout strains in the BW25113 background, we observed increased tolerance in exponential-phase cells, reproducing Zeng et al.'s findings under their specific conditions (see Author response image 2B below), although BW25113 and MG1655 exhibited distinct persister phenotypes in exponential phase (Author response image 2A, B). These results, altogether, highlight the sensitivity of antibiotic tolerance and persistence phenotypes to factors such as strain background, antibiotic concentration, and treatment duration. This is now discussed in detail in the revised manuscript, with supporting data provided (see lines 460-476, and Supplement File 6, 7 in the current manuscript).

      Author response image 1.

      Persister levels of E. coli K-12 MG1655 WT, Δcrp, and Δcrp strains in late stationary phase. Cells were treated with ampicillin (5× MIC for 4 h), ofloxacin (5× MIC for 2.5 h), and gentamicin (3× MIC for 1 h). Concentrations and treatment durations were selected based on (Zeng et al., 2022).

      Author response image 2.

      Persister levels of E. coli K-12 MG1655 (Panel A) and BW25113 (Panel B) WT, Δcrp, and Δcrp strains in the exponential growth phase. Cells were treated at mid-exponential phase (OD<sub>600</sub> ~0.25) with ampicillin (5× MIC for 4 h), ofloxacin (5× MIC for 2.5 h), and gentamicin (3× MIC for 1 h). Treatment concentrations and durations were based on conditions described in (Zeng et al., 2022).

      Reviewer #3:

      The authors try to draw too many conclusions and it's difficult to identify what their actual findings are. For instance, they do not have any interesting findings with aminoglycosides but include the data and spend a lot of time discussing it, but it is really a distraction. The correlation between the induction of anabolic pathways in the crp mutant in the late stationary phase and the reduction in persisters is potentially very interesting but is buried in the paper with the vast quantities of data, and observations and conclusions that are often not well substantiated.

      We thank the reviewer for their assessment that helped us clarify and strengthen the focus of our manuscript.

      While our study is not focused on aminoglycosides, we believe the related data provide important insights into persister cell physiology. Persisters are traditionally described as metabolically dormant, non-growing cells. However, we consistently observe that aminoglycosides—despite requiring energy-dependent uptake and active protein translation for their activity—can still eliminate persister cells in wild-type E. coli. This finding supports our central hypothesis that persisters may retain a basal level of metabolic activity sufficient to permit aminoglycoside uptake and action during prolonged treatment. We have revised the manuscript to present this point more clearly, ensuring it complements rather than distracts from the main narrative.

      We respectfully emphasize that our conclusions are supported by multiple layers of evidence. Our metabolomics data are corroborated by proteomics and further validated by functional assays, including redox state measurements, growing versus non-growing cell detection, and targeted persister assays. In addition, we performed labor-intensive validations using individually selected Keio mutants treated with antibiotics to quantify persister levels, with key observations further confirmed in single-gene deletions in E. coli MG1655 strains.

      We believe the revisions made in response to all reviewers’ comments have significantly improved the clarity, focus, and overall impact of the manuscript.

      The discussion section is particularly difficult to read and I recommend a large overhaul to increase clarity. For instance, what are the authors trying to conclude in section (iii) of the discussion? That persisters in the stationary phase have higher energy than other cells? Is there data to support that? All sections are similarly lacking in clarity.

      We repeatedly emphasize in the manuscript that while persister survival depends on energy metabolism, this does not imply that persisters have higher metabolic activity than those in the exponential growth phase. We have clarified this point in the revised manuscript (see lines 67-79, and 442-444 in the current manuscript).

      The large number of mutants characterized is a strength, but the quality of the data provided for those experiments is poor. Did some of these mutants lose fitness in the deep stationary phase in the absence of antibiotics? Did some reach a far lower cfu/ml in the stationary phase? These details are important and without them, it is difficult to interpret the data.

      Although metabolic mutations can affect cell growth, we do not observe substantial differences in cell numbers during the late stationary phase, when persister assays are performed. These knockout strains reach stationary phase fully by that time. We emphasize that we routinely measure cell numbers at this stage using flow cytometry before diluting cultures into fresh media and applying antibiotic treatments. Cell counts for the metabolic mutants are shown in Figure 5-figure supplement 4 in the current manuscript, and no significant growth deficiencies are observed in the late stationary phase. This is consistent with our previous publication (Shiraliyev & Orman, 2023) and findings from Lewis’s group (Manuse et al., 2021), where similar knockout strains showed no drastic impact on growth.

      There is an analysis of persister formation in mutants in the pts/CRP pathway that is not discussed (Zeng et al PNAS 2022, Parsons et al PNAS, 2024).

      These studies are now cited and discussed in the revised manuscript (see lines 459-476).

      The authors do not discuss ROS production and antibiotic killing in these experiments. Presumably, the WT would have a greater propensity to produce ROS in response to antibiotics than the crp mutant, but it survives better. Is ROS not involved in antibiotic killing in these conditions?

      The experimental conditions used here are identical to those in our previously published study on persister cells in the late stationary phase (Orman & Brynildsen, 2015), where we specifically investigated the role of ROS in antibiotic tolerance. In that work, we overexpressed key antioxidant enzymes—catalases (katE, katG) and superoxide dismutases (sodA, sodB and sodC)—at stationary phase. These enzymes were confirmed to be catalytically active through functional assays, yet their overexpression had no measurable effect on persister levels. To further decouple ROS from respiratory activity in that study, we performed anaerobic experiments using nitrate as an alternative terminal electron acceptor. We found that anaerobic respiration actually enhanced persister formation, and inhibition of nitrate reductases using KCN reduced it—again, independent of ROS. These findings provide compelling evidence that it is the respiratory activity itself, rather than ROS production, that influences persister formation in our system.

      We have now included this discussion in the revised manuscript to clarify that ROS are unlikely to be a major factor in antibiotic killing under these conditions (see lines 503-513).

      References Aiba, H. (1985). Transcription of the Escherichia coli adenylate cyclase gene is negatively regulated by cAMP-cAMP receptor protein. The Journal of Biological Chemistry, 260(5), 3063–3070.

      Amato, S. M., & Brynildsen, M. P. (2015). Persister Heterogeneity Arising from a Single Metabolic Stress. Current Biology, 25(16), 2090–2098. https://doi.org/10.1016/j.cub.2015.06.034

      Amato, S. M., Orman, M. A., & Brynildsen, M. P. (2013). Metabolic Control of Persister Formation in Escherichia coli. Molecular Cell, 50(4), 475–487. https://doi.org/10.1016/J.MOLCEL.2013.04.002

      Cui, P., Niu, H., Shi, W., Zhang, S., Zhang, H., Margolick, J., Zhang, W., & Zhang, Y. (2016). Disruption of Membrane by Colistin Kills Uropathogenic Escherichia coli Persisters and Enhances Killing of Other Antibiotics. Antimicrobial Agents and Chemotherapy, 60(11), 6867–6871. https://doi.org/10.1128/AAC.01481-16

      Hansen, S., Lewis, K., & Vulić, M. (2008). Role of Global Regulators and Nucleotide Metabolism in Antibiotic Tolerance in Escherichia coli. Antimicrobial Agents and Chemotherapy, 52(8), 2718–2726. https://doi.org/10.1128/AAC.00144-08

      Keseler, I. M., Collado-Vides, J., Santos-Zavaleta, A., Peralta-Gil, M., Gama-Castro, S., Muniz-Rascado, L., Bonavides-Martinez, C., Paley, S., Krummenacker, M., Altman, T., Kaipa, P., Spaulding, A., Pacheco, J., Latendresse, M., Fulcher, C., Sarker, M., Shearer, A. G., Mackie, A., Paulsen, I., … Karp, P. D. (2011). EcoCyc: a comprehensive database of Escherichia coli biology. Nucleic Acids Research, 39(Database), D583–D590. https://doi.org/10.1093/nar/gkq1143

      Kwan, B. W., Valenta, J. A., Benedik, M. J., & Wood, T. K. (2013). Arrested protein synthesis increases persister-like cell formation. Antimicrobial Agents and Chemotherapy, 57(3), 1468–1473. https://doi.org/10.1128/AAC.02135-12

      Leszczynska, D., Matuszewska, E., Kuczynska-Wisnik, D., Furmanek-Blaszk, B., & Laskowska, E. (2013). The Formation of Persister Cells in Stationary-Phase Cultures of Escherichia Coli Is Associated with the Aggregation of Endogenous Proteins. PLoS ONE, 8(1), e54737. https://doi.org/10.1371/journal.pone.0054737

      Lin, J. S., Bekale, L. A., Molchanova, N., Nielsen, J. E., Wright, M., Bacacao, B., Diamond, G., Jenssen, H., Santa Maria, P. L., & Barron, A. E. (2022). Anti-persister and Anti-biofilm Activity of Self-Assembled Antimicrobial Peptoid Ellipsoidal Micelles. ACS Infectious Diseases, 8(9), 1823–1830. https://doi.org/10.1021/acsinfecdis.2c00288

      Majerfeld, I. H., Miller, D., Spitz, E., & Rickenberg, H. V. (1981). Regulation of the synthesis of adenylate cyclase in Escherichia coli by the cAMP — cAMP receptor protein complex. Molecular and General Genetics MGG, 181(4), 470–475. https://doi.org/10.1007/BF00428738

      Manuse, S., Shan, Y., Canas-Duarte, S. J., Bakshi, S., Sun, W.-S., Mori, H., Paulsson, J., & Lewis, K. (2021). Bacterial persisters are a stochastically formed subpopulation of low-energy cells. PLoS Biology, 19(4), e3001194.

      Mok, W. W. K., Orman, M. A., & Brynildsen, M. P. (2015). Impacts of global transcriptional regulators on persister metabolism. Antimicrobial Agents and Chemotherapy, 59(5), 2713–2719.

      OFLOXACIN. (2019). https://dailymed.nlm.nih.gov/dailymed/fda/fdaDrugXsl.cfm?setid=1779c568-d7bb-4bd5-bc29-13bd52ba8a0a&type=display

      Orman, M. A., & Brynildsen, M. P. (2013). Dormancy is not necessary or sufficient for bacterial persistence. Antimicrobial Agents and Chemotherapy, 57(7), 3230–3239.

      Orman, M. A., & Brynildsen, M. P. (2015). Inhibition of stationary phase respiration impairs persister formation in E. coli. Nature Communications, 6(1), 7983.

      Shah, D., Zhang, Z., Khodursky, A. B., Kaldalu, N., Kurg, K., & Lewis, K. (2006). Persisters: a distinct physiological state of E. coli. BMC Microbiology, 6(1), 53. https://doi.org/10.1186/1471-2180-6-53

      Shiraliyev, R. C., & Orman, M. (2023). Metabolic disruption impairs ribosomal protein levels, resulting in enhanced aminoglycoside tolerance. BioRxiv, 2012–2023.

      Soto, E., Shoji, S., Muto, C., Tomono, Y., & Marshall, S. (2014). Population pharmacokinetics of ampicillin and sulbactam in patients with community-acquired pneumonia: evaluation of the impact of renal impairment. British Journal of Clinical Pharmacology, 77(3), 509–521. https://doi.org/10.1111/bcp.12232

      Zeng, J., Hong, Y., Zhao, N., Liu, Q., Zhu, W., Xiao, L., Wang, W., Chen, M., Hong, S., Wu, L., Xue, Y., Wang, D., Niu, J., Drlica, K., & Zhao, X. (2022). A broadly applicable, stress-mediated bacterial death pathway regulated by the phosphotransferase system (PTS) and the cAMP-Crp cascade. Proceedings of the National Academy of Sciences, 119(23). https://doi.org/10.1073/pnas.2118566119

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The study introduces and validates the Cyclic Homogeneous Oscillation (CHO) detection method to precisely determine the duration, location, and fundamental frequency of non-sinusoidal neural oscillations. Traditional spectral analysis methods face challenges in distinguishing the fundamental frequency of non-sinusoidal oscillations from their harmonics, leading to potential inaccuracies. The authors implement an underexplored approach, using the auto-correlation structure to identify the characteristic frequency of an oscillation. By combining this strategy with existing time-frequency tools to identify when oscillations occur, the authors strive to solve outstanding challenges involving spurious harmonic peaks detected in time-frequency representations. Empirical tests using electrocorticographic (ECoG) and electroencephalographic (EEG) signals further support the efficacy of CHO in detecting neural oscillations.

      Response:  We thank the reviewer for recognizing the strengths of our method in this encouraging review and for the opportunity to further improve and finalize our manuscript.

      Strengths:

      (1) The paper puts an important emphasis on the 'identity' question of oscillatory identification. The field primarily identifies oscillations through frequency, space (brain region), and time (length, and relative to task or rest). However, more tools that claim to further characterize oscillations by their defining/identifying traits are needed, in addition to data-driven studies about what the identifiable traits of neural oscillations are beyond frequency, location, and time. Such tools are useful for potentially distinguishing between circuit mechanistic generators underlying signals that may not otherwise be distinguished. This paper states this problem well and puts forth a new type of objective for neural signal processing methods.

      Response:  We sincerely appreciate this encouraging summary of the objective of our manuscript.

      (2) The paper uses synthetic data and multimodal recordings at multiple scales to validate the tool, suggesting CHO's robustness and applicability in various real-data scenarios. The figures illustratively demonstrate how CHO works on such synthetic and real examples, depicting in both time and frequency domains. The synthetic data are well-designed, and capable of producing transient oscillatory bursts with non-sinusoidal characteristics within 1/f noise. Using both non-invasive and invasive signals exposes CHO to conditions which may differ in extent and quality of the harmonic signal structure. An interesting followup question is whether the utility demonstrated here holds for MEG signals, as well as source-reconstructed signals from non-invasive recordings.

      Response:  We thank the reviewer for this excellent suggestion.  Indeed, our next paper will focus on applying our CHO method to signals that were source-reconstructed from non-invasive recordings (e.g., MEG and EEG) to extract their periodic activity.

      (3) This study is accompanied by open-source code and data for use by the community.

      Response:  We thank the reviewer for recognizing our effort to widely disseminate our method to the broader community.

      Weaknesses:

      (1) Due to the proliferation of neural signal processing techniques that have been designed to tackle issues such as harmonic activity, transient and event-like oscillations, and non-sinusoidal waveforms, it is naturally difficult for every introduction of a new tool to include exhaustive comparisons of all others. Here, some additional comparisons may be considered for the sake of context, a selection of which follows, biased by the previous exposure of this reviewer. One emerging approach that may be considered is known as state-space models with oscillatory and autoregressive components (Matsuda 2017, Beck 2022). State-space models such as autoregressive models have long been used to estimate the auto-correlation structure of a signal. State-space oscillators have recently been applied to transient oscillations such as sleep spindles (He 2023). Therefore, state-space oscillators extended with auto-regressive components may be able to perform the functions of the present tool through different means by circumventing the need to identify them in time-frequency. Another tool that should be mentioned is called PAPTO (Brady 2022). Although PAPTO does not address harmonics, it detects oscillatory events in the presence of 1/f background activity. Lastly, empirical mode decomposition (EMD) approaches have been studied in the context of neural harmonics and nonsinusoidal activity (Quinn 2021, Fabus 2022). EMD has an intrinsic relationship with extrema finding, in contrast with the present technique. In summary, the existence of methods such as PAPTO shows that researchers are converging on similar approaches to tackle similar problems. The existence of time-domain approaches such as state-space oscillators and EMD indicates that the field of timeseries analysis may yield even more approaches that are conceptually distinct and may theoretically circumvent the methodology of this tool.

      Response:  We thank the reviewer for this valuable insight.  In our manuscript, we acknowledge emerging approaches that employ state-space models or EMD for time-frequency analysis.  However, it's crucial to clarify that the primary focus in our study is on the detection and identification of the fundamental frequency, as well as the onset/offset of non-sinusoidal neural oscillations.  Thus, our emphasis lies specifically on these aspects.  We hope that future studies will use our methods as the basis to develop better methods for time-frequency analysis that will lead to a deeper understanding of harmonic structures.  

      Our Limitation section is addressing this issue.  Specifically, we recognize that a more sophisticated time-frequency analysis could contribute to improved sensitivity and that the core claim of our study is centered around the concept of increasing specificity in the detection of non-sinusoidal oscillations.  We hope that future studies will use this as a basis for improving time-frequency analysis in general.  Notably, our open-source code will greatly enable these future studies in this endeavor.  Specifically, in the first step of our algorithm, the timefrequency estimation can be replaced with any other preferred time-frequency analysis, such as state-space models, EMD, Wavelet transform, Gabor transform, and Matching Pursuit. 

      For our own follow-up study, we plan to conduct a thorough review and comparison of emerging approaches employing state-space models or EMD for time-frequency analysis.  In this study, we aim to identify which approach, including the six methods mentioned by the reviewer (Matsuda 2017, Beck 2022, He 2023, Brady 2022, Quinn 2021, and Fabus 2022), can maximize the estimation of the fundamental frequency of non-sinusoidal neural oscillations using CHO.  The insights provided by the reviewer are appreciated, and we will carefully consider these aspects in our follow-up study.  

      In the revision of this manuscript, we are setting the stage for these future studies.  Specifically, we added a discussion paragraph within the Limitation section about the state-space model, and EMD approaches:

      “However, because our CHO method is modular, the FFT-based time-frequency analysis can be replaced with more sophisticated time-frequency estimation methods to improve the sensitivity of neural oscillation detection.  Specifically, a state-space model (Matsuda 2017, Beck 2022, He 2023, Brady 2022) or empirical mode decomposition (EMD, Quinn 2021, Fabus 2022) may improve the estimation of the auto-correlation of the harmonic structure underlying nonsinusoidal oscillations.  Furthermore, a Gabor transform or matching pursuit-based approach may improve the onset/offset detection of short burst-like neural oscillations (Kus 2013 and Morales 2022).”

      (2) The criteria that the authors use for neural oscillations embody some operating assumptions underlying their characteristics, perhaps informed by immediate use cases intended by the authors (e.g., hippocampal bursts). The extent to which these assumptions hold in all circumstances should be investigated. For instance, the notion of consistent auto-correlation breaks down in scenarios where instantaneous frequency fluctuates significantly at the scale of a few cycles. Imagine an alpha-beta complex without harmonics (Jones 2009). If oscillations change phase position within a timeframe of a few cycles, it would be difficult for a single peak in the auto-correlation structure to elucidate the complex time-varying peak frequency in a dynamic fashion. Likewise, it is unclear whether bounding boxes with a pre-specified overlap can capture complexes that maneuver across peak frequencies.

      Response:  We thank the reviewer for this valuable insight into the methodological limitations in the detection of neural oscillations that exhibit significant fluctuations in their instantaneous frequency.  Indeed, our CHO method is also limited in the ability to detect oscillations with fluctuating instantaneous frequencies.  This is because CHO uses an auto-correlation-based approach to detect neural oscillations that exhibit two or more cycles.  If oscillations change phase position within a timeframe of a few cycles, CHO cannot detect the oscillation because the periodicity is not expressed within the auto-correlation.  This limitation can be partially overcome by relaxing the detection threshold (see Line 30 of Algorithm 1 in the revised manuscript) for the auto-correlation analysis.  However, relaxing the detection threshold, in consequence, increases the probability of detecting other aperiodic activity as well. To clarify how CHO determines the periodicity of oscillations, and to educate the reader about the tradeoff between detecting oscillations with fluctuating instantaneous frequencies and avoiding detecting other aperiod activity, we have added pseudo code and a new subsection in the Methods.

      Author response table 1.

      Algorithm 1

      A new subsection titled “Tradeoffs in adjusting the hyper-parameters that govern the detection in CHO”.

      “The ability of CHO to detect neural oscillations and determine their fundamental frequency is governed by four principal hyper-parameters.  Adjusting these parameters requires understanding their effect on the sensitivity and specificity in the detection of neural oscillations. 

      The first hyper-parameter is the number of time windows (N in Line 5 in Algorithm 1), that is used to estimate the 1/f noise.  In our performance assessment of CHO, we used four windows, resulting in estimation periods of 250 ms in duration for each 1/f spectrum.  A higher number of time windows results in smaller estimation periods and thus minimizes the likelihood of observing multiple neural oscillations within this time window, which otherwise could confound the 1/f estimation.  However, a higher number of time windows and, thus, smaller time estimation periods may lead to unstable 1/f estimates. 

      The second hyper-parameter defines the minimum number of cycles of a neural oscillation to be detected by CHO (see Line 23 in Algorithm 1).  In our study, we specified this parameter to be two cycles.  Increasing the number of cycles increases specificity, as it will reject spurious oscillations.  However, increasing the number also reduces sensitivity as it will reject short oscillations.

      The third hyper-parameter is the significance threshold that selects positive peaks within the auto-correlation of the signal.  The magnitude of the peaks in the auto-correlation indicates the periodicity of the oscillations (see Line 26 in Algorithm 1).  Referred to as "NumSTD," this parameter denotes the number of standard errors that a positive peak has to exceed to be selected to be a true oscillation.  For this study, we set the "NumSTD" value to 1.  Increasing the "NumSTD" value increases specificity in the detection as it reduces the detection of spurious peaks in the auto-correlation.  However, increasing the "NumSTD" value also decreases the sensitivity in the detection of neural oscillations with varying instantaneous oscillatory frequencies. 

      The fourth hyper-parameter is the percentage of overlap between two bounding boxes that trigger their merger (see Line 31 in Algorithm 1).  In our study, we set this parameter to 75% overlap.  Increasing this threshold yields more fragmentation in the detection of oscillations, while decreasing this threshold may reduce the accuracy in determining the onset and offset of neural oscillations.”

      (3) Related to the last item, this method appears to lack implementation of statistical inferential techniques for estimating and interpreting auto-correlation and spectral structure. In standard practice, auto-correlation functions and spectral measures can be subjected to statistical inference to establish confidence intervals, often helping to determine the significance of the estimates. Doing so would be useful for expressing the likelihood that an oscillation and its harmonic has the same autocorrelation structure and fundamental frequency, or more robustly identifying harmonic peaks in the presence of spectral noise. Here, the authors appear to use auto-correlation and time-frequency decomposition more as a deterministic tool rather than an inferential one. Overall, an inferential approach would help differentiate between true effects and those that might spuriously occur due to the nature of the data. Ultimately, a more statistically principled approach might estimate harmonic structure in the presence of noise in a unified manner transmitted throughout the methodological steps.

      Response:  We thank the reviewer for sharing this insight on further enhancing our method.  Indeed, CHO does not make use of statistical inferential statistics to estimate and interpret the auto-correlation and underlying spectral structure of the neural oscillation.  Implementing this approach within CHO would require calculating phase-phase coupling across all cross-frequency bands and bounding boxes.  However, as mentioned in the introduction section and Figure 1GL, phase-phase coupling analysis cannot fully ascertain whether the oscillations are phaselocked and thus are harmonics or, indeed, independent oscillations.  This ambiguity, combined with the exorbitant computational complexity of the entailed permutation test and the requirement to perform the analysis across all cross-frequency bands, channels, and trials, makes phase-phase coupling impracticable in determining the fundamental frequency of neural oscillations in real-time and, thus, the use in closed-loop neuromodulation applications.  Thus, within our study, we prioritized determining the fundamental frequency without considering the structure of harmonics.  

      An inferential approach can be implemented by adjusting the significance threshold that selects positive peaks within the auto-correlation of the signal.  Currently, this threshold is set to represent the approximate confidence bounds of the periodicity of the fundamental frequency.  To clarify this issue, we added additional pseudo code and a new subsection, titled “Tradeoffs in adjusting the hyper-parameters that govern the detection in CHO,” in the Methods section.

      In future studies, we will investigate the harmonic structure of neural oscillations based on a large data set.  This exploration will help us understand how non-sinusoidal properties may influence the harmonic structure.  Your input is highly appreciated, and we will diligently incorporate these considerations into our research.

      See Author response table 1.

      A new subsection titled “Tradeoffs in adjusting the hyper-parameters that govern the detection in CHO”.

      “The ability of CHO to detect neural oscillations and determine their fundamental frequency is governed by four principal hyper-parameters.  Adjusting these parameters requires understanding their effect on the sensitivity and specificity in the detection of neural oscillations. 

      The first hyper-parameter is the number of time windows (N in Line 5 in Algorithm 1), that is used to estimate the 1/f noise.  In our performance assessment of CHO, we used four windows, resulting in estimation periods of 250 ms in duration for each 1/f spectrum.  A higher number of time windows results in smaller estimation periods and thus minimizes the likelihood of observing multiple neural oscillations within this time window, which otherwise could confound the 1/f estimation.  However, a higher number of time windows and, thus, smaller time estimation periods may lead to unstable 1/f estimates. 

      The second hyper-parameter defines the minimum number of cycles of a neural oscillation to be detected by CHO (see Line 23 in Algorithm 1).  In our study, we specified this parameter to be two cycles.  Increasing the number of cycles increases specificity, as it will reject spurious oscillations.  However, increasing the number also reduces sensitivity as it will reject short oscillations.

      The third hyper-parameter is the significance threshold that selects positive peaks within the auto-correlation of the signal.  The magnitude of the peaks in the auto-correlation indicates the periodicity of the oscillations (see Line 26 in Algorithm 1).  Referred to as "NumSTD," this parameter denotes the number of standard errors that a positive peak has to exceed to be selected to be a true oscillation.  For this study, we set the "NumSTD" value to 1.  Increasing the "NumSTD" value increases specificity in the detection as it reduces the detection of spurious peaks in the auto-correlation.  However, increasing the "NumSTD" value also decreases the sensitivity in the detection of neural oscillations with varying instantaneous oscillatory frequencies. 

      The fourth hyper-parameter is the percentage of overlap between two bounding boxes that trigger their merger (see Line 31 in Algorithm 1).  In our study, we set this parameter to 75% overlap.  Increasing this threshold yields more fragmentation in the detection of oscillations, while decreasing this threshold may reduce the accuracy in determining the onset and offset of neural oscillations.”

      (4) As with any signal processing method, hyperparameters and their ability to be tuned by the user need to be clearly acknowledged, as they impact the robustness and reproducibility of the method. Here, some of the hyperparameters appear to be: a) number of cycles around which to construct bounding boxes and b) overlap percentage of bounding boxes for grouping. Any others should be highlighted by the authors and clearly explained during the course of tool dissemination to the community, ideally in tutorial format through the Github repository.

      Response:  We thank the reviewer for this helpful suggestion.  In response, we added a new subsection that describes the hyper-parameters of CHO as follows:

      A new subsection named “Tradeoffs in adjusting the hyper-parameters that govern the detection in CHO”.

      “The ability of CHO to detect neural oscillations and determine their fundamental frequency is governed by four principal hyper-parameters.  Adjusting these parameters requires understanding their effect on the sensitivity and specificity in the detection of neural oscillations. 

      The first hyper-parameter is the number of time windows (N in Line 5 in Algorithm 1), that is used to estimate the 1/f noise.  In our performance assessment of CHO, we used four windows, resulting in estimation periods of 250 ms in duration for each 1/f spectrum.  A higher number of time windows results in smaller estimation periods and thus minimizes the likelihood of observing multiple neural oscillations within this time window, which otherwise could confound the 1/f estimation.  However, a higher number of time windows and, thus, smaller time estimation periods may lead to unstable 1/f estimates. 

      The second hyper-parameter defines the minimum number of cycles of a neural oscillation to be detected by CHO (see Line 23 in Algorithm 1).  In our study, we specified this parameter to be two cycles.  Increasing the number of cycles increases specificity, as it will reject spurious oscillations.  However, increasing the number also reduces sensitivity as it will reject short oscillations.

      The third hyper-parameter is the significance threshold that selects positive peaks within the auto-correlation of the signal.  The magnitude of the peaks in the auto-correlation indicates the periodicity of the oscillations (see Line 26 in Algorithm 1).  Referred to as "NumSTD," this parameter denotes the number of standard errors that a positive peak has to exceed to be selected to be a true oscillation.  For this study, we set the "NumSTD" value to 1.  Increasing the "NumSTD" value increases specificity in the detection as it reduces the detection of spurious peaks in the auto-correlation.  However, increasing the "NumSTD" value also decreases the sensitivity in the detection of neural oscillations with varying instantaneous oscillatory frequencies. 

      The fourth hyper-parameter is the percentage of overlap between two bounding boxes that trigger their merger (see Line 31 in Algorithm 1).  In our study, we set this parameter to 75% overlap.  Increasing this threshold yields more fragmentation in the detection of oscillations, while decreasing this threshold may reduce the accuracy in determining the onset and offset of neural oscillations.”

      (5) Most of the validation demonstrations in this paper depict the detection capabilities of CHO. For example, the authors demonstrate how to use this tool to reduce false detection of oscillations made up of harmonic activity and show in simulated examples how CHO performs compared to other methods in detection specificity, sensitivity, and accuracy. However, the detection problem is not the same as the 'identity' problem that the paper originally introduced CHO to solve. That is, detecting a non-sinusoidal oscillation well does not help define or characterize its non-sinusoidal 'fingerprint'. An example problem to set up this question is: if there are multiple oscillations at the same base frequency in a dataset, how can their differing harmonic structure be used to distinguish them from each other? To address this at a minimum, Figure 4 (or a followup to it) should simulate signals at similar levels of detectability with different 'identities' (i.e. different levels and/or manifestations of harmonic structure), and evaluate CHO's potential ability to distinguish or cluster them from each other. Then, does a real-world dataset or neuroscientific problem exist in which a similar sort of exercise can be conducted and validated in some way? If the "what" question is to be sufficiently addressed by this tool, then this type of task should be within the scope of its capabilities, and validation within this scenario should be demonstrated in the paper. This is the most fundamental limitation at the paper's current state.

      Response: Thank you for your insightful suggestion; we truly appreciate it. We recognize that the 'identity' problem requires further studies to develop appropriate methods. Our current approach does not fully address this issue, as it may detect asymmetric non-sinusoidal oscillations with multiple harmonic peaks, without accounting for different shapes of nonsinusoidal oscillations.

      The main reason we could not fully address the “identity” problem results from the general absence of a defined ground truth, i.e., data for which we know the harmonic structure. To overcome this barrier, we would need datasets from well-characterized cognitive tasks or neural disorders.  For example, Cole et al. 2017 showed that the harmonic structure of beta oscillations can explain the degree of Parkinson’s disease, and Hu et al. 2023 showed that the number of harmonic peaks can localize the seizure onset zone. Future studies could use the data from these two studies to study whether CHO can distinguish different harmonic structures of pathological neural oscillations.

      In this paper, we showed the basic identity of neural oscillations, encompassing elements such as the fundamental frequency and onset/offset. Your valuable insights contribute significantly to our ongoing efforts, and we appreciate your thoughtful consideration of these aspects. In response, we added a new paragraph in the Limitation of the discussion section as below:

      “Another limitation of this study is that it does not assess the harmonic structure of neural oscillations. Thus, CHO cannot distinguish between oscillations that have the same fundamental frequency but differ in their non-sinusoidal properties.  This limitation stems from the objective of this study, which is to identify the fundamental frequency of non-sinusoidal neural oscillations.  Overcoming this limitation requires further studies to improve CHO to distinguish between different non-sinusoidal properties of pathological neural oscillations.  The data that is necessary for these further studies could be obtained from the wide range of studies that have linked the harmonic structures in the neural oscillations to various cognitive functions (van Dijk et al., 2010; Schalk, 2015; Mazaheri and Jensen, 2008) and neural disorders (Cole et al., 2017; Jackson et al., 2019; Hu et al., 2023). For example, Cole et al. 2017 showed that a harmonic structure of beta oscillations can explain the degree of Parkinson’s disease, and Hu et al. 2023 showed the number of harmonic peaks can localize the seizure onset zone. “

      References:

      Beck AM, He M, Gutierrez R, Purdon PL. An iterative search algorithm to identify oscillatory dynamics in neurophysiological time series. bioRxiv. 2022. p. 2022.10.30.514422.

      doi:10.1101/2022.10.30.514422

      Brady B, Bardouille T. Periodic/Aperiodic parameterization of transient oscillations (PAPTO)Implications for healthy ageing. Neuroimage. 2022;251: 118974.

      Fabus MS, Woolrich MW, Warnaby CW, Quinn AJ. Understanding Harmonic Structures Through Instantaneous Frequency. IEEE Open J Signal Process. 2022;3: 320-334.

      Jones SR, Pritchett DL, Sikora MA, Stufflebeam SM, Hämäläinen M, Moore CI. Quantitative analysis and biophysically realistic neural modeling of the MEG mu rhythm: rhythmogenesis and modulation of sensory-evoked responses. J Neurophysiol. 2009;102: 3554-3572.

      He M, Das P, Hotan G, Purdon PL. Switching state-space modeling of neural signal dynamics. PLoS Comput Biol. 2023;19: e1011395.

      Matsuda T, Komaki F. Time Series Decomposition into Oscillation Components and Phase Estimation. Neural Comput. 2017;29: 332-367.

      Quinn AJ, Lopes-Dos-Santos V, Huang N, Liang W-K, Juan C-H, Yeh J-R, et al. Within-cycle instantaneous frequency profiles report oscillatory waveform dynamics. J Neurophysiol. 2021;126: 1190-1208.

      Reviewer #2 (Public Review):

      Summary:

      A new toolbox is presented that builds on previous toolboxes to distinguish between real and spurious oscillatory activity, which can be induced by non-sinusoidal waveshapes. Whilst there are many toolboxes that help to distinguish between 1/f noise and oscillations, not many tools are available that help to distinguish true oscillatory activity from spurious oscillatory activity induced in harmonics of the fundamental frequency by non-sinusoidal waveshapes. The authors present a new algorithm which is based on autocorrelation to separate real from spurious oscillatory activity. The algorithm is extensively validated using synthetic (simulated) data, and various empirical datasets from EEG, intracranial EEG in various locations and domains (i.e. auditory cortex, hippocampus, etc.).

      Strengths:

      Distinguishing real from spurious oscillatory activity due to non-sinusoidal waveshapes is an issue that has plagued the field for quite a long time. The presented toolbox addresses this fundamental problem which will be of great use for the community. The paper is written in a very accessible and clear way so that readers less familiar with the intricacies of Fourier transform and signal processing will also be able to follow it. A particular strength is the broad validation of the toolbox, using synthetic, scalp EEG, EcoG, and stereotactic EEG in various locations and paradigms.

      Weaknesses:

      At many parts in the results section critical statistical comparisons are missing (e.g. FOOOF vs CHO). Another weakness concerns the methods part which only superficially describes the algorithm. Finally, a weakness is that the algorithm seems to be quite conservative in identifying oscillatory activity which may render it only useful for analysing very strong oscillatory signals (i.e.

      alpha), but less suitable for weaker oscillatory signals (i.e. gamma).

      Response: We thank Reviewer #2 for the assistance in improving this manuscript.  In the revised manuscript, we have added the missing statistical comparisons, detailed pseudo code, and a subsection that explains the hyper-parameters of CHO.  We also recognize the limitations of CHO in detecting gamma oscillations.  While our results demonstrate beta-band oscillations in ECoG and EEG signals (see Figures 5 and 6), we had no expectation to find gamma-band oscillations during a simple reaction time task.  This is because of the general absence of ECoG electrodes over the occipital cortex, where such gamma-band oscillations may be found. 

      Nevertheless, our CHO method should be able to detect gamma-band oscillations.  This is because if there are gamma-band oscillations, they will be reflected as a bump over the 1/f fit in the power spectrum, and CHO will detect them.  We apologize for not specifying the frequency range of the synthetic non-sinusoidal oscillations.  The gamma band was also included in our simulation. We added the frequency range (1-40 Hz) of the synthetic nonsinusoidal oscillations in the subsection, the caption of Figure 4, and the result section.

      Reviewer #1 (Recommendations For The Authors):

      (1) The example of a sinusoidal neural oscillation in Fig 1 seems to still exhibit a great deal of nonsinusoidal behavior. Although it is largely symmetrical, it has significant peak-trough symmetry as well as sharper peak structure than typical sinusoidal activity. Nevertheless, it has less harmonic structure than the example on the left. A more precisely-stated claim might be that non-sinusoidal behavior is not the distinguishing characteristic between the two, but rather the degree of harmonic structure.

      Response: We are grateful for this thoughtful observation. In response, we now recognize that the depicted example showcases pronounced peak-trough symmetry and sharpness, characteristics that might not be typically associated with sinusoidal behavior. We now better understand that the key differentiator between the examples lies not only in their nonsinusoidal behavior but also in their harmonic structure. To reflect this better understanding, we have refined our manuscript to more accurately articulate the differences in harmonic structure, in accordance with your suggestion. Specifically, we revised the caption of Fig 1 in the manuscript as follows:

      The caption of the Fig 1G-L.

      “We applied the same statistical test to a more sinusoidal neural oscillation (G). Since this neural oscillation more closely resembles a sinusoidal shape, it does not exhibit any prominent harmonic peaks in the alpha and beta bands within the power spectrum (H) and time-frequency domain (I).  Consequently, our test found that the phase of the theta-band and beta-band oscillations were not phase-locked (J-L).  Thus, this statistical test suggests the absence of a harmonic structure.”

      (2) The statement "This suggests that most of the beta oscillations

      detected by conventional methods are simply harmonics of the predominant asymmetric alpha oscillation." is potentially overstated. It is important to constrain this statement to the auditory cortex in which the authors conduct the validation, because true beta still exists elsewhere. The same goes for the beta-gamma claim later on. In general, use of "may be" is also more advisable than the definitive "are".

      Response: We thank the reviewer for this thoughtful feedback. To avoid the potential overstatement of our findings we revised our statement on beta oscillations in the manuscript as follows:

      Discussion:

      “This suggests that most of the beta oscillations detected by conventional methods within auditory cortex may be simply harmonics of the predominant asymmetric alpha oscillation.”

      Reviewer #2 (Recommendations For The Authors):

      All my concerns are medium to minor and I list them as they appear in the manuscript. I do not suggest new experiments or a change in the results, instead I focus on writing issues only.

      a) Line 50: A reference to the seminal paper by Klimesch et al (2007) on alpha oscillations and inhibition would seem appropriate here.

      Response: We added the reference to Klimesch et al. (2007).

      b) Figure 4: It is unclear which length for the simulated oscillations was used to generate the data in panels B-G.

      Response: We generated oscillations that were 2.5 cycles in length and 1-3 seconds in duration. We added this information to the manuscript as follows.

      Figure 4:

      “We evaluated CHO by verifying its specificity, sensitivity, and accuracy in detecting the fundamental frequency of non-sinusoidal oscillatory bursts (2.5 cycles, 1–3 seconds long) convolved with 1/f noise.”

      Results (page 5, lines 163-165):

      “To determine the specificity and sensitivity of CHO in detecting neural oscillations, we applied CHO to synthetic non-sinusoidal oscillatory bursts (2.5 cycles, 1–3 seconds long) convolved with 1/f noise, also known as pink noise, which has a power spectral density that is inversely proportional to the frequency of the signal.”

      Methods (page 20, lines 623-626):

      “While empirical physiological signals are most appropriate for validating our method, they generally lack the necessary ground truth to characterize neural oscillation with sinusoidal or non-sinusoidal properties. To overcome this limitation, we first validated CHO on synthetic nonsinusoidal oscillatory bursts (2.5 cycles, 1–3 seconds long) convolved with 1/f noise to test the performance of the proposed method.”

      c) Figure 5 - supplements: Would be good to re-organize the arrangement of the plots on these figures to facilitate the comparison between Foof and CHO (i.e. by presenting for each participant FOOOF and CHO together).

      Response: We combined Figure 5-supplementary figures 1 and 2 into Figure 5-supplementary figure 1, Figure 6-supplementary figures 1 and 2 into Figure 6-supplementary figure 1, and Figure 8-supplementary figures 1 and 2 into Figure 8-supplementary figure 1. 

      Author response image 1.

      Figure 5-supplementary figure 1:

      Author response image 2.

      Figure 6-supplementary figure 1:

      Author response image 3.

      Figure 8-supplementary figure 1:

      d) Statistics: Almost throughout the results section where the empirical results are described statistical comparisons are missing. For instance, in lines 212-213 the statement that CHO did not detect low gamma while FOOOF did is not backed up by the appropriate statistics. This issue is also evident in all of the following sections (i.e. EEG results, On-offsets of oscillations, SEEG results, Frequency and duration of oscillations). I feel this is probably the most important point that needs to be addressed.

      Response: We added statistical comparisons to Figure 5 (ECoG), 6 (EEG), and the results section as follows.

      Author response image 4.

      Validation of CHO in detecting oscillations in ECoG signals. A. We applied CHO and FOOOF to determine the fundamental frequency of oscillations from ECoG signals recorded during the pre-stimulus period of an auditory reaction time task. FOOOF detected oscillations primarily in the alpha- and beta-band over STG and pre-motor area.  In contrast, CHO also detected alpha-band oscillations primarily within STG, and more focal beta-band oscillations over the pre-motor area, but not STG. B. We investigated the occurrence of each oscillation within defined cerebral regions across eight ECoG subjects. The horizontal bars and horizontal lines represent the median and median absolute deviation (MAD) of oscillations occurring across the eight subjects. An asterisk (*) indicates statistically significant differences in oscillation detection between CHO and FOOOF (Wilcoxon rank-sum test, p<0.05 after Bonferroni correction).”

      Author response image 5.

      Validation of CHO in detecting oscillations in EEG signals. A. We applied CHO and FOOOF to determine the fundamental frequency of oscillations from EEG signals recorded during the pre-stimulus period of an auditory reaction time task.  FOOOF primarily detected alpha-band oscillations over frontal/visual areas and beta-band oscillations across all areas (with a focus on central areas). In contrast, CHO detected alpha-band oscillations primarily within visual areas and detected more focal beta-band oscillations over the pre-motor area, similar to the ECoG results shown in Figure 5. B. We investigated the occurrence of each oscillation within the EEG signals across seven subjects. An asterisk (*) indicates statistically significant differences in oscillation detection between CHO and FOOOF (Wilcoxon rank-sum test, p<0.05 after Bonferroni correction). CHO exhibited lower entropy values of alpha and beta occurrence than FOOOF across 64 channels. C. We compared the performance of FOOO and CHO in detecting oscillation across visual and pre-motor-related EEG channels. CHO detected more alpha and beta oscillations in visual cortex than in pre-motor cortex. FOOOF detected alpha and beta oscillations in visual cortex than in pre-motor cortex.

      We added additional explanations of our statistical results to the “Electrocorticographic (ECoG) results” and “Electroencephalographic (EEG) results” sections.

      “We compared neural oscillation detection rates between CHO and FOOOF across eight ECoG subjects.  We used FreeSurfer to determine the associated cerebral region for each ECoG location. Each subject performed approximately 400 trials of a simple auditory reaction-time task.  We analyzed the neural oscillations during the 1.5-second-long pre-stimulus period within each trial. CHO and FOOOF demonstrated statistically comparable results in the theta and alpha bands despite CHO exhibiting smaller median occurrence rates than FOOOF across eight subjects. Notably, within the beta band, excluding specific regions such as precentral, pars opercularis, and caudal middle frontal areas, CHO's beta oscillation detection rate was significantly lower than that of FOOOF (Wilcoxon rank-sum test, p < 0.05 after Bonferroni correction). This suggests comparable detection rates between CHO and FOOOF in premotor and Broca's areas, while the detection of beta oscillations by FOOOF in other regions, such as the temporal area, may represent harmonics of theta or alpha, as illustrated in Figure 5A and B. Furthermore, FOOOF exhibited a higher sensitivity in detecting delta, theta, and low gamma oscillations overall, although both CHO and FOOOF detected only a limited number of oscillations in these frequency bands.”

      “We assessed the difference in neural oscillation detection performance between CHO and FOOOF across seven EEG subjects.  We used EEG electrode locations according to the 10-10 electrode system and assigned each electrode to the appropriate underlying cortex (e.g., O1 and O2 for the visual cortex). Each subject performed 200 trials of a simple auditory reaction-time task.  We analyzed the neural oscillations during the 1.5-second-long pre-stimulus period. In the alpha band, CHO and FOOOF presented statistically comparable outcomes. However, CHO exhibited a greater alpha detection rate for the visual cortex than for the pre-motor cortex, as shown in Figures 6B and C. The entropy of CHO's alpha oscillation occurrences (3.82) was lower than that of FOOOF (4.15), with a maximal entropy across 64 electrodes of 4.16. Furthermore, in the beta band, CHO's entropy (4.05) was smaller than that of FOOOF (4.15). These findings suggest that CHO may offer a more region-specific oscillation detection than FOOOF.

      As illustrated in Figure 6C, CHO found fewer alpha oscillations in pre-motor cortex (FC2 and FC4) than in occipital cortex (O1 and O2), while FOOOF found more beta oscillations occurrences in pre-motor cortex (FC2 and FC4) than in occipital cortex. However, FOOOF found more alpha and beta oscillations in visual cortex than in pre-motor cortex.

      Consistent with ECoG results, FOOOF demonstrated heightened sensitivity in detecting delta, theta, and low gamma oscillations. 

      Nonetheless, both CHO and FOOOF identified only a limited number of oscillations in delta and theta frequency bands.

      Contrary to the ECoG results, FOOOF found more low gamma oscillations in EEG subjects than in ECoG subjects.”

      e) Line 248: The authors find an oscillatory signal in the hippocampus with a frequency at around 8 Hz, which they refer to as alpha. However, several researchers (including myself) may label this fast theta, according to the previous work showing the presence of fast and slow theta oscillations in the human hippocampus (https://pubmed.ncbi.nlm.nih.gov/21538660/, https://pubmed.ncbi.nlm.nih.gov/32424312/).

      Response: We replaced “alpha” with “fast theta” in the figure and text. We added a citation for Lega et al. 2012.

      f) Line 332: It could also be possible that the auditory alpha rhythms don’t show up in the EEG because a referencing method was used that was not ideal for picking it up. In general, re-referencing is an important preprocessing step that can make the EEG be more susceptible to deep or superficial sources and that should be taken into account when interpreting the data.

      Response: We re-referenced our signals using a common median reference (see Methods section). After close inspection of our results, we found that the EEG topography shown in Figure 6 did not show the auditory alpha oscillation because the alpha power of visual locations greatly exceeded that of those locations that reflect oscillations in the auditory cortex. Further, while our statistical analysis shows that CHO detected auditory alpha oscillations, this analysis also shows that CHO detected significantly more visual alpha oscillations.

      g) Line 463: It seems that the major limitation of the algorithm lies in its low sensitivity which is discussed by the authors. The authors seem to downplay this a bit by saying that the algorithm works just fine at SNRs that are comparable to alpha oscillations. However, alpha is the strongest single in human EEG which may make the algorithm less suitable for picking up less prominent oscillatory signals, i.e. gamma, theta, ripples, etc. Is CHO only seeing the ‘tip of the iceberg’?

      Response:  We performed the suggested analysis. For the theta band, this analysis generated convincing statistical results for ECoG signals (Figures 5, 6, and the results section). For theta oscillation detection, we found no statistical difference between CHO and FOOOF.  Since FOOOF has a high sensitivity even under SNRs (as shown in our simulation), our analysis suggests that CHO and FOOOF should perform equally well in the detection of theta oscillation, even when the theta oscillation amplitude is small.

      To validate the ability of CHO to detect oscillations in high-frequency bands (> 40Hz), such as gamma oscillations and ripples, our follow-up study is applying CHO in the detection of highfrequency oscillations (HFOs) in electrocorticographic signals recorded during seizures.  To this end, our follow-up study analyzed 26 seizures from six patients.  In this analysis, CHO showed similar sensitivity and specificity as the epileptogenicity index (EI), which is the most commonly used method to detect seizure onset times and zones. The results of this follow-up study were presented at the American Epilepsy Society Meeting in December of 2023, and we are currently preparing a manuscript for submission to a peer-reviewed journal. 

      In this study, we want to investigate the performance of CHO in detecting the most prominent neural oscillations (e.g., alpha and beta). Future studies will investigate the performance of  CHO in detecting more difficult to observe oscillations (delta in sleep stages, theta in the hippocampus during memory tasks, and high-frequency oscillation or ripples in seizure or interictal data. 

      h) Methods: The methods section, especially the one describing the CHO algorithm, is lacking a lot of detail that one usually would like to see in order to rebuild the algorithm themselves. I appreciate that the code is available freely, but that does not, in my opinion, relief the authors of their duty to describe in detail how the algorithm works. This should be fixed before publishing.

      Response: We now present pseudo code to describe the algorithms within the new subsection on the hyper-parameterization of CHO.

      See Author response table 1.

      A new subsection titled “Tradeoffs in adjusting the hyper-parameters that govern the detection in CHO.”

      “The ability of CHO to detect neural oscillations and determine their fundamental frequency is governed by four principal hyper-parameters.  Adjusting these parameters requires understanding their effect on the sensitivity and specificity in the detection of neural oscillations. 

      The first hyper-parameter is the number of time windows (N in Line 5 in Algorithm 1), that is used to estimate the 1/f noise.  In our performance assessment of CHO, we used four time windows, resulting in estimation periods of 250 ms in duration for each 1/f spectrum.  A higher number of time windows results in smaller estimation periods and thus minimizes the likelihood of observing multiple neural oscillations within this time window, which otherwise could confound the 1/f estimation.  However, a higher number of time windows and, thus, smaller time estimation periods may lead to unstable 1/f estimates. 

      The second hyper-parameter defines the minimum number of cycles of a neural oscillation to be detected by CHO (see Line 23 in Algorithm 1).  In our study, we specified this parameter to be two cycles.  Increasing the number of cycles increases specificity, as it will reject spurious oscillations.  However, increasing the number also sensitivity as it will reject short oscillations.

      The third hyper-parameter is the significance threshold that selects positive peaks within the auto-correlation of the signal.  The magnitude of the peaks in the auto-correlation indicates the periodicity of the oscillations (see Line 26 in Algorithm 1).  Referred to as "NumSTD," this parameter denotes the number of standard errors that a positive peak has to exceed to be selected to be a true oscillation.  For this study, we set the "NumSTD" value to 1 (the approximate 68% confidence bounds).  Increasing the "NumSTD" value increases specificity in the detection as it reduces the detection of spurious peaks in the auto-correlation.  However, increasing the "NumSTD" value also decreases the sensitivity in the detection of neural oscillations with varying instantaneous oscillatory frequencies. 

      The fourth hyper-parameter is the percentage of overlap between two bounding boxes that trigger their merger (see Line 31 in Algorithm 1).  In our study, we set this parameter to 75% overlap.  Increasing this threshold yields more fragmentation in the detection of oscillations, while decreasing this threshold may reduce the accuracy in determining the onset and offset of neural oscillations.”

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The present work establishes 14-3-3 proteins as binding partners of spastin and suggests that this binding is positively regulated by phosphorylation of spastin. The authors show evidence that 14-3-3 >- spastin binding prevents spastin ubiquitination and final proteasomal degradation, thus increasing the availability of spastin. The authors measured microtubule severing activity in cell lines and axon regeneration and outgrowth as a prompt to spastin activity. By using drugs and peptides that separately inhibit 14-3-3 binding or spastin activity, they show that both proteins are necessary for axon regeneration in cell culture and in vivo models in rats.

      The following is an account of the major strengths and weaknesses of the methods and results.

      Major strengths

      -The authors performed pulldown assays on spinal cord lysates using GST-spastin, then analyzed pulldowns via mass spectrometry and found 3 peptides common to various forms of 14-3-3 proteins. In co-expression experiments in cell lines, recombinant spastin co-precipitated with all 6 forms of 14-3-3 tested.

      -By protein truncation experiments they found that the Microtubule Binding Domain of spastin contained the binding capability to 14-3-3. This domain contained a putative phosphorylation site, and substitutions that cannot be phosphorylated cannot bind to spastin.

      -spastin overexpression increased neurite growth and branching, and so did the phospho null spastin. On the other hand, the phospho mimetic prevents all kinds of neurite development.

      -Overexpression of GFP-spastin shows a turn-over of about 12 hours when protein synthesis is inhibited by cycloheximide. When 14-3-3 is co-overexpressed, GFP-spastin does not show a decrease by 12 hours. When S233A is expressed, a turn-over of 9 hours is observed, indicating that the ability to be phosphorylated increases the stability of the protein.

      -In support of that notion, the phospho-mimetic S233D makes it more stable, lasting as much as the over-expression of 14-3-3.

      -Authors show that spastin can be ubiquitinated, and that in the presence of ubiquitin, spastin-MT severing activity is inhibited.

      -By combining FCA with Spastazoline, the authors claim that FCA increased regeneration is due to increased spastin Activity in various models of neurite outgrowth and regeneration in cell culture and in vivo, the authors show impressive results on the positive effect of FCA in regeneration, and that this is abolished when spastin is inhibited.

      Major weaknesses

      -However convincing the pull-downs of the expressed proteins, the evidence would be stronger if a co-immunoprecipitation of the endogenous proteins were included.

      We thank the reviewer for their succinct summary of the main results and strengths of our study. We acknowledge the reviewers' valuable suggestions and agree that performing endogenous co-immunoprecipitation (co-IP) experiments in neurons is crucial for supporting our conclusions. To address this question, cortical neurons were cultured in vitro for endogenous IP experiment. The cortical neurons were cultured using a neurobasal medium supplemented with 2% B27, and using cytarabine to inhibit the proliferation of glial cells. The proteins were then extracted and subjected to the immunoprecipitation experiments using antibodies against spastin. The results, as shown in Fig.1C in the revised manuscript, clearly demonstrate that 14-3-3 protein indeed interacts with spastin within neurons.

      -To better establish the impact of spastin phosphorylation in the interaction, there is no indication that the phosphomimetic (S233D) can better bind spastin, and this result is contradicting to the conclusion of the authors that spastin-14-3-3 interaction is necessary for (or increases) spastin function.

      Thank you for your valuable and constructive comments. We agree with your consideration. To reinforce the importance of phosphorylated spastin in this binding model, we conducted additional experiments by transfecting S233D into 293T cells and performed immunoprecipitation experiments (Fig.2H). The results clearly demonstrate that spastin (S233D) exhibits enhanced binding to spastin, indicating that phosphorylation at the S233 site is critical for this interaction. Additionally, we observed that spastin (S233D) maintains its binding to 14-3-3 even in the presence of staurosporine. This data further supports and strengthens our conclusions.

      -To fully support the authors' suggestion that 14-3-3 and spastin work in the same pathway to promote regeneration, I believe that some key observations are missing.

      1-There is no evidence showing that 14-3-3 overexpression increases the total levels of spastin, not only its turnover.

      Thank you for your consideration and valuable input. We have previously demonstrated that overexpression of 14-3-3 leads to an increase in the protein levels of spastin in the absence of CHX (Fig.3E&F). Furthermore, we also observed an upregulated protein levels of spastin S233D compared to the wild-type (Fig.3G). We have now included these results in the revised manuscript.

      2- There is no indication that increasing the ubiquitination of spastin decreases its levels. To suggest that proteasomal activity is affecting the levels of a protein, one would expect that proteasomal inhibition (with bortezomib or epoxomycin), would increase its levels.

      Thanks for your concern. We believe that this evidence is critical. Indeed, another study by our team is working to elucidate the ubiquitination degradation pathway of spastin. In addition, a previous study has shown that phosphorylation of the S233 site of spastin can affect its protein stability (Spastin recovery in hereditary spastic paraplegia by preventing neddylation-dependent degradation, doi:10.26508/lsa.202000799.). To better support our conclusions, we have supplemented the results in Fig.3L&M. The results showed that the proteasome inhibitor MG132 could significantly increase the protein level of spastin, whereas CHX could significantly decrease the protein level of spastin, and the degradation of spastin is significantly hindered in the presence of both CHX and MG132. This experiment also further showed that ubiquitination of spastin reduced its protein level.

      3- Authors show that S233D increases MT severing activity, and explain that it is related to increased binding to 14-3-3. An alternative explanation is that phosphorylation at S233 by itself could increase MT severing activity. The authors could test if purified spastin S233D alone could have more potent enzymatic activity.)

      We appreciate the reviewer’s consideration. After investigating the interaction between 14-3-3 and spastin, we first aimed to determine whether the S233 phosphorylation mutation of spastin influenced its microtubule-severing activity. We found that overexpression of both S233A and S233D mutants resulted in significant microtubule severing (as indicated by a significant decrease in microtubule fluorescence intensity) (Fig.S2). Furthermore, it is noteworthy that S233 is located outside the microtubule-binding domain (MTBD, 270-328 amino acids) and the AAA region (microtubule-severing region, 342-599 amino acids) of spastin. Based on our initial observations, we believe that the phosphorylation of the S233 residue in spastin does not impact its microtubule-severing function. Additionally, under the same experimental conditions, we observed that the green fluorescence intensity of GFP-spastin S233D was significantly higher than that of GFP-spastin S233A. Based on these phenomena, we speculated that phosphorylation of the S233 residue of spastin might affect its protein stability, leading us to conduct further experiments. Furthermore, we fully acknowledge the reviewer's concern; however, due to technical limitations, we were unable to perform an in vitro assay to test the microtubule-severing activity of spastin. We have provided an explanation for this consideration in the revised version.

      -Finally, I consider that there are simpler explanations for the combined effect of FC-A and spastazoline. FC-A mechanism of action can be very broad, since it will increase the binding of all 14-3-3 proteins with presumably all their substrates, hence the pathways affected can rise to the hundreds. The fact that spastazoline abolishes FC-A effect, may not be because of their direct interaction, but because spastin is a necessary component of the execution of the regeneration machinery further downstream, in line with the fact that spastizoline alone prevented outgrowth and regeneration, and in agreement with previous work showing that normal spastin activity is necessary for regeneration.

      We appreciate the considerations raised by the reviewer. It is evident that spastin is not the exclusive substrate protein for 14-3-3, and it is challenging to demonstrate that 14-3-3 promotes nerve regeneration and recovery of spinal cord injury directly through spastin in vivo. However, we have identified the importance of 14-3-3 and spastin in the process of nerve regeneration. Importantly, we have conducted supplementary experiments to support the stabalization of spastin by FC-A treatment within neurons (Fig.4M), as well as the repair process of spinal cord injury in vivo (Fig.5D). The results showed that FC-A treatment in cortical neurons could enhance the stability of spastin protein levels, and we also demonstrated a consistent trend of upregulated protein levels of spastin and 14-3-3 following spinal cord injury. Moreover, the protein levels were significantly elevated in the the FC-A group of mice. These results also support that 14-3-3 enhances spastin protein stability to promote spinal cord injury repair. The manuscript was revised accordingly.

      Reviewer #2 (Public Review):

      Summary:

      The idea of harnessing small molecules that may affect protein-protein interactions to promote axon regeneration is interesting and worthy of study. In this manuscript, Liu et al. explore a 14-3-3-spastin complex and its role in axon regeneration.

      Strengths:

      Some of the effects of FC-A on locomotor recovery after spinal cord contusion look interesting.

      Weaknesses:

      The manuscript falls short of establishing that a 14-3-3-spastin complex is important for any FC-A-dependent effects and there are several issues with data quality that make it difficult to interpret the results. Importantly, the effects of the spastin inhibitor have a major impact on neurite outgrowth suggesting that cells simply cannot grow in the presence of the inhibitor and raising serious questions about any selectivity for FC-A - dependent growth. Aspects of the histology following spinal cord injury were not convincing.

      We sincerely appreciate the reviewer for evaluating our manuscript. Given the multitude of substrates that interact with 14-3-3, and considering spastin's indispensable role in neuroregeneration, it is indeed challenging to experimentally establish that FC-A's neuroregenerative effect is directly mediated through spastin in vivo. Therefore, we have provided additional crucial evidence regarding the changes in spastin protein levels following spinal cord injury, as well as the application of FC-A after spinal cord injury. Furthermore, we have made relevant adjustments to the uploaded images to enhance the resolution of the presented figures, as detailed in the subsequent response.

      Reviewer #3 (Public Review):

      Summary: The current manuscript c laims that 14-3-3 interacts with spastin and that the 14-3-3/spastin interaction is important to regulate axon regeneration after spinal cord injury.

      Strengths:

      In its present form, this reviewer identified no clear strengths for this manuscript.

      Weaknesses:

      In general, most of the figures lack sufficient quality to allow analyses and support the author's claims (detailed below). The legends also fail to provide enough information on the figures which makes it hard to interpret some of them. Most of the quantifications were done based on pseudo-replication. The number of independent experiments (that should be defined as n) is not shown. The overall quality of the written text is also low and typos are too many to list. The original nature of the spinal cord injury-related experiments is unclear as the role of 14-3-3 (and spastin) in axon regeneration has been extensively explored in the past.

      We sincerely appreciate the careful consideration and rigorous evaluation provided by the reviewer. In the revised version, we have made effort to present high-resolution figures and provide more detailed figure legends. Furthermore, we have made relevant adjustments to the statistical methods in accordance with the reviewer's suggestions. The manuscript has also undergone a thorough review and correction process to eliminate any writing-related errors. Please refer to the following response.

      To the best of our knowledge, there has been no clear reports on the efficacy of 14-3-3 in the repair of spinal cord injury. Kaplan A et al. (doi: 10.1016/j.neuron.2017.02.018) reported a reduction in die-back of the corticospinal tract following spinal cord injury using FC-A as a filler in situ in the lesion site. However, the specific effects of FC-A on spinal cord injury, such as motor function and neural reactivity, as well as the expression characteristic of 14-3-3 after spinal cord injury, have not been extensively elucidated. Additionally, prior research on spastin's role in axon regeneration primarily focused on the effects in Drosophila, and its regenerative effects in the central nervous system of adult mammals after injury have not been reported. Therefore, our study provides crucial insights into the importance of 14-3-3 and spastin in the process of spinal cord injury repair in mammals.

      Reviewer #1 (Recommendations For The Authors):

      There are many spelling and grammar errors, please revise. Examples:

      -approach revealed14-3-3

      -We have detected different many 14-3-3 peptides

      -Line 1057 (D) 14-3-3 agnoist FC-A

      -There is a discrepancy between panel names and figure legend in Figure 4.

      -There is another discrepancy between the color coding of treatments in Figure 7. All panels show "injury" in red and FC-A in orange, but in panel E, these are swapped. This is confusing to readers.

      Thank you for the thorough and rigorous review. We have re-colored the relevant chart. The manuscript has also undergone a thorough review to eliminate any writing-related errors.

      Most images from confocal microscopy are blurred or low resolution. They should be sharper for the type of microscopy used.

      We have adjusted and re-uploaded the images with higher resolution. Additionally, we have enlarged the relevant images.

      The list of all peptides retrieved in the Mass-Spec analyses of the GST-spastin pulldown must be publicly available, according to eLife rules.

      Thank you for your suggestion. We have now uploaded the mass spectrometry data.

      To determine where the 14-3-3/spastin protein142 complex functions in neurons, we double stained hippocampal neurons with spastin143 and 14-3-3 antibody, and found that 14-3-3 was colocalized with spastin in the entire144 cell compartment (Figure 1C).

      Colocalization by confocal fluorescence microscopy is not evidence for protein complexes.

      While co-localization experiments may not directly demonstrate protein-protein interactions, they can still provide valuable insights into the cellular localization of the proteins and suggest potential interactions between them. Therefore, we adjusted the statement.

      Fig1F- Co-immunoprecipitation assay results confirmed that all 14-3-3 isoforms could form direct complexes with spastin.

      CoIP in cells overexpressing the proteins is not evidence that it is direct. That they can interact directly with each other can be extracted from the evidence in vitro with purified proteins.

      We agree with this and we have changed our statement accordingly.

      For a broad audience to have a better understanding, the authors have to explain their a.a. subtitucions of Serine233, one being mimicking phosphorylation (S233D) and the other rendering the protein not being able to be phosphorylated in that position (S233A).

      We appreciate the suggestion. We have provided a more detailed explanation in revised manuscript.

      The panel of neuronas in Fig2G is mislabeled, because it is twice spastin S233A, instead of S233D.

      We apologize for this mistake and we have corrected it in the panel.

      FCA may increase the interaction of 14-3-3 with any of its substrates, including spastin. One would appreciate evidence that FCA increases the MT-severing activity of spastin, as assumed by authors

      We appreciate the reviewer’s suggestion. In this study, we overexpressed spastin to investigate its microtubule severing activity. It is important to note that overexpressing spastin significantly exceeds the normal physiological concentration of the protein. Using excessive amounts of FC-A to enhance the interaction between 14-3-3 and spastin in cells can lead to cell toxicity. Therefore, we chose to overexpress 14-3-3 instead of employing excessive FC-A.

      In Fig2F, the interaction of 14-3-3 with Spas-S233D would have been very informative.

      Thank you for the constructive suggestions from the reviewer. We have supplemented the corresponding co-immunoprecipitation experiments (Fig.).

      The functional effect of S233A and S233D does not correlate with a function of 14-3-3 in neurite outgrowth. This is because S233A does not interact with 14-3-3, however, it is as good as WT spastin... meaning that binding of 14-3-3 with spastin is not necessary...

      We appreciate the reviewer's consideration. The observed phenomenon of spastin WT and S233A promoting axon growth do not align with the physiological state within neurons. This may mask the true effects of S233A or S233D on neuronal axon growth. It is documented that the proper dosage of spastin is essential for neuronal growth and regeneration, as excessive or insufficient amounts can hinder axon growth. Excessive spastin levels can disrupt the overall cellular MTs. Therefore, spastin were moderately expressed by adjusting the transfection dosage and duration. Nevertheless, we were unable to precisely control the expression levels of spastin for both WT and S233A, also resulting in an overexpression state compared to the physiological state. As a result, the crucial role of spastin S233 in neural growth under physiological conditions may be masked. We have addressed this issue in the revised version of our manuscript.

      In panels 3C and D it is not clear if it does contain 14-3-3.... it seems it does not... but clarify.

      We apologize for any confusion. Since there is endogenous 14-3-3 present in the cells, we utilized spastin S233A and S233D to mimic the binding pattern with 14-3-3 according to the established interaction model. This information has been clarified in the original manuscript.

      Line 217 should indicate Figure 3, not Figure 5

      We have made the corresponding corrections.

      In F3G, it is intriguing that the input blot shows a decrease in Ubiquitin proteins when there is expression of flag ubiquitin...

      We apologize for the error in our presentation. In the control group, we actually overexpressed Flag-ubiquitin and GFP instead of Flag and GFP-spastin. Additionally, to further elucidate the impact of different phosphorylation states on spastin ubiquitination and degradation, we have conducted additional ubiquitination experiments (Fig.3N), which are now included in the revised version of our manuscript.

      S233 mutations seem to affect the effective turnover of spastin, but does not seem to change the levels of the spastin protein...hence, the conclusion that 14-3-3 protects from degradation is overstated.

      We thank the reviewers for the careful review and we have revised the statement accordingly.

      The mode of action of R18 FCA should be introduced earlier in the text.

      Thank you for the reviewer's correction. We have provided a corresponding description of the effects of FC-A and R18 on the interaction between 14-3-3 and spastin in the ubiquitination experiments section of the manuscript.

      Line 296 reads: Our results revealed that levels of 14-3-3 protein remained high even at 30 DPI, indicating that 14-3-3 plays an important role in the recovery of spinal cord injury.

      This is overstated since it can well be that an upregulated protein is inhibitory. We thank the reviewers for their consideration and we have made adjustments accordingly.

      It is not clear if 14-3-3 prevents ubiquitination of spastin, then its levels should be higher... it is noteworthy that they did not measure its levels in nerve tissue after injury. For example, in experiments shown in Figure 5A, it would have been very useful the observation of the levels of spastin.

      We appreciate the reviewer's consideration. We have now included the assessment of spastin protein levels following spinal cord injury. Additionally, we have collected the injured spinal cord lysates in mice treated with FC-A for western blot analysis. The results revealed that the expression trend of 14-3-3 protein is largely consistent with spastin after spinal cord injury. Furthermore, the treatment with FC-A was found to enhance the expression of spastin after spinal cord injury (Fig. 5C&D)."

      Panel 5G reads "nerve regeneration across the lesion site", but it actually measured NF levels, according to the legend.

      Thanks to the reviewers for the critical review. We have revised the chart accordingly.

      361 "BMS" should be explained in the results section for a better understanding of the results by non-experts.

      Thank you to the reviewers for their suggestions. We have explained this in the results section accordingly.

      Reviewer #2 (Recommendations For The Authors):

      1. The results of the mass spec and co-IP in Figure 1 are unclear.

      a) Are all of the peptides in Fig. 1A from 14-3-3 and were there only 3 14-3-3 peptides that were identified?

      The mass spectrum results did identify only three 14-3-3 peptides, and these three peptides were highly conserved across all isoforms.

      b) The blot in panel B needs to show the input band for spastin and 14-3-3 from the same gel and not spliced so that the level of enrichment can be evaluated in the co-IP.

      Thanks to the reviewer's comments, we have presented the whole gel (Fig.1B)

      c) Further, does an IP for 14-3-3 co-precipitate spastin?

      Thank you for your concern. We appreciate your feedback. Our 14-3-3 antibody is capable of Western blot experiments and recognizes all subtypes (Pan 14-3-3, Cell Signaling Technology, Cat #8312). Unfortunately, it is not suitable for immunoprecipitation (IP) experiments. Therefore, we have employed additional approaches, namely immunoprecipitation and pull-down assays, to further investigate the interaction between 14-3-3 and spastin.

      1. It is difficult to say anything about 14-3-3 - spastin co-localization in hippocampal neurons (1c) since 14-3-3 labels the entire hippocampal neuron so any protein will co-localize.

      We appreciate the comments. The co-localization experiments have provided evidence of the relative expression of both 14-3-3 and spastin in neurons, suggesting their potential interaction within neuronal cells. We have made the necessary revisions to accurately describe the results of the co-localization experiments in the manuscript.

      To further investigate the interaction between 14-3-3 and spastin within neurons, we have conducted additional co-immunoprecipitation (Co-IP) experiments using cortical neuron lysates (Fig.1C).

      1. The molecular weight of 14-3-3 is 25-28 kDa but the band in panel 1B and in subsequent figures it is below 15 kDa. Fig. 1F - the spastin band also seems to be low compared to predicted molecular weight and other W. Blot reports in the literature so some indication of how the antibody was validated would be important.

      Apologies for the mistakes. We have carefully re-evaluated the western blot images (See Author response image 1). We have confirmed that the molecular weight of the 14-3-3 protein is approximately 33 kDa. In the case of spastin, its molecular weight is around 55-70 kDa. Additionally, the GFP-spastin fusion protein has an estimated molecular weight of approximately 90 kDa. We have conducted a thorough verification and made appropriate adjustments to the molecular weight labels in all western blot images.

      Author response image 1.

      1. Fig 1G is a co-immunoprecipitation and it is not clear what the authors mean by "direct complexes" as claimed in line 150 of the results since this does not show direct binding between 14-3-3 and spastin. None of the assays in Fig. 1 assess "direct" binding between the two proteins and the authors should be clear in their interpretation.

      We agree with the reviewer's comments and have removed the word "direct" from the text.

      1. Fig. 1D - there is no validation that staurosporine (protein kinase inhibitor, not protein kinase as per typo in Line 167) affects the phosphorylation levels of spastin.

      Thank you for your valuable comments. In our group, we have conducted another study that has confirmed the involvement of CAMKII in mediating spastin phosphorylation. Furthermore, we have found that the addition of staurosporine significantly reduces the phosphorylation levels of spastin (unpublished results). In response to the reviewer's comment, we are pleased to provide western blot experiments demonstrating the effect of staurosporine on reducing spastin phosphorylation. The phosphorylation levels of spastin were assessed using a Pan Phospho antibody (Fig.2D).

      1. Fig. 2F - it would be important to test if spastin S233D interacts more robustly with 14-3-3 and if this is insensitive to staurosporine.

      Thank you for your comments. The suggestion provided by the reviewer is highly significant for supporting our conclusion that "phosphorylation of spastin is a prerequisite for its interaction with 14-3-3." Therefore, we have conducted additional immunoprecipitation experiments to further supplement our findings (Fig.2H). The experimental results demonstrate that the binding affinity between spastin S233D and 14-3-3 is stronger compared to spastin WT.

      1. Line 179 "Next, we transfected Ser233 mutation of spastin (spastin S233A or spastin S233D) with flag tagged 14-3-3 and generated Pearson's correlation coefficients. Results revealed that spastin 181 S233D was markedly colocalized with 14-3-3, with minimal colocalization with spastin S233A (Figure 2A-B)." Assuming the authors are referring to supplemental Figure 2, the 14-3-3 covers the entire cell thus I think measures of co-localization are uninterpretable.

      We agree with the reviewer's comment. We realize that 14-3-3θ exhibits a ubiquitous cellular distribution, which renders the measurement of its co-localization coefficients inconclusive. Therefore, we have decided to remove Supplementary Figure 2 from the manuscript.

      1. Line 189 "Consistent with earlier results, spastin promoted neurite outgrowth, as evidenced by both the length and total branches of neurite." - It is unclear what earlier results the authors are referring to. The authors should clarify how they determined the "moderate" expression level.

      We thank the review’s suggestions. The "earlier results" mentioned here refers to previously published articles, we now have added relevant references. Existing literature indicates that an appropriate dosage of spastin is necessary for neuronal growth and regeneration. However, both excessive and insufficient amounts of spastin are detrimental to axonal growth. Excessive spastin disrupts the overall microtubule network within cells. We controlled plasmid transfection dosage and transfection durations to achieve moderate expression. We have provided an explanation of these details in the revised version.

      1. The effects of WT spastin and spastin S233A were similar in spite of the fact that S233A does not bind to 14-3-3, which is inconsistent with the author's model that spastin-14-3-3 binding promotes growth. Line 191 - the authors mention that spastin S233D was toxic but I do not see any cell death measurements. I assume the bottom right panel in Fig. 2G labelled as spastin S233A is mislabeled and should be S233D.

      In response to comment 8, the transfection of both wild-type (WT) spastin and S233A mutant failed to precisely control the expression levels around the physiological concentration. Consequently, we observed an overexpression of spastin in both cases, which obscured the critical role of S233 phosphorylation in neurite outgrowth. We have addressed this issue in the revised version of the manuscript.

      1. Fig. 3. Does spastin(S233D) bind constitutively to 14-3-3? Why is spastin S233A not less stable than WT spastin based on the author's model?

      We propose that 14-3-3 is more likely to interact with spastin S233D in a non-constitutive manner. The instability of the S233A protein is attributed to the disruption of its ubiquitination degradation process due to the absence of 14-3-3 binding.

      1. The ubiquitin blot in Fig. 3G is not convincing and not quantified.

      We acknowledge the mislabeling in our figures. In the control group, Flag-Ubiquitin was also overexpressed, and we transfected GFP as a control instead of GFP-spastin. To further enhance the reliability, we conducted additional ubiquitination experiments (Fig.3N), which revealed a significant increase in spastin (S233A) ubiquitination levels compared to the WT group, consistent with previous research findings (Spastin recovery in hereditary spastic paraplegia by preventing neddylation-dependent degradation, doi:10.26508/lsa.202000799). Additionally, we observed that the addition of R18 could partially enhance spastin ubiquitination levels, as quantitatively illustrated in the figure (Fig.3O). This result further underscores the inhibitory role of 14-3-3 in the ubiquitination degradation pathway of spastin.

      1. I do not understand how the glutamate injury fits with the narrative (Fig. 4C).

      Excessive glutamate exposure can induce severe intracellular oxidative stress reactions, leading to the disruption of physiological processes such as mitochondrial energy production. This, in turn, results in the swelling and lysis of neuronal processes, a phenomenon known as neuronal necrosis. During this state, neurite maintenance is obstructed, and neurites exhibit swelling and breakage (Glutamate-induced neuronal death: a succession of necrosis or apoptosis depending on mitochondrial function. Neuron. 1995 Oct;15(4):961-73). We have provided a more comprehensive explanation of this phenomenon in the revised version of our manuscript.

      1. Some commentary about the selectivity of spastazoline to inhibit spastin should be included - it would be helpful if the authors could explain that this is a spastin inhibitor in the manuscript. FC-A still seems to promote growth in the presence of spastazoline suggesting that the FC-A effects are not dependent on spastin (Fig. 4E). The statistical analysis section of the materials and methods indicates that multiple groups were analyzed by one-way ANOVA. This seems unusual since the controls for cellular transfection are different than for small molecules (FC-A) and for peptides such as R18. As such, there is no vehicle control for the FC-A condition and it is difficult to assess the FC-A vs Spastazoline vs FA-A + Spastoazoline. The authors should clarify (Fig. 4E-J)

      Thank you for the reviewer’s suggestions. In the revised version, we have provided a more detailed explanation of the specific inhibition of spastin's severing function by spastazoline.

      We observed that FC-A, in combination with spastazoline, still exhibited a certain degree of promotion in neurite growth compared to the injury group under the glutamate circumstances. Evidently, spastin is not the exclusive substrate for 14-3-3, and FC-A might delay cellular oxidative stress reactions by facilitating the interaction of 14-3-3 with other substrates, such as the FOXO transcription factors as mentioned in the introduction. Nevertheless, our results still demonstrate that the addition of spastazoline significantly diminishes the promoting effect of FC-A on neurite growth, indicating that FC-A affects neuronal growth by impacting spastin.

      Furthermore, in the drug-treated groups, we overexpressed GFP to trace the morphology of neurons. Culture media were exchanged following transfection, and during media exchange, drugs were added. And an equivalent amount of DMSO or ethanol were added as controls to rule out the influence of solvents on neurons.

      1. There is a good possibility that spastin is required for all axon regeneration and that there is no selectivity for the FC-A pathway and this is a major issue with the interpretation of the manuscript (Fig 4K-L).

      We acknowledge this point. Clearly, spastin is not the exclusive substrate for 14-3-3, and our experimental evidence does not establish that 14-3-3 solely promotes neuronal regeneration through spastin. Nevertheless, we have identified the significance of 14-3-3 and spastin in the process of neural regeneration. Furthermore, we conducted complementary experiments to support the stability of spastin by FC-A treatment both in vitro and in vivo. We found an enhanced protein expression in cortical neurons after FC-A treatment (Fig.4M). Also, the results indicate a consistent elevation trend in the protein levels of spastin and 14-3-3 following spinal cord injury (Fig.5C&H). Moreover, in the FC-A group of mice, there was a significant increase in spastin protein levels (Fig.5D&I). These results also support that 14-3-3 promotes spinal cord injury repair by enhancing spastin protein stability.

      1. Fig. 5C- it is unclear where the photomicrographs were taken relative to the lesion.

      We obtained tissue sections from the lesion core and the above segments for histological analysis. Given the scarcity of neural compartment at the injury center, we select tissue slices as close as possible to lesion core to illustrate the relationship between 14-3-3 and the injured neurons. We have provided an explanation of this in the revised version of the manuscript.

      1. The authors need to provide some evidence that the FC-A and spastazoline compounds are accessing the CNS following IP injection.

      We thank the review’s suggestion. Although direct visualization evidence of FC-A and spastazoline entering the CNS is challenging to obtain, several indicators suggest drug penetration into spinal cord tissue. Firstly, behavioral and electrophysiological experiments in vivo demonstrate that drug injections indeed affect the neural activity of mice. Secondly, following spinal cord injury, the blood-spinal cord barrier was disrupted at the injury site, combined with the fact that both FC-A (molecular weight: 680.82 Da) and spastazoline (molecular weight: 382.51 Da) are small molecule drugs, these increases the likelihood of these small molecules entering the injured spinal cord tissue. Furthermore, our microtubule staining results indicated that FC-A and spastazoline did influence the acetylation ratio of microtubules. These findings support the drug penetration into spinal cord tissue.

      1. Some quantification of Fig. 5D would be important to support the contention that the lesion site is impacted by FC-A treatment.

      Thank you for the suggestion. We have included quantitative analysis for Figure 5D (Figure) as recommended.

      1. The NF and 5-HT staining in Fig. 5D and in Fig. 7A and B does not clearly define fibers and is not convincing.

      We appreciate the concerns. While we did not present whole nerve fibers, we therefore employed NF and 5-HT immunoreactive fluorescence intensity as an indicator to assess the regeneration of nerve fibers as previously described, but not axons per square millimeter (Baltan S, et, al. J Neurosci. 2011 Mar 16;31(11):3990-9; Iwai M, et, al. Stroke. 2010 May;41(5):1032-7; Wang Y, et, al. Elife. 2018 Sep 12;7:e39016; Altmann C, et, al. Mol Neurodegeneration. 2016 Oct 22;11(1):69).

      Our results showed that in the spinal cord injury group, there was strongly decreased NF-positive stainning (with a slight increase in 5-HT). In contrast, the FC-A treatment group exhibited a significant higher abundance of NF-positive signals (or an increased 5-HT signal) in the lesion site, which also suggests the reparative effect of FC-A on nerves. We also intend to refine our immunohistochemical methods in future experiments.

      Minor Comments: 1. Line 80 -84. To my knowledge the only manuscripts examining the effects of spastin in axon regeneration models includes the analysis in drosophila (i.e. ref 15 and 16) and a study in sciatic nerve that reported an index of functional recovery but did not perform any histology to assess axon regeneration phenotypes. The literature should be more accurately reflected in the introduction.

      We appreciate the suggestions from the reviewer. In the revised version, we have provided further clarification on the novelty of spastin in the spinal cord injury repair process.

      1. Line 73: The meaning of the following statement needs to be clarified: "spastin has two major isoforms, namely M1 and M87, coded form different initial sites."

      We have provided additional elaboration for this statement in the revised version.

      1. Line 216: Results indicated that GFP-spastin could be ubiquitinated, while inhibiting the 217 binding of 14-3-3/spastin promoted spastin ubiquitination (Figure 5G)." - Should be Fig 3G

      Sorry about the mistake. We have made the corresponding changes in the revised version.

      1. Line 255: "Briefly, we established a neural injury model as previously described(31)" - the basics of the injury model need to be described in this manuscript.

      In the revised version, we have provided further elaboration on the glutamate-induced neuronal injury model.

      Reviewer #3 (Recommendations For The Authors):

      Figure 1: A- Both legend and text fail to provide detail on this specific panel.

      We have provided a more detailed and comprehensive description of the legend and results in this section.

      B- Is the contribution of non-neuronal cells for co-IPs relevant? Co-IP with isolated neuronal extracts (instead of spinal cord tissue) should be performed.

      We thank the review’s suggestion. To further elucidate their interaction within neurons, cortical neurons were cultured (Cultured in Neurobasal medium supplemented with 2%B27 and cytarabine was used to inhibit glial cell growth) and cells were lysed for co-IP experiments (Fig.1C), and the results demonstrated the interaction between 14-3-3 and spastin within neurons.

      C- Both spastin and 14-3-3 appear to label the entire neuron with similar intensities throughout the entire cell which is rather unusual. Conditions of immunofluorescence should be improved and z-projections should be provided to support co-localization.

      Thanks for the comment. Our dual-labeling experiments indicated that 14-3-3 exhibits a characteristic pattern of whole-cell distribution. Therefore, this result cannot confirm the interaction between 14-3-3 and spastin within neurons, but it does provide evidence regarding the intracellular distribution patterns of 14-3-3 and spastin. Consequently, we supplemented neuronal endogenous co-IP experiments to further demonstrate the direct interaction between 14-3-3 and spastin within neurons, and we have modified the wording in the revised version accordingly.

      D- xx and yy axis information is either lacking or incomplete.

      We have made the corrections to the figures.

      E- It would be useful to show the conservation between the different 14-3-3 isoforms.

      We appreciate the suggestions. We have included a conservation analysis of 14-3-3 to assist readers in better understanding these results (Fig.1F).

      Figure 2:

      D- The experiment using a general protein kinase inhibitor does not allow concluding that the specific phosphorylation of spastin is sufficient for binding to 14-3-3. An alternative phosphorylated protein might be involved in the process.

      We appreciate the reviewer's consideration. We believe this serves as a prerequisite condition to demonstrate that "14-3-3 binding to spastin requires spastin phosphorylation." In fact, another project in our group has confirmed that CAMK II can mediate spastin phosphorylation, and the addition of staurosporine significantly reduces spastin phosphorylation levels (unpublished results). Here, we provide the western blot experiment showing the decrease in spastin phosphorylation under staurosporine treatment, with phosphorylation levels detected using the Pan Phospho antibody (Fig.2D).

      H and I- Pseudo-replication. Only independent experiments should be plotted and not data on multiple cells obtained in the same experiment. Please indicate the number of independent experiments.

      We appreciate the reviewer's correction. We now have included the mean value of three independent experiments and we have made relevant revisions to the statistical charts.

      Figure 3:

      The rationale for the hypothesis that spastin S233D transfection might upregulate the expression of spastin relative to WT and spastin S233A is unclear.

      We appreciate the reviewer's consideration. We have supplemented the relevant results, as depicted in the Fig.3G, which demonstrates that 14-3-3 can enhance the protein levels of spastin, and phosphorylated spastin (S233D) exhibits a significantly increased protein level compared to wild-type spastin. These findings indicate that 14-3-3 not only inhibits the degradation of spastin but also increases its protein levels.

      I- pseudo-replication. Please plot and do statistical analysis of independent experiments.

      Thank you for the reviewer's corrections. We have made the necessary revisions.

      Figure 4: E-J: I- pseudo-replication. Please plot and do statistical analysis of independent experiments.

      Thank you for the reviewer's corrections. We have made the necessary revisions.

      Figure 5:

      B- Please show individual data points.

      Thank you for the reviewer's corrections. We have made the necessary revisions.

      D- Longitudinal images of spinal cords where spastazoline was used cannot correspond to contusion as there is a very sharp discontinuity between the rostral and caudal spinal cord tissue. A full transection seems to have occurred. Alternatively, technical problems with tissue collection/preservation might have occurred.

      Thank you for the reviewer's consideration. The sharp discontinuity observed in the spastazoline group is not due to modeling issues but rather a result of the drug's effects on the injury site. This is primarily because spastin plays a crucial role not only in neuronal development but also in mitosis. Since the highly active proliferation of stromal cells at the injury site, . spastazoline may inhibit the proliferation of injury site-related stormal cells, thereby impeding the wound healing process following spinal cord injury, resulting in the observed discontinuous injury gap. We have made the corresponding revision accordingly.

      E- Images do not have the quality to allow analysis. 5HT staining should not be considered as a clear axonal labeling is not seen. This is also the case for neurofilament staining.

      We appreciate the concerns. While we did not present whole nerve fibers, we therefore employed NF and 5-HT immunoreactive fluorescence intensity as an indicator to assess the regeneration of nerve fibers as previously described, but not axons per square millimeter (Baltan S, et, al. J Neurosci. 2011 Mar 16;31(11):3990-9; Iwai M, et, al. Stroke. 2010 May;41(5):1032-7; Wang Y, et, al. Elife. 2018 Sep 12;7:e39016; Altmann C, et, al. Mol Neurodegeneration. 2016 Oct 22;11(1):69).

      Our results showed that in the spinal cord injury group, there was strongly decreased NF-positive stainning (with a slight increase in 5-HT). In contrast, our FC-A treatment group exhibited a significant higher abundance of NF-positive signals (or an increased 5-HT signal) in the lesion site, which also suggests the reparative effect of FC-A on nerves. We also intend to refine our immunohistochemical methods in future experiments.

      F- Images do not allow analysis. Higher magnifications are needed.

      Thank you for the reviewer's consideration. We have now included higher-magnification images (Fig.5M) to address this concern.

      Figure 7:

      Same issues as in Figure 5.

      A- Images do not have the quality to allow analysis. 5HT staining should not be considered as a clear axonal labeling is not seen.

      B- Images do not have the quality to allow analysis. Neurofilament staining should not be considered as clear axonal labeling is not seen. MBP staining does not have a pattern consistent with myelin staining

      We appreciate the concerns. While we did not present whole nerve fibers, we therefore employed NF and 5-HT immunoreactive fluorescence intensity as an indicator to assess the regeneration of nerve fibers as previously described, but not axons per square millimeter (Baltan S, et, al. J Neurosci. 2011 Mar 16;31(11):3990-9; Iwai M, et, al. Stroke. 2010 May;41(5):1032-7; Wang Y, et, al. Elife. 2018 Sep 12;7:e39016; Altmann C, et, al. Mol Neurodegeneration. 2016 Oct 22;11(1):69). In this study, sagittal slices were used. MBP covers the axonal surface, indicating its co-localization with the axons. However, as we did not present intact nerve fibers, so we were unable to show the typical myelin staining of MBP.

    1. Author Response:

      The following is the authors' response to the original reviews.

      We were pleased with the overall enthusiastic comments of the reviewers:

      • Reviewer #1: “This manuscript by Mahlandt, et al. presents a significant advance in the manipulation of endothelial barriers with spatiotemporal precision”

      • Reviewer #2: “The immediate and repeatable responses of barrier integrity changes upon light-on and light-off switches are fascinating and impressive.”

      • Reviewer #3: “, these molecular tools will be of broad interest to cell biologists interested in this family of GTPases.”

      We thank the reviewers for their fair and constructive comments that helped us to improve the manuscript.

      Reviewer #1 (Recommendations For The Authors):

      1) This paper is likely to attract a diverse audience. However, the order of data presented in this manuscript can be confusing or challenging to follow for the naive reader. This is because the tool characterization is split into two parts: before the barrier strength assay (selection of optogenetic platform and tool expression) and after (characterization of cell morphology with global and local optogenetic stimulation). Reorganizing the results such that the barrier strength results follows from an understanding of individual cell responses to stimulation may improve the ability of this readership to understand the factors at play in the changes in barrier strength observed when opto-RhoGEFs are activated.

      We appreciate this idea, and we initially structured the paper in the proposed order and then decided, that we wanted to put more focus on the barrier strength results by already presenting them in the second figure. Therefore, we prefer to keep this order of figures.

      2) While the description of the selection of iLID as the study's optogenetic platform is clear, a better job could be done motivating the need for engineering new optogenetic tools for the control of GEF recruitment. Given that iLID-based tools for GEFs of RhoA, Rac1, and Cdc42 already exist, some of which are cited in the introduction, more information on why these tools were not used would be helpful-were these tools tested in endothelial cells and found lacking.

      The original system has the domain structure DHPH-tagRFP-SspB. But we wanted to work with a SspB-FP-GEF construct, which would allow easy exchange of the FP and the DHPH domain. This modular approach allowed us to generate and compare the mCherry, iRFP647 and HaloTag version. We don’t want to claim that we engineered an entirely new optogenetic tool but rather optimized an existing one with different tags. To make this more clear we added : ‘The membrane tag of the original iLID was changed to an optimized anchor. In addition, we modified the sequence of the domains to SspB, tag, GEF to simplify the exchange of GEF and genetically encoded tag. A set of plasmids with different fluorescent tags was created for more flexibility in co-imaging.’

      3) Comment on the reason behind using DHPH vs. DH domains for each GEF is needed.

      We have previously found (and this is supported by biochemical analysis of GEF activity) that the selected domains provide the best activity. We will add reference and the following to the text: ‘Their catalytic active DHPH domains were used for ITSN1 and TIAM1 (Reinhard et al., 2019).  In case of p63 the DH domain only was used, because the PH domain of p63 inhibits the GEF activity (Van Unen et al., 2015) (Fig. 1E).

      4) Since multiple Rho GTPases (e.g., RhoA, RhoB, RhoC) exist and Rho is used as the name of the GTPase family, please use RhoA where applicable for clarity.

      Since the RhoGEFp63 will activate RhoA/B/C we would rather not refer to RhoA only. We will clarify this in the text: ‘Three GEFs were selected, ITSN1, TIAM1 and RhoGEFp63, which are known to specifically activate respectively Cdc42, Rac and Rho and their isoforms.’

      5) A brief comment on the use of HeLa cells for protein engineering and characterization (versus the endothelial cells motivated in the introduction) may be helpful.

      We added the following to the text: ‘HeLa cells were used for the tool optimization because of easier handling and  higher transfection rate in comparison to endothelial cells.

      Minor suggestions:

      In figure 1C, line sections showing intensity profiles before and after protein dimerization might further emphasize the change in biosensor localization.

      We are not a fan of intensity profiles as the profile depends strongly on the position of the line and it basically turns a 2D image in 1D data, for a single image. So, we prefer to stick to the quantification as shown in panel 1B (which shows data from multiple cells).

      Reviewer #2 (Recommendations For The Authors):

      1)The study has analyzed the effects of light-induced activation of the three optogenetic constructs in endothelial cells on their barrier function (electrical resistance) at high cell density and correlated the findings with the cellular overlap-producing effects on endothelial cells cultured at sparse cell density. It should be tried to show these effects at a cell density where these light-induced effects increase electrical resistance. Lifeact with different chromophores in adjacent cells might be useful.

      We had attempted to measure the overlap in a monolayer by taking advantage of the Halotag and the variety of dyes available by staining one pool of cells red with JF 552 nm and the other far red with the JF 635 nm dye. However, the cells need at least 24 h to form a monolayer and by then they had exchanged the dye and red and far red pool could not be distinguished any longer.

      Therefore, we used the Lck-mTq2-iLID construct, which already marks the plasma membrane of the cells. We created a mosaic monolayer of cells expressing mScarlet-CaaX and cells expressing Lck-mTq2-iLID + SspB-HaloTag-TIAM(DHPH). We observed and increase in the overlap between cells under this condition. The results have been added to figure 4 - figure supplement 2I&J. To the text we added:

      'Additionally, cell-cell membrane overlap increased about 20 %, up on photo-activation of OptoTIAM, in a mosaic expression monolayer (figure 4 - figure supplement 2I,J, Animation 22)‘

      2) The authors correctly state that some reports have shown that S1P can increase endothelial barrier function in VE-cadherin independent ways and these are related to Rac and Cdc42. This was also shown for Tie-2 in vitro and even in vitro in the absence of VE-cadherin and should also be mentioned.

      We added the following to the text: ‘Not only S1P promotes endothelial barrier independent from VE-cadherin, also Tie2 can increase barrier resistance in the absence of VE-cadherin (Frye et al. 2015).

      Since a blocking antibody against VE-cadherin was used, a negative control antibody should be tested which also binds to endothelial cells.

      To visualize the cell-cell junctions in the experiment shown in Supplemental Fig 3.1, we added a non-blocking VE-cadherin antibody that is directly labeled with ALEXA 647 and shows normal junction morphology. These experiments already give an indication that the live labeling antibody of VE-cadherin does not disturb the junction morphology. However, when we added the blocking antibody against VE-cadherin, known to interfere with the trans-interactions of VE-cadherin, a rapid disruption of the junctions is observed.

      Additionally, previous work has shown, that VE-cadherin labeling antibody does not interfere with junction dynamics and function (see Figure 2.A, Kroon et al. 2014 ‘Real-time imaging of endothelial cell-cell junctions during neutrophil transmigration under physiological flow’, jove.). We have added the figures below, showing that addition of the control IgG and VE-cadherin 55-7H1 Abs at the timepoint where the dotted line is, did not interfere with the resistance whereas the blocking Ab drastically reduced resistance. We have added this reference to the results. ‘Previous work has shown the specific blocking effect of this antibody in comparison to the VE-cadherin (55-7H1) labeling antibody (Kroon et al., 2014).’

      Author response image 1.

      Reviewer #3 (Recommendations For The Authors):

      Additional comments for the authors:

      1) The introduction is very long and would benefit from a more concise emphasis on the information required to put the work and results in context and understand their importance.

      Comment: we appreciate the comment of the reviewer. However, we wish to introduce the topic and the tools thoroughly and therefore we chose to keep the introduction as it is.

      2) The N-terminal membrane-binding domain does not homogeneously translocate to the plasma membrane, since lck is a raft-associated kinase. Please comment on this.

      In our hands, the Lck is among the most selective and efficient tags for plasma membrane localization (https://doi.org/10.1101/160374). We do observe homogeneous translocation, but our resolution is limited to ~200 nm and so we cannot exclude that the Lck concentrates in structures smaller than 200 nm. Given the robust performance of the lck-based iLID anchor in the optogenetics experiments, we think that the Lck anchor is a good choice.

      3) Figure 1D is not very clear. What does 25 or 36% change mean? If iLID tg is conjugated to these sequences, its cytosolic localization should be reduced versus iLID alone. Is this what the graph wants to express? If so, please, label properly the ordinate axis in the graph (% of non-tagged iLID values?)

      The graph is representing the recruitment efficiency of SspB to the plasma membrane for the two different membrane tags, targeting iLID to the plasma membrane. The recruitment efficiency was measured by the depletion of SspB-mScarlet intensity in the cytosol, up on light activation, and represented as a change in percentage.

      We added the following to the title of the graph_: SspB recruitment efficiency for Plasma Membrane tagged iLID._

      4) Supplemental figures in the main text. Fig S1D in the text refers to data in Fig S1E and Fig S1E is supposed to be Fig S1F? (page 11).

      That is correct. The mistakes have been corrected (and this is now renamed to figure 1 - figure supplement 1E and 1F).

      5) Figure 3. Contribution of VE-cadherin. Other junctional complexes, such as tight junctions may also intervene. However, these results would also suggest that cell-substrate adhesion rather than cell-cell junctions may modulate the barrier properties, as it has been previously demonstrated for example by imatinib-mediated activation of Rac1 (Aman et al. Circulation 2012). The ECIS system used to measure TEER in the quantitative barrier function assays can modulate these measurements and discriminate between paracellular permeability (Rb) and cell-substrate adhesion (alpha). Please, provide whether the optogenetic modulation of these GTPases does indeed regulate Rb or alpha.

      The measured impedance is made up of two components: capacitance and resistance. At relatively high AC frequencies (> 32,000 Hz) more current capacitively couples directly through the plasma membranes. At relatively low frequencies (≤ 4000 Hz), the current flows in the solution channels under and between adjacent endothelial cells’ (https://www.biophysics.com/whatIsECIS.php).

      Therefore, the high frequency impedance is representing cell-substrate adhesion whereas the low frequency responds more strongly to changes in cell-cell junction connections.

      We only measured at 4000 Hz, representing the paracellular permeability. We chose a single frequency to maximize time resolution.

      We have added this extra comment to the legend of the figure: ‘(B) Resistance of a monolayer of BOECs stably expressing Lck-mTurquoise2-iLID, solely as a control (grey), and either SspB-HaloTag-TIAM1(DHPH)(purple)/ ITSN1(DHPH) (blue) or p63RhoGEF(DH) (green) measured with ECIS at 4000 Hz, representing paracellular permeability, every 10 s.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The manuscript by Hage et al. presents interesting results from a foraging behavior in Marmosets that explores the interactions of saccade and lick vigor with pupil dilation and performance as well as a marginal value theory and foraging theory-inspired value-based decision-making model thereof. The results are generally robust and carefully presented and analyses, particularly of vigor, are carefully executed.

      The authors constructed a model that makes two predictions: "In summary, this simple theory made two sets of predictions: in response to an increased cost of harvest, one should work longer, but move with reduced vigor. In response to an increased reward value, as in hunger, one should also work longer, but now move with increased vigor." Their behavioral data meets these predictions. It is not clear if the model was designed and tweaked in order to make those predictions and match the data, or derived from principles. Furthermore, it is not clear what other models would make similar predictions. It would help to assess what is predicted by other simple models, as well as different functional forms for the effort costs in their model.

      We chose this formulation of utility (Eq. 1) because it is a normative approach that ecologists have used to understand the decisions that animals make regarding how far to travel for food, what mode of travel to use, and how long to stay before moving on to another reward opportunity (Richardson and Verbeek 1986; Stephens and Krebs 1986; Bautista et al. 2001). In a typical formulation of the theory, the numerator represents the reward gained (in units of energy), minus the effort expended (also in units of energy). The denominator represents the amount of time spent during that behavior. We represented this idea in Eq. (1) with saccades that produced reward accumulation, and licks that produced reward consumption. Thus, the utility that we are trying to maximize is the rate of energy gained.

      The specific functions that we used to represent the energy acquired through reward acquisition, and the energy expended through effort expenditure, came a priori either from experiment design, or from the measurements we have made in other experiments. We modeled reward accumulation as a linear rise in energy stored because successful saccades produced a linear increase in the food cache. We modeled consumption of the food as a hyperbolic function of the number of licks to represent the fact that as the licking bout began, each successful lick depleted the food, and thus the first few licks produced a greater amount of food consumption than the last few licks. We modeled the effort cost of licking to grow linearly with the number of licks.

      A critical assumption that we made is that energy spent performing the saccade trials (which grew faster than linearly as a function of the number of trials attempted), grew faster than the time spent attempting those same trials (which grew linearly with the number of trials). This assumption is based on the heuristic that the average rate of energy lost following a large number of attempted trials is greater than the average rate of energy lost following a small number of attempted trials.

      Sensitivity to parameter values: The model’s simplicity provides closed-form solutions across all parameter values, allowing one to make predictions without having to fit the model to the measured data. For example, for all parameter values that produce a real solution (as opposed to imaginary), the optimal number of saccade trials increases with the square root of the cost of licking. Thus, the basic prediction of the model is that in order to maximize the capture rate, an increase in the effort that it takes to harvest the reward should produce a greater willingness to work longer, caching more food. The closed-form solutions are presented in the Mathematica supplementary document.

      Other models of utility: In composing our utility (Eq. 1), we chose to combine reward and effort additively. This is in contrast to other approaches in which effort discounts reward multiplicatively (47–49). Here, let us show that multiplicative interactions may have the limitation that they are incompatible with the observation that reward invigorates movements. To compare additive and multiplicative approaches, let us consider an arbitrary function 𝑈(𝑇) that specifies how effort varies with movement duration. Typically, this is a U-shaped function that describes energy expenditure as a function of movement duration, as in Shadmehr et al. (2016). In the case of multiplicative interaction between reward and effort, we can consider the following representation of utility:

      In the above formulation, reward 𝛼 is discounted hyperbolically with time, and an increase in reward increases the utility of the action. The optimum movement vigor has the duration 𝑇∗ that maximizes this utility. Notably, because increasing reward merely scales this utility, it has no effect on vigor. Thus, a utility in which reward is multiplied by a function of effort generally fails to predict dependence of movement vigor on reward.

      Line 37 page 6; the link of pupil to NE/LC is tenuous. Other modulators systems and circuits may be equally important and should be mentioned (e.g. Reimer, Jacob, Matthew J. McGinley, Yang Liu, Charles Rodenkirch, Qi Wang, David A. McCormick, and Andreas S. Tolias. "Pupil fluctuations track rapid changes in adrenergic and cholinergic activity in cortex." Nature communications 7, no. 1 (2016): 13289.)

      Reimer et al. (2016) used two-photon microscopy to measure activity of ACh and NE projections in layer 1 of mouse visual cortex while tracking pupil diameter fluctuations. During stillness, elevated pupil diameter was followed by cholinergic and noradrenergic axonal activity. Notably, NE activity levels were larger and with shorter latency than ACh. In primates, Joshi et al., (2016) recorded from LC during a fixation task. Using spike-triggered averaging, they found that following a spike in an LC neuron, there was pupil dilation at 200-300 ms latency. Moreover, microstimulation in LC produced pupil dilation at 500ms latency. More recently, Breton-Provencher and Sur (2019) provided causal evidence that LC activity drives pupil size. They optogenetically activated (1s) or silenced (5 sec) locus coeruleus noradrenergic neurons and found strong increase in pupil size or modest decrease: increase had a slow time scale of 1 second or more, similar slow timescale for decrease. The LC-NA neurons are surrounded by GABA-ergic neurons. Stimulation of the GABA-ergic neurons produced mild, slow constriction. They identified GABA-ergic and NA neurons by photo-tagging and then tried to identify them via spike shape and found that “spike shape of some GABA neurons were not well separated from NA neurons, demonstrating the difficulty of cell-type identification based on spike shape alone.” They noted that a subset of GABAergic neurons received coincident inputs with the NA neurons. When the GABA neurons were excited, the gain of the pupil response to an auditory tone was diminished, producing an increase as a function of tone intensity that had a lower gain. Thus, LC-NA neurons causally drive pupil size, and the GABA neurons that surround them control the gain of the response of LC-NA neurons to arousal stimuli.

      Line 35 page 6-page 7 line 10 emphasizes a cognitive interpretation of the pupil dilations that is emphasized, in relation to effort costs. But there are also more concomitant vigorous movements. Could all of their pupil results be explained by motor correlates? This should be tested and ruled out before making cognitive interpretations.

      Pupil dilation is a proxy for activity in the brainstem neuromodulatory system (Vazey et al., 2018) and is a measure of arousal (Mathot, 2018). Control of pupil size is dependent on spiking of norepinephrine neurons in locus coeruleus (LC-NE): an increase in the activity of these neurons produces pupil dilation (Joshi et al., 2016; Breton-Provencher and Sur, 2019). Some of these neurons show a transient change in their activity when acquisition of reward requires expenditure of physical effort (Bornert and Bouret, 2021). However, the link between effort costs and pupil size appears to go beyond motor control, as a recent paper found that pupil size increases during effortful speech perception (Contadini-Wright et al., 2023). Thus, although in our work increases in pupil size were always associated with increased movement vigor, the results from other studies suggest that economic variables such as cognitive effort in tasks in which there is no concomitant movement also drive an increase in pupil size.

      Page 7, line 37-42: How would the model need to be modified in order to account for this discrepancy with the data? Ideally, this would be tested.

      We comment on potential modifications that can be made to the model that may account for the discrepancy referred to by the reviewer in the discussion section: “Notably, some of the predictions of the theory did not agree with the experimental data. An increased effort cost did not accompany a reduction in the duration of harvest, and hunger did not increase saccade vigor robustly. Indeed, earlier experiments have shown that if the effort cost of harvest increases, animals who expend the effort will then linger longer to harvest more of the reward that they have earned (2). This mismatch between observed behavior and theory highlights some of the limitations of our formulation. For example, our capture rate reflected a single work-harvest period, rather than a long sequence. Moreover, the capture rate did not consider the fact that the food tube had finite capacity, beyond which the food would fall and be wasted. This constraint would discourage a policy of working more but harvesting less. Finally, if we assume that a reduced body weight is a proxy for increased subjective value of reward, it is notable that we observed a robust effect on vigor of licks, but not saccades. A more realistic capture rate formulation awaits simulations, possibly one that describes capture rate not as the ratio of two sums (sum of gains and losses with respect to sum of time), but rather the expected value of the ratio of each gain and loss with respect to time (Bateson et al., 1995 & 1996).”

      Page 9, line 2-11: In this section, it would help to also consider 'baseline' pupil size (in between trials). This would give a signal that is not 'contaminated' by movements, and may reflect control state. Relatedly, changes in control state may impact and confound the movement-related dilation magnitudes due to e.g. floor and ceiling effects on pupil size, which has a strong tendency for reversion to the mean.

      The experiment design included little or no between-trial periods because during the trials the subjects worked (performed saccades to accumulate reward), while after completing a few trials they stopped working and started harvesting through licking. Because primates make saccades during their entire wake state, it is probably not possible to find a significant period in which the subjects do not make any movements. We selected a window of 500 ms around each lick in the harvest period, and each saccade during the work period, and computed the average pupil size per movement, which includes data from both before and after movements. We then computed a within-session z-score by normalizing these measures by the average pupil size acquired for that day.

      The hunger-related and reward-size related analyses are both heavily confounded since they were not manipulated directly and could co-vary with many latent factors. For example, why might a given Marmoset be lower weight on a given day? Could it affect sleep, stress, activity, or other factors during the preceding 24 hours? If so, could these other variables be driving the results that are interpreted as 'hunger?' Relatedly, since the reward size is determined by the animals behavior on each trial (how much they worked), factors (internal brain state, external noises, etc.) that alter how much they worked will influence the subsequent reward size. Therefore interpretations about reward expectancy are confounded. Both of these issues should be discussed and manipulations of them (different feeding schedules and reward size-work functions proposed, respectively).

      Weight of the subjects was measured prior to the start of the experiment on each day. The natural fluctuations are typically the result of factors such as time of the experiment and corresponding weight measurement (AM vs PM) relative to the time of feeding on the previous day, day of the week of the experiment (following a weekend vs. during the week), and volume of food given during the previous day. Animals were maintained at 90% of their baseline weight during food restriction, and fluctuations typically occurred within that range (Sedaghat-Nejad et al., 2019). We used weight as a proxy for hunger, and thus value of reward, and the resulting analyses yielded results consistent with predictions made by our model, as seen in Fig. 5. Critically, other factors that may co-vary with lower weights, like those mentioned by the reviewer (sleep conditions, stress levels, and activity levels) often lead to very poor task performance by the subjects. In sharp contrast, the model predicted increased work period, and increased movement vigor for high reward value, both of which we observed when the subject’s weight was low. Thus, a low relative weight did not seem to impair performance, but rather act as a motivating factor. Subjects were closely monitored for well characterized stress-related behaviors and impaired attentive states by experimenters, veterinarian staff, and caretaker staff, and, in the event of abnormalities, were removed from food restriction and experimentation until behavior stabilized.

      Effect of reward size: As you noted, we did not manipulate reward size directly. Rather, because our emphasis was on quantifying the effect of effort, the subjects received the same increment of reward per each completed trial, but on some sessions this reward was easy to harvest, while in other sessions the reward required greater effort to harvest. Because the reward amount accumulated during the work period, some harvests encountered a small amount of reward, while other harvests encountered a large amount of reward. Indeed, the amount of reward available for harvest depended linearly on the number of successful saccade trials completed during the work period. We found that the vigor of licks grew with the reward magnitude.

      A major issue is a lack of alternative models. The authors seem to have constructed a particular model designed to capture the behavioral patterns they observed in the data. The model fails in some instances, as they point out. Even more importantly, there are no results or discussion about how other plausible models could or couldn't fit the data. The lack of model comparisons makes it difficult to interpret the conclusions or put the results in a broader context.

      To model behavior, we chose a formulation of utility that represented a normative approach that ecologists have used to understand the decisions that animals make regarding how far to travel for food, what mode of travel to use, and how long to stay before moving on to another patch. In the model, the objective of decisions and actions is to maximize the sum of reward acquired, minus the efforts expended, divided by time. This is termed the capture rate. However, there are other models to consider, and thus we added a new section titled Model formulation and Other models of utility.

      Reviewer #2 (Public Review):

      The model proposed in the paper takes a very specific functional form that is neither motivated by the previous literature nor particularly useful for indexing the behavioral tendencies of individual monkeys (or of the same monkey in different contexts). For example, while it is clear that the saccade effort cost will need to outgrow the increase in the utility of the accumulated food for the monkey to start feeding, it is unclear why this needs to be modeled with a fixed quadratic exponent on the number of saccades? Similarly, why do licks deplete the food stash with the specific rate hard-coded in the model?

      We added a section titled Model formulation and Other models of utility to better explain the rationale behind the model.

      We chose this formulation of utility (Eq. 1) because it is a normative approach that ecologists have used to understand the decisions that animals make regarding how far to travel for food, what mode of travel to use, and how long to stay before moving on to another reward opportunity (Richardson and Verbeek, 1986; Stephens and Krebs, 1986; Bautista et al., 2001). In a typical formulation of the theory, the numerator represents the reward gained (in units of energy), minus the effort expended (also in units of energy), while the denominator represents the amount of time spent during that behavior. We represented this idea in Eq. (1) with saccades that produced reward accumulation, and licks that produced reward consumption. Thus, the utility that we aim to maximize is the rate of energy gained.

      The specific functions that we used to represent the energy gained through reward acquisition, and the energy expended through effort expenditure, came either from experiment design, or from the measurements we have made in other experiments. We modeled reward accumulation as a linear rise in energy stored because successful saccades produced a linear increase in the food cache. We modeled consumption of the food as a hyperbolic function of the number of licks to represent the fact that as the licking bout began, each successful lick depleted the food, and thus the first few licks produced a greater amount of food consumption than the last few licks. We modeled the effort cost of licking to grow linearly with the number of licks.

      A critical assumption that we made is that energy expended performing the saccade trials (which grew faster than linearly as a function of the number of trials attempted), grew faster than the time spent attempting those same trials (which grew linearly with the number of trials). This assumption is based on the heuristic that the average rate of energy lost following a large number of attempted trials is greater than the average rate of energy lost following a small number of attempted trials. A quadratic function is one example of such a function, which has the advantage of providing closed form solutions for the optimal policy.

      The model’s simplicity provided closed-form solutions across all parameter values, allowing us to make predictions without having to fit the model to the measured data. Critically, for all parameter values that produce a real solution (as opposed to imaginary), the optimal number of saccade trials increases with the square root of the cost of licking. Thus, the basic prediction of the model is that to maximize the capture rate, regardless of parameter values, an increase in the effort required for harvest should be met with a greater willingness to work. The closed-form solutions are presented in the supplementary document (simulations.nb).

      Finally, the proportion of successful saccades and lick events is assumed to be fixed, even though it very likely to be directly influenced by movement speed (speed- accuracy trade-off), which is also contained in the model. It would strongly increase the plausibility and potential impact of the model if the authors could clearly state where these hard-coded model terms come from. Ideally, they would formulate the model in more general terms and also consider other functional forms, as briefly suggested in the discussion. This latter point would be particularly important since not all model predictions were actually borne out in the data.

      Thank you for this excellent suggestion. Regarding saccades, contrary to the speed accuracy trade-off hypothesis, we found that faster saccades were also more accurate (Fig. 3C). Thus, increased pupil size was not only associated with more vigorous saccades, but also more accurate saccades. Importantly, these vigor-related changes in accuracy were too small to affect the probability of reward: the reward area for the saccades was much larger (1.5 deg) than the endpoint accuracy changes that was produced due to changes in the food tube distance. For example, on average saccade vigor changed from 0.95 to 1.05 when the food tube distance changed from 12 mm to 8 mm. These changes in vigor would produce a fraction of degree reduction in endpoint error (Fig. 3C).

      Regarding licks, we added new data to the manuscript to assess the relationship between vigor of the licks and endpoint accuracy. We saw no consistent relationship, across subjects or effort conditions, between protraction speed and the outcome of a lick, that is, if the lick was successful in making it inside the tube. On average, in subject R we observed an improvement in lick accuracy with increased vigor, and in subject M we saw no change (Fig. 4F). Thus, we used the average success rate of licks, which was roughly 30% for both subjects.

      The authors derive qualitative predictions, by simulating their model with apparently arbitrary parameters. They then test these qualitative predictions with conventional statistics (e.g., t-tests of whether monkeys lick more for high vs low effort trials). The reader wonders why the authors chose this route, instead of formulating their model with flexible parameters and then fitting these to data. This would allow them (and future researchers) to test their model not just qualitatively but also quantitatively, and to compare the plausibility of different functional forms. The authors certainly have enough data and power to do this, given the vast number of sessions the monkey completed.

      The model’s simplicity provides closed-form solutions across all parameter values, allowing one to make predictions without having to fit the model to the measured data. For example, for all parameter values that produce a real solution (as opposed to imaginary), the optimal number of saccade trials increases with the square root of the cost of licking. Thus, the basic prediction of the model is that to maximize the capture rate, an increase in the effort that it takes to harvest the reward should produce a greater willingness to work longer, caching more food. The closed-form solutions are presented in the Mathematica supplementary document.

      The effort manipulation chosen by the authors (distance of food tube) goes hand in hand with a greater need for precision since the monkey's tongue needs to hit an opening of similar size, but now located at a greater distance. This raises the question of whether the monkeys moved slower to enhance its chance of collecting the food (in line with a speed-accuracy trade off). The manuscript would benefit from an explicit test of this possibility, for example by reporting whether for each of the two conditions, the speed of tongue movements on a trial-by-trial basis predicts the probability of food collection? At the very least, the manuscript should explicitly discuss this issue and how it affects the certainty with which effects of tube distance can be linked to anticipated effort cost alone.

      Thank you for the excellent point. We looked for but found no consistent relationship, across subjects or effort conditions, between protraction speed of the tongue and the success probability of a lick (probability of insertion into the tube). Regardless, we agree with you that it is an excellent alternate hypothesis that reductions in lick vigor that accompanied increased distance of the tube may be due to a desire to maintain accuracy, and not a reflection of increased effort cost of reward. To incorporate this idea into the model, we would need a measure of speed-accuracy for the licks, something that we do not have but hope to develop in the future.

      However, perhaps the most interesting aspect of our results is that when we increased tube distance, making reward more effortful, there was not only a reduction in lick vigor, but also a reduction in saccade vigor. That is, the decisions and actions during the work period responded to the increased effort cost of reward during the harvest period. These changes accompanied dilation of the pupil, both in the work period and in the harvest period. We now include a paragraph regarding this in the Discussion.

      The manuscript measures pupil dilation in a time period ranging from -250ms before to 250 ms after saccade onset. However, the pupil changes strongly during saccade execution relative to the preceding baseline, leaving doubts as to whether the aggregated measure blurs several interesting and potentially different effects. It would be more conclusive if the manuscript could report the analyses of pupil size separately for a period prior to saccade onset and during/after the saccade.

      Our goal was to test for general correlations between the state of the pupil and both movement vigor and decisions. We chose a window of 500 ms around saccade onset, as referred to by the reviewer, as it allowed us a large enough time window to measure pupil size outside of the movement itself (~30 ms duration), to accurately capture the state of the animal around initiation and end of a saccade. Critically, pupil tracking during a saccade itself, when using infrared eye tracking techniques, can be prone to slight measurement error in certain cases due to tracking jitter. Thus, averaging across this window, following processing of the signal, results in a more accurate measure of pupil size.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      This manuscript assesses the differences between young and aged chondrocytes. Through transcriptomic analysis and further assessments in chondrocytes, GATA4 was found to be increased in aged chondrocyte donors compared to young donors. Subsequent mechanistic analysis with lentiviral vectors, siRNAs, and a small molecule was used to study the role of GATA4 in young and old chondrocytes. Lastly, an in vivo study was used to assess the effect of GATA4 expression on osteoarthritis progression in a DMM mouse model.

      Strengths:

      This work linked the overexpression of GATA4 to NF-kB signaling pathway activation, alterations to the TGF-b signaling pathway, and found that GATA4 increased the progression of OA compared to the DMM control group. This indicates that GATA4 contributes to the onset and progression of OA in aged individuals.

      The authors thank the reviewer for reviewing our manuscript and providing insightful comments.

      Weaknesses:

      (1) A couple of sentences should be added to the introduction, to emphasize the role GATA4 plays, such as the alterations to the TGF-b signaling pathway and the increased activation of the NF-kB pathway. 

      As suggested, we have expanded on these signaling pathways in the Introduction to highlight the known functions of GATA4. Importantly, there was no previous study reporting the roles of GATA4 in regulating TGF-β pathway.

      “Many growth factors contribute to the chondro-supportive environment in the knee joint. Particularly, transforming growth factor-b (TGF-b) plays a key role in maintaining chondrocytes and replenishing ECM loss. However, during OA, TGF-b can induce catabolic processes in chondrocytes, resulting in matrix stiffening, osteophytes, and chondrocyte hypertrophy.[10-12]” (Lines 80-84)

      “Mechanistically, upregulation of GATA4 was shown to increase nuclear factor-kB (NF-kB) pathway activation.[14,15]  NF-κB is thought to amplify and potentially propagate cellular senescence during the aging process through the senescence-associated secretory phenotype (SASP), which could contribute to a low-grade state of chronic inflammation.[16]” (Lines 99-102)

      “When GATA4 was over expressed, we found that there were alterations to the TGF-b signaling pathway and activation of the NF-kB signaling pathway.” (Lines 106-108)

      (2) Figure 1F, the GATA4 histology image should be bigger.

      We have now increased the size of the image in revised Figure 1F.

      (3) Further discussion should be conducted regarding the reasoning as to why GATA4 increases the phosphorylation of SMAD1/5. 

      Thank you. The underlying mechanism of GATA4 activating SMAD1/5 has not been previously investigated. We have now elaborated on this in the discussion and have added more relevant publications.

      “Our study indicated that there was an observed decrease in chondrogenesis and an increase in hypertrophy-related genes following GATA4 overexpression (Figure 2G).” (Lines 572-574)

      “These previous studies and literature review inspired us to explore the potential association between GATA4 levels and the activation of SMAD1/5.” (Lines 587-588)

      “In this study, it was shown that GATA4 was necessary for bone morphogenic protein-6 (BMP-6) mediated IL-6 induction, in which there are multiple GATA binding domains on the IL-6 promoter. This work further showed that GATA4 interacts with SMAD 2,3 and 4.[55] Studies have suggested that BMP pathways and GATA4 work synergistically to regulate SMAD signaling.56 This information indicates that the involvement of GATA4 in the TGF-b signaling pathway is complex and further studies should be conducted to better assess this relationship.” (Lines 594-599)

      (4) More information should be included to clarify why GATA4 is thought to be linked to DNA damage and the pathway that is associated with that. 

      We have now included further information in the discussion to clarify the association between DNA damage and GATA4 upregulation.

      “The study by Kang et al. demonstrated that the suppression of p62 following DNA damage leads to GATA4 accumulation due to the lack of autophagy.13 DNA damage is known to increase with age.71 Therefore, we believe that DNA damage due to aging is a key driver of the upregulation of GATA4 in old chondrocytes.” (Lines 642-646)

      (5) Please add further information regarding the limitations of the animal study conducted in this work and future plans to assess this. 

      We have included more limitations of the animal study that was conducted in this work and have expanded on the future plans to use inducible GATA4 expression in transgenic mouse lines to study the role of GATA4 overexpression in OA onset and progression.

      “Third, during our in vivo work, the intraarticular injection of GATA4 lentivirus was not chondrocyte-specific. Therefore, the injection also allowed for other cell types to overexpress GATA4. Future work should be conducted using transgenic mouse lines for cartilage-specific inducible overexpression or depletion of Gata4 to further investigate the role of GATA4 in chondrocytes.” (666-670)

      (6) In Figure 5, GATA4 should be changed to Gata4 in the graphed portions for consistency. 

      Thanks. We have made the necessary adjustments throughout the manuscript.

      Reviewer #2 (Public review):

      (1) While it is convincing that GATA4 expression is elevated in elderly individuals, and that it has a detrimental impact on cartilage health, the authors might want to add further discussion on the variability among individual human donors, especially given the finding that the elevation of GATA4 was not observed in chondrocytes from donor O1 (Figure 1G).

      The authors thank the reviewer for reviewing our manuscript and providing insightful comments.

      As suggested, we have included more discussion on the variability among donors.

      “Although we found that GATA4 was generally increased with aging, some young donors also exhibited increased levels of GATA4, which may be associated with increased DNA damage, as discussed above, or other stressors. Therefore, GATA4 should be used together in conjunction with other aging biomarkers, such as the epigenetic clock [72] to precisely define chondrocyte aging. Future work should examine biological versus chronological aging and epigenetic clock-based assessments to explain the variabilities in GATA4 expression among donors.” (Lines 658-663)

      (2) It might also be worth adding additional discussion on the interplay between senescent chondrocytes and the dysfunctional ECM during aging. As noted by the authors, aging is associated with decreased sGAG content and likely degenerative changes in the collagen II network, so the microniche of chondrocytes, and thus cell-matrix crosstalk through the pericellular matrix, is also altered or impaired. 

      Thank you for this comment. We have included more discussion on the interplay of chondrocyte senescence and dysfunctional ECM during aging, with a specific focus on the microniche of chondrocytes.

      “Additionally, a common hallmark of chondrocyte aging is the alternation of ECM, including composition change [2] and stiffening.[57] ECM stiffness can directly affect chondrocyte phenotype and proliferation, and contribute to OA.[58] A recent study by Fu et al. associated matrix stiffening with the promotion of chondrocyte senescence.[59] Furthermore, matrix stiffening has been associated with modulating the TGF-b signaling pathway.[60-62] Future studies should investigate the potential of matrix stiffening and the effect of GATA4 on pericellular matrix proteins such as decorin[63,64], biglycan, collagen VI and XV, as these proteins assist with the regulation of biochemical interactions and assist with the maintenance of the chondrocyte microenvironment.[65] Herein, the TGF-b signaling pathway can further alter the extracellular microenvironment[62], which could promote cellular senescence and subsequently NF-kB pathway activation.” (Lines 600-610)

      (2) If applicable, please also add Y3 and O3 to Figure S1 for visual comparison across individual donors. 

      As suggested, we added Y3 and O3 to the revised Figure S1 for more visual comparisons across individual donors.

      (3) Figure 3C, the molecular weight labels are off. 

      Thanks. We corrected this mistake.

      (4) Line 438 - Please clarify in text that the highest efficiency of siRNA chosen was siRNA2. 

      As suggested, we added the reason for selecting siRNA2.

      “Several GATA4 siRNAs were tested, and the one with the highest efficiency was selected based off RT-qPCR results, which indicated that siRNA2 treatment induced lowest expression of GATA4.  (Supplementary Figure S6).” (Lines 448-450)

      (5) Did the authors test the timeline of sustained knockdown of GATA4 by siRNA?

      We used a 7-day timepoint of chondrogenesis, and RT-qPCR results demonstrated that there was a downregulation of GATA4 expression at this timepoint (Figure 4). In the current in vitro study, we did not examine the efficacy of GATA4 siRNA for longer than 7 days.

      Reviewer #3( Public review):

      (1) It would be useful to explain why GATA4 was chosen over HIF1a, which was the most differentially expressed. 

      The authors thank the reviewer for reviewing our manuscript and providing insightful comments.

      When we first saw the results, we did consider studying the role of HIF1a in aging because it was the most differentially expressed. When we reviewed the relevant literature, we found that HIF1a was commonly upregulated in aged individuals which was thought to be linked to hypoxia and increased oxidated stress (PMID: 12470896, PMID: 12573436). Further investigation found studies that investigated HIF1a in chondrocytes and the use of in vivo work to investigate its role in osteoarthritis (PMID: 32214220). Indicating that HIF1a plays a protective role during OA by suppressing the activation of NF-kB pathway.  Moreover, there is work that has been conducted assessing the stabilization of HIF1a by regulating mitophagy and using HIF1a as a potential therapeutic target for OA (PMID: 32587244). Since there have been many studies investigating the correlation of HIF1a expression and OA, we felt that it would be more innovative to look at other molecules, such as GATA4. Moreoever, as we highlighted in the Introducion and Disucussion, through testing in cell types other than chondrocytes, GATA4 was shown to be associated with DNA damage and senescence, which are both aging hallmarks. Given the fact that roles of GATA4 in chodnrocytes had not been previous studies, we thus chose GATA4 in this study. 

      “Of note, Hypoxia-Inducible Factor 1a (HIF1a) was the most differentially expressed gene predicted to regulate chondrocyte aging. The connection between HIF1a and aging has been previously reported.32 Furthermore, additional studies have investigated HIF1a in association with OA and assessed its use as a therapeutic target.[33,34] Therefore, we decided to focus on GATA4, which was less studied in chondrocytes but highly associated with cellular senescence, an aging hallmark. However, our selection did not dampen the importance of HIF1α and other molecules listed in Figure 1D in chondrocyte aging. They can be further studied in the future using the same strategy employed in the current work.” (Lines 526-533)

      (2) In Figure 5, it would be useful to demonstrate the non-surgical or naive limbs to help contextualize OARSI scores and knee hyperalgesia changes. 

      Thank you for your comment. Based on prior experience, the OARSI score of mice in the sham group had an OARSI score ranging from 0-0.5. In the current study, we focused on the DMM control and DMM Gata4 virus groups so we did not include a sham control group. We recognized this was a limitation of this study.

      “We measured the naive limbs for knee hyperalgesia before DMM surgery, and found the average threshold was 507g. We have highlighted the threshold measurement in the figure legend.507 g was the threshold baseline for non-surgery mice (dashed line).” (Lines 499-500)

      (3) While there appear to be GATA4 small-molecule inhibitors in various stages of development that could be used to assess the effects in age-related OA, those experiments are out of scope for the current study. 

      We agree with this comment that the results are still preliminary, which was the reason that we put it in the supplementary materials. However, we felt like the result is informative, which will support the potential of GATA4 as a therapeutic target and inspire the development of more specific inhibitors. Therefore, if the reviewer agrees, we want to keep the results in the current study.

      In particular, our in vitro study demonstrated the potential of using small-molecule GATA4 to enhance the quality of cartilage created by old chondrocytes. We can validate the findings in vivo, as well as develop other GATA4 inhibitors. (Lines 673-675)

      (4) Is GATA4 upregulated in chondrocytes in publicly available databases? 

      Thank you for this question. We have examined the public databases and have found that there is data showing the trend that GATA4 is upregulated in aged or OA chondrocytes in work conducted by Ungethuem et al (PMID: 20858714). In one study by Ramos et al. (PMID: 25054223), we noticed that GATA4 expression levels were the same in both young and old groups, which may be due to the relatively smaller sample size in the young group compared to old group (4 vs 26).

      Work Conducted by Grogan et al. (Unpublished https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE39795)

      Author response image 1.

      Author response image 2.

      Work conducted by Ramos et al. (PMID: 25054223).<br />

      Author response image 3.

      Work conducted by Ungethuem et al (PMID: 20858714).<br />

      (5) In many cases, the figure captions describe the experiment vs. the outcome. It may be more compelling to state the main finding in the figure title, and you might consider changing it from what is stated at present. For example, Figure 2: instead of the impact of overexpression, you may say GATA4 overexpression impairs cartilage formation (as stated in the results).

      Thanks for the suggestion. We have made the following changes to the figure captions as suggested.

      Figure 1: GATA4 is upregulated in aged chondrocytes (Line 373)

      Figure 2: Overexpressing GATA4 impairs the hyaline cartilage formation capacity of young chondrocytes (Lines 408-409)

      Figure 3: GATA4 overexpression activates SMAD1/5  (Line 436)

      Figure 4: Suppressing GATA4 in old chondrocytes promotes cartilage formation and lowers expression of proinflammatory cytokines (Line 467)

      Figure 5: Gata4 overexpression in the knee joints accelerates OA progression in mice. (Line 593)

      (6) It would be useful to provide a little more information about the human tissue donors, if that is available. 

      We have provided more information about the tissue donors in the revised Supplementary Table S1.

      (7) While aging-like changes were observed in young chondrocytes with GATA4 overexpression, it would be interesting to directly evaluate if there is a change in biological versus chronological age in these tissues. Companies like Zymo can provide this biological v chronological age epigenetic clock-based assessments if that is of interest, to say the young chondrocytes are looking "older". 

      Thank you for this information. We agree that it will be important to assess epigenetic changes in GATA-overexpressing cells. We are contacting the company to learn more about their technology. Meanwhile, we added this to the future work section of the manuscript.

      “Although we found that GATA4 was generally increased with aging, some young donors also exhibited increased levels of GATA4, which may be associated with increased DNA damage, as discussed above, or other stressors. Therefore, GATA4 should be used together in conjunction with other aging biomarkers, such as the epigenetic clock [72] to precisely define chondrocyte aging. Future work should examine biological versus chronological aging and epigenetic clock-based assessments to explain the variabilities in GATA4 expression among donors.”  (Lines 658-663)

      (8) It is not clear the age at which the mice received DMM in the methods, but it is shown in Figure 5. 

      We have added the age at which the mice received the DMM surgery to the methods section.

      “Intraarticular injections were administered to mice between 10-12 weeks of age under general anesthesia to safeguard the well-being of the animals and to minimize procedural discomfort.” (Line 300)

      “One week after viral vector injection, DMM surgery was performed to induce the OA model on mice 11-13 weeks of age.” (Line 312-313)

      (9) It is not clear which factors were assayed using Luminex, and it would be great to add. 

      Thank you for this comment, we have added a comprehensive list of proteins assessed using Luminex into a new supplementary table 6 (S6).

      (10) Also interesting, loss of GATA4 seems to prevent diet-induced obesity in mice and promote insulin sensitivity (potentially via GLP-1 secretion). I wonder if there may be a metabolic axis here too? PMID: 21177287. I may have missed parts of the discussion of the role of GATA4 in metabolism, but it might be an interesting addition to the discussion. 

      In the current study, we have not investigated the role of GATA4 in obesity. As suggested, we have included a discussion of GATA4 in metabolism.

      “Furthermore, GATA4 might be associated with metabolic regulation. A study conducted by Patankar et al. investigated how GATA4 regulates obesity. Specifically, they used intestine-specific Gata4 knockout mice to study diet-induced obesity, showing that the knockout mice were resistant to the high-fat diet, and that glucagon-like peptide-1 (GLP-1) release was increased. These findings indicated a decreased risk for the development for insulin resistance in knockout mice.[44] This work was taken a step further in a subsequent publication, in which the same team investigated the dietary lipid-dependent and independent effects on the development of steatosis and fibrosis in Gata4 knockout mice. The results from this work suggested that the knockdown of Gata4 increases GLP-1 release, in turn suppressing the development of hepatic steatosis and fibrosis, ultimately blocking hepatic de novo lipogenesis.[45] These studies are especially interesting with the rise of GLP-1 based therapy for the treatment of OA.46,47 Thus, the coupling of GATA4-related metabolic dysfunction and OA should be further investigated.” (Lines 542-553)

      (11) Another potential citation: GATA4 regulates angiogenesis and persistence of inflammation in rheumatoid arthritis PMID: 29717129 - around the inflammatory axis potential in OA? since GATA4 was reported in FLS from OA- PMC11183113.

      Thank you. We have included this work/citation in the discussion section.\

      “Further studies have shown that GATA4 regulates angiogenesis and inflammation in fibroblast-like synoviocytes in rheumatoid arthritis, indicating that GATA4 is required for the inflammation induced by IL-1b. This study also demonstrated that GATA4 binds to promoter regions on Vascular Endothelial Growth Factor (VEGF)-A and VEGFC to enhance transcription and regulate angiogenesis.[15]”  (Lines 558-562)

    1. Author response:

      The following is the authors’ response to the original reviews.

      We appreciate your comments and suggestions on our manuscript.

      In particular, we have measured the affinity between the middle tail domain of myosin-5a (Myo5a-MTD) and the actin-binding domain of melanophilin (Mlph-ABD) using microscale thermophoresis, and obtained the Kd of ~0.56 uM, which is similar to the Kd of the globular tail domain of myosin-5a (Myo5a-GTD) to the GTD-binding motif of melanophilin (Mlph-GTBM). Moreover, we have performed Western blot of the lysate of transfected cells, showing that the proteins of the dominant negative construct and the negative control were expressed at similar lever without noticeable degradation.

      We appreciate the editors’ and reviewers’ comment on how melanophilin might be regulated in binding to the exon-G of myosin-5 and to actin filaments. Phosphorylation of melanophilin by protein kinase A is one possible mechanism. We will investigate this issues in our future study.

      We also took this opportunity to correct several minor errors in the manuscript. Textual alterations can be viewed in the “tracked change” version of the manuscript. Below is the comments from the editors and the two reviewers together with our point-by-point responses.

      eLife assessment

      This study represents a useful description of a third interaction site between melanophilin and myosin-5a which is important in regulating the distribution of pigment granules in melanocytes. While much of the data forms a solid case for this interaction, the inclusion of important controls for the cellular studies and measurement of interaction affinities would have been helpful.

      Public Reviews:

      Reviewer #1 (Public Review):

      Interactions known to be important for melanosome transport include exon F and the globular tail domain (GTD) of MyoVa with Mlph. Motivated by a discrepancy between in vitro and cell culture results regarding necessary interactions for MyoVa to be recruited to the melanosome, the authors used a series of pull-down and pelleting assays experiments to identify an additional interaction that occurs between exon G of MyoVa and Mlph. This interaction is independent of and synergistic with the interaction of Mlph with exon F. However, the interaction of the actin-binding domain of Mlph can occur either with exon G or with the actin filament, but not both simultaneously. These data lead to a modified recruitment model where both exon F and exon G enhance the binding of Mlph to auto-inhibited MyoVa, and then via an unidentified switch (PKA?) the actin-binding domain of Mlph dissociates from MyoVa and interacts with the actin filament to enhance MyoVa processivity.

      The only weakness noted is that the authors could have had a more complete story if they pursued whether PKA phosphorylation/dephosphorylation of Mlph is indeed the switch for the actin-binding domain of Mlph to interact with exon G versus the actin filament.

      We thank Reviewer #1 for careful reading of the manuscript and appreciation of the study. We agree with the Reviewer that it is important to understand how the actin-binding domain of Mlph switch its interaction with the exon-G of Myo5a and actin filament. We would like to pursue this direction in our future research.

      Reviewer #2 (Public Review):

      The authors identify a third component in the interaction between myosin Va and melanophilin- an interaction between a 32-residue sequence encoded by exon-g in myosin Va and melanophilin's actin-binding domain. This interaction has implications for how melanosome motility may be regulated.

      While this work is largely well done and certainly publishable following needed revisions (e.g. some affinity measurements, necessary controls for the dominant negative experiments), I believe that additional work would be required to make a more compelling case. First, the study provides just one more piece to a well-developed story (the role of exon-F and the GTD in myosin Va: melanophilin (Mlph) interaction), much of which was published 20 years ago by several labs. Second, the study does not demonstrate a physiological significance for their findings other than that exon-G plays an auxiliary role in the binding of myosin Va to Mlph. For example, what dictates the choice between Mlph's actin binding domain (ABD) binding to actin or to exon-G. Is it a PTM or local actin concentration? It is unlikely to be alternative splicing as exon-G is present in all spliced isoforms of myosin Va. And what changes re melanosome dynamics in cells between these two alternatives? Similarly, the paper does not provide any in vitro evidence that binding to exon-G instead of actin effects the processivity of a Rab27a/Myosin Va/Mlph transport complex. For example, if the ABD sticks to exon-G instead of actin, does that block Mlph's ability to promote processivity through its interaction with the actin filament during transport? In summary, given that the authors did not directly test their model either in vitro or in cells, I do not think this story represent a significant conceptual advance.

      We thank Reviewer #2 for careful reading of the manuscript and the suggestions of improving the manuscript. As suggested by the reviewer, we have measured the affinity between the middle tail domain of Myo5a (Myo5a-MTD) and Mlph-ABD (Kd ~0.562 uM), which is similar to that between the globular tail domain of Myo5a (Myo5a-GTD) and the GTBM of Mlph. In addition, we have performed additional experiments showing the integrity and the expression level of the dominant negative constructs in the transfected cells.

      We believe more extensive experiments are required to address other questions raised by the reviewer. For example, what dictates the choice between Mlph's actin binding domain (ABD) binding to actin or to exon-G is an open question. As we proposed, phosphorylation by protein kinase A is only one possible mechanism. We would like to pursue them in our future research.

      Recommendations for the authors:

      The reviewing editor feels strongly that addressing some of the points raised by the reviewers would make this a more compelling manuscript. In particular, a measurement of the affinity of the relevant fragments from melanophilin and myosin-5a would indicate that the interaction might be physiologically relevant. Concerning the dominant negative experiments, the lack of effect of an expressed fragment could be that the expressed fragments were simply degraded or expressed at too low of a level to be competing. The reviewer gives guidelines on how to address this. Reviewer #2 made a point that it would be compelling if the effect of phosphorylation as suggested in the model was tested, but we all agree that this could well be the subject of a later study. In addition, the authors make a very interesting proposal for how protein kinase A could be involved in this regulation as has been suggested previously. Perhaps the use of phosphomimetic mutations could give some insight into this. Such experiments, if consistent with the proposed model would certainly raise the impact of this study. Finally, a very clear periodicity in hydrophobic amino acids is apparent in the interacting sequences of both Myo5 (yrisLykrMidLmeqLekqdktVrkLkkqLkvFakkIgeLevgqmen) and Mlph (tdeeLseMedrVamtAseVqqAeseIsdIesrIaaLra). This is strongly suggesting a leucine-zipper-like coiled coil, rather than an interaction mediated solely by charge. Recent softwares (and easily accessible too) like AlphaFold multimer might yield important structural insight into the binding configuration and might help rationalize the effect of the mutations herein.

      We thank the editors and the reviewers for their suggestions of improving the manuscript. We have performed the several essential experiments to address the concerns raised by the reviewers.

      (1) Regarding the affinity of the relevant fragments from melanophilin and myosin-5a. We have measured the affinity between Mlph-ABD and Myo5a-MTD using MST (Kd ~562 nM) (see revised Figure 3A).

      (2) Regarding the concerns on the dominant negative experiments. We have examined the molecular sizes and expression levels of  Mlph or Myo5a constructs by Western blots. First, we show that all constructs have correct molecular size in transfected cells (see revised Figure 6C and 7D), indicating that the inability of Myo5a or Mlph truncations to generate dilute-like phenotypes was not due to the intracellular degradation of the EGFP fusion protein. Second, by correcting for the percentage of transfected cells, we show that the overall expression levels of the wild-type construct and the mutants are roughly equal. Third, we categorized the expression levels into high and low, and calculated percentage of the DN phenotype in high and low expression levels. The results are consistent with the percentage of DN phenotype in total EGFP fusion protein cells.

      (3) Regarding the suggestion to investigate the effect of phosphorylation by protein kinase A on Mlph-ABD’s interaction with Myo5a and actin filament. We understand that it is important to elucidate the mechanism by which the actin-binding domain of Mlph switch its interaction with the exon-G of Myo5a and actin filament. However, as we proposed, phosphorylation by protein kinase A is one possible mechanism, and more extensive experiments are required to address this question. Therefore, we would like to pursue it in our future research.

      (4) Regarding the suggestion to predict the interaction between the exon-G of myosin-5a and Mlph-ABD using AlphaFold. We have used AlphaFold multimer to predict the Myo5a-MTD/Mlph-ABD interaction. Remarkably, the AlphaFold predicted that the binding of Myo5a-MTD with Mlph-ABD is mediated by an antiparallel coiled-coil formed by Myo5a (1430-1467) and Mlph (450-481), just as predicted by the editors. This prediction is also consistent with our finding that the exon-G of Myo5a interacts with Mlph-ABD. However, the predicted model cannot explain our mutagenesis results. We will pursue this point in the future research. Nevertheless, we are grateful to the editors for bringing this idea to our attention, because it will help us to design experiments to investigate the nature of Myo5a-exon-G/Mlph-ABD interaction.

      Reviewer #1 (Recommendations For The Authors):

      Specific minor comments

      Q1: In figs 6-7 an overlay between DAPI and EGFP would be helpful for the reader to see perinuclear distribution.

      As suggested, we have added the merged images of DAPI and EGFP in the revised Figure 6 and 7.

      Q2: The delta symbol in the pdf text was corrupted.

      The corrupted delta symbol has been fixed in the revised manuscript.

      Reviewer #2 (Recommendations For The Authors):

      Q1: Please explain in detail early in the text what exon-G is - length, position in the tail, and evidence that it is a coiled coil (CC). Of note, is it only long enough for about 4 heptad repeats? Has it been shown biochemically to form a CC? Is the CC irreversible? What would be the consequence of removing the exon-G CC on the ability of surrounding regions to bind Mlph (exon-F and the GTD)?

      We thank the reviewer for this suggestion. In the revision, we added a new paragraph (the first paragraph in the results section) and revised Figure 1A to introduce the middle tail domain and alternatively spliced exons of Myo5a.

      Exon-G is 32 amino acids in length, located at the C-terminal region of the middle tail domain, immediately before the globular tail domain. Exon-G region was predicted to form a short coiled-coil by using on-line tools (such as paircoil), and this prediction has not been tested biochemically. Moreover, we do not know whether the exon-G coiled-coil is reversible or not.

      We have not examined the effect of removing the whole exon-G on the interaction between the GTD and Mlph-GTBM. The exon-G (residues 1436-1467) and the GTD core (residues 1498-1877) are separated by a long loop of 31 residues. We therefore expect that the removing the exon-G will not affect the GTD/Mlph-GTBM interaction.

      Physically, exon-F is immediately followed by exon-G, and those two regions might interfere with each other. In our preliminary study, we found that removing the whole exon-G abolished the interaction between exon-F and Mlph-EFBD. On the other hand, removing the C-terminal half (residues 1454-1467) of exon-G had little effect the interaction between exon-F and Mlph-EFBD (see Figure 2C). In this work, we intentionally selected the later construct for functional analysis of the exon-G/Mlph-ABD interaction, because removing the C-terminal half of exon-G abolishes the interaction with Mlph-ABD, but does not affect the exon-F/Mlph-EFBD interaction.

      Q2: Figures 1-3. While the pulldown experiments demonstrating an interaction between Mlph-ABD residues 446-571 and Myo5a-MTD are a good start, one would like to see affinity measurements to gauge the likelihood that this interaction is physiologically relevant. The same goes for the pulldown experiments demonstrating an interaction between (i) the C-terminal half of exon-G (residues 1453-1467) and the Mlph-ABD, (ii) between residues 1411-1467 (a short peptide containing exon-F and exon-G) and the Mlph-ABD, and (iii) between residues 1436-1467 (a short peptide containing exon-G) and the Mlph-ABD. This would also apply to the pulldowns in 3C-3E where versions of the proteins with charge residue changes were tested.

      We agree the reviewer’s opinion that determination of the affinities between Mlph-ABD and Myo5a-MTD and their variants will be helpful in understanding the physiological relevance of Exon-G/Mlph-ABD interaction. However, the extensive experiments suggested by the reviewer require many high quality, purified proteins, which are not trivial.

      Nevertheless, we think it is important to know the affinity between Myo5a-MTD and Mlph-ABD (both wild-type), as this parameter can be used for the comparison of the three interactions between Myo5a and Mlph. Therefore, we have obtained the affinity between Myo5a-MTD and Mlph-ABD using microscale thermophoresis (MST). The dissociation constant (Kd) of Myo5a-MTD to Mlph-ABD is 0.562±0.169 uM, which is similar to that between Myo5a-GTD and Mlph-GTBM (~1 uM) (Geething & Spudich (2007) JBC 282:21518). Consistent with GST pulldown results, MST shows that deletion of C-terminal half of exon-G (1453-1467) greatly decreases the MST signals (see revised Figure 3A).

      Q3: While the domain negative (DN) approach to testing functional significance is OK, rescuing dilute/myosin Va null melanocytes with full-length myosin Va containing the various deletions would have been more convincing. Also, the authors must show (i) that the DN constructs are the correct size in transfected cells (i.e. are not degraded), and (ii) that they are expressed at roughly equal levels (either by doing Westerns and correcting for the percent of transfected cells, or by measuring total cellular fluorescence in transfected cells). Without this information, it remains possible that constructs not exhibiting a DN effect are simply degraded or poorly expressed. This applies to all the DN data in Figures 6 and 7.

      We agree with the reviewer that Myo5a null melanocytes is ideal for investigating exon G function. Unfortunately, we do not have Myo5a null melanocytes derived from dilute mice.

      To confirm the integrity of the overexpressed proteins in the transfected cells, we performed Western blot of those proteins, including  EGFP-Mlph-RBD (wild-type and two mutants) and Myo5a-Tail (wild-type and G mutant), in the lysate of the transfected cells. Western blots show that all those proteins have correct molecular masses, indicating no degradation of those overexpressed proteins (see revised Figure 6C and 7C). Moreover, by correcting for the percentage of transfected cells, we show that the overall expression levels in each transfected cell of the wild-type construct and the mutants are roughly equal. This information is included in the revised manuscript (Line 222-225; 237-241).

      Q4: The authors scored the DN phenotype as yes/no but it mostly likely varies depending on the degree of over-expression. Showing that the degree of melanosome centralization scales with the degree of overexpression, and that the correlation between expression level and phenotype varies depending on the construct would strengthen the results.

      We agree with the reviewer’s prediction that the degree of DN phenotype should depend on the of over-expression level. We analyzed the EGFP signals of transfected cells and found very few cells with medium expression level. Therefore, we simply categorized the expression levels into high and low, and calculated the DN phenotype in each categories as shown in the table below. These results are consistent with the expectation that the degree of DN phenotype depends on the over-expression level of the transfected constructs.

      Author response table 1.

      Percentage of the EGFP-expressing cells with perinuclear aggregation of melanosomes

      Q5: The conclusion from the data in Figure 8A- "the presence of both exon-F and exon-G is insufficient for binding to the Mlph occupied by Myo5a, but sufficient for binding to the unoccupied Mlph"- should be verified by also doing the experiment in myosin Va knockdown cells.

      We agree. Unfortunately, our RNAi knockdown of Myo5a in melanocytes by RNAi is not ideal and we do not have Myo5a knockout melanocytes. We will pursue this point in the future.

      Q6: Line 213 "three Mlph-binding regions, i.e., exon-F, exon-F, and GTD (Figure 7A)" has a typo.

      This typo has been corrected.

      Q7: The authors should provide high mag insets for the images in Figure 8.

      As suggested, we have revised Figure 8 by including high mag insets for the images.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      In this manuscript by Napoli et al, the authors study the intracellular function of Cytosolic S100A8/A9 a myeloid cell soluble protein that operates extracellularly as an alarmin, whose intracellular function is not well characterized. Here, the authors utilize state-of-the-art intravital microscopy to demonstrate that adhesion defects observed in cells lacking S100A8/A9 (Mrp14-/-) are not rescued by exogenous S100A8/A9, thus highlighting an intrinsic defect. Based on this result subsequent efforts were employed to characterize the nature of those adhesion defects.

      The authors thank reviewer #1 for his/her insightful comments and suggestions. Please find our point to point responses below.

      (1) Ex vivo characterization of the function of S100A8/A9 in adhesion, spreading, and calcium signaling requires at least one rescue experiment to support the direct role of these proteins in the biological processes under study.

      We thank the reviewer for this comment. We agree that rescue experiments would be helpful to confirm the direct role of intracellular S100A8/A9 in adhesion, spreading, and Ca2+ signaling. Although transfection of primary cells, especially neutrophils, poses challenges due to their short half-life, we now have undertaken additional in vitro rescue experiments. Specifically, we used extracellular S100A8/A9 and coated Ibidi flow chambers with E-selectin, ICAM-1 and CXCL1 alone or alongside S100A8/A9, and measured rolling and adhesion of blood neutrophils. Our data reveal that extracellular S100A8/A9 can induce increased adhesion in WT neutrophils but fails to rescue the adhesion defect in Mrp14-/- neutrophils (Author response image 1). This result corroborates our in vivo findings, emphasizing that the observed adhesion defect is due to the lack of intracellular S100A8/A9.

      Author response image 1.

      Extracellular S100A8/A9 does not rescue the adhesion defect in Mrp14/- neutrophils. Analysis of number of adherent leukocytes FOV-1 normalized to the WBC of WT and Mrp14-/- mice. Whole blood was harvested through a carotid artery catheter and perfused with a high precision pump at constant shear rate using flow cambers coated with either E-selectin, ICAM-1 and CXCL1 or E-selectin, ICMA-1, CXCL1 and S100A8/A9. [mean+SEM, n=5 mice per group, 12 (WT) and 14 (Mrp14-/-) flow chambers, 2way ANOVA, Sidak’s multiple comparison]. ns, not significant; *p≤0.05, **p≤0.01, ***p≤0.001.

      (2) There is room for improvement in the analysis of signaling pathways presented in Figures 3 H and I. Western blots and analyses are not convincing, in particular for p-Pax.

      We acknowledge the reviewer's concern regarding the clarity of the signaling pathway analysis, particularly the western blots for p-Paxillin. To address this, we have repeated the western blot experiments using murine neutrophils. Our new data confirm the defective paxillin phosphorylation upon CXCL1 stimulation and ICAM-1 binding in the absence of cytosolic S100A8/A9. We have now integrated these new findings with the original data and included the updated results in the manuscript (Figure 3I revised). These enhanced analyses provide a more robust and convincing demonstration of the signaling defects in Mrp14-/- neutrophils.

      (3) At least one western blot showing a knockdown of S100A8/A9 should be included towards the beginning of the result section.

      We appreciate the reviewer's suggestion to include a western blot demonstrating the knockout of S100A8/A9 early in the results section. In a recent publication by our group, we have already demonstrated the absence of S100A8/A9 at the protein level in Mrp14-/- neutrophils via western blotting ([1], please refer to Extended Data Fig. 1h). We agree that visual confirmation of the absence of S100A8/A9 protein is crucial for establishing the validity of our study.

      (4) The Ca2+ measurements at LFA-1 nanoclusters using the Mrp14-/- Lyz2xGCamP5 are interesting; It is understood that the authors are correcting calcium levels by normalizing by LFA-1 cluster areas and that seems fine to me. The issue is that the total calcium signal seems decreased in Mrp14-/- cells compared to WT cells (Fig. 4E)...why is totalCa2+ low? Please discuss.

      We thank the reviewer for this insightful comment. Indeed, our observations reveal reduced overall Ca2+ levels in Mrp14-/- neutrophils compared to WT neutrophils. Initially, we noticed a general decrease in Ca2+ intensity (Author response image 2A-B) and lifetime in Mrp14-/- neutrophils (Author response image 2C-D). Further analysis indicated that these differences in Ca2+ levels are localized specifically to the LFA-1 nanocluster sites. In contrast, the cytosolic Ca2+ levels outside of the LFA-1 nanocluster areas were comparable between Mrp14-/- and WT neutrophils (Figure 4H-J). This suggests that the reduced total Ca2+ levels observed in Mrp14-/- neutrophils are primarily due to the impaired Ca2+ supply at the LFA-1 nanocluster areas. Our data support the notion that cytosolic S100A8/A9 plays a crucial role in actively supplying Ca2+ to LFA-1 nanoclusters during neutrophil crawling. In the absence of S100A8/A9, the increase in overall Ca2+ levels (summing both inside and outside LFA-1 nanocluster areas) is minimal, further highlighting the specific role of S100A8/A9 in maintaining localized Ca2+ concentrations at these crucial sites.

      Author response image 2.

      Overall Ca2+ levels in WT and Mrp14-/- neutrophils (A) Representative confocal images of neutrophils from WT Lyz2xGCaMP5 and Mrp14-/- Lyz2xGCaMP5 mice, labeled with Lyz2 td Tomato marker. The images illustrate overall cytosolic Ca2+ levels during neutrophil crawling flow chambers coated with E-selectin, ICAM-1, and CXCL1 (scale bar=10μm). (B) Quantitative analysis of total cytosolic Ca2+ intensity in single cells from WT Lyz2xGCaMP5 and Mrp14-/- Lyz2xGCaMP5 neutrophils measured over three time intervals: min 0-1, 5-6 and 9-10 [mean+SEM, n=5 mice per group, 56 (WT) and 54 (Mrp14-/-) neutrophils, 2way ANOVA, Sidak’s multiple comparison]. (C) Representative traces and (D) single cell analysis of total Ca2+ lifetime over the first 5 minutes in WT Lyz2xGCaMP5 and Mrp14-/- Lyz2xGCaMP5 neutrophils crawling on Eselectin, ICAM-1, and CXCL1 coated flow chambers recorded with FLIM microscopy [mean+SEM, n=3 mice per group, 111 (WT) and 95 (Mrp14-/-) neutrophils, 2way ANOVA, Sidak’s multiple comparison]. ns, not significant; *p≤0.05, **p≤0.01, ***p≤0.001.

      (5) Even if the calcium level outside LFA-1 nanoclusters is not significant (Figure 4J), the data at min 9-10 in Figure 4J seems to be affected by a single event that may be an outlier. Additional data may be needed here.

      We appreciate the reviewer’s attention to this detail. To address the concern regarding a potential outlier in the Ca2+ level measurements at 9-10 minutes in Figure 4J, we rigorously tested the dataset using the GraphPad outlier calculator. The analysis revealed that no data point was statistically identified as an outlier. Given that the current dataset is robust and the statistical analysis confirms the integrity of the data, we believe that the results accurately reflect the biological variability observed in our experiments. Therefore, we have not added additional data points at this stage but remain open to discussing this further.

      (6) Finally, even though there is less calcium at LFA-1 clusters, that does not necessarily mean that "cytosolic S100A8/A9 plays an important role in Ca2+ "supply" at LFA-1 adhesion spots" as proposed. S100A8/A9 may play an indirect role in calcium availability. The analysis of the subcellular localization of S100A8/A9 at LFA-1 clusters together with calcium dynamics in stimulated WT cells would help support the authors' interpretation, which although possibly correct, seems speculative at this point.

      We thank the reviewer for this insightful comment and fully agree that additional evidence regarding the subcellular localization of S100A8/A9 would strengthen our conclusions. Although live cell imaging of intracellular S100A8/A9 was initially challenging due to technical limitations, we have now performed additional experiments to address this issue. We conducted end-point measurements where we allowed WT neutrophils to crawl on E-selectin, ICAM-1, and CXCL1 coated flow chambers for 10 minutes. Following this, we fixed and permeabilized the cells to stain intracellular S100A9, along with LFA-1 and a cell tracker for segmentation. Confocal microscopy and subsequent single-cell analysis revealed a significant enrichment of S100A8/A9 at LFA-1 positive nanocluster areas compared to the surrounding cytosol (Figure 4K and 4L, new). This finding supports our hypothesis that S100A8/A9 plays a direct role in the localized supply of Ca2+ at LFA-1 adhesion spots, thus facilitating efficient neutrophil crawling under shear stress. These new data have been included in the revised manuscript, providing stronger evidence for our proposed mechanism.

      Reviewer #2:

      Napoli et al. provide a compelling study showing the importance of cytosolic S100A8/9 in maintaining calcium levels at LFA-1 nanoclusters at the cell membrane, thus allowing the successful crawling and adherence of neutrophils under shear stress. The authors show that cytosolic S100A8/9 is responsible for retaining stable and high concentrations of calcium specifically at LFA-1 nanoclusters upon binding to ICAM-1, and imply that this process aids in facilitating actin polymerisation involved in cell shape and adherence. The authors show early on that S100A8/9 deficient neutrophils fail to extravasate successfully into the tissue, thus suggesting that targeting cytosolic S100A8/9 could be useful in settings of autoimmunity/acute inflammation where neutrophil-induced collateral damage is unwanted.

      The authors appreciate reviewer #2's insightful comments and suggestions. Below are our detailed responses:

      (1) Extravasation is shown to be a major defect of Mrp14-/- neutrophils, but the Giemsa staining in Figure 1H seems to be quite unspecific to me, as neutrophils were determined by nuclear shape and granularity. It would have perhaps been more clear to use immunofluorescence staining for neutrophils instead as seen in Supplementary Figure 1A (staining for Ly6G or other markers instead of S100A9).

      We acknowledge the reviewer's concern. However, Giemsa staining is a well-established method in hematology, histology, cytology, and bacteriology, widely recognized for its ability to distinguish leukocyte subsets based on nuclear shape and cytoplasmic characteristics. This method is extensively documented in the literature [2-5]. Its advantages are the easy morphological discrimination of leukocytes based on nuclear and cytoplasmic shape and conformation (Author response image 3).

      Author response image 3.

      Giemsa staining of extravasated leukocyte subsets. (A) Representative image of Giemsa-stained cremaster muscle tissue post-TNF stimulation. The image clearly differentiates leukocyte subsets (white arrow = neutrophils, yellow arrow = eosinophils, red arrow = monocytes). Scale bar = 50µm.

      (2) The representative image for Mrp14-/- neutrophils used in Figure 4K to demonstrate Ripley's K function seems to be very different from that shown above in Figures 4C and 4F.

      The reviewer correctly observed that the cell in Figure 4K is different from those in Figures 4C and 4F. This is intentional, as Figure 4K is meant to show a representative image that accurately reflects the overall results of the experiments. We assure the reviewer that all cells analyzed in Figures 4C and 4F were also included in the analysis for Figure 4K.

      (3) Although the authors have done well to draw a path linking cytosolic S100A8/9 to actin polymerisation and subsequently the arrest and adherence of neutrophils in vitro, the authors can be more explicit with the analysis - for example, is the F-actin co-localized with the LFA-1 nanoclusters? Does S100A8/9 localise to the membrane with LFA-1 upon stimulation? Lastly, I think it would have been very useful to close the loop on the extravasation observation with some in vitro evidence to show that neutrophils fail to extravasate under shear stress.

      We thank the reviewer for this comment and questions. 

      Concerning the co-localization of F-actin with LFA-1 nanoclusters and S100A8/9 localization: We appreciate the reviewer's interest in the co-localization between F-actin and LFA-1. Unfortunately, due to the limitations of our GCaMP5 mouse model (with neutrophils labeled with td-Tomato and eGFP for LyzM and Ca2+), we could only stain for either LFA-1 or F-actin at a time. However, in our F-actin movies, we observed that F-actin predominantly localizes at the rear of the cell, while LFA-1 is more uniformly distributed at the plasma membrane.

      Regarding S100A8/A9 localization, as mentioned in response to Reviewer 1's sixth point, we now conducted endpoint measurements. We stained neutrophils with cell tracker green CMFDA and LFA-1, allowed them to crawl on E-selectin, ICAM-1, and CXCL1-coated flow chambers, and then performed intracellular S100A9 staining after fixation and permeabilization. Our analysis shows higher S100A9 intensity at LFA-1 positive areas compared to LFA-1 negative areas (Figure 4K and 4L, new). This indicates that S100A8/A9 indeed concentrates Ca2+ at LFA-1 nanoclusters, supporting adhesion and post-arrest modification events under flow.

      Regarding the extravasation defect under shear stress: To address the reviewer's suggestion, we performed transwell migration assays under static conditions. Our results show no significant difference in transmigration between WT and Mrp14-/- neutrophils without flow, indicating that the extravasation defect in Mrp14-/- neutrophils is shear-dependent. This supports our hypothesis that S100A8/A9-mediated Ca2+ supply at LFA-1 nanoclusters is critical under flow conditions (Author response image 4).

      Author response image 4.

      Static Transmigration assay. (a) Transmigration of WT and Mrp14-/- neutrophils in static transwell assays (3um pore size, 45min migration time) showing spontaneously migration (PBS) or migration towards CXCL1. [mean+SEM, n=3 mice per group, 2way ANOVA, Sidak’s multiple comparison]. ns, not significant; *p≤0.05, **p≤0.01, ***p≤0.001.

      Additional References

      (1) Pruenster, M., et al., E-selectin-mediated rapid NLRP3 inflammasome activation regulates S100A8/S100A9 release from neutrophils via transient gasdermin D pore formation. Nature Immunology, 2023. 24(12): p. 2021-2031.

      (2) Kuwano, Y., et al., Rolling on E- or P-selectin induces the extended but not high-affinity conformation of LFA-1 in neutrophils. Blood, 2010. 116(4): p. 617-24.

      (3) Porse, B., Mouse Hematology – A Laboratory Manual. European Journal of Haematology, 2010. 84(6): p. 554-554.

      (4) Frommhold, D., et al., Protein C concentrate controls leukocyte recruitment during inflammation and improves survival during endotoxemia after efficient in vivo activation. Am J Pathol, 2011. 179(5): p. 2637-50.

      (5) Braach, N., et al., RAGE Controls Activation and Anti-Inflammatory Signalling of Protein C. PLOS ONE, 2014. 9(2): p. e89422.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Thank you for your consideration and insightful comments on our article.

      We have gone through all the reviewers' comments and addressed all their questions and concerns point by point.

      As per their recommendation, we have amended our manuscript by providing more information about the experimental procedure and statistical analysis followed, and removed some analyses with a reduced number of imaging sessions. In addition, as a Resource and Tools article, the claim of our paper has been adjusted to a proof-of-concept paper showing robust and reliable preliminary results. In the meantime, we have provided 3 new Supplementary Figures, including one showing data from all individual animals.

      Reviewer #1 (Public Review):

      The authors apply a new approach to monitor brain-wide changes in sensory-evoked hemodynamic activity after focal stroke in fully conscious rats. Using functional ultrasound (fUS), they report immediate and lasting (up to 5 days) depression of sensory-evoked responses in somatosensory thalamic and cortical regions.

      Strengths: This a technically challenging and proof-of-concept study that employs new methods to study brain-wide changes in sensory-evoked neural activity, inferred from changes in cerebral blood flow. Despite the minor typos/grammatical errors and small sample size, the authors provide compelling images and rigorous analysis to support their conclusions. Overall, this was a very technically difficult study that was well executed. I believe that it will pave the way for more extensive studies using this methodological approach. Therefore I support this study and my recommendations to improve it are relatively minor in nature and should be simple for the authors to address.

      Weaknesses: The primary weakness of this paper is the small sample sizes. Drawing conclusions based on the small sham control group (n=2) or 5-day stroke recovery group (n=2), is rather tenuous. One way to alleviate some uncertainty with regard to the conclusions would be to state in the discussion that the findings (ie. loss of thalamocortical function after stroke) are perfectly consistent with previous studies that examined thalamocortical function after stroke. The authors missed some of these supporting studies in their reference list (see PMID: 28643802, 1400649). A second issue that can easily be resolved is their analysis of the 69 brain regions. This seems like a very important part of the study and one of the primary advantages of employing efUS. As presented, I had difficulty seeing the data. I think it would be worthwhile to expand Fig 3 (especially 3C) into a full-page figure with an accompanying table in the Supplementary info section describing the % change in CBF for each brain region.

      Other Recommendations for the authors:.

      • Since there is variability in spreading depolarizations, was there any trend in the relationship between # SD's and ischemic volume? I know there are few data points but a scatterplot might be of interest.

      • For statistical comparisons of 'response curves' in Fig 3 and 4, what exactly was the primary dependent measure: changes in peak amplitude (%) or area under the curve?

      • There are several typos and minor grammatical errors in the manuscript. Some editing is recommended.

      We thank the reviewer for the comments and suggestion, we have adapted our message to a proof-of-concept paper showing robust and reliable preliminary results. We also thank the reviewer for pointing out important references that support our observation and have added them to our article. We have provided a supplementary full-page version of the current Figure 3C (see Supplementary Figure 3).

      Regarding the recommendations, we strongly agree that it would be of interest to link SDs and ischaemia, but unfortunately this can't be done because our experimental design, i.e. narrow cranial window and single static plane, does not allow brain-wide quantification of ischemic volume. This would be possible either by scanning the brain or by using a matrix array (also discussed in the manuscript).

      For statistical analysis of the hemodynamic response curves, we have adapted them to compare the area under the curve (AUC). In addition, we have provided a new Supplementary Figure 4 showing the associated values and statistics.

      We have edited typos and errors.

      Reviewer #2 (Public Review):

      Brunner et al. present a new and promising application of functional ultrasound (fUS) imaging to follow the evolution of perfusion and haemodynamics upon thrombotic stroke in awake rats. The authors leveraged a chemically induced occlusion of the rat Medial Cerebral Artery (MCA) with ferric chloride in awake rats, while imaging with fUS cerebral perfusion with high spatio and temporal resolution (100µm x 110µm x 300µm x 0.8s). The authors also measured evoked haemodynamic response at different timepoints following whisker stimulation.

      As the fUS setup of the authors is limited to 2D imaging, Brunner and colleagues focused on a single coronal slice where they identified the primary Somatosensory Barrel Field of the Cortex (S1BF), directly perfused by the MCA and relay nuclei of the Thalamus: the Posterior (Po) and the Ventroposterior Medial (VPM) nuclei of the Thalamus. All these regions are involved in the sensory processing of whisker stimulation. By investigating these regions the authors present the hyper-acute effect of the stroke with these main results:

      • MCA occlusion results in a fast and important loss of perfusion in the ipsilesional cortex.

      • Thrombolysis is followed by Spreading Depolarisation measured in the Retrosplenial cortex.

      • Stroke-induced hypo-perfusion is associated with a significant drop in ipsilesional cortical response to whisker stimulation, and a milder one in ipsilesional subcortical relays.

      • Contralesional hemisphere is almost not affected by stroke with the exception of the cortex which presents a mildly reduced response to the stimulation.

      In addition, the authors demonstrate that their protocol allows to follow up stroke evolution up to five days post-induction. They further show that fUS can estimate the size of the infarcted volume with brilliance mode (B-mode), confirming the presence of the identified lesional tissue with post-mortem cresyl violet staining.

      Upon measuring functional response to whisker stimulation 5 days after stroke induction, the authors report that:

      • The ipsilesional cortex presents no response to the stimulation

      • The ipsilesional thalamic relays are less activated than hyper acutely

      • The contralesional cortex and subcortical regions are also less activated 5d after the stroke.

      These observations mainly validate the new method as a way to chronically image the longitudinal sequelae of stroke in awake animals. However, the potentially more intriguing results the authors describe in terms of functional reorganization of functional activity following stroke appear to be preliminary, and underpowered ( N = 5 animals were imaged to describe hyper-acute session, and N = 2 in a five day follow-up). While highly preliminary, the research model proposed by the author (where the loss of the infarcted cortex induces reduces activity in connected regions, whether by cortico-thalamic or cortico-cortical loss of excitatory drive), is interesting. This hypothesis would require a greatly expanded, sufficiently powered study to be validated (or disproven).

      We thank the reviewer for the careful and accurate description of our work. We have addressed all the comments, recommendations and concerns raised by providing details of the experimental procedure and statistical analysis followed, and by removing some analyses associated with a reduced number of imaging sessions (at d5, n=2).

      Reviewer #3 (Public Review):

      The authors set out to demonstrate the utility of functional ultrasound for evaluating changes in brain hemodynamics elicited acutely and subacutely by the middle cerebral artery occlusion model of ischemic stroke in awake rats.

      Functional ultrasound affords a distinct set of tradeoffs relative to competing imaging modalities. Acclimatization of rats for awake imaging has proven difficult with most, and the high quality of presented data in awake rats is a major achievement. The major weakness of the approach is in its being restricted to single-slice acquisitions, which also complicates the registration of acquisition across multiple imaging sessions within the same animal. Establishing that awake imaging represents an advancement in relation to studies under anesthesia hinges upon the establishment of the level of stress experienced by the animals in the course of imaging, i.e., requires providing data on the assessment of stress over the course of these long imaging sessions. This is particularly significant given how significant a stressor physical restraint has been established to be in rodent models of stress. Furthermore, assessment of the robustness of these measurements is of particular significance for supporting the wide applicability of this approach to preclinical studies of brain injury: the individual animal data (effect sizes, activation areas, kinetics) should thus be displayed and the statistical analysis expanded. Both within-subject, within/across sessions, and across-subjects variability should be evaluated. Thoughtful comments on the relationship between power doppler signal and cerebral blood volume are important to include and facilitate comparisons to studies recording other blood volume-weighted signals. Finally, the contextualization of the observations with respect to other studies examining acute and subacute changes in brain hemodynamics post focal ischemic stroke in rats is needed. It is also quite helpful, for establishing the robustness of the approach, when the statistical parametric maps are shown in full (i.e. unmasked).

      We would like to thank the reviewer for the comments, recommendations and concerns he/she/they raised. We have addressed all the points to clarify our article and make it more relevant and informative for readers.

      Reviewer #2 (Recommendations For The Authors):

      The work described by Brunner et al is primarily a methodological paper, with potentially interesting, yet not robust enough, novel biological insight into the mechanisms of stroke. Nonetheless, the method employed is interesting and potentially well-validated.

      General comments/suggestions

      1- One potential concern I have is related to the relatively low sample size used, with n=5 for the main results and only n=2 for the follow-up after 5d. I am not sure much can be generalized using only two animals in any research study and this N = 2 dataset should probably be removed entirely from the study. Moreover, I found the statistical methods used were only superficially described, which prevented me from assessing whether the results reported by the authors are biologically relevant or not (including some significant differences in rCBV well below 1% estimated over two individuals).

      We fully agree with the reviewer’s comment and balanced our claim by considering this work as a proof-of-concept on brain imaging of multiple aspects of stroke hemodynamics (ischemia, spreading depolarization-like events, cortico-thalamic functions) in awake head-fixed rats. Therefore, we attenuated our message along the entire manuscript to prevent misunderstanding and over statement (e.g., Lines 356, 441, 455), we also remove statistics from the analysis at d5 post-stroke, see Figure 4 and associated paragraph from Line 356.

      2- Based on their investigations, the authors propose a model where the loss of infarcted cortex induces reduced activity in connected regions, whether by cortico-thalamic or cortico-cortical loss of excitatory drive. This is an intriguing framework but this hypothesis would require a more complete, well-powered study to be substantiated.

      I think a clear recognition of the fact that these findings are just preliminary and not validated should be more explicitly reported. I also marginally note here that these results are in contrast with previous reports from the same team where occlusion of the MCA induced increased response to whisker stimulation in anaesthetised rats. These contradictory findings are not discussed in this manuscript.

      As mentioned above, we explicit more on the proof-of-concept proposed in this work as well as clearly stating on the preliminary aspect of the findings described in this work. As mentioned above, we attenuated our message along the entire manuscript to prevent misunderstanding and over statement (e.g., Lines 348, 433, 447), we also remove statistics from the analysis at d5 post-stroke, see figure 4 and associated paragraph from Line 348.

      We thanks the reviewer for pointing out the missing link with our previous work performed under anesthesia. We therefore provided a discussion point on this contradictory finding (Line 441).

      3- In a previous study from the same group perfusion was imaged in 3D either by means of a motorized probe or by using a 2D matrix arrays. It would be interesting to discuss why a 2D approach was chosen in this study over those previous methods.

      Indeed, brain-wide coverage would be of great interest in such experiment context. As mentionned by the reviewer, two strategies can be used:

      • One can scan the brain using a motorized probe as performed for different purposes by Sieu et al., Nature Methods, 2015; Hingot, Brodin et al., Theranostics 2020; Macé et al., Neuron 2019 and also by our group in Sans-Dublanc, Chrzanowska et al., Neuron, 2022; Brunner et al. Frontiers in Neuroscience 2022 and Brunner et al., JCBFM 2023. (This list of publication is not exhaustive).

      • A second approach aims at using a 2D matrix array to capture functions at brain-wide scale. So far, this strategy has been employed in a couple of studies (Rabut et al., Nature Methods, 2019 and Brunner, Grillet et al., Neuron, 2020).

      The strategy consisting of scanning (manually or using a motor) strongly limits investigation on brain functions, as performing an accurate covering of the functional regions requires an extensive and time-consumming scanning: brain functions must be addressed several time to capture a reliable and robust signal for all the brain section scanned (see Brunner et al., 2022). Unfortunately, this strategy prevents us to accurately capture other brain hemodynamics like the dynamic of the ischemia or the spreading depolarization event.

      On the other hand, the volumetric functional ultrasound imaging (vfUSI) would be suited for brain-wide coverage capturing large-scale brain functions (see Brunner, Grillet et al. Neuron 2020) and hemodynamic events (see Rabut et al., Nature Methods, 2019) but at the cost of the resolution, frame rate and larger cranial window. Unfortunately, this technology was not available when this work was conducted.

      Such experimental opportunities have been suggested at the end of the manuscript: “To overcome such limitation, one can extend the size of the cranial window to allow for larger scale imaging either by sequentially scanning the brain27,28,31,32,59,69,71,72, or by using the recently developed volumetric fUS which provides whole-brain imaging capabilities in anesthetized73 and awake rats30.“

      4- Overall the registration scheme seems suboptimal which ultimately questions the specificity of the findings in thalamic regions. It would be interesting to validate this procedure, especially the probe repositioning five days after the stroke.

      Positioning was not difficult part of this experiment. First, all head posts were implanted in the same position relative to the skull references bregma and lambda. Second, the head fixation ensures the same placement of the headpost for all animals. Finally, fine adjustement of the ultrasound probe position were done using a micromanipulator by finding key landmarks from the µDoppler image. In practice, minimal adjustements were needed to find back the same imaging plane. We provide additional information about the positionning in the Materials and Methods section.

      New text – Line 126: “Positionning.

      The mechanical fixation of the head-post ensures an easy and repeatabe positionning of the ultrasound probe across imaging session. The ultrasound probe is indeed fixed to a micromanipulator enabling light adjustements To find the plane of interest (containing both S1BF and thalamic relays: bregma - 3.4mm), we used brain landmarks (e.g., surface of the brain, hippocampus, superior sagittal sinus, large vessels). Note that as the headpost was carefully placed in the same position relative to the skulls landmarks (bregma and lambda), the position of the region of interest was minimal across animals.”

      Second, at d5 post-stroke, we positionned the ultrasound probe over the imaging window as described in the Materials and Methods section and use brain landmarks from baseline/post-stroke image to maximize the position of brain image. We better detail the procedure followed.

      Original text: “First, we used the vascular markers and the shape of the hippocampus31,32 to find back the coronal cross-section imaged during the pre-stroke session. Five days after the MCA occlusion,….”

      New text – Line 360 :“Five days after the MCA occlusion, we first placed the ultrasound probe over the imaging window and adjusted its position (using micromanipulator) to find back the recording plane from Pre-Stroke session using Bmode (morphological mode) and µDoppler imaging using brain vascular landmarks (i.e., vascular patterns, brain surface and hippocampus34,35; see Figure 2B).”

      More detailed questions/comments/suggestions

      Methods

      ARRIVE methodology

      • Point 2b: sample size is not adequately explained, especially the use of n = 2 animals for 5d follow up

      We have explicited the sample size by adding a short paragraph at the beginning of the Results section. We also make the Supplementary Table 1 more accurate. New text – Line 239: “Animals

      Report on animal use, experimentation, exclusion criteria can be found in Supplementary Table 1. Rat#1 was excluded after the control session as the imaging window was too anterior to capture both cortical and thalamic responses. Ra#2 was excluded as hemodynamic responses were inconsistent during baseline (pre-stroke) period. Rat#3 showed early post-stroke reperfusion and was excluded from stroke analysis, the control session (pre-stroke) from Rat#3 was analyzed.”

      • Point 7: statistical methods: The quantification used to assess significant differences in stimulation traces is poorly described.

      We have amended the Materials and Methods section about statistics and provided Supplementary Figure 4.

      New text – Line 221: “Activated brain regions were detected from hemodynamic response time-courses using GLM followed by t-test across animals as proposed in Brunner, Grillet et al.,34. The area under the curve (AUC) from hemodynamic response time-courses was computed for individual trials in S1BF, VPM and Po regions, for all the periods of the recording and for all rats included in this work. AUC were compared and analysed using a non-parametric Kruskal-Wallis test corrected for multiple comparison using a Dunn’s test. Tests were performed using GraphPad Prism 10.0.1. “

      Functional Ultrasound Imaging acquisition

      • References 26 and 28 imply 2.5Hz and 2Hz acquisition rates, respectively. Why does the same method result in a 1.25Hz acquisition rate here? Can you confirm the same spatial resolution in these conditions?

      The spatial resolution is independent of the temporal resolution (frame rate). The spatial resolution depends on the resolution of the compound image and the temporal resolution is given by the number of compound images to generate a single Doppler image (exposure time). By increasing the number of compound images, the frame rate decreases while increasing the signal to noise ratio and sensistivity. For some work, a pause between 2 frames is used (mostly due to technical limitations in the software (processing time , or execution of a real-time display/processing by the user), however this reduces the frame rate.

      Author response table 1.

      Comparing with the sequences used in references 26 and 28, we have the following timing parameters

      In this work, we decided to reduce the frame rate to have less images but with higher SNR. The 0.3s were added by technical considerations in this specific implementation.

      New text – Line 158:“ To obtain a single vascular image we acquired a set of 250 compound images in 0.5s, an extra 0.3s pause is included between each image to have some processing time to display the images for real-time monitoring of the experiment. “

      Activity Maps

      • How is the use of a 40s window motivated?

      The 40s window has been choosen to better compare hemodynamic responses to either left or right whisker stimulation and centered the period of interest on the start of the stimulation. Original text:” Pre- and post-stroke recordings are reshaped in shorter 40-s sessions, i.e., 50 frames, …”

      New text – Line 206:“ Pre- and post-stroke recordings are reshaped in 40-s sessions, i.e., 50 frames, centered on the start of the stimulation (at 20s), …”

      • I think the manuscript would benefit from the use of an established, event-based GLM for activity mapping.

      We thank the reviewer for this suggestion, here we used a z-score for activity mapping that is largerly established in the neuroimaging realm.

      • The statistical thresholds used should account for multiple comparisons.

      We have amended the Materials and Methods section, and figure captions about statistics and provided Supplementary Figure 4.

      Statistical analyses

      • Overall this section is only superficially described, and lacks detailed information.

      We have amended the Materials and Methods section about statistics and provided Supplementary Figure 4.

      New text – Line 221 : “Activated brain regions were detected from hemodynamic response time-courses using GLM followed by t-test across animals as proposed in Brunner, Grillet et al.,34. The area under the curve (AUC) from hemodynamic response time-courses was computed for individual trials in S1BF, VPM and Po regions, for all the periods of the recording and for all rats included in this work. AUC were compared and analysed using a non-parametric Kruskal-Wallis test corrected for multiple comparison using a Dunn’s test. Tests were performed using GraphPad Prism 10.0.1. “

      • Are average rCBV changes referred to in the 40s window?

      The rCBV changes are referring to the pre-stimulation baseline. We have modified the text accordingly (Line 206).

      • Were normality and variance equality requirements verified in the group with n=2?

      Based on reviewers comment’s on the limited amount of recording at 5d, we have decided to remove this statistical analysis. The manuscript, figure and caption were corrected accordingly.

      • There is no method for cresyl violet staining

      We thank the review for highlighting this omission. We have provided a paragraph in the Materials & Methods section detailling the histology procedure – Line 228:

      “Histopathology

      Rats were killed 24hrs after the occlusion for histological analysis of the infarcted tissue. Rats received a lethal injection of pentobarbital (100mg/kg i.p. Dolethal, Vetoquinol, France). Using a peristaltic pump, they were transcardially perfused with phosphate-buffered saline followed by 4% paraformaldehyde (Sigma-Aldrich, USA). Brains were collected and post-fixed overnight. 50-μm thick coronal brain sections across the MCA territory were sliced on a vibratome (VT1000S, Leica Microsystems, Germany) and analyzed using the cresyl violet (Electron Microscopy Sciences, USA) staining procedure (see Open Lab Book for procedure). Slices were mounted with DPX mounting medium (Sigma-Aldrich, USA) and scanned using a bright-field microscope.”

      Results 1: Real time imaging of stroke induction in awake rats

      • Why is the window so narrow in the anteroposterior direction?

      The imaging window was defined based on the brain regions investigated in this work, meaning the primary somatosensory cortex (S1BF) and the ventroposterior medial thalamic relay (VPM). From Paxinos atlas, a position of interest is located at Bregma -3.4mm. The cranial window was performed accordingly, and restricted couple of mm to avoid non-needed procedure and brain exposure. We added a new sentence in the Materials & Methods section – Line 116: “This cranial window aims to cover bilateral thalamo-cortical circuits of the somatosensory whisker-to-barrel pathway.”

      • What validation was employed for the habituation protocol? Are animals stressed by the procedure? Do you have cortisol data to show? Ar animal weights throughout the procedure?

      The habituation protocol employed in this work follows recommandations from the expert in the field and peers (Martin et al., Journal of Neuroscience Methods, 2002; Martin et al., Neuroimage 2006; Topchiy et al., Behav Brain Res 2009). We have amended the corresponding paragraph in the Materials & Methods section detailling the habituation procedure:

      Original text: “Body restraint and head fixation.

      Rats were habituated to the workbench and to be restrained in a sling suit (Lomir Biomedical inc, Canada), progressively increasing the restraining period from minutes to hours33,34. After the headpost implantation (see below), rats were habituated to be head-fixed while restrained in the sling. The period of fixation was progressively increased from minutes to hours. Water and food gel (DietGel, ClearH2O, USA) were provided along the habituation session. Once habituated, the cranial window for imaging was performed as described below (Figure 1A-C).”

      New text - Line 90:“ Body restraint and head fixation.

      The body restraint and head fixation procedures are adapted from published protocols and setup dedicated for brain imaging of awake rats39–41. Rats were habituated to the workbench and to be restrained in a sling suit (Lomir Biomedical inc, Canada) by progressively increasing restraining periods from minutes (5mins, 10mins, 30mins) to hours (1 and 3hrs) for one or two weeks. The habituation to head-fixation started by short (5 to 30s) and gentle head-fixation of the headpost between fingers. The headpost was then secured between clamps for fixation periods progressively increased following the same procedure as with the sling. For both body restraint and head fixation, the initial struggling and vocalization diminished over sessions. Water and food gel (DietGel, ClearH2O, USA) were provided for all body restraint and head-fixation habituation sessions. Once habituated, the cranial window for imaging was performed as described below (Figure 1A-C).”

      • The observation of contralateral oligemia is based only on RSG traces.

      We provided contralesional perfusion changes for all regions in Supplementary Figure 1.

      • The spatial and temporal distribution of Bmode measured hyperechogenicity is surprising and should be discussed. Reference 29 describes for instance non-overlap with an area of hypo-perfusion. Overlap between hypo-perfused and infarct volumes should be systematically investigated and coregistered with histology. Moreover, reference 40, while using a different model, presents hyperechogenicity at 5h.

      The B-mode images in Figure 2B are presented as an illustration of the potential morphological changes detected at different timepoint. However, our study focuses on functional responses and not on the evolution of the morphological changes. Indeed, this Bmode images remain difficult to interpret as they show a structural reorganization at the level of the ultrasound scatterers which has not been directly linked with tissue infarction, oedema, orother histological conditions.

      Regarding the reference 40, the authors found an hyper-echogenicity at 5h a time window is not covered by our protocol. In reference 29, we indeed detailed a mismatch between the µDoppler images and histopathology. As suggested by the reviewer, seeking for other potential mismatchs/overlaps between Bmode/µDoppler and histopathology is an interesting field on investigation, but remains out of the scope of this work.

      Results 3: Delayed alteration of the somatosensory thalamocortical pathway

      • These results are underpowered and as such should probably be removed entirely from the paper (or substantiated with greater Ns of animals). Based on reviewers comment’s on the limited amount of recording at 5d, we have decided to remove this statistical analysis. The manuscript, figure and caption were corrected accordingly.

      • If I am not mistaken, reference 28 describes a protocol for awake mouse imaging, and thereby does not introduce any hippocampal landmark allowing effective positioning of the probe.

      We thanks the reviewer for this comment. While not used in the figure detailling image registration in reference 28, step 42 (page 17) from the protocol mentions the use of hippocampal landmark to position of the imaged brain to the atlas. The hippocampal landmark is also used in Brunner et al., JCBFM 2023, we have added this reference which is more appropriate to this work (i.e., rat model, digitalized paxinos atlas, linear ultrasound transducer).

      • Significant difference in ispsilesional VPM with post-stroke period looks spurious.

      We have amended the Materials and Methods section about statistics and provided Supplementary Figure 4.

      Discussion:

      The sentence "might result from the direct loss of the excitatory corticothalamic feedback to the VPM" should be moderated in the absence of electrophysiology support. Such a decrease could be explained by reduced perfusion due to the challenge.

      The reviewer is right and we believe the tense used in the sentence already balance the claim. However, we clarified on how such result could be better validated.

      Original text: “Further work will need to dissect the complex and long-lasting post-stroke alterations of the functional whisker-to-barrel pathway, including at the neuronal level, as fUS only reports on hemodynamics as a proxy of local neuronal activity27,28,60,66–68“

      New text – Line 445: “Therefore, further studies will be needed to accurately dissect the complex and long-lasting post-stroke alterations of the functional whisker-to-barrel pathway, including at the neuronal level by direct electrophysiology recordings and imaging, as fUS only reports on hemodynamics as a proxy of local neuronal activity30,31,63,74–76.“

      Figure 2

      • Panel B would be more informative if presented as an average.

      The aim of this figure is to show the raw data of a typical case. Averaging µDoppler images wouldn’t be illustrative as individual vessels will not be visible anymore. Because the vessels are in different positions from one animal to another, an average image would be blurred.

      • Panel C lacks contralateral S1BF trace.

      We have provided contralesional perfusion changes for all regions in Supplementary Figure 1.

      • Methods for detection of SDs refer to non-peer-reviewed reference 29, where SD is defined as 50% over baseline level. What is the actual threshold/method used to define a SD in this study?

      We better detailled this procedure in the Materials & Methods section - Line 195: “The detection of hemodynamic events associated with spreading depolarizations (SDs) was performed based on the temporal analysis of the rCBV signal in the retrosplenial granular (RSGc) and dysgranular (RSD) cortices of the left hemisphere (ipsi-lesional). SDs were defined as transient increase of rCBV signal (+25%) detected with a temporal delay of <10 frames (i.e., 8secs) between the two regions of interest, validating both the hyperemia and spreading features of hemodynamic events associated with spreading depolarizations.”

      • For panel F, a measure of variance would be more suited to show stereotypic profile across animals as the number of SDs varies between animals.

      Figure 2F indeed shows the average profile of hemodynamic events associated with spreading depolarizations (black line) with the variance (95% confidence interval error bands in gray). We have adjusted the corresponding figure caption to make this information more clear.

      Figure 3

      • The exact stimulation employed is not clear as the methods describe a 1.33 min delay between two whisker pad stimulations, but the figure reports 40s. The description is thereby ambiguous. We thank the reviewer for pointing out this potiential confusion which allowed us to correct a mistake

      • The effective delay between two stimulations delivered to the whisker pads is 40 seconds

      • The effective delay between two stimulations delivered to the same whisker pad is 80 seconds from start to start or 75 seconds from end to start.

      The text was amended accordingly in line 144: “Thus, the effective delay between two stimulations delivered to the same whisker pad is 80 seconds from start to start.“

      • In panel B the choice of colormap and transparency for template overlay is not explained and is confusing given the employed threshold of 1.6. Which mask was used to overlay the activation map on the template? Why black color to represent a supposedly significant difference?

      We thank the reviewer for pointing out this potiential confusion. We have adjusted the colormap in Figures 3 and 4.

      • The pre-stroke thalamic response is clearly localized in VPM for left stimulation, while it overlaps VPM and Po for the right stimulation. This questions the accuracy of the employed registration scheme and consequently the choice of these ROIs, which appear quite small as compared to the resolution and this positioning precision.

      We see the point of the reviewer, here the apparent difference because the brain is slighly tilted. By adjusting the angle for both activity maps (see Author response image 1) we confirm that both maps are very similar including the for activated areas VPM and Po.

      Author response image 1.

      • It would be interesting to see the same activation maps for all animals in supplementary.

      We have provided the Supplementary Figure 5 that contains both ipsilateral and contralateral responses to whiskers stimulation (from both left and right pads) for all trials and all rats included in this work.

      • Looking at panel C, more cortical regions seem to respond to the stimulation above S1BF.

      The reviewer is right and we have indeed mentioned this point several times in the original manuscript in:

      • the result section: “We also detected significant increase of activity in S2, AuD, Ect (*p<0.0001) and PRh (p<0.001) cortices and VPL nucleus (**p<0.01; the list of acronyms is provided in Supplementary Table 2), brain regions receiving direct efferent projections from the S1BF45,48,49, VPM or Po nuclei50–52.”

      • the caption of Figure 4: “S1BF, S2, AuD, VPM, VPL and Po regions are brain regions significatively activated (all pvalue<0.01; GLM followed by t-test.”

      • the conclusion section : “Functional responses to mechanical whisker stimulation were detected in several regions relaying the information from the whisker to the cortex, including the VPM and Po nuclei of the thalamus, and S1BF, the somatosensory barrel-field cortex. Responses were also observed in the S2 cortex involved in the multisensory integration of the information43,44,61, the auditory cortex as it receives direct efferent projection from S1BF45,61, and the VPL nuclei of the thalamus connected via corticothalamic projections45.“

      • It would be interesting to see bilateral traces as supplementary figures.

      We have provided the Supplementary Figure 5 that contains both ipsilateral and contralateral responses to whiskers stimulation (from both left and right pads) for all trials and all rats included in this work.

      • In both panels C and D, n=5 is reported, but methods state the use of 7 animals. Please clarify how animals have been used in the different studies

      We have clarified the report on animal use and amended the Supplementary Table 1 accordingly.

      • In Panel D, the 95% CI intervals seem particularly narrow. Might this be the result of considering multiple trials as independent events? A GLM analysis would avoid this statistical fallacy.

      We have provided the Supplementary Figure 5 that contains both ipsilateral and contralateral responses to whiskers stimulation (from both left and right pads) for all trials and all rats included in this work. The statistical analysis has been adjusted (see Materials and Methods) and completed with a Supplementary Figure 4

      Figure 4 - See comments above for Figure 3

      We have adjusted the Figure 3 accordingly to reviewer’s suggestions

      Reviewer #3 (Recommendations For The Authors):

      1) Introduction: Given the emphasis on the awake state, it would be helpful to note that a significant portion of strokes occur during sleep - as well as comment on its hemodynamic difference with respect to an awake state.

      We agree with the reviewer on the remark that some strokes occur during sleep phase. However, here the awake state, which has been poorly addressed in the litterature, is opposed to anesthesia a condition largerly used to investigate brain functions after stroke. We added a point and corresponding references about wake-up stroke, see Line 49.

      2) The effects of anesthetics on stroke are quite variable and the literature data on the topic is rather divergent: it would be helpful for the introduction to reflect the large level of discord in the literature and the wide-ranging mechanisms of action of different anesthetics.

      We thank the reviewer for this comment. We have completed our original sentence in the introduction to better reflect the various effects of anesthetics on stroke, see Line 50

      3) The reference list (14-17) to other studies of brain hemodynamic changes post ischemic stroke is egregiously short. Please expand. Similarly, the list of citations to other functional ultrasound rodent studies in the literature (23-24) is misleading: other groups have published similar work and ought to be cited.

      We thank the reviewer for this comment and added complementary references. However, we believe that the references 14-17 pointed by the reviewer are not only refering to brain hemodynamic changes but mostly on network and function as stated in the manuscript. Regarding references on fUS (23-24) mentioned by the reviewer, we did not limited our citation on functional ultrasound imaging to those 2 articles but on 15+ from 4 different research groups.

      4) It would be helpful if the authors used "spreading depolarization" the way it has been utilized in the many decades of research on them in the literature, namely, as waves of hyper/hypoactivity in the electrophysiological signals. Please use a distinct term to refer to waves of changes in the hemodynamic state.

      We have amended the terminology used in the manuscript. “Spreading depolarization” has been replaced by “hemodynamic events associated with spreading depolarizations” or similar.

      5) Why is this investigation restricted to male rats?

      As a proof of concept, we did not performed experiments in female rats. We agree that further investigation would require a gender mix. We added a line in the discussion.

      New text – Line 455:” Finally, it is important to note that this proof-of-concept work did not specifically focus the impact of sex dimorphism on the stroke or early behavioral outcomes following the insult that would greatly enhance the translational value of such preclinical stroke study80.”

      6) Were the animals tested during their active phase? If not, why not, and what are the implications of testing their responses during the sleep phase?

      We think there is a misunderstanding here as we investigated brain functions in awake head-fixed rats. Therefore, the sleep/active phases were not investigated neither mentioned in the manuscript.

      7) How is the level of stress monitored/established?

      In this work, we followed established procedure used to reduce stress and disconfort of the rats all along the experiment. The procedure used is now better detailled in the Materials and Methods section. However, the level of stress was not monitored, and would be of interest to considere in future experiments.

      8) What are the sequelae of stress on brain hemodynamics, especially given 1-4 hour long sessions.

      This is a good remark. While we cannot state on how the stress impacts brain hemodynamics, the data extracted show that hemodynamics reponse functions were stable and robust over hour-long recording (see control and pre-stroke sessions in Supplementary Figure 5).

      9) How is the animal prepared for stroke induction? In general, the methodological steps surrounding animal handling and preparation are exceedingly terse.

      We provided more details about the handling and preparation of the rats in the Materials and Methods section.

      Original text: “Body restraint and head fixation.

      Rats were habituated to the workbench and to be restrained in a sling suit (Lomir Biomedical inc, Canada), progressively increasing the restraining period from minutes to hours33,34. After the headpost implantation (see below), rats were habituated to be head-fixed while restrained in the sling. The period of fixation was progressively increased from minutes to hours. Water and food gel (DietGel, ClearH2O, USA) were provided along the habituation session. Once habituated, the cranial window for imaging was performed as described below (Figure 1A-C).”

      New text - Line 90:“ Body restraint and head fixation.

      The body restraint and head fixation procedures are adapted from published protocols and setup dedicated for brain imaging of awake rats39–41. Rats were habituated to the workbench and to be restrained in a sling suit (Lomir Biomedical inc, Canada) by progressively increasing restraining periods from minutes (5mins, 10mins, 30mins) to hours (1 and 3hrs) for one or two weeks. The habituation to head-fixation started by short (5 to 30s) and gentle head-fixation of the headpost between fingers. The headpost was then secured between clamps for fixation periods progressively increased following the same procedure as with the sling. For both body restraint and head fixation, the initial struggling and vocalization diminished over sessions. Water and food gel (DietGel, ClearH2O, USA) were provided for all body restraint and head-fixation habituation sessions. Once habituated, the cranial window for imaging was performed as described below (Figure 1A-C).”

      10) What is the reproducibility of the chemo-thrombotic model timeline? What are its limitations?

      We have provided more information on the chemo-thrombotic model and its limitations in the discussion section to discuss

      New text – Line 402:” However, to adequatly and efficiently occlude the vessel of interest, removing a piece of skull remains required. As mentioned in the report on animal use, one rat was excluded from the analysis as the MCA spontaneously reperfuses, thus dropping the success rate of such model.”

      11) What is the motivation behind the 5-days post stroke timepoint selection?

      In addition to demonstrating the feasability of imaging brain functions at different timepoint following the ischemia, the motivation to performed this delayed session was to capture functional diaschisis which is known to occur few days after the initial insult. More recurrent imaging sessions covering a longer post-stroke period would be of high interest to better capture the impact of ischemia on both the brain hemodynamics and functions.

      12) How predictive is hyperacute hemodynamics imaging of the long-term outcome?

      We thanks the reviewer for this question, that remains of major interest in the stroke realm. However, the prediction of long-term outcome would require to capture brain hemodynamic at larger scale as performed in Hingot et al., Theranostics 2020 and Brunner et al. JCBFM 2023, a coverage not accessible with the imaging window proposed in this work.

      13) It would be greatly reassuring if the authors presented the statistical parametric maps without masking regions of interest (eg Fig3B).

      We thank the reviewer for pointing out this potential confusion. In the first version of the figure, the colormap used of activity maps was indeed non optimal. Therefore, we i) adjusted the colormap used in Fig 3 and 4 and ii) provided non-thresholded z-score maps for all rats in Supplementary Figure 5.

      14) Fig 3C is hard to make out.

      We provided a full page version of the Figure 3C in Supplementary Figure 3.

      15) Figs 3,4 should incorporate box and whisker plots of data across all rats scatter plots of individual animal data.

      We are not sure which kind of data the reviewer wants to have displayed here. However, we have provided the Supplementary Figure 5 that contains both ipsilateral and contralateral responses to whiskers stimulation (from both left and right pads) for all trials and for individual animal included in this work.

      16) The final panels in Figures 3,4 would more tellingly include the plots of the linear models fitted.

      Based on all reviewers’ comments, we have adjusted and clarified the statistical analysis performed (see Materials and Method) and completed with a Supplementary Figure 4.

      17) The frame rate calculations are not adding up unless averaging and pauses are included so some more details should be stated. Are tilted plane waves averaged before compounding as in prior publications?

      The angles are averaged 6 times before compounding to reduce signal to noise ration and there is a pause of 0.3s between each Doppler image. See also question “Functional Ultrasound Imaging acquisition” from reviewer 2. We also provided supplementary and key information about the sequence used in this work.

      We have provided complementary information in the manuscript:

      Original text:” The ultrasound sequence generated by the software is the same as in Macé et al.,26 and Brunner, Grillet et al., Briefly, the ultrafast scanner images the brain 140 with 5 tilted plane-waves (-6°, -3°, +0.5°, +3°, +6°) at a 10-kHz frame rate. The 5 plane-wave images are added to create compound images at a frame rate of 500Hz. Each set of 250 compound images is 142 filtered to extract the blood signal. Finally, the intensity of the filtered images is averaged to obtain a 143 vascular image of the rat brain at a frame rate of 1.25Hz. Then, the acquired images are processed with a dedicated GPU architecture, displayed in real-time for data visualization, and stored for subsequent off-line analysis.”

      New text – Line 146:” The ultrasound sequence generated by the software is adapted from Macé et al.31 and Brunner, Grillet et al.34 Ultrafast images of the brain were generated using 5 tilted plane-waves (-6°, -3°, +0.5°, +3°, +6°). Each plane wave is repeated 6 times and the recorded echoes are averaged to increase the signal to noise ration. The 5 plane-wave images are added to create compound images at a frame rate of 500Hz. To obtain a single vascular image we acquired a set of 250 compound images in 0.5s, an extra 0.3s pause is included between each image to have some processing time to display the images for real-time monitoring of the experiment. The set of 250 compound images has a mixed information of blood and tissue signal. To extract the blood signal we apply a low pass filter (cutt off 15Hz) and an SVD filter that eliminates 20 singular values. This filter aims to select all the signal from blood moving with an axial velocity higher than ~1mm/s. To obtain a vascular iimage we compute the intensity of the blood signal i.e., Power Doppler image. This image is in first approximation proportional to the cerebral blood volume26,28. Overall, this process enables a continious acquisition of power Doppler images at a frame rate of 1.25Hz during several hours.”

      18) Ultrasound data processing: The filtering process should have more description. It would be highly instructive to explain that the power Doppler signal is being used and comment clearly on its relationship to blood volume, commenting on stalled flow mircrovessels/RBC-devoid micrrovessels, and considerations of vessel orientation.

      The compound image has a mixed information of blood and tissu signal. To extract the blood signal, we applied a low pass filter (cutt off 15Hz) and an SVD filter that eliminates 20 singular values. This filter selects all the signal from blood moving with an axial velocity higher than ~1mm/s. To obtain a vascular iimage we compute the intensity of the blood signal (Power Doppler image). This power Doppler image is in first approximation proportional to the cerebral blood volume.

      These information have been added in the Materials and Methods section of the manuscript.

      19) Does the SVD processing have the same cut off (20 singular values) as in prior publications as a standard value, or is that adjusted for each study? There are enough minor differences between sequences that these details are uncertain. Do the overall hemodynamics measurements (Fig 2) include all data acquired, or do they exclude the whisker stimulation events, and if so, how long of a window is excluded? The explanation of the activity maps should be rephrased e.g. "... recordings are segmented in shorter 40-s time windows encompassing the whisker stimulation trials..."

      We agree that these details are important, all these information have been added to the manuscript

      • SVD processing: We eliminate 20 singular values as in cited studies.

      • Sequence: we have included more details about the sequence.

      • Processing: all data during the whisker stimulation is used.

      • We have rephrased the explanation about the activity maps.

      20) Discuss the methodology behind histological data shown in Fig. 1.

      We thank the review for highlighting this omission. We have provided a paragraph in the Materials & Methods section detailling the histology procedure (Line 228):

      “Histopathology

      Rats were killed 24hrs after the occlusion for histological analysis of the infarcted tissue. Rats received a lethal injection of pentobarbital (100mg/kg i.p. Dolethal, Vetoquinol, France). Using a peristaltic pump, they were transcardially perfused with phosphate-buffered saline followed by 4% paraformaldehyde (Sigma-Aldrich, USA). Brains were collected and post-fixed overnight. 50-μm thick coronal brain sections across the MCA territory were sliced on a vibratome (VT1000S, Leica Microsystems, Germany) and analyzed using the cresyl violet (Electron Microscopy Sciences, USA) staining procedure (see Open Lab Book for procedure). Slices were mounted with DPX mounting medium (Sigma-Aldrich, USA) and scanned using a bright-field microscope

    1. Author response:

      The following is the authors’ response to the current reviews.

      Reviewer #1 (Public review):

      Summary:

      This work combines molecular dynamics (MD) simulations along with experimental elucidation of the efficacy of ATP as biological hydrotrope. While ATP is broadly known as the energy currency, it has also been suggested to modulate the stability of biomolecules and their aggregation propensity. In the computational part of the work, the authors demonstrate that ATP increases the population of the more expanded conformations (higher radius of gyration) in both a soluble folded mini-protein Trp-cage and an intrinsically disordered protein (IDP) Aβ40. Furthermore, ATP is shown to destabilise the pre-formed fibrillar structures using both simulation and experimental data (ThT assay and TEM images). They have also suggested that the biological hydrotrope ATP has significantly higher efficacy as compared to the commonly used chemical hydrotrope sodium xylene sulfonate (NaXS).

      Strengths:

      This work presents a comprehensive and compelling investigation of the effect of ATP on the conformational population of two types of proteins: globular/folded and IDP. The role of ATP as an "aggregate solubilizer" of pre-formed fibrils has been demonstrated using both simulation and experiments. They also elucidate the mechanism of action of ATP as a multi-purpose solubilizer in a protein-specific manner. Depending on the protein, it can interact through electrostatic interactions (for predominantly charged IDPs like Aβ40), or primarily van der Waals' interactions through (for Trp-Cage).

      Weaknesses:

      The weaknesses and suggestions mentioned in my first review have been adequately addressed by the authors in the revised version of the manuscript.

      Thank you very much for your positive feedback and for taking the time to thoroughly review our manuscript. Your thoughtful comments and suggestions have significantly contributed to enhancing the quality of our work.

      We sincerely appreciate your time and efforts in helping us refine our research.

      Reviewer #3 (Public review):

      Since its first experimental report in 2017 (Patel et al. Science 2017), there have been several studies on the phenomenon in which ATP functions as a biological hydrotrope of protein aggregates. In this manuscript, by conducting molecular dynamics simulations of three different proteins, Trp-cage, Abeta40 monomer, and Abeta40 dimer at concentrations of ATP (0.1, 0.5 M), which are higher than those at cellular condition (a few mM), Sarkar et al. find that the amphiphilic nature of ATP, arising from its molecular structure consisting of phosphate group (PG), sugar ring, and aromatic base, enables it to interact with proteins in a protein-specific manner and prevents their aggregation and solubilize if they aggregate. The authors also point out that in comparison with NaXS, which is the traditional chemical hydrotrope, ATP is more efficient in solubilizing protein aggregates because of its amphiphilic nature.

      Trp-cage, featured with hydrophobic core in its native state, is denatured at high ATP concentration. The authors show that the aromatic base group (purine group) of ATP is responsible for inducing the denaturation of helical motif in the native state.

      For Abeta40, which can be classified as an IDP with charged residues, it is shown that ATP disrupts the salt bridge (D23-K28) required for the stability of beta-turn formation.

      By showing that ATP can disassemble preformed protein oligomers (Abeta40 dimer), the authors suggest that ATP is "potent enough to disassemble existing protein droplets, maintaining proper cellular homeostasis," and enhancing solubility.

      Overall, the message of the paper is clear and straightforward to follow. In addition to the previous studies in the literature on this subject. (J. Am. Chem. Soc. 2021, 143, 31, 11982-11993; J. Phys. Chem. B 2022, 126, 42, 8486-8494; J. Phys. Chem. B 2021, 125, 28, 7717-7731; J. Phys. Chem. B 2020, 124, 1, 210-223), the study, which tested using MD simulations whether ATP is a solubilizer of protein aggregates, deserves some attention from the community and is worth publishing.

      Weakness

      My only major concern is that the simulations were performed at unusually high ATP concentrations (100 and 500 mM of ATP), whereas the real cellular concentration of ATP is 1-5 mM.

      I was wondering if there is any report on a titration curve of protein aggregates against ATP, and what is the transition mid-point of ATP-induced solubility of protein aggregates. For instance, urea or GdmCl have long been known as the non-specific denaturants of proteins, and it has been well experimented that their transition mid-points of protein unfolding are in the range of ~(1 - 6) M depending on the proteins.

      The authors responded to my comment on ATP concentration that because of the computational issue in all-atom simulations, they had no option but to employ mM-protein concentrations instead of micromolar concentrations, thus requiring 1000-folds higher ATP concentration, which is at least in accordance with the protein/ATP stoichiometry. However, I believe this is an issue common to all the researchers conducting MD simulations. Even if the system is in the same stoichiometric ratio, it is never clear to me (is it still dilute enough?) whether the mechanism of solubilization of aggregate at 1000 fold higher concentration of ATP remains identical to the actual process.

      Thank you for your thoughtful feedback and for recognizing the value of our study. We appreciate your detailed review and the constructive comments you have provided.

      We appreciate your understanding of the inherent limitations in MD simulations. The use of higher ATP concentrations in our simulations stems from the computational challenges of all-atom MD simulations. Due to the practical constraints of simulating micromolar protein concentrations in atomistic detail, we employed millimolar protein concentrations, which necessitated the use of ATP concentrations that are proportionally higher to maintain appropriate stoichiometry between ATP and proteins.

      We fully agree with your point that this is a common issue faced by researchers in the MD simulation community. While it is challenging to directly replicate physiological ATP concentrations in atomistic simulations, we believe that our approach still captures the fundamental interactions between ATP and proteins. In particular, our focus was on the relative behaviors and mechanistic insights, rather than absolute concentration effects. We based our choice of ATP concentration on maintaining stoichiometric ratios with the protein concentration to ensure that the molecular mechanisms observed remain relevant. We hope our clarification addresses your concerns.

      We would like to share that in an ongoing study focused on the role of ATP in influencing the liquid-liquid phase separation behavior of several intrinsically disordered proteins, we are employing a coarse-grained model. This approach allows us to maintain ATP concentrations within physiologically relevant ranges, as simulating micromolar protein concentrations becomes computationally feasible with this method. We believe that this complementary work will provide additional insights into the behavior of ATP at concentrations more reflective of cellular conditions and further validate the findings from our current study.

      We would also like to emphasize that the complementary experiments presented in this study were conducted at physiologically relevant concentrations for both protein and ATP. The experimental results are in strong agreement with our computational findings, supporting the hypothesis that the mechanisms observed in the simulations closely reflect the actual biological process.

      --—-

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      This work combines molecular dynamics (MD) simulations along with experimental elucidation of the efficacy of ATP as a biological hydrotrope. While ATP is broadly known as the energy currency, it has also been suggested to modulate the stability of biomolecules and their aggregation propensity. In the computational part of the work, the authors demonstrate that ATP increases the population of the more expanded conformations (higher radius of gyration) in both a soluble folded mini-protein Trp-cage and an intrinsically disordered protein (IDP) Aβ40. Furthermore, ATP is shown to destabilise the pre-formed fibrillar structures using both simulation and experimental data (ThT assay and TEM images). They have also suggested that the biological hydrotrope ATP has significantly higher efficacy as compared to the commonly used chemical hydrotrope sodium xylene sulfonate (NaXS).

      Strengths:

      This work presents a comprehensive and compelling investigation of the effect of ATP on the conformational population of two types of proteins: globular/folded and IDP. The role of ATP as an "aggregate solubilizer" of pre-formed fibrils has been demonstrated using both simulation and experiments. They also elucidate the mechanism of action of ATP as a multi-purpose solubilizer in a protein-specific manner. Depending on the protein, it can interact through electrostatic interactions (for predominantly charged IDPs like Aβ40), or primarily van der Waals' interactions through (for Trp-Cage).

      Weaknesses:

      The data presented by the authors are sound and adequately support the conclusions drawn by the authors. However, there are a few points that could be discussed or elucidated further to broaden the scope of the conclusions drawn in this work as discussed below:

      (i) The concentration of ATP used in the simulations is significantly higher (500 mM) as compared to those used in the experiments (6-20 mM) or cellular cytoplasm (~5 mM as mentioned by the authors). Since the authors mention already known concentration dependence of the effect of ATP, it is worth clarifying the possible limitations and implications of the high ATP concentrations in the simulations.

      We thank the reviewer for their concern regarding the ATP concentration used in our simulation. The reviewer correctly noted our statement about cellular ATP concentrations being in the range of a few millimolar. We would like to highlight that, in a cellular environment, millimolar ATP concentrations coexist with micromolar protein concentrations in the aqueous phase [1].

      In our study, we focused on the impact of ATP on protein conformational dynamics, primarily simulating a protein monomer within the simulation box. If one was required to maintain a micromolar protein concentration (e.g., 20 μM [1]) for a monomeric protein, a MD simulation box of significant dimensions (~44x44x44 nm³) would be required, which is computationally challenging to simulate at an atomistic resolution due to the excessive computational cost and time. We had observed a severe reduction of performance of simulation (with Gromacs software of version 2018.6) of more than 150 times for the 20 μM Aβ40 protein in 20 mM ATP solution containing 50 mM NaCl salt which is comprised in the simulation box of ~ 44x44x44 nm³ in comparison to the current simulation set up we have employed in our study).

      To ensure computational efficiency, we employed a simulation protocol that would maintain the cellular protein/ATP stoichiometry. Similar to the stoichiometry in the cellular environment (i.e., micromolar protein : millimolar ATP ~ 103), our simulations maintained a consistent ratio (i.e., millimolar protein : molar ATP ~ 103). This approach allowed us to use a smaller simulation box while preserving the relevant stoichiometry, enabling us to leverage data within a realistic timeframe.

      Based on the reviewer comment we have included the explanation in the revised manuscript as “In this study, we opted to maintain the ATP stoichiometry consistent with biological conditions and previous in vitro experiments. Instead of keeping the protein concentration within the micromolar range and ATP concentration at the millimolar level, we chose this approach to avoid the need for an extremely large simulation box, which would greatly reduce computational efficiency by more than 150-fold.” (page 4).

      However, during our experimental measurements we have maintained micromolar concentration of protein and ATP concentration in the millimolar range, which lies consistent with the former in vitro experimental studies [1].

      It seems ATP can stabilise the proteins at low concentrations, but the current work does not address this possible effect. It would be interesting to see whether the effect of ATP on globular proteins and IDPs remains similar even at lower ATP concentrations.

      We thank the reviewer for raising this point. We would like to refer you to the Discussion and Conclusion sections of our manuscript (on page 18), where we have noted ATP’s concentration-dependent actions on protein homeostasis, incorporating insights from previous literature as well: “In our literature survey of ATP's concentration-dependent actions, as detailed in the Introduction section, we observed a dual role where ATP induces protein liquid-liquid phase separation at lower concentrations and promotes protein disaggregation at higher concentrations [2–4]. These versatile functions emphasize ATP's pivotal role in maintaining a delicate balance between protein stability (at low ATP concentrations) and solubility (at high ATP concentrations) for effective proteostasis within cells. Notably, ATP-mediated stabilization primarily targets soluble proteins, particularly those with ATP-binding motifs, while ATP-driven biomolecular solubilization is observed for insoluble proteins, typically lacking ATP-binding motifs.”. We explain that ATP stabilizes proteins at lower concentrations, primarily targeting those with ATP-binding motifs, as illustrated by a sequence-dependent analysis. Since the proteins we studied (Trp-cage and Aβ40) do not contain any ATP-binding motifs, ATP-guided protein stabilization is not expected for these proteins. Additionally, we presented a set of simulations for Trp-cage with a comparatively lower concentration of ATP (see Figure 2), which also suggests

      ATP-driven protein chain elongation. Thus, we believe that ATP’s effect on globular proteins and intrinsically disordered proteins (IDPs) lacking ATP-binding motifs would remain similar at lower ATP concentrations.”

      (ii) The authors make a somewhat ambitious statement that the role of ATP as a solubilizer of pre-formed fibrils could be used as a therapeutic strategy in protein aggregation-related diseases. However, it is not clear how it would be so since ATP is a promiscuous substrate in several biochemical processes and any additional administration of ATP beyond normal cellular concentration (~5 mM) could be detrimental.

      The authors thank the reviewer for this comment. In conjunction with earlier studies on the non-energetic effects of ATP, our study underscores ATP’s anti-aggregation properties and its ability to dissolve preformed aggregates, thereby maintaining regular protein homeostasis within cells and inhibiting protein aggregation-related diseases. Consequently, ATP has been proposed as a probable therapeutic agent in multiple previous reports [5–8]. Patel et al. also noted that as ATP levels decrease with age, this can lead to increased protein aggregation and neurodegenerative decline [1]. Therefore, the problem of excessive protein aggregation in cells may be linked to the reduction of ATP levels with aging [1,8–12]. In such circumstances, authors hypothesize introducing ATP as part of a therapeutic treatment might address the issue of excessive protein aggregation and neurodegenerative diseases.

      (iii) A natural question arises about what is so special about ATP as a solubilizer. The authors have also asked this question but in a limited scope of comparing to a commonly used chemical hydrotrope NaXS. However, a bigger question would be what kind of chemical/physical features make ATP special? For example, (i) if the amphiphilic property is important, what about some standard surfactants? (ii) how would ATP compare to other nucleotides like ADP or GTP? It might be useful to explore such questions in the future to further establish the special role of ATP in this regard.

      We thank the reviewer for recognizing the significance and value of our exploration into the unique properties of ATP as a solubilizer. In response to the reviewer’s comment regarding the specific features that make ATP special, we would like to emphasize our analysis of ATP's region-specific interactions with biomolecules. ATP's unique structure, comprising three distinct moieties- a larger hydrophobic aromatic base, a hydrophilic sugar moiety, and a highly negatively charged phosphate group, enables it to perform multiple modes of interactions, including hydrophobic, hydrogen bonding, and electrostatic interactions with proteins. This combination of interactions leads to its pronounced effect in a protein-specific manner. We believe that, together with its amphiphilic property, the specific chemical structure of ATP makes it an efficient solubilizer. A previous study by Patel et al. demonstrated the efficiency of ATP as a biological hydrotrope compared to other classical chemical hydrotropes (NaXS and NaTO). Our current study further rationalizes ATP’s efficiency through its effective interactions with biomolecules, driven by the chemically distinct parts of the ATP molecule.

      Regarding the reviewer’s point about comparing ATP as a hydrotrope with standard surfactants, we would like to add that typically, hydrotropes are amphiphilic molecules that differ from classical surfactants due to their low cooperativity of aggregation and their effectiveness at molar concentrations. Hydrotropes tend to preferentially accumulate non stoichiometrically around the solute, and their aggregation depends on the presence of solute molecules. Unlike surfactants, hydrotropes do not form any well-defined superstructure on their own.

      In response to the reviewer’s comment on comparing ATP’s effect with other nucleotides like ADP and GTP, we would like to highlight that previous studies have shown GTP to dissolve protein droplets (FUS) with similar efficiency to ATP. However, in cells, the concentration of GTP is much lower than that of ATP, resulting in negligible effects on the solubilization of liquid compartments in vivo. Conversely, ADP and AMP exhibited comparatively lower efficiency in dissolving protein condensates, suggesting the triphosphate moiety plays a considerable role in protein condensate dissolution. Additionally, only TP-Mg had a negligible effect on protein drop dissolution, indicating that the charge density in the ionic ATP side chain alone is insufficient for dissolving protein drops. Together, these findings highlight the efficiency of ATP as a protein aggregate solubilizer, which stems from its specific chemical structure and not merely its amphiphilicity.

      According to the suggestion of the reviewer we have included the discussion in the revised manuscript as “Comparing the effects of ATP with other nucleotides such as ADP and GTP, we emphasize that previous studies have demonstrated GTP can dissolve protein droplets (such as FUS) with efficiency comparable to ATP. However, in vivo, the concentration of GTP is significantly lower than that of ATP, resulting in negligible impact on the solubilization of liquid compartments. In contrast, ADP and AMP show much lower efficiency in dissolving protein condensates, indicating the critical role of the triphosphate moiety in protein condensate dissolution. Furthermore, only TP-Mg exhibited a negligible effect on protein droplet dissolution, suggesting that the charge density in the ionic ATP side chain alone is insufficient for this process. These findings underscore ATP's superior efficacy as a protein aggregate solubilizer, attributed to its specific chemical structure rather than merely its amphiphilicity.” (page 15).

      (iv) In Figure 2F, it seems that in the presence of 0.5 M ATP, the Rg increases (as expected), but the number of native contacts remains almost similar. The reduction in the number of native contacts at higher ATP concentrations is not as dramatic as the increase in Rg. This is somewhat counterintuitive and should be looked into. Normally one would expect a monotonous reduction in the number of native contacts as the protein unfolds (increase in Rg).

      We appreciate the reviewer’s insightful comment. As noted, the presence of 0.5 M ATP results in an increase in the protein’s radius of gyration (Rg) and a decrease in native contacts, indicating that ATP promotes protein chain extension. However, the extent of the changes in Rg and native contacts are not identical. It is important to recognize that even the disruption of a few native contacts can significantly impact protein folding, leading to considerable protein chain extension. Therefore, it is not necessary for the extent of variation in Rg and native contacts to be similar. The appropriate measure is whether the alterations in these two variables are consistent with each other, such that an increase in Rg is accompanied by a decrease in native contacts, and vice versa.

      Reviewer #1 (Recommendations For The Authors):

      (i) There are several references repeated multiple times, e.g. (a) 1, 9, 14, (b) 25, 29, 31, 33. There are more such examples and the authors should fix these.

      We thank the reviewer for pointing this out. We have addressed the issue in the updated manuscript.

      (ii) Specific Gromacs version should be mentioned rather than 20xx.

      In the updated manuscript we have mentioned the particular version of Gromacs software (2018.6) we have employed for our simulation.

      Reviewer #2 (Public Review):

      In this work, Sarkar et al. investigated the potential ability of adenosine triphosphate (ATP) as a solubilizer of protein aggregates by combining MD simulations and ThT/TEM experiments. They explored how ATP influences the conformational behaviors of Trp-cage and β-amyloid Aβ40 proteins. Currently, there are no experiments in the literature supporting their simulation results of ATP on Trp-cage. The simulation protocol employed for the Aβ40 monomer system is conventional MD simulation, while REMD simulation (an enhanced sampling method) is used for the Aβ monomer + ATP system. It is not clear whether the conformational difference is caused by ATP or by the different simulation methods used.

      We thank the reviewer for raising this point. First we note that for Trp-cage, the simulation methods employed in presence and absence of ATP were identical (REMD simulation) and the difference in the free energy surfaces due to introduction of ATP in the solution were evident.

      Nonetheless to address referee’s point if the difference in simulation method employed for generating the 2D free energy landscape in absence and presence of ATP would have introduced the observed difference, we had undertaken the initiative of carrying out a fresh set of REMD simulations with Aβ40 in neat water, followed by adaptive sampling simulation. As shown below in Author response image 1, the free energy profiles obtained from conventional MD simulation (using DESRES trajectory) as well as those obtained via REMD simulations for the same system (in neat water) are qualitatively similar. The free energy profiles obtained in presence of ATP are significantly different from that of neat water, irrespective of the simulation method. This confirms the simulation’s observation of ATP driven alteration of protein conformation.

      Author response image 1.

      Image represents the 2D free energy profile for Aβ40 monomer in absence of ATP, obtained through A. conventional MD and B. REMD simulation followed by adaptive sampling simulation.

      In the revised manuscript we have included the discussion as “To verify that the effect of ATP on conformational landscape is not an artifact of difference in sampling method (long conventional MD in absence of ATP versus REMD in presence of ATP), we repeated the conformational sampling in absence of ATP via employing REMD, augmented by adaptive sampling (figure S4). We find that the free energy map remains qualitatively similar (figure 4A and S4) irrespective the sampling technique. Comparison of 2D free energy map obtained from REMD simulation in absence of ATP (figure S4) with the one obtained in presence of ATP (figure 4B) also indicates ATP driven protein chain elongation.” on page 7 and updated the method section as “To test the robustness we have also estimated the 2D free energy profile of Aβ40 in absence of ATP by performing a similar REMD simulation followed by adaptive sampling simulation following the similar protocol described above.” on page 20.

      ThT/TEM experiments should be performed on Aβ40 fibrils rather than on Aβ(16-22) aggregates. Moreover, to elucidate their experimental results that ATP can dissolve preformed Aβ fibrils, the authors need to study the influence of ATP on Aβ fibrils instead of on Aβ dimer in their MD simulations. The novelty of this study is limited. The role of ATP in inhibiting Aβ fibril formation and dissolving preformed Aβ fibrils has been reported in previous experimental and computational studies (Journal of Alzheimer's Disease, 2014, 41: 561; Science 2017, 2017, 356, 753-756 J. Phys. Chem. B 2019, 123, 9922−9933; Scientific Reports, 2024, 14: 8134). However, most of those papers are not discussed in this manuscript. Additionally, some details of MD simulations and data analysis are missing in the manuscript, including the initial structures of all the simulations, the method for free energy calculation, the dielectric constant used, etc.

      We thank the reviewer for pointing out additional papers on ATP that were not discussed in the original manuscript. While some of the suggested papers were already cited (Science 2017, 356, 753-756), we had initially excluded the others as we did not find them directly relevant to our focus. However, in this revised version, we have included those references (on page 17 and 18).

      Through a thorough literature review, including the papers suggested by the reviewer, we maintain that our article is novel in its investigation of ATP's role in the protein conformational landscape and its correlation with anti-aggregation effects. While previous reports emphasize ATP's role in inhibiting protein aggregation, our work connects these findings by highlighting ATP's influence starting at the monomeric level, thereby preventing proteins from becoming aggregation-prone.

      In the revised manuscript, we have included this justification as “While previous reports emphasize ATP's role in inhibiting protein aggregation, our work connects these findings by highlighting ATP's influence starting at the monomeric level, thereby preventing proteins from becoming aggregation-prone.” on page 18.

      Regarding the reviewer's concern on the details of MD simulations, we would like to mention that method part of the current article provides an elaborate explanation of the simulation set up and characterization (on page 19-21). Regarding the reviewer's comment on dielectric constant, we would like to emphasize that here we have performed simulation considering explicit presence of solvent (water molecules), which by default takes into account dielectric constants (unlike many approximate continuum modelling approaches).

      Reviewer #2 (Recommendations For The Authors):

      (1) The convergence of simulations needs to be verified prior to data analysis.

      We thank the reviewer for this suggestion. We have assessed the convergence of the simulations and represented the respective plots in Author response image 2.

      Author response image 2.

      The time profile of temperature (a, c, e and g) and energies i.e. kinetic energy, potential energy and total energy (b, d, f and h) are being represented for Trp-cage in absence (a-b) and presence of 0.5 MATP (c-d) and Aβ40 protein in absence (e-f) and presence of 0.5 M ATP (g-h).

      (2) "The precedent experiments investigating protein aggregation in the presence of ATP, had been performed by maintaining the ATP:protein stoichiometric ratio in the range of 0.1x10x3 to 1.6x10x3. Likewise, in our simulation with Trp-cage, the ATP:protein ratio of 0.02x10x3 was maintained.". Clearly, there is a big difference between the ATP:protein ratio in the MD simulations and that in the precedent experiments.

      We thank the reviewer for raising this point. We would like to clarify that for unstructured proteins, including Aβ40, the ATP stoichiometry [1] ranged from 0.1 × 10³ to 1.6 × 10³. In our study, we have maintained the ATP stoichiometry at 0.1 × 10³ for the disordered protein Aβ40. For structured globular mini-protein like Trp-cage, a lower concentration of 0.02 × 10³ was used, consistent with other studies investigating the effects of ATP on globular proteins such as ubiquitin, lysozyme, and malate dehydrogenase, where the ATP stoichiometry ranged [13] from 0.01 × 10³ to 0.03 × 10³.

      In the revised manuscript we have clearly mentioned the point as “The precedent studies reporting the effect of ATP on structured proteins, had been performed by maintaining ATP:protein stoichiometric ratio in the range of 0.01x103 to 0.03x103. Likewise, in our simulation with Trp-cage, the ATP:protein ratio of 0.02x103 was maintained. ” in page 4 and “The former experiments investigating protein (unstructured) aggregation in presence of ATP, had been performed by maintaining ATP:protein stoichiometric ratio in the range of 0.1x103 to 1.6x103, similarly we have also maintained ATP/protein stoichiometry 0.1x103 in our investigation ATP’s effect on disordered protein Aβ40.” in page 7.

      However, during our experimental measurements we have maintained micromolar concentration of protein and ATP concentration in the millimolar range, which lies consistent with the former in vitro experimental studies [1].

      (3) The snapshots in Figure 2G show that in the absence of ATP, the Trp-cage monomer exhibits only minor conformational changes compared to the NMR structure (PDB: 1L2Y). However, the native contact number of the Trp-cage monomer (~18, Figure 2C) is much smaller than the total contact number (~160, Figure 2B). The authors are suggested to explain this unexpectedly large difference.

      The authors thank the reviewer for his/her concern related to the values of native contact and the total number of contacts of the protein Trp-cage. The author would like to highlight that the estimation of total number of contacts involves the cumulative number of intra-protein contacts which calculates when the two atoms of the protein’s come within the cut-off distance (0.8 nm). Whereas native contact only considers the key contacts of the protein between the side chains of two amino acids that are not adjacent in the amino acid sequence.

      (4) The authors are suggested to calculate the contact numbers of each residue with different parts of ATP (phosphate group, base, and sugar moiety), which will help to reveal the key interactions between ATP and proteins.

      The authors thank the reviewer for this comment. According to the suggestion we have calculated the contact probability of each residue of protein with ATP as depicted in Author response image 3 and 4 for Trp-cage and Aβ40 respectively.

      Author response image 3.

      The figure shows the residue wise contact probability of protein Trp-cage with ATP.

      Author response image 4.

      The image shows the residue wise contact probability of Aβ40 protein with ATP.

      For detailed interaction of ATP’s region-specific interactions with proteins, the authors would like to refer to the calculation of the preferential binding coefficient and interaction energies as depicted in Figure 3 for Trp-cage (in page 6) and in Figure 5 and 8 for Aβ40 protein. These figures illustrate well the mode of protein interaction with the chemically divergent regions of ATP and also illuminates ATP’s interaction with different parts of the proteins as well.

      (5) The authors claimed that "coulombic interaction of ATP with protein predominates in Aβ40 (Figure 5 H)" (Page 10). However, the preferential interaction coefficient in Figure 5G shows that the curve of the phosphate group lies below the other two curves when distance < 1 nm, indicating the relatively weak interactions between the phosphate group and Aβ40. This seems to be in conflict with the results of energy calculation (Figure 5H).

      We thank the reviewer for raising this point. The author would like to emphasize that ATP, with its large and highly charged phosphate group, is highly likely to interact with intrinsically disordered proteins (IDPs) primarily through electrostatic interactions due to their significant charge content. In Figure 5G, it is evident that the preferential binding coefficient reaches a notably high value, indicating strong interaction between the protein and the charged phosphate group of ATP. To address the reviewer's concern regarding the curve showing the highest interaction value only after 1 nm, we would like to highlight the nature of long-range electrostatic potential, which is active in the range of approximately 1-1.2 nm [14–16]. Furthermore, Figure 5H confirms that the electrostatic interaction between the protein and ATP is favorable and predominates over the Lennard-Jones (LJ) interaction.

      (6) There are several issues with citations. For example, references 2, 5, 24, 28, 32, 45. 49 and 53 are the same paper, references 1, 7, and 14 are the same paper, references 12, 15, and 46 are the same paper, and many more. In addition, the title of reference 12/15 is "ATP Controls the Aggregation of Aβ16-22 Peptides" instead of "ATP Controls the Aggregation of Aβ Peptides".

      We thank the reviewer for pointing this out. We have addressed the issue in the updated manuscript.

      (7) References 19 and 20 are cited in the context of "As a potential function of the excess ATP concentration within the cell, a substantial influence on cellular protein homeostasis is observed, particularly in preventing protein aggregation (14-21)" (Page 2). However, there is no mention of "ATP" in ref. 19 and 20.

      Thank you to the reviewer for identifying this mistake. We have corrected the issue in the revised manuscript.

      (8) On page 22: "To perform all the molecular dynamics (MD) simulations GROMACS software of version 20xx software was utilized". Please provide the version of GROMACS software used in this study.

      In the updated manuscript, we have specified the particular version of Gromacs software (2018.6) used for our simulations. (see revised manuscript page 19)

      (9) In Figure 8J, the time-dependent distance of Aβ40 dimer without ATP needs to be provided as a comparison.

      We thank the reviewer for this comment. In the revised manuscript we have updated the calculation of distance between the Aβ40 protein chains both in absence and presence of ATP as well as “The probability distribution (Figure 8J) illustrates that, in the presence of ATP, the two protein chains, initially part of the dimer, become prone to be moved away from each other.” (page 15).

      (10) The authors should compare ATP-Aβ interactions with NaXS-Aβ interactions to understand why ATP is more efficient than NaXS in inhibiting interprotein interactions.

      The authors thank the reviewer for the concern regarding the ATP-Aβ40 interaction compared to the NaXS-Aβ40 interaction. We would like to highlight our results (Figure 5G and H) which demonstrate the dominance of Coulombic interactions (over LJ interactions) of ATP with the protein. Based on this, we compared the Coulombic interaction energy of ATP and NaXS with the protein Aβ40, as depicted in Figure 9I. We observed that ATP-protein electrostatic interactions occur more favorably than those with NaXS, leading to better action of ATP over NaXS. The favorable electrostatic interaction of ATP with the protein, compared to NaXS, is evident because ATP possesses a large and highly charged triphosphate group that can strongly interact with the protein, whereas NaXS contains a very small sulfonate group with much less charge. Therefore, due to the favorable Coulombic interaction of ATP with the protein over NaXS, ATP acts more efficiently as a hydrotrope. In the revised manuscript we have highlighted the term “Coulombic interaction” in the main text and in the figure caption (Figure 9) as well (in page 15 and 16 of the revised manuscript respectively).

      (11) The word "sollubilizer" in the Abstract is a typo.

      We thank the reviewer for pointing this out. We have made the necessary corrections in the revised manuscript.

      (12) What does "ATP-Mg2+" mean in the manuscript?

      ATP, being polyanionic and possessing a potentially chelating polyphosphate group, binds metal cations with high affinity and hence biologically it occurs to be complexed with the equivalent number of Mg2+ in the form of ATP-Mg [17–19]. Similarly multiple former studies utilized ATP-Mg in their investigations [1,20–22].

      Reviewer #3 (Public Review):

      Summary:

      Since its first experimental report in 2017 (Patel et al. Science 2017), there have been several studies on the phenomenon in which ATP functions as a biological hydrotrope of protein aggregates. In this manuscript, by conducting molecular dynamics simulations of three different proteins, Trp-cage, Abeta40 monomer, and Abeta40 dimer at a high concentration of ATP (0.1, 0.5 M), Sarkar et al. find that the amphiphilic nature of ATP, arising from its molecular structure consisting of phosphate group (PG), sugar ring, and aromatic base, enables it to interact with proteins in a protein-specific manner and prevents their aggregation and solubilize if they aggregate. The authors also point out that in comparison with NaXS, which is the traditional chemical hydrotrope, ATP is more efficient in solubilizing protein aggregates because of its amphiphilic nature.

      Trp-cage, featured with a hydrophobic core in its native state, is denatured at high ATP concentration. The authors show that the aromatic base group (purine group) of ATP is responsible for inducing the denaturation of helical motifs in the native state.

      For Abeta40, which can be classified as an IDP with charged residues, it is shown that ATP disrupts the salt bridge (D23-K28) required for the stability of beta-turn formation.

      By showing that ATP can disassemble preformed protein oligomers (Abeta40 dimer), the authors argue that ATP is "potent enough to disassemble existing protein droplets, maintaining proper cellular homeostasis," and enhancing solubility.

      Overall, the message of the paper is clear and straightforward to follow. I did not follow all the literature, but I see in the literature search, that there are several studies on this subject. (J. Am. Chem. Soc. 2021, 143, 31, 11982-11993; J. Phys. Chem. B 2022, 126, 42, 8486-8494; J. Phys. Chem. B 2021, 125, 28, 7717-7731; J. Phys. Chem. B 2020, 124, 1, 210-223).

      If this study is indeed the first one to test using MD simulations whether ATP is a solubilizer of protein aggregates, it may deserve some attention from the community. But, the authors should definitely discuss the content of existing studies, and make it explicit what is new in this study.

      Strengths:

      The authors showed that due to its amphiphilic nature, ATP can interact with different proteins in a protein-specific manner, a. finding more general and specific than merely calling ATP a biological hydrotrope.

      Weaknesses:

      (1) My only major concern is that the simulations were performed at unusually high ATP concentrations (100 and 500 mM of ATP), whereas the real cellular concentration of ATP is 1-5 mM. Even if ATP is a good solubilizer of protein aggregates, the actual concentration should matter. I was wondering if there is a previous report on a titration curve of protein aggregates against ATP, and what is the transition mid-point of ATP-induced solubility of protein aggregates.

      For instance, urea or GdmCl have long been known as the non-specific denaturants of proteins, and it has been well experimented that their transition mid-point of protein unfolding is ~(1 - 6) M depending on the proteins.

      We thank the reviewer for their concern regarding the ATP concentration used in our simulation. The reviewer correctly noted our statement about cellular ATP concentrations being in the range of a few millimolar. We would like to highlight that, in a cellular environment, millimolar ATP concentrations coexist with micromolar protein concentrations in the aqueous phase.

      In our study, we focused on the impact of ATP on protein conformational dynamics, primarily simulating a protein monomer within the simulation box. To maintain a micromolar protein concentration (e.g., 20 μM [1]) for a monomeric protein, a simulation box of significant dimensions (~44x44x44 nm³) would be required. This size would be computationally challenging to simulate at an atomistic resolution due to the excessive computational cost and time.

      To ensure computational efficiency, we employed millimolar protein concentrations instead of micromolar, thus requiring a higher ATP concentration to maintain the cellular protein stoichiometry. Similar to the stoichiometry in the cellular environment (i.e., micromolar protein : millimolar ATP ~ 103), our simulations maintained a consistent ratio (i.e., millimolar protein : molar ATP ~ 103). This approach allowed us to use a smaller simulation box while preserving the relevant stoichiometry, enabling us to leverage data within a realistic timeframe.

      Based on the reviewer comment we have included the explanation in the revised manuscript as “In this study, we opted to maintain the ATP stoichiometry consistent with biological conditions and previous in vitro experiments. Instead of keeping the protein concentration within the micromolar range and ATP concentration at the millimolar level, we chose this approach to avoid the need for an extremely large simulation box, which would greatly reduce computational efficiency by more than 150-fold.” (page 4).

      However, during our experimental measurements we have maintained micromolar concentration of protein and ATP concentration in the millimolar range, which lies consistent with the former in vitro experimental studies [1]

      (2) The sentence "... a clear shift of relative population of Abeta40 conformational subensemble towards a basin with higher Rg and lower number of contacts in the presence of ATP" is not a precise description of Figures 4A and 4B. It is not clear from the figures whether the Rg of Abeta40 is increased when Abeta40 is subject to ATP. The authors should give a more precise description of what is observed in the result from their simulations or consider a better-order parameter to describe the change in molecular structure.

      We thank the reviewer for this comment. Figure 4A and 4B depicting the 2D free energy profile of the Aβ40 protein with respect to Rg and total number contacts are presented to pinpoint the alteration of protein conformational landscape in influence of ATP. To further elucidate ATP driven protein conformational alteration, the overlaid snapshots corresponding to absence and presence of ATP were also provided. Together the author believes that the descriptions of Figures 4A and 4B in the article are appropriate and effectively incorporate the analysis provided in the article.

      In addition, the disruption of beta-sheet from Figure 4E to 4F is not very clear. The authors may want to use an arrow to indicate the region of the contact map associated with this change.

      In the revised manuscript the authors have highlighted the region of the contact map associated with the changes in the beta-sheet propensity with an arrow for each of the plots.

      Although the full atomistic simulations were carried out, the analyses demonstrated in this study are a bit rudimentary and coarse-grained (e.g, Rg is a rather poor order parameter to discuss dynamics involved in proteins). The authors could go beyond and say more about how ATP interacts with proteins and disrupts the stable configurations.

      We thank the reviewer for this comment. We understand the reviewer's concern regarding the choice of the order parameter (Rg), which has been a topic of long-standing debate. However, we would like to note that in the current study, we employed Rg based on recent investigations by Dr. D. E. Shaw Research group [23] (specifically concerning the protein Aβ40 and the Charmm36m force field), which reported an almost negligible Rg penalty compared to experimental values. The experiments characterizing IDPs utilize Rg as a choice of metric. We also would like to highlight that previous investigations of our group have done careful benchmarking of several features of proteins as well as IDPs using both linear and artificial neural network based dimension reduction techniques and have demonstrated that Rg, in combination with fraction of native contact serves as optimum features [24,25]. Therefore, we believed that Rg would be a suitable order parameter for analyzing the structural behavior of this protein. Additionally, we have also analyzed other relevant characteristics, including the total number of contacts, residue-wise protein contact map, percentage of secondary structure, solvent-accessible surface area, and distances between key interacting residues, to provide a comprehensive understanding.

      The justification of our choice of collective variable has been discussed in the revised manuscript as “Since multiple previous studies has reported benchmarking of several features of proteins as well as IDPs using both linear and artificial neural network based dimension reduction techniques and have demonstrated that Rg, in combination with fraction of native contact serves as optimum features, we have chosen these two metrics for developing the 2D free energy profile.” on page 4.

      (3) Although the amphiphilic character of ATP is highlighted, a similar comment can be made as to GTP. Is GTP, whose cellular concentration is ~0.5 mM, also a good solubilizer of protein aggregates? If not, why? Please comment.

      In response to the reviewer’s comment on comparing ATP’s effect with other nucleotides GTP, we would like to highlight that previous studies have shown GTP’s ability to dissolve protein droplets (FUS) with similar efficiency to ATP [1,26]. However, in cells, the concentration of GTP is much lower than that of ATP, resulting in negligible effects on the solubilization of liquid compartments in vivo [1].

      According to the suggestion of the reviewer we have included the discussion in the revised manuscript as “Comparing the effects of ATP with other nucleotides such as ADP and GTP, we emphasize that previous studies have demonstrated GTP can dissolve protein droplets (such as FUS) with efficiency comparable to ATP. However, in vivo, the concentration of GTP is significantly lower than that of ATP, resulting in negligible impact on the solubilization of liquid compartments. In contrast, ADP and AMP show much lower efficiency in dissolving protein condensates, indicating the critical role of the triphosphate moiety in protein condensate dissolution. Furthermore, only TP-Mg exhibited a negligible effect on protein droplet dissolution, suggesting that the charge density in the ionic ATP side chain alone is insufficient for this process. These findings underscore ATP's superior efficacy as a protein aggregate solubilizer, attributed to its specific chemical structure rather than merely its amphiphilicity.” (page 15).

      Reviewer #3 (Recommendations For The Authors):

      Spell-check should be carried out throughout the manuscript. e.g., sollubilizer, sollubilizing, ...

      We thank the reviewer for pointing this out. We have made the necessary corrections in the revised manuscript.

      The reference section should be properly organized. There are multiple repetitions of references (e.g., references 28, 30, 32 are the same reference). I see many instances of this.

      We thank the reviewer for pointing this out. We have addressed the issue in the updated manuscript.

      References:

      (1) Patel, A.; Malinovska, L.; Saha, S.; Wang, J.; Alberti, S.; Krishnan, Y.; Hyman, A. A. ATP as a Biological Hydrotrope. Science 2017, 356 (6339), 753–756.

      (2) Ren, C.-L.; Shan, Y.; Zhang, P.; Ding, H.-M.; Ma, Y.-Q. Uncovering the Molecular Mechanism for Dual Effect of ATP on Phase Separation in FUS Solution. Sci Adv 2022, 8 (37), eabo7885.

      (3) Song, J. Adenosine Triphosphate Energy-Independently Controls Protein Homeostasis with Unique Structure and Diverse Mechanisms. Protein Sci. 2021, 30 (7), 1277–1293.

      (4) Liu, F.; Wang, J. ATP Acts as a Hydrotrope to Regulate the Phase Separation of NBDY Clusters. JACS Au 2023, 3 (9), 2578–2585.

      (5) Chu, X.-Y.; Xu, Y.-Y.; Tong, X.-Y.; Wang, G.; Zhang, H.-Y. The Legend of ATP: From Origin of Life to Precision Medicine. Metabolites 2022, 12 (5). https://doi.org/10.3390/metabo12050461.

      (6) Tian, Z.; Qian, F. Adenosine Triphosphate-Induced Rapid Liquid-Liquid Phase Separation of a Model IgG1 mAb. Mol. Pharm. 2021, 18 (1), 267–274.

      (7) Wang, B.; Zhang, L.; Dai, T.; Qin, Z.; Lu, H.; Zhang, L.; Zhou, F. Liquid-Liquid Phase Separation in Human Health and Diseases. Signal Transduct Target Ther 2021, 6 (1), 290.

      (8) Alberti, S.; Dormann, D. Liquid-Liquid Phase Separation in Disease. Annu. Rev. Genet. 2019, 53, 171–194.

      (9) Nair, K. S. Aging Muscle. Am. J. Clin. Nutr. 2005, 81 (5), 953–963.

      (10) Recharging Mitochondrial Batteries in Old Eyes. Near Infra-Red Increases ATP. Exp. Eye Res. 2014, 122, 50–53.

      (11) Goldberg, J.; Currais, A.; Prior, M.; Fischer, W.; Chiruta, C.; Ratliff, E.; Daugherty, D.; Dargusch, R.; Finley, K.; Esparza-Moltó, P. B.; Cuezva, J. M.; Maher, P.; Petrascheck, M.; Schubert, D. The Mitochondrial ATP Synthase Is a Shared Drug Target for Aging and Dementia. Aging Cell 2018, 17 (2). https://doi.org/10.1111/acel.12715.

      (12) Kagawa, Y.; Hamamoto, T.; Endo, H.; Ichida, M.; Shibui, H.; Hayakawa, M. Genes of Human ATP Synthase: Their Roles in Physiology and Aging. Biosci. Rep. 1997, 17 (2), 115–146.

      (13) Ou, X.; Lao, Y.; Xu, J.; Wutthinitikornkit, Y.; Shi, R.; Chen, X.; Li, J. ATP Can Efficiently Stabilize Protein through a Unique Mechanism. JACS Au 2021, 1 (10), 1766–1777.

      (14) Norberg, J.; Nilsson, L. On the Truncation of Long-Range Electrostatic Interactions in DNA. Biophys. J. 2000, 79 (3), 1537–1553.

      (15) Pabbathi, A.; Coleman, L.; Godar, S.; Paul, A.; Garlapati, A.; Spencer, M.; Eller, J.; Alper, J. D. Long-Range Electrostatic Interactions Significantly Modulate the Affinity of Dynein for Microtubules. Biophys. J. 2022, 121 (9), 1715–1726.

      (16) Sastry, M. Nanoparticle Thin Films: An Approach Based on Self-Assembly. In Handbook of Surfaces and Interfaces of Materials; Elsevier, 2001; pp 87–123.

      (17) Wilson, J. E.; Chin, A. Chelation of Divalent Cations by ATP, Studied by Titration Calorimetry. Anal. Biochem. 1991, 193 (1), 16–19.

      (18) Storer, A. C.; Cornish-Bowden, A. Concentration of MgATP2- and Other Ions in Solution. Calculation of the True Concentrations of Species Present in Mixtures of Associating Ions. Biochem. J 1976, 159 (1), 1–5.

      (19) Garfinkel, L.; Altschuld, R. A.; Garfinkel, D. Magnesium in Cardiac Energy Metabolism. J. Mol. Cell. Cardiol. 1986, 18 (10), 1003–1013.

      (20) Hautke, A.; Ebbinghaus, S. The Emerging Role of ATP as a Cosolute for Biomolecular Processes. Biol. Chem. 2023, 404 (10), 897–908.

      (21) Pal, S.; Roy, R.; Paul, S. Deciphering the Role of ATP on PHF6 Aggregation. J. Phys. Chem. B 2022, 126 (26), 4761–4775.

      (22) Pal, S.; Paul, S. ATP Controls the Aggregation of Aβ Peptides. J. Phys. Chem. B 2020, 124(1), 210–223.

      (23) Robustelli, P.; Piana, S.; Shaw, D. E. Developing a Molecular Dynamics Force Field for Both Folded and Disordered Protein States. Proc. Natl. Acad. Sci. U. S. A. 2018, 115 (21), E4758–E4766.

      (24) Ahalawat, N.; Mondal, J. Assessment and Optimization of Collective Variables for Protein Conformational Landscape: GB1 -Hairpin as a Case Study. J. Chem. Phys. 2018, 149 (9), 094101.

      (25) Menon, S.; Adhikari, S.; Mondal, J. An Integrated Machine Learning Approach Delineates Entropy-Mediated Conformational Modulation of α-Synuclein by Small Molecule, 2024. https://doi.org/10.7554/elife.97709.1.

      (26) Pandey, M. P.; Sasidharan, S.; Raghunathan, V. A.; Khandelia, H. Molecular Mechanism of Hydrotropic Properties of GTP and ATP. J. Phys. Chem. B 2022, 126 (42), 8486–8494.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      In the presented study, the authors aim to explore the role of nociceptors in the fine particulate matter (FPM) mediated Asthma phenotype, using rodent models of allergic airway inflammation. This manuscript builds on previous studies and identify transcriptomic reprogramming and an increased sensitivity of the jugular nodose complex (JNC) neurons, one of the major sensory ganglia for the airways, on exposure to FPM along with Ova during the challenge phase. The authors then use OX-314 a selectively permeable form of lidocaine, and TRPV1 knockouts to demonstrate that nociceptor blocking can reduce airway inflammation in their experimental setup. The authors further identify the presence of Gfra3 on the JNC neurons, a receptor for the protein Artemin, and demonstrate their sensitivity to Artemin as a ligand. They further show that alveolar macrophages release Artemin on exposure to FPM.

      We thank the reviewer for their valuable comments, which have significantly enhanced the quality of our manuscript. A point-by-point rebuttal is provided below.

      Strength

      The study builds on results available from multiple previous work and presents important results which allow insights into the mixed phenotypes of Asthma seen clinically. In addition, by identifying the role of nociceptors, they identify potential therapeutic targets which bear high translational potential.

      Weakness

      While the results presented in the study are highly relevant, there is a need for further mechanistic dissection to allow better inferences. Currently certain results seem associative. Also, certain visualisations and experimental protocols presented in the manuscript need careful assessment and interpretation. While Asthma is a chronic disease, the presented results are particularly important to explore Asthma exacerbations in response to acute exposure to air pollutants. This is relevant in today's age of increasing air pollution and increasing global travel.

      Major

      The JNC is a major group of neurons responsible for receiving sensory inputs from the airways. However, the DRG also contains nociceptors and is known to receive afference from the upper airways. An explanation of why the study was restricted to the JNC would be important.

      We acknowledge that some afferents to the upper airways do arise from the DRG, specifically in the upper thoracic segments (T1–T5). We have added a statement in the text to note this subset of nociceptive and spinally mediated pathways. However, the preponderance of evidence indicates that the majority of airway and lung afferents (70–80%, sometimes up to 90%) originate from the jugular–nodose complex (JNC). Given this large imbalance—and because our study focuses on the mechanosensory, and chemosensory functions mediated primarily by the JNC—we restricted our analysis to this main vagal pathway. By contrast, DRG innervation, though functionally important for nociception and irritation-related reflexes, accounts for a smaller yet significant (~20–30%) fraction of the total afferent pool. The referenced tracing studies[1,2] support this distribution and are cited to clarify our rationale for emphasizing the JNC in our work.

      Similarly, the role of the Artemin in the study remains associative. The study results present that Artemin sensitize nociceptors to lead to an increased inflammatory response (Supplementary Figure 2), however, both upstream and downstream evidence for this inference needs to be dissected further. For instance, the evidence for the role of Artemin in the model comes from ex vivo experiments with alveolar macrophages, but not in the experimental model created. Blocking or activation experiments could be performed, along with investigating the change in the total number of nociceptors with Artemin exposure. Similarly, the downstream effects of the potential Artemin-mediated JNC stimulation should be explored in the context of this experimental setup. A detailed dissection of the mechanisms is important. Additionally, it is also important to discuss the hypothesis leading to the selection of Artemin as a target, which currently seems arbitrary.

      Our data show that exogenous i) OVA-FPM exposed AM secrete Artemin and that ii) recombinant Artemin can sensitize nociceptors, potentially heightening the inflammatory response. As suggested, we agree that more upstream and downstream evidence is needed for definitive mechanistic insight. In response, we have expanded our experiments to include intravital microscopy, which demonstrates impaired motility of alveolar macrophages and neutrophils in nociceptor-ablated mice, suggesting a bidirectional crosstalk between AMs and nociceptor neurons.  

      In future studies, we will perform blocking or activation studies to clarify Artemin’s in vivo effects and confirm its role in modulating airway nociceptors. We also recognize the importance of examining whether Artemin exposure alters the phenotype of these neurons and lung innervation density. As recommended, we plan targeted interventions (e.g., Artemin-neutralizing antibodies or overexpression strategies) to delineate the mechanisms by which Artemin-mediated nociceptor stimulation influences the local inflammatory environment.

      We have expanded our discussion to clarify that Artemin is a recognized growth factor known to sensitize certain sensory neurons, including those responsive to tissue injury and inflammation. This literature-based rationale guided our hypothesis that Artemin might increase nociceptor reactivity in the lung and thereby influence alveolar macrophage function. By combining ex vivo and intravital approaches, we have begun to map these interactions but agree that further in vivo studies are necessary to confirm causality, dissect signal transduction pathways, and fully validate Artemin’s contributions to AM–nociceptor crosstalk. We have revised our manuscript accordingly to highlight these limitations.

      A deeper exploration of the inflammatory parameters could be performed. The multiplex analysis of the cytokine analysis shows a reduction in certain cytokines like IL-6 and MCP (figure 3F), which needs to be discussed. Additionally, investigating the change in proportions of the different immune cell populations is important, which currently restricts the eosinophil and neutrophil counts in the BAL. This is also important as the study builds on work from Prof. Chang's group, which also identified the expansion of an invariant iNKT cell population by FPM, regulatory in nature. Adding data on airway hyperresponsiveness, if possible, would be a welcome addition, considering Asthma as the disease context.

      We thank the reviewer for highlighting the need for a more comprehensive exploration of inflammatory parameters. To address these concerns:

      (1) Cytokine Analysis: We re-ran all statistical analyses, including the CBA and ELISA assays, and confirmed that TNFα and Artemin are the only differentially expressed cytokines across experimental groups. We have expanded the Discussion to emphasize TNFα’s role in this context.

      (2) Immune Cell Profiling in BALF: Our data show that co-exposure with FPM exacerbates CD45+ cells, eosinophil, neutrophil, T-cells and monocyte infiltration. Notably, CD45+ cells and neutrophils were the only population reduced under nociceptor neuron loss-of-function conditions (QX314–treated or TRPV1-DTA mice, Author response image 1).

      Of note, we also confirmed these data using intravital imaging and in a second line of nociceptor ablated mice (NaV1.8DTA). We are aware of Prof. Chang’s work suggesting expansion of an invariant iNKT cell population this population in future

      (3) Airway Hyperresponsiveness (AHR): We recognize that adding AHR data would strengthen the asthma-related context. Unfortunately, we are not currently equipped to perform AHR measurements, but we intend to include this in future experiments to provide a more complete assessment of airway function.

      Author response image 1.

      The authors could revisit the data presented in terms of visualization. For instance, the pooled data presented in some of the figures is probably leading to a wide variation which makes interpretation more difficult. Presenting data separately for each experimental replicate might help the reader. This is also important considering the possible variation seen between experiments (for instance, in Figure 3A and 3C and 3B and 3D, the neutrophil and eosinophil panels for the same groups seem to have an almost 2-fold difference.). Similarly, in the cytokine analysis, the authors have used a common axis for depicting all cytokine values which leads to difficulties in interpretation (Figure 3F). Analysis of the RNA seq results and the DEGs could be revisited to include pathway analysis etc (Figure 2), and the supplementary information could include detailed lists of the major target genes.

      To address this query, we have completely reformatted all graphs and included both gene lists and lists of enriched pathways for all three comparisons in Supplementary Table 1. We also confirmed our flow cytometry analysis functionally by performing intravital imaging.

      The authors should also consider citing the previous experimental setup used for some particular protocols. For instance, the use of the specified protocol for OVA in a C57 background needs to be justified, as there are various protocols reported in the literature. Additionally, doses used in some experiments seem arbitrary (The FPM and Artemin exposure in Figure 4). Depicting the dose-response curve or citing previous literature for the same would be important. Similarly, different sample sizes seen in experiments should be explained, whether they are due to mortality, failure to exhibit phenotypes, or due to technical failures. The RNA seq experiment mentions only 2 biological replicates in one of the groups which should be addressed either by increasing the sample size or by replicating the experiment. Moreover, nested comparisons in experiments performed for Figure 1 need to be performed. Neurons isolated from each mouse should be maintained and analysed separately to retain biological replicates to better represent the heterogeneity.

      We appreciate the request for clarity regarding the experimental protocols and sample sizes:

      OVA Model in C57BL/6 Mice: We adapted a previously published OVA protocol in C57BL/6 mice[3-5] (PMID: 39661516), which uses two doses of sensitization to compensate for the lower Th2 response compared to BALB/c[6]. We increased the dose of OVA (100 µg) because our initial experiments produced low eosinophil infiltration. Although this dosage is on the higher side, some studies have noted local IFNγ induction in C57BL/6 mice; however, we did not detect IFNγ in our setup.

      FPM and Artemin Doses: We did not perform a full dose-response assay for FPM and Artemin but used 100 ng/mL as reported in prior literature, where TRPA1 and TRPV1 mRNA were upregulated after 18 hours of incubation[7]. This reference has been added for clarity.

      Sample Sizes and Exclusions: One control mouse was excluded from the RNA-seq experiment because a parallel PCA analysis indicated it was an outlier. This was the only exclusion in the study, and this have been indicated in the method section of the article.  

      Nested Comparisons and Biological Replicates: We reanalyzed the relevant data with a nested one-way ANOVA and updated the figures accordingly. Neurons isolated from each mouse were first averaged to preserve biological replicates and capture potential heterogeneity; and data was analysed on the per mouse averages.

      The manuscript should be more detailed regarding the statistics employed. Currently, there is a section mentioned in the methods section, but details of corrections employed and specific stats for specific experiments should be described. There are also some minor grammatical errors and incomplete sentences in the manuscript which should be corrected. The authors should also consider a more expansive literature review in the introduction/discussion sections.

      We have updated the figure legends and methods to include more detailed information on the specific statistical tests used for each experiment. In addition, we have fixed minor grammatical errors and incomplete sentences throughout the manuscript. Finally, we have expanded our Introduction and Discussion to include additional references and a broader literature context.

      Reviewer #2 (Public review):

      The authors sought to investigate the role of nociceptor neurons in the pathogenesis of pollutionmediated neutrophilic asthma.

      We thank the reviewer for their valuable comments, which have significantly enhanced the quality of our manuscript. A point-by-point rebuttal is provided below.

      Strength

      The authors utilize TRPV1 ablated mice to confirm effects of intranasally administered QX-314 utilized to block sodium currents. The authors demonstrate that via artemin, which is upregulated in alveolar macrophages in response to pollution, sensitizes JNC neurons thereby increasing their responsiveness to pollution. Ablation or inactivity of nociceptor neurons prevented the pollution induced increase in inflammation.

      Weakness

      While neutrophilic, the model used doesn't appear to truly recapitulate a Th2/Th17 phenotype.  No IL-17A is visible/evident in the BALF fluid within the model. (Figure 3F). Unclear of the relevance of the RNAseq dataset, none of the identified DEGs were evaluated in the context of mechanism. The authors overall achieved the aim of demonstrating that nociceptor neurons are important to the pathogenesis of pollutionexacerbated asthma. Their results support their conclusions overall, although there are ways the study findings can be strengthened. This work further evaluates how nociceptor neurons contribute to asthma pathogenesis important for consideration while proposing treatment strategies for undertreated asthma endotypes.

      Major

      Utilizing a different model, one using house dust mite or alternaria alternata or similar that is able to induce a true Th2/th17 type response that is also more translatable to humans for confirmation.

      We appreciate the suggestion to use additional allergen models. In a pilot study, we did observe increased Artemin in the BALF of house dust mite–treated mice, although the levels were low under our current dosing schedule (20 µg/dose daily from Day 0–4 and Day 7–9, with sacrifice on Day 10; Auhtor response image 2). Conversely, using an Alternaria alternata model at 100 µg/dose daily from Day 0–2 (sacrificed on Day 3) did not yield a detectable increase in Artemin. We suspect these findings may reflect the specific dose and timing used. We plan to refine our protocols (e.g., longer exposures or higher doses) for HDM and/or Alternaria to better model a Th2/Th17 response and further validate our observations in a setting more translatable to human asthma.

      Author response image 2.

      Additional analysis, maybe pathway analysis on the RNAseq dataset presented in Figure 2. Unclear how these genes are relevant/how they affect functionality. At present it is acceptable to say they are transcriptionally reprogramed, but no protein evaluation is provided which would get more at function, however, the authors do show some functional data in Figure 1, so maybe this could somehow be discussed/related to Figure 2.

      We have expanded our RNA-seq analysis to include gene lists and enriched pathways for all three comparisons in Supplementary Table 1. We have also revised our discussion to align these transcriptomic changes with the functional data shown in Figure 1. While we have not yet performed protein-level validation for all identified genes, the patterns observed in our RNA-seq dataset suggest pathways potentially tied to nociceptor activation and the downstream inflammatory response. We plan to conduct targeted protein analyses in future studies to further substantiate these findings.

      Histology and localization of neutrophils/nociceptor neurons/alveolar macrophages would enhance the study findings.

      We appreciate the reviewer’s suggestion to include histological data showing the distribution of neutrophils, nociceptor neurons, and alveolar macrophages. While we have not yet performed detailed histological staining of these cell types, we have added live in-vivo intravital microscopy data (Figure 4) that illustrate impaired AM and neutrophil motility in nociceptor-ablated mice. We plan to include additional histological analyses in future studies to further localize these cells in the lung tissue.

      Minor:

      The first 3 figures are small and hard to read.

      We have enlarged Figures 1 and 3 in the revised manuscript to improve readability. We have also added the corresponding gene lists and enriched pathways to Supplementary Table 1 for clarity.

      The figures are mislabeled in the text. Figure 2 is discussed twice in two different contexts; the second mention is supposed to be labeled as Figure 2.

      We corrected the mislabeled figures in the text, ensuring that each figure is referenced accurately.

      Figure 4 isn't cited in the text. I think it is supposed to be referenced in the paragraph before the discussion starts and is currently labeled as Figure 1.

      We have updated the text to properly cite Figure 4 in the relevant paragraph before the Discussion begins, rather than labeling it as Figure 1.

      Notating which statistical analysis was used with each figure/subfigure would be beneficial. Also, it's important to notate if the data was analyzed for multiple comparisons.

      We have revised each figure/subfigure legend to specify the statistical tests used, including information on whether corrections for multiple comparisons were applied. This provides a clearer understanding of how each dataset was analyzed.

      Reviewer #3 (Public review):

      Asthma is a complex disease that includes endogenous epithelial, immune, and neural components that respond awkwardly to environmental stimuli. Small airborne particles with diameters in the range of 2.5 micrometers or less, so-called PM2.5, are generally thought to contribute to some forms of asthma. These forms of asthma may have increased numbers of neutrophils and/or eosinophils present in bronchoalveolar lavage fluid and are difficult to treat effectively as they tend to be poorly responsive to steroids. Here, Wang and colleagues build on a recent model that incorporated PM2.5 which was found to have a neutrophilic component. Wang altered the model to provide an extra kick via the incorporation of ovalbumin. Building on their prior expertise linking nociceptors and inflammation, they find that silencing TRPV1-expressing neurons either pharmacologically or genetically, abrogated inflammation and the accumulation of neutrophils. By examining bronchoalveolar lavage fluid, they found not only that levels of the number of cytokines were increased, but also that artemin, a protein that supports neuronal development and function, was elevated, which did not occur in nociceptor-ablated mice. They also found that alveolar macrophages exposed to PM2.5 particles had increased artemin transcription, suggesting a further link between pollutants, and immune and neural interactions.

      We thank the reviewer for their valuable comments, which have significantly enhanced the quality of our manuscript. A point-by-point rebuttal is provided below.

      Weakness

      There are substantial caveats that must be attached to the suggestions by the authors that targeting nociceptors might provide an approach to the treatment of neutrophilic airway inflammation in pollutiondriven asthma in general and wildfire-associated respiratory problems in particular.  

      These caveats include the uncertainty of the relevance of the conventional source of PM2.5, to pollution and asthma. According to the National Institute of Standards and Technology (NIST), the standard reference material (SRM) 2786 is a mix obtained from an air intake system in the Czech Republic. It is not clear exactly what is in the mix, and a recent bioRxiv preprint, https://www.biorxiv.org/content/10.1101/2023.08.18.553903v3.full.pdf reveals the presence of endotoxin. Care should thus be taken in interpreting data using particulate matter. Regarding wildfires, there is data that indicates that such exposure is toxic to macrophages. What impact might that then have on the production of cytokines, and artemin, in humans?

      We recognize the potential limitations of using SRM2786 (obtained from a Czech air-intake system) as a model for realworld PM2.5 exposure. Our rationale for choosing SRM2786 is that it is commercially available and represents a broad spectrum of ambient air pollutants, in contrast to more specialized sources like diesel exhaust particles. However, we acknowledge in the discussion the presence of endotoxin in SRM2786, as suggested by recent reports, and agree that this may influence immune responses and should be considered when interpreting our data.

      Regarding wildfire-associated exposure, we are aware that certain components of wildfire smoke can be toxic to macrophages. We do not think this play a significant role in the current study design as number of AMs, as determined by flow cytometry and intravital microscopy, are similar when comparing OVA-exposed mice to OVA-FPM exposed animals. Thus, these results rule out significant AM toxicity by FPM.

      Ultimately, while our findings suggest that modulating nociceptor activity may reduce neutrophilic inflammation, we emphasize that additional research—including different PM2.5 sources, validation of endotoxin levels, and in vivo confirmation in human-relevant models—is necessary before drawing definitive conclusions about treating pollutiondriven asthma or wildfire-induced respiratory problems.

      The Introductory paragraph implies links between wildfire events, particular exposure, and neutrophilic asthma. I am not aware of such a link having been established, in which case the paragraph needs revision. In the paragraph that begins with 'Urban pollution', it is suggested that eosinophilic asthma is treatment responsive in comparison to the neutrophilic form. That may not be the case, and they may often these cellular components may occur together. In much of the manuscript, there is a mismatch between the text and the figure numbers. For example, in the Results, Figure 2 should be Figure 3 some of the time, and Figure 3 is actually Figure 4, while the reference to Figure 1F-H is Figure 4H. Please check carefully.

      (a) Introduction Paragraph and Wildfire–Neutrophilic Asthma Link

      We add references to the introduction to support the link between wildfire, respiratory symptoms and the link to neutrophilic asthma [8-12].

      (b) Distinction Between Eosinophilic and Neutrophilic Asthma

      We recognize that eosinophilic and neutrophilic airway infiltrates can co-occur in the same individual and that treatment responsiveness can vary considerably. Our intention was to note that conventional asthma therapies (e.g., inhaled corticosteroids) are generally more effective for eosinophilic-driven disease than for neutrophilic phenotypes, but we agree that these inflammatory endotypes often overlap in clinical practice. We have revised the text in the “Urban pollution” section to acknowledge this complexity and to clarify that inflammatory cell populations in asthma are not always discrete.

      Figure Numbering and Text–Figure Mismatch

      We sincerely apologize for the confusion caused by mismatched figure labels and references in the Results section. We have carefully reviewed and corrected all figure references throughout the manuscript to ensure accuracy.

      References

      (1) Kim, S. H. et al. Mapping of the Sensory Innervation of the Mouse Lung by Specific Vagal and Dorsal Root Ganglion Neuronal Subsets. eNeuro 9 (2022). https://doi.org/10.1523/ENEURO.0026-22.2022

      (2) McGovern, A. E. et al. Evidence for multiple sensory circuits in the brain arising from the respiratory system: an anterograde viral tract tracing study in rodents. Brain Struct Funct 220, 3683-3699 (2015). https://doi.org/10.1007/s00429-014-0883-9

      (3) Shen, C.-C., Wang, C.-C., Liao, M.-H. & Jan, T.-R. A single exposure to iron oxide nanoparticles attenuates antigen-specific antibody production and T-cell reactivity in ovalbumin-sensitized BALB/c mice. International journal of nanomedicine, 1229-1235 (2011).  

      (4) Delayre-Orthez, C., De Blay, F., Frossard, N. & Pons, F. Dose-dependent effects of endotoxins on allergen sensitization and challenge in the mouse. Clinical & Experimental Allergy 34, 1789-1795 (2004).  

      (5) Morokata, T., Ishikawa, J. & Yamada, T. Antigen dose defines T helper 1 and T helper 2 responses in the lungs of C57BL/6 and BALB/c mice independently of splenic responses. Immunology letters 72, 119-126 (2000).  

      (6) Li, L., Hua, L., He, Y. & Bao, Y. Differential effects of formaldehyde exposure on airway inflammation and bronchial hyperresponsiveness in BALB/c and C57BL/6 mice. PLoS One 12, e0179231 (2017).  

      (7) Ikeda-Miyagawa, Y. et al. Peripherally increased artemin is a key regulator of TRPA1/V1 expression in primary afferent neurons. Molecular pain 11, s12990-12015-10004-12997 (2015).  

      (8) Baan, E. J. et al. Characterization of Asthma by Age of Onset: A Multi-Database Cohort Study. J Allergy Clin Immunol Pract 10, 1825-1834 e1828 (2022). https://doi.org/10.1016/j.jaip.2022.03.019

      (9) de Nijs, S. B., Venekamp, L. N. & Bel, E. H. Adult-onset asthma: is it really different? Eur Respir Rev 22, 44-52 (2013). https://doi.org/10.1183/09059180.00007112

      (10) Gianniou, N. et al. Acute effects of smoke exposure on airway and systemic inflammation in forest firefighters. J Asthma Allergy 11, 81-88 (2018). https://doi.org/10.2147/JAA.S136417

      (11) Noah, T. L., Worden, C. P., Rebuli, M. E. & Jaspers, I. The Effects of Wildfire Smoke on Asthma and Allergy. Curr Allergy Asthma Rep 23, 375-387 (2023). https://doi.org/10.1007/s11882-023-01090-1

      (12) Wilgus, M. L. & Merchant, M. Clearing the Air: Understanding the Impact of Wildfire Smoke on Asthma and COPD. Healthcare (Basel) 12 (2024). https://doi.org/10.3390/healthcare12030307

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This important study reports the transcriptomic and proteomic landscape of the oviducts at four different preimplantation periods during natural fertilization, pseudopregnancy, and superovulation. The data presented convincingly supported the conclusion in general, although more analyses would strengthen the conclusions drawn. This work will interest reproductive biologists and clinicians practicing reproductive medicine. 

      We appreciate the concise summary and agree that additional experiments can reinforce the fidelity of predictions made by our robust bioinformatic characterization of the oviduct. Our robust bioinformatic model appears reproducible as similar pathway trends have been produced in all three datasets, lending confidence for future researchers to establish testable hypotheses more effectively.  

      Reviewer #1 (Public review):

      The paper demonstrated through a comprehensive multi-omics study of the oviduct that the transcriptomic and proteomic landscape of the oviduct at 4 different preimplantation periods was dynamic during natural fertilization, pseudopregnancy, and superovulation using three independent cell/tissue isolation and analytical techniques. This work is very important for understanding oviductal biology and physiology. In addition, the authors have made all the results available in a web search format, which will maximize the public's access and foster and accelerate research in the field.

      Strengths:

      (1) The manuscript addresses an important and interesting question in the field of reproduction: how does the oviduct at different regions adapt to the sperm and embryos for facilitating fertilization and preimplantation embryo development and transport?

      (2) Authors used cutting-edge techniques: Integrated multi-modal datasets followed by in vivo confirmation and machine learning prediction.

      (3) RNA-seq, scRNA-seq, and proteomic results are immediately available to the scientific community in a web search format.

      (4) Substantiated results indicate the source of inflammatory responses was the secretory cell population in the IU region when compared to other cell types; sperm modulate inflammatory responses in the oviduct; the oviduct displays immuno-dynamism.

      We sincerely thank you for your thorough and insightful review of our manuscript. Your comprehensive summary accurately captures the essence of our multi-omics study on oviductal biology, highlighting its importance in understanding reproductive physiology. We are particularly grateful for your recognition of our study's strengths. In the revised manuscript, we have added another searchable scRNA-seq data on our public website; https://genesearch.org/winuthayanon/Oviduct_pregnancy/. We have also addressed the weaknesses in the response below in our revised manuscript.  

      Weaknesses:

      (1) The rationale for using the superovulation model is not clear. The oviductal response to sperm and embryos can be studied by comparing mating with normal and vasectomized mice and comparing pregnancy vs pseudopregnancy (induced by mating with vasectomized males). Superovulation causes supraphysiological hormone levels and other confounding conditions.

      We agree with this assessment that superovulation changes the hormonal levels and could have a confounding impact on the oviduct function. As such, for all experiments involving pseudopregnant datasets, pseudopregnancy was induced by mating females with vasectomized males without superovulation. Our oviductal luminal protein content analysis was collected from oviductal fluid from pregnant females with and without superovulation. This allowed us to directly compare the impact of superovulation on protein abundance and profile. In the revised manuscript, we have provided clarifying statements on using superovulation in our Method section, which reads 

      “Datasets from the natural cycle and SO allowed us to directly compare the impact of exogenous hormone treatments on protein abundance and profile distinct from the physiological levels of hormones”.

      One exception for using superovulation in the absence of a “natural mating” group for comparison is the scRNA-seq dataset. As single-cell libraries should be performed in a single run to avoid batch effects, we need to ensure that a sufficient number of females were pregnant for single-cell isolation (we used ~4 mice/time point). Therefore, superovulation was used to synchronize and ensure that the females were receptive to mating. At the time of our sample collection, single nuclei isolation methods (freeze tissue now, isolate nuclei later) had not been reliable or standardized. We tried to synchronize females using the male bedding without superovulation. However, we would still need to set up at least 12-15 females per pregnancy timepoint to mate with male mice, totaling ~48-60 mice each night. Due to budget constraints and vivarium space limitations, we were not able to do so. We have included a similar statement to clarify the justifications in the revised Methods, which reads,

      “Mating and tissue collection protocols were similar to bulk RNA isolation described above, with the exception that female mice were superovulated using the protocol described previously (73) to ensure sufficient numbers of female mice at each timepoint could be harvested for single cell isolation and library preparation within the same day (n= 3-4 mice/group)”.

      (2) This study involves a very complex dataset with three different models at four time points. If possible, it would be very informative to generate a graphic abstract/summary of their major findings in oviductal responses in different models and time points

      Thank you for this suggestion. We have now included the graphical abstract to accompany our final version of the manuscript.

      (3) The resolution of Figures 3A-3C in the submitted file was not high enough to assess the authors' conclusion.

      We have now used a higher magnification of images in Figures 3A-C in the revised version.

      (4) The authors need to double-check influential transcription factors identified by machine learning. Apparently, some of them (such as Anxa2, Ift88, Ccdc40) are not transcription factors at all.

      We appreciate the recognition of this oversight. In the revised manuscript, we have clarified and stated the distinction between ‘influential transcripts’ and ‘influenced proteins’, which now reads,

      “The top 25 “influential” transcripts (ITs) with the highest attention scores from all the transcription factors present in bulk RNA-seq data were extracted for every potentially influenced protein (IP) in the empirical proteomics datasets”. 

      Recommendations for the authors:

      (1) What are the stained debris/nuclei surrounding oocytes/fertilized eggs in Figure 1A? Please indicate in figure legends. 

      We have edited Figure 1A with black arrows that highlight the stained cumulus cells surrounding the ovulated eggs/fertilized eggs, together with a revised Figure legend, which now reads, “Arrows indicate cumulus cells surrounding the eggs/fertilized eggs called cumulus-oocyte complexes”.

      (2) "Then, oviducts were sectioned into IA and IU regions" The Ampulla region is quite a long tube. Could authors provide details about the cutting border between IA and IU regions? 

      We have now included a literature defining the number of turns in the coiled mouse oviduct and how we cut between the IA and IU regions in the Method section, which reads,

      “We defined the IA region by including the infundibulum and cutting at turn three of the oviductal coil (the end of ampulla) (5). Turn four to eleven was considered the IU region, which was stripped of uterine tissue enveloping the colliculus tubaris of the UTJ region (5)”. 

      (3) "In this experiment, superovulation (SO) using exogenous gonadotropins was used due to technical limitations of sample collection for single-cell processing." It was not clear. What was the technical limitation of sample collections? 

      As indicated in response to the public review above, we have now clarified that we used superovulation for scRNA-seq analysis to ensure that a sufficient number of females were pregnant for singlecell isolation (we used ~4 mice/time point). Therefore, superovulation was used to synchronize, making sure that females were receptive to mating, thereby providing enough cell numbers for the experiment.

      (4) Ephx2+ cluster (only present at SO 0.5 dpc and SO estrus) was very interesting. Could the author provide more information about this gene and the potential cell type this cluster represents? 

      We appreciate the reviewer’s interest in this cell-type cluster. We have now included the discussion regarding this gene, which reads, “Interestingly, the Ephx2+ cluster was mainly present in the SO 0.5 dpc and SO estrus samples. Ephx2 encodes epoxide hydrolase 2, which converts epoxides to dihydrodiols. Recent findings suggest that EPHX2 may play a role in primary hypertension in humans (52). However, the reproductive-related functions of EPHX2 have not yet been investigated. Therefore, we believe this presents an opportunity for future research to define the role of Ephx2 in the oviduct in response to SO during preimplantation embryo development.” However, as it is beyond the scope of the research provided in this manuscript, we did not further investigate the roles of Ephx2 in our current study. 

      (5) "we elucidated whether exogenous hormone treatment impacts protein secretion in the oviduct. There were 298, 354, and 163 differentially abundant proteins when compared between SO estrus vs. SO 269 0.5 dpc". Which hormone?? FSH/LH? Or high estrogens due to more mature follicles; or more embryos instead of hormones? Again, the rationale for using the superovulation model need to be better explained with the consideration of other possibilities. 

      Thank you for pointing this out. We have clarified that “exogenous hormone treatment” was the superovulation (SO), which is now corrected in the statement, which reads, “we elucidated whether SO treatment impacts protein secretion in the oviduct”. 

      The justification for the superovulation has now been included in the revised manuscript as indicated in the responses to reviewers above. A detailed description of gonadotropin treatment was included in the Material and Methods section. As the reviewer suggested, we have revised in the Discussion, including the caveat and possibility of the other factors that could lead to biological changes we observed subsequent to SO, which reads, 

      “As SO increases the number of mature follicles (therefore, estrogen levels), ovulated eggs, and follicular fluid, it is also likely that these biological alterations could lead to changes in the protein abundance in the oviduct”.

      (6) "we used RNAScope in situ hybridization staining of Tlr2, Ly6g (leukocytes), and Ptprc (common immune cell marker)." Please indicate what cell types Tlr2 marker was for. 

      We have now corrected the statement to include the cell types with Tlr2+ staining, which reads, “we used RNAScope in situ hybridization staining of Tlr2 (epithelium, stroma, and myosalpinx), ”.

      (7) In which cell types are P38 and p-P38 expressed?  

      Based on our scRNA-seq searchable dataset, which has been included in the revised manuscript (https://genesearch.org/winuthayanon/Oviduct_pregnancy/), we found that Mapk14 (encoding P38) was highly expressed in the immune cells in mice (red arrows in the UMAPs below).

      Author response image 1.

      In humans, scRNA-seq data published by Ulrich et al. (PMID: 35320732) showed that MAPK14 was present in most cell types in the Fallopian tubes at low levels (see violin plot below).

      Author response image 2.

      (8) "Our findings showed an influx of Ptrprc+ cells to the stromal layer, and subsequently penetration into the epithelial layer in the presence of sperm at 0.5 dpc in the UTJ." The authors didn't have results for tracking the influx Ptrprc+ cell to the stromal layer. 

      Thank you for pointing this out. We agreed with the reviewer’s assessment, as we did not have the results of the tracking of the influx of Ptprc+ cells. We have corrected and removed the “influx” statement, which now reads, “Our findings showed that Ptrprc+ cells were present in the stromal and epithelial layers in the presence of sperm at 0.5 dpc in the UTJ.”

      Reviewer #2 (Public review):

      The manuscript investigates oviductal responses to the presence of gametes and embryos using a multi-omics and machine learning-based approach. By applying RNA sequencing (RNA-seq), single-cell RNA sequencing (sc-RNA-seq), and proteomics, the authors identified distinct molecular signatures in different regions of the oviduct, proximal versus distal. The study revealed that sperm presence triggers an inflammatory response in the proximal oviduct, while embryo presence activates metabolic genes essential for providing nutrients to the developing embryos. Overall, this study offers valuable insights and is likely to be of great interest to reproductive biologists and researchers in the field of oviduct biology. However, further investigation into the impact of sperm on the immune cell population in the oviduct is necessary to strengthen the overall findings.

      We appreciate the concise summary, strengths, and weaknesses highlighted. We have addressed all comments made by the reviewer concerning superovulation, figure recommendations, and additional analysis in our revised manuscript. We have included a new analysis of scRNA-seq datasets from human Fallopian tube tissues collected from hydrosalpinx patients and healthy subjects by Ulrich et al. (PMID: 35320732). The evaluation of this human data helped distinguish between different inflammatory pathways stimulated by sperm vs. general inflammation, as well as species differences (more details in responses below). In future studies, we will follow up on a detailed description of immune cell types present at 0.5 dpc using FACS analysis. This is mainly due to a lack of expertise and technical limitations in our lab on immune cell investigation. Nevertheless, we have already recruited two immunologists to facilitate our future immune cell studies. We have also provided a clear justification for superovulation, especially in the scRNA-seq analysis in the revised manuscript (please see response to Reviewer 1 above). 

      Recommendations for the authors:

      (1) In Figure 3A and 3B, the authors should provide higher contrast and high-resolution images for the expression of the selected immune cell markers at 0.5 dpc and 0.5 dpp. For better clarity and flow, 0.5 dpc & 0.5 dpp, as well as 1.5 dpc & 1.5 dpp, should be merged into a single panel.  

      Thank you for this suggestion. As shown in the response to Reviewer 1 above, we have now used a higher-magnification image for Fig. 3. We have also changed the panel in the quantification graphs to better reflect the immunofluorescent images and improve clarity and flow.

      (2) The authors demonstrated that sperm induces an inflammatory response in the oviduct by presenting IF for selected immune cells. However, FACS analysis should be included to dissect the various immune cell populations further. 

      We appreciate the recommendation and agree that FACS analysis should provide a more detailed description of the immune cell types present at 0.5 dpc. However, our current work primarily offers initial investigations, confirming that three bioinformatic models (bulk RNA-seq, scRNA-seq, and proteomic analyses) can be validated by IF staining. Our future research using analysis should provide additional characterization of immune cell types at 0.5 dpc in the oviduct.

      (3) In Figure 2, the authors performed proteomic analysis at different stages of implantation. They observed similar alterations in the pro-inflammatory Reactome, as seen with RNA-seq and sc-RNA-seq analyses. It would be interesting to examine the types of proteins induced by embryo presence and how their expression changes at 1.5 and 2.5 dpc. Similarly, are sperm-interacting proteins induced in response to sperm presence at 0.5 dpc? Are these proteins uniquely present in the isthmus compared to the ampulla? 

      We sincerely appreciate the reviewer’s insightful comments regarding the findings in Figure 2 and the potential avenues for further exploration of the proteomic analysis during different stages of embryo preimplantation. We found that during 1.5 dpc, enriched Reactome included Innate Immune System and RHO GTPase (Fig. S4A). In comparison, Reactome at 2.5 dpc were enriched for Keratinization, Metabolism of Protein, and Post-translational Protein Modification (Fig. S4B). Therefore, the pro-inflammatory Reactome profile appeared to have completely subsided at 2.5 dpc. This statement has now been included in the results section, which reads, “Lastly, differential protein abundance at 1.5 dpc and 2.5 dpc indicated the enrichment for Ras Homolog (RHO) GTPase signaling pathway and changes in epithelial remodeling (keratinization) (Fig. S4A and B), respectively. Therefore, the pro-inflammatory Reactome profile appeared to have completely subsided at 2.5 dpc”.

      And yes, we detected sperm-interacting proteins (such as OVGP1, ANXA1, HSPA5, and PDIA6, etc.) from our 0.5 dpc proteomic datasets (see examples from images below taken from our dataset:

      https://genes.winuthayanon.com/winuthayanon/oviduct_proteins/). We noticed that all of these sperminteracting protein levels were lower at 0.5 dpc compared to other timepoints. We speculated that these proteins bind to the sperm and were washed out together with the sperm during the pre-processing centrifugation prior to mass spectrometry analysis. However, we could not distinguish the original location (ampulla vs. isthmus) of proteins as the luminal fluid was flushed from the entire oviduct.

      Author response image 3.

      (4) Given that salpingitis is associated with inflammation of the fallopian tubes, the authors should consider comparing the gene signatures from this study with publicly available salpingitis datasets. 

      Thank you for this insightful suggestion. We have reanalyzed the human data from scRNA-seq of Fallopian tube tissues collected from hydrosalpinx (inflamed and dilated tube) and healthy patients by Ulrich et al. (PMID: 35320732). From this published human dataset, we have evaluated GO biological pathways enriched in the differentially expressed genes (DEGs) in hydrosalpinx compared to healthy Fallopian tubes. We have added these new data in the revised Results, Fig. 5 and Supplementary Dataset S5. The new data now read,  

      “Evaluation of human hydrosalpinx Fallopian tubes compared to sperm-induced inflammation genes

      To determine whether sperm-induced inflammatory responses in the mouse oviduct are similar to or different from human inflammation conditions, we reanalyzed publicly available scRNA-seq data from hydrosalpinx samples by Ulrich et al (50). We found that some of the sperm-induced inflammatory genes identified from our mouse study were present and upregulated in hydrosalpinx samples compared to healthy subjects (Fig. 5A). However, the differentially expressed levels, for example the CCL2 gene, appeared to be marginal between healthy vs. hydrosalpinx samples (Fig. 5_B-C_ and Supplemental Datasets S5). Nevertheless, the top five most enriched GOBPs related to inflammatory responses were Regulation of Complement Activation, Positive Regulation of Macrophage Migration Inhibitory Factor Signaling Pathway, MHC Class II Protein Complex Assembly, Positive Regulation of NK Cell Chemotaxis, and Negative Regulation of Metallopeptidase Activity (Fig. 5D). These GOBPs differed from those identified in sperm-exposed mouse oviducts at 0.5 dpc, which were enriched for neutrophil-related pathways, unlike macrophages or NK cells in hydrosalpinx samples”.

      We have also added a revised Discussion, which now reads, 

      “Lastly, we found that sperm-induced inflammatory conditions in the oviduct were potentially different than those of chronic inflammatory conditions in human Fallopian tubes. The inflammatory responses observed in mice and humans exhibited significant distinction based on immune cell involvement, mechanisms, and context. In mice, acute inflammation after sperm exposure could be primarily characterized by the activation of neutrophils, which serve as the first responders to injury or foreign bodies. In contrast, human Fallopian tubes with hydrosalpinx conditions displayed chronic inflammatory conditions predominantly involving macrophages and NK cells, suggesting a more complex and sustained immune response. It is also possible that inflammation in the oviduct differs between mice and humans. Understanding these species-specific variations is crucial for developing effective therapeutic strategies, as findings from murine models may not accurately translate to human inflammatory conditions due to the distinct immune dynamics at play”.

      (5) In Line 259, the authors should clarify why SO females were chosen for luminal fluid collection at different points. 

      Thank you for pointing this out. We wanted to clarify that the proteomic analysis from the luminal fluid was performed in both naturally mated with and without SO. We have revised the statement in the Results section, which now reads,

      “To validate our transcriptomics data at a translational level, LC-MS/MS proteomic analysis was performed on secreted proteins in the oviductal luminal fluid at estrus, 0.5, 1.5, and 2.5 dpc with or without SO. As we also aim to address whether changes in proteomic profiles in the oviduct are governed by hormonal fluctuations, the SO was performed using exogenous gonadotropins. Therefore, the comparison was assessed in the following groups: estrus, 0.5 dpc, 1.5 dpc, 2.5 dpc, SO estrus, SO 0.5 dpc, SO 1.5 dpc, and SO 2.5 dpc”.

      In addition, we have now provided additional clarification in the Method section, which reads,

      “In this context, our SO approach facilitates multi-dimensional analysis comparisons among naturally cycling bulk RNA-seq, SO scRNA-seq, and natural luminal proteomic biological replicates, enhancing confidence between different methods. This experimental design also reflects adaptive responses in the oviduct during natural fertilization and preimplantation development, influenced by PMSG and hCG treatments at both RNA and protein levels. Furthermore, SO is commonly used in female reproduction to synchronize estrus cycles in animals, thus reducing variables at each collection timepoint.”.

      (6) The authors should include scale bars in all fluorescent images. 

      We apologize for this oversight. In all applicable figures, we have provided a scale bar for all immunofluorescent images.

    1. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      This paper investigates how isoform II of transcription factor RUNX2 promotes cell survival and proliferation in oral squamous cell carcinoma cell lines. The authors used gain and loss of function techniques to provide incomplete evidence showing that RUNX2 isoform silencing led to cell death via several mechanisms including ferroptosis that was partially suppressed through RUNX2 regulation of PRDX2 expression. The study provides useful insight into the underlying mechanism by which RUNX2 acts in oral squamous cell carcinoma, but the conclusions of the authors should be revised to acknowledge that ferroptosis is not the only cause of cell death.

      We appreciate the editor’s positive comments on our work and the valuable suggestions provided by the reviewers. We did find that RUNX2 isoform II knockdown or HOXA10 knockdown could also lead to apoptosis. We have revised our title as following: “RUNX2 Isoform II Protects Cancer Cells from Ferroptosis and Apoptosis by Promoting PRDX2 Expression in Oral Squamous Cell Carcinoma”. In addition, we have also revised our conclusions in the abstract as follows: “OSCC cancer cells can up-regulate RUNX2 isoform II to inhibit ferroptosis and apoptosis, and facilitate tumorigenesis through the novel HOXA10/RUNX2 isoform II/PRDX2 pathway.” We have added more experiments to better support our conclusions. Please see following responses to reviewers.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this paper, authors investigated the role of RUNT-related transcription factor 2 (RUNX2) in oral squamous carcinoma (OSCC) growth and resistance to ferroptosis. They found that RUNX2 suppresses ferroptosis through transcriptional regulation of peroxiredoxin-2. They further explored the upstream positive regulator of RUNX2, HOXA10 and found that HOXA10/RUNX2/PRDX2 axis protects OSCC from ferroptosis.

      Strengths:

      The study is well designed and provides a novel mechanism of HOXA10/RUNX2/PRDX2 control of ferroptosis in OSCC.

      Weaknesses:

      According to the data presented in (Figure 2F, Figure 3F and G, Figure 5D and Figure 6E and F), apoptosis seems to be affected in the same amount as ferroptosis by HOXA10/RUNX2/PRDX2 axis, which raises questions on the authors' specific focus on ferroptosis in this study. Reasonably, authors should adapt the title and the abstract in a way that recapitulates the whole data, which is HOXA10/RUNX2/PRDX2 axis control of cell death, including ferroptosis and apoptosis in OSCC.

      We really grateful for your comments. We agree that these figures do show that isoform II-knockdown or HOXA10-knockdown could induce apoptosis. We have adapted the title and abstract as follow:

      Title: “RUNX2 Isoform II Protects Cancer Cells from Ferroptosis and Apoptosis by Promoting PRDX2 Expression in Oral Squamous Cell Carcinoma”.

      Abstract: “In the present study, we surprisingly find that RUNX2 isoform II is a novel ferroptosis and apoptosis suppressor. RUNX2 isoform II can bind to the promoter of peroxiredoxin-2 (PRDX2), a ferroptosis inhibitor, and activate its expression. Knockdown of RUNX2 isoform II suppresses cell proliferation in vitro and tumorigenesis in vivo in oral squamous cell carcinoma (OSCC). Interestingly, homeobox A10 (HOXA10), an upstream positive regulator of RUNX2 isoform II, is required for the inhibition of ferroptosis and apoptosis through the RUNX2 isoform II/PRDX2 pathway. Consistently, RUNX2 isoform II is overexpressed in OSCC, and associated with OSCC progression and poor prognosis. Collectively, OSCC cancer cells can up-regulate RUNX2 isoform II to inhibit ferroptosis and apoptosis, and facilitate tumorigenesis through the novel HOXA10/RUNX2 isoform II/PRDX2 pathway.”

      In addition, we have performed the rescue experiment showing that PRDX2 overexpression rescues the apoptosis induced by isoform II-knockdown (Figure 4-figure supplement 4) or HOXA10-knockdown (Figure 7-figure supplement 2).

      We have added the description about these experiments in result “RUNX2 isoform II promotes the expression of PRDX2” and “HOXA10 inhibits ferroptosis and apoptosis through RUNX2 isoform II” as follow: “In addition, we found that PRDX2 overexpression could partially reduce the increased apoptosis caused by isoform II-knockdown. (Figure 4-figure supplement 4).” “PRDX2 overexpression also could rescue the increased cellular apoptosis caused by HOXA10 knockdown (Figure 7-figure supplement 2).”.

      Comments:

      In the description of the result section related to Figure 3E, the author wrote "In addition, we found that isoform II-knockdown induced shrunken mitochondria with vanished cristae with transmission electron microscopy (Figure 3E). These results suggest that RUNX2 isoform II may suppress ferroptosis." The interpretation provided here is not clear to the reviewer. How shrunken mitochondria and vanished cristae can be linked to ferroptosis?

      We apologize for the inaccurate description. Ferroptotic cells usually exhibit shrunken mitochondria, reduced or absent cristae, and increased membrane dentistry (Dixon et al., 2012). However, the presence of shrunken mitochondria or vanished cristae does not guarantee that ferroptosis has occurred in the cells. Other evidences, such as the increased ROS production and lipid peroxidation accumulation in cells with RUNX2 isoform II-knockdown must be evaluated as we are showing in Figure 3A and 3B. Furthermore, isoform II overexpression suppressed ROS production (Figure 3C) and lipid peroxidation (Figure 3D). We have revised our interpretation as follow: “In addition, we found that isoform II-knockdown induced shrunken mitochondria with vanished cristae with transmission electron microscopy (Figure 3E). This phenomenon along with the above results of ROS production and lipid peroxidation accumulation assays suggests that RUNX2 isoform II may suppress ferroptosis.”.

      Dixon, S. J., Lemberg, K. M., Lamprecht, M. R., Skouta, R., Zaitsev, E. M., Gleason, C. E., . . . Stockwell, B. R. (2012). Ferroptosis: an iron-dependent form of nonapoptotic cell death. Cell, 149(5), 1060-1072. doi:10.1016/j.cell.2012.03.042 PMID:22632970

      The electron microscopy images show more elongated mitochondria in the RUNX2 isoform II-KO cells than in RUNX2 isoform II positive cells, which might result from the fusion of mitochondria. These images should complete with a fluorescent mitochondria staining of these cells.

      We do find that the TEM images of RUNX2 isoform II-knockdown cells show more elongated mitochondria. The mitochondria undergo cycles of fission and fusion, known as mitochondrial dynamics, which in turn leads to changes in mitochondrial length. Through examining factors related to mitochondrial dynamics, we find that isoform II knockdown could decrease the expression levels of FIS1 (Fission, Mitochondrial 1) (Figure 3-figure supplement 2B) which mediates the fission of mitochondria. Therefore, we speculate that the elongated mitochondria in the isoform II-knockdown cells may be due to the decrease in mitochondrial fission through inhibiting FIS1 expression.

      In addition, we have tried our best to perform the fluorescent staining of mitochondrial to observe mitochondrial morphology. However, due to the quality of probes and fluorescent microscope, our images of mitochondrial fluorescence were not satisfactory. So, we re-capture more electron microscopy images, measure the length of mitochondria, and perform statistical analyses. We find that isoform II-knockdown cells show significantly more mitochondrial elongation than the control cells (Author response image 1 and Figure 3-figure supplement 2A). Therefore, we believe that isoform II knockdown promotes mitochondrial elongation to be relatively reliable.

      Author response image 1.

      The new electron microscopy images in RUNX2 isoform II-knockdown cells. RSL3 (a ferroptosis activator) served as a positive control. Scale bar: 1 μm. The calculation and statistical analysis of mitochondrial elongation were added in Figure 3-figure supplement 2A.

      What is the oxygen consumption rate in RUNX2 KO cells?

      We have performed a new mitochondrial stress assay to analyze the oxygen consumption rate (OCR). We find that RUNX2 isoform II-knockdown can decrease OCR in OSCC cell line. This result has been added to Figure 3-figure supplement 3A and B. It is consistent with our observation of the damaged mitochondria morphology in the cells with RUNX2 isoform II knockdown.

      The increase in cell proliferation after RUNX2 overexpression in Figure 2A is not convincing, is there any differences in their migration or invasion capacity?

      We agree that overexpression of isoform II didn’t dramatically enhance OSCC cell proliferation. We consider that it may be due to the existing high level of isoform II in OSCC cells. We have performed wound-healing assay and transwell assay to analyze the migration or invasion capacity of cells with RUNX2 isoform II or isoform I overexpression. We find that isoform II overexpression has no effect on the migration and invasion in OSCC cells (Figure 2-figure supplement 2). This phenomenon suggests that further increasing isoform II cannot improve the migration or invasion capacity of OSCC cells. However, isoform I overexpression suppresses the migration and invasion of cancer cells (Figure 2-figure supplement 2), indicating that the upregulation of isoform I, which is downregulated in OSCC cells, may inhibit tumorigenesis. In addition, we found that the expression level of isoform I was lower in TCGA OSCC patients than that in normal controls (Figure 1D), and patients with higher isoform I showed longer overall survival (Figure 1-figure supplement 1). These results support that isoform I may inhibit tumorigenesis in OSCC cells.

      The in vivo study shows 50% reduction in primary tumor growth after RUNX2 inhibition by shRNA in CAL 27 xenografts, but only one shRNA is shown. Is this one shRNA clone? At least 2 shRNA clones should be used.

      In this vivo primary tumor growth experiment, we used a CAL 27 stable cell line transfected with an shRNA against RUNX2 isoform II (shisoform II-1). We agree that at least two shRNAs should be used. In this revision, we perform another tumor growth experiment with the CAL 27 stably transfected with another new shRNA targeting the different region in isoform II (shisoform II-2). As with the previous experiment, CAL 27 cells stably transfected with this new shRNA also showed significantly reduced tumor growth and weight than those transfected with non-specific control shRNA in nude mice (Figure 2-figure supplement 4A-D).

      Apoptosis and necroptosis seem to be affected in the same amount as ferroptosis by HOXA10/RUNX2/PRDX2 axis. This is evident from experiments in Figure 3E, F and from Figure 6E, F and Figure 3G. Either Fer-1, Z-VAD, or Nec-1 used alone, were not able to fully restore cell proliferation to control cell level, which implies an additive effect of ferroptosis, apoptosis and necrosis. The author should verify potential additive or synergistic effect of the combination of Fer-1 and Z-VAD in these assays after si-RUNX2 in Figure 3 F and G and after si-HOX assays.

      We sincerely appreciate your valuable comments. We have performed the new assay to analyze the potential additive or synergistic effect of the combination of Fer-1 and Z-VAD after RUNX2 isoform II (si-II) or HOXA10 (si-HOX) knockdown. We find that the combination of Fer-1 and Z-VAD is more effective in rescuing the cell proliferation than Fer-1 or Z-VAD alone. (Figure 3- figure supplement 6 and Figure 6- figure supplement 4).

      What is the effect of PRDX2 or HOXA10 depletion on tumor growth?

      We have performed a new xenograft tumor formation assay in nude mice to analyze the effect of PRDX2-knockdown on tumor growth. We found that CAL 27 cells stably transfected with shRNAs against PRDX2 showed significantly reduced tumor growth and weight than those transfected with non-specific control shRNA in nude mice (Figure 4-figure supplement 2A-D). Regarding the effect of HOXA10 depletion on tumor growth, please allow us to cite a study (Guo et al., 2018) which demonstrated that HOXA10 knockout in Fadu cells (a cell line of pharyngeal squamous cell carcinoma) could inhibit tumor growth. 

      We have added these results to the section of “RUNX2 isoform II promotes the expression of PRDX2” as follows: “In line with the inhibitory effect of isoform II-knockdown on tumor growth, CAL 27 cells stably transfected with anti-PRDX2 shRNAs showed notably reduced tumor growth and weight than those transfected with non-specific control shRNA in nude mice (Figure 4-figure supplement 2A-D).”.

      Guo, L. M., Ding, G. F., Xu, W., Ge, H., Jiang, Y., Chen, X. J., & Lu, Y. (2018). MiR-135a-5p represses proliferation of HNSCC by targeting HOXA10. Cancer Biol Ther, 19(11), 973-983. doi:10.1080/15384047.2018.1450112 PMID:29580143

      What is the clinical relevance of HOXA10 in OSCC patients?

      In Figure 5-figure supplement 1B, we have showed that the expression levels of HOXA10 in TCGA OSCC patients were also significantly higher than those in normal controls. In this revision, we further find that patients with higher HOXA10 show significantly shorter overall survival in TCGA OSCC dataset (Figure 5-figure supplement 2C). In addition, we have also analyzed the expression of HOXA10 in our clinical OSCC and adjacent normal tissues, and found that HOXA10 expression level of OSCC tissues is significantly higher than that of normal controls (Figure 5-figure supplement 2A and B), which is consistent with the results from TCGA OSCC dataset.

      We have revised our writing in the result “HOXA10 is required for RUNX2 isoform II expression and cell proliferation in OSCC” as follows: “Similarly, HOXA10 expression level of our clinical OSCC tissues is significantly higher than that of adjacent normal tissues (Figure 5-figure supplement 2A and B). Moreover, TCGA OSCC patients with higher expression levels of HOXA10 showed shorter overall survival (Figure 5-figure supplement 2C).”

      Reviewing editor (Public Review):

      This paper reports the role of the Isoform II of RUNX2 in activating PRDX2 expression to suppress ferroptosis in oral squamous cell carcinoma (OSCC).

      The following major issues should be addressed.

      A major postulate of this study is the specific role of RUNX2 isoform II compared to isoform I.

      Figure 1F shows association between patient survival and Iso II expression, but nothing is shown for Iso I, this should be added, in addition the number of patients at risk in each category should be shown.

      We sincerely appreciate your valuable comments. We have added the survival curve of isoform I (exon 2.1) in the new Figure 1-figure supplement 1. In contrast to isoform II, patients with higher isoform I showed longer overall survival. The numbers of patients at risk in each category in the Figure 1F and Figure 1-figure supplement 1 are added.

      The authors test Iso I and Iso II overexpression in CAL27 or SCC-9 model cell lines. In Fig. 2A in CAL27, the overexpression of Iso II is much stronger than Iso I so it seems premature to draw any conclusions. More importantly, however, no Iso l silencing is shown in either of the cell lines nor the xenografted tumours. This is absolutely essential for the authors hypothesis and should be tested using shRNA in cells and xenografted tumours.

      Thank you for your valuable comments. We agree that the overexpression of isoform I is much stronger than isoform II in CAL 27 cells in Fig. 2A-B. We have done another repeat experiment which shows the similar overexpression of isoform II and I in Figure 2A-figure supplement 1. This repeat experiment also shows that overexpression of FLAG tagged isoform II significantly promoted the proliferation of OSCC cells. We tried our best to knockdown isoform I. However, the specific sequence of isoform I is 317 nt. We designed four anti-isoform I siRNAs, and unfortunately found that none of these siRNAs could knockdown isoform I efficiently. Please see following Author response image 2. Therefore, currently we cannot knockdown isoform I. However, we have tried the overexpression of isoform I. We find that isoform I overexpression inhibits the migration and invasion of cancer cells (Figure 2- figure supplement 2). In addition, we have shown that isoform II overexpression showed enhanced cell proliferation compared with isoform I overexpression in OSCC cells (Figure 2A). Therefore, we consider that isoform I is not essential for OSCC cell proliferation and tumorigenesis. Then, we mainly focus on isoform II in this study.  

      Author response image 2.

      The knockdown efficiency of RUNX2 isoform I (anti-isoform I, si-I-1, si-I-2, si-I-3, si-I-4) in OSCC cells were analyzed by RT-PCR, 18S rRNA served as a loading control. The sequences of siRNAs are as follows: 5’ GGCCACUUCGCUAACUUGU 3’ (si-I-1), 5’ GUUCCAAAGACUCCGGCAA 3’ (si-I-2), 5’ UGGCUGUUGUGAUGCGUAU 3’ (si-I-3), and 5’ CGGCAGUCGGCCUCAUCAA 3’ (si-I-4).

      A major conclusion of this study is that Iso II expression suppresses ferroptosis. To support this idea, the authors use the inhibitor Ferrostatin-1 (Fer -1). While Fer-1 typically does not lead to a 100% rescue, here the effect is only marginal and as shown in Figures 3F and G only marginally better than Z-VAD or Necrostatin 1. These data do not support the idea that the major cause of cell death is ferroptosis. Instead. Iso II silencing leads to cell death through different pathways. The authors should acknowledge this and rephrase the conclusion of the paper accordingly. Moreover, the authors consistently confound cell proliferation with cell death.

      We agree that RUNX2 isoform II-knockdown could also induce apoptosis. We have revised the description in the title and abstract as follow:

      Title: “RUNX2 Isoform II Protects Cancer Cells from Ferroptosis and Apoptosis by Promoting PRDX2 Expression in Oral Squamous Cell Carcinoma”.

      Abstract: “In the present study, we surprisingly find that RUNX2 isoform II is a novel ferroptosis and apoptosis suppressor. RUNX2 isoform II can bind to the promoter of peroxiredoxin-2 (PRDX2), a ferroptosis inhibitor, and activate its expression. Knockdown of RUNX2 isoform II suppresses cell proliferation in vitro and tumorigenesis in vivo in oral squamous cell carcinoma (OSCC). Interestingly, homeobox A10 (HOXA10), an upstream positive regulator of RUNX2 isoform II, is required for the inhibition of ferroptosis and apoptosis through the RUNX2 isoform II/PRDX2 pathway. Consistently, RUNX2 isoform II is overexpressed in OSCC, and associated with OSCC progression and poor prognosis. Collectively, OSCC cancer cells can up-regulate RUNX2 isoform II to inhibit ferroptosis and apoptosis, and facilitate tumorigenesis through the novel HOXA10/RUNX2 isoform II/PRDX2 pathway.”.

      Conclusion: “In conclusion, we identified RUNX2 isoform II as a novel ferroptosis and apoptosis inhibitor in OSCC cells by transactivating PRDX2 expression. RUNX2 isoform II plays oncogenic roles in OSCC. Moreover, we also found that HOXA10 is an upstream regulator of RUNX2 isoform II and is required for suppressing ferroptosis and apoptosis through RUNX2 isoform II and PRDX2.”.

      We apologize for confusing cell proliferation with cell death. We have checked the whole manuscript and corrected the mistakes.

      In Fig. 4A the authors investigate GPX1 expression, whereas GPX4 is often the key ferroptosis regulator, this has to be tested. This is important as the authors also test the effect of the GPX4 inhibitor RSL3, however, the authors do not determine IC<sub50</sub> values of the different cell lines with or without Iso II overexpression or silencing or compared to other RSL3 sensitive or resistant cells. Without this information, no conclusions can be drawn.

      We greatly appreciated the reviewer’s comments. We have performed new experiment to analyze the effect of isoform II on GPX4 expression. We find that isoform II knockdown decreases the expression of GPX4 mRNA and protein (Figure 4-figure supplement 1A and B), and conversely isoform II overexpression promotes GPX4 expression (Figure 4-figure supplement 1C and D), which is consistent with the inhibition of ferroptosis by RUNX2 isoform II. As an upstream positive regulator of RUNX2 isoform II, HOXA10 knockdown also inhibited the expression of GPX4 mRNA and protein (Figure 6-figure supplement 1A and B).

      We also perform new experiment to determine IC<sub50</sub> values of the cells with or without isoform II overexpression or silencing. We find that isoform II overexpression elevates the IC<sub50</sub> values of RSL3 (Figure 3-figure supplement 8A), in contrast, isoform II-knockdown decreases the IC<sub50</sub> values of RSL3 (Figure 3-figure supplement 8B).

      We have added the description of these experiments in Result “RUNX2 isoform II suppresses ferroptosis”, “RUNX2 isoform II promotes the expression of PRDX2” and “HOXA10 inhibits ferroptosis through RUNX2 isoform II” as follow:

      RUNX2 isoform II suppresses ferroptosis: “Isoform II overexpression could elevate the IC<sub50</sub> values of RSL3 (Figure 3-figure supplement 8A), in contrast, isoform II-knockdown decreased the IC<sub50</sub> values of RSL3 (Figure 3-figure supplement 8B).”.

      RUNX2 isoform II promotes the expression of PRDX2: “Firstly, we found that RUNX2 isoform II-knockdown or overexpression could downregulate or upregulate the expression of GPX4 mRNA and protein, respectively (Figure 4-figure supplement 1A-D). In addition to the GPX4, we found that PRDX2 is the most significantly down-regulated gene upon isoform II-knockdown in CAL 27 (Figure 4A).”.

      HOXA10 inhibits ferroptosis through RUNX2 isoform II: “In addition, HOXA10-knockdown could suppress the expression of GPX4 mRNA and protein (Figure 6-figure supplement 1A and B).”.

      In summary, while the authors show that RUNX2 Iso II expression enhances cell survival, the idea that cell death is principally via ferroptosis is not fully established by the data. The authors should modify their conclusions accordingly.

      We agree that RUNX2 isoform II could enhance cell survival via suppressing both ferroptosis and apoptosis. We have revised the description in the title and abstract as follow:

      Abstract: “In the present study, we surprisingly find that RUNX2 isoform II is a novel ferroptosis and apoptosis suppressor. RUNX2 isoform II can bind to the promoter of peroxiredoxin-2 (PRDX2), a ferroptosis inhibitor, and activate its expression. Knockdown of RUNX2 isoform II suppresses cell proliferation in vitro and tumorigenesis in vivo in oral squamous cell carcinoma (OSCC). Interestingly, homeobox A10 (HOXA10), an upstream positive regulator of RUNX2 isoform II, is required for the inhibition of ferroptosis and apoptosis through the RUNX2 isoform II/PRDX2 pathway. Consistently, RUNX2 isoform II is overexpressed in OSCC, and associated with OSCC progression and poor prognosis. Collectively, OSCC cancer cells can up-regulate RUNX2 isoform II to inhibit ferroptosis and apoptosis, and facilitate tumorigenesis through the novel HOXA10/RUNX2 isoform II/PRDX2 pathway.”.

      Conclusion: “In conclusion, we identified RUNX2 isoform II as a novel ferroptosis and apoptosis inhibitor in OSCC cells by transactivating PRDX2 expression. RUNX2 isoform II plays oncogenic roles in OSCC. Moreover, we also found that HOXA10 is an upstream regulator of RUNX2 isoform II and is required for suppressing ferroptosis and apoptosis through RUNX2 isoform II and PRDX2.”

    1. Author Response

      The following is the authors’ response to the current reviews.

      We thank the editors and reviewers for their helpful comments, which have allowed us to improve the manuscript.

      Response to reviewer 2

      We thank the reviewer for this positive feedback, which requires no further revision.

      Response to reviewer 3

      We thank the reviewer for highlighting these additional points and provide further explanations on these below.

      Firstly, we started the analysis from a baseline of year 2000 because the largest international donor (the Global Fund) uses baseline malaria levels in the period 2000-2004 as the basis of their current allocation calculations (The Global Fund, Description of the 2020-2022 Allocation Methodology, December 2019). In the paper we compare our optimal strategy to a simplified version of this method, represented by our “proportional allocation” strategy.

      Even if our simulations started in the year 2015, a direct comparison with the Global Technical Strategy for Malaria 2016-2030 would not be possible due to the different approaches taken. The GTS was developed to progress towards malaria elimination globally and set ambitious targets of at least 90% reduction in malaria case incidence and mortality rates and malaria elimination in at least 35 countries by 2030 compared to 2015. Mathematical modelling at the time suggested that 90% coverage of WHO-recommended interventions (vector control, treatment and seasonal malaria chemoprevention) would be needed to approach this target (Griffin et al. 2016, Lancet Infectious Diseases). The global annual investment requirements to meet GTS targets were estimated at US$6.4 billion by 2020 and US$8.7 billion by 2030 (Patouillard et al. 2017, BMJ Global Health). This strategy therefore considers what resources would be required to achieve a specific global target, but not the optimized allocation of resources.

      Investments into malaria control have consistently been below the estimated requirements for the GTS milestones (World Health Organization 2022, World Malaria Report 2022). In our study, we therefore take a different perspective on how limited budgets can be optimally allocated to a single intervention (insecticide-treated nets) across countries/settings to achieve the best possible outcome for two objectives that are different to the GTS milestones (either minimizing the global case burden, or minimizing both the global case burden and the number of settings not having yet reached a pre-elimination phase). As stated in the discussion, our estimate of allocating 76% of very low budgets to high-transmission settings was similar to the global investment targets estimated for the GTS, where the 20 countries with the highest burden in 2015 were estimated to require 88% of total investments (Patouillard et al. 2017, BMJ Global Health). Nevertheless, we also show that if higher budgets were available, allocating the majority to low-transmission settings co-endemic for P. falciparum and P. vivax would achieve the largest reduction in global case burden. We acknowledge the modelling of a single intervention as one of the key limitations of this analysis, but this simplification was necessary in order to perform the complex optimisation problem. Computationally it would not have been feasible to optimize across a multitude of intervention and coverage combinations.

      A further limitation raised by the reviewer is the lack of cross-species immunity between P. falciparum and P. vivax in our model. While cross-reactivity between antibodies against these two species has been observed in previous studies and the potential implications of this would be important to explore in future work, we did not include it here as little is known to date about the epidemiological interactions between different malaria parasite species (Muh et al. 2020, PLoS Neglected Tropical Diseases).

      Lastly, we did not assume that transmission was homogenous within the four transmission settings in our study (very low, low, moderate, high); transmission dynamics were simulated separately in each country, accounting for heterogeneous mosquito bite exposure. However, results were summarised for the broader transmission settings since many other country-specific factors were not accounted for (see discussion) and the findings should not be used to inform individual country allocation decisions.


      The following is the authors’ response to the original reviews.

      Author response to peer review

      We thank the reviewers for their insightful comments, which raise several important points regarding our study. As the reviewers have recognised, we introduced a number of simplifications in order to perform this complex optimisation problem, such as by restricting the analysis to a single intervention (insecticide-treated nets) and modelling countries at a national level. Despite their clear relevance to the study, computationally it would not have been feasible to run the multitude of scenarios suggested by reviewer 1, which we recognise as a limitation. As such we agree with the assessment that this study primarily represents a thought experiment, based on substantive modelling and aggregate scenario-based analysis, to assess whether current policies are aligned with an optimal allocation strategy or whether there might be a need to consider alternative strategies. The findings are relevant primarily to global funders and should not be used to inform individual country allocation decisions, and also point to avenues for further research. This perspective also underlies our decision to start the analysis from a baseline of year 2000 as opposed to modelling the current 2023 malaria situation: the largest international donor (the Global Fund) uses baseline malaria levels in the period 2000-2004 as the basis of their allocation calculations (The Global Fund, Description of the 2020-2022 Allocation Methodology, December 2019) (1). A simplified version of this method is represented by our “proportional allocation” strategy. We have made several revisions to the manuscript to address the points raised by the reviewers, as detailed below.

      Reviewer #1 (Public Review):

      1. The authors present a back-of-the-envelope exploration of various possible resource allocation strategies for ITNs. They identify two optimal strategies based on two slightly different objective functions and compare 3 simple strategies to the outcomes of the optimal strategies and to each other. The authors consider both P falciparum and P vivax and explore this question at the country level, using 2000 prevalence estimates to stratify countries into 4 burden categories. This is a relevant question from a global funder perspective, though somewhat less relevant for individual countries since countries are not making decisions at the global scale.

      Thank you for this summary of the paper. We agree that our analysis is of relevance to global funders, but is not meant to inform individual country allocation decisions. In the discussion, we now state:

      p. 12 L19: “Therefore, policy decisions should additionally be based on analysis of country-specific contexts, and our findings are not informative for individual country allocation decisions.”

      1. The authors have made various simplifications to enable the identification of optimal strategies, so much so that I question what exactly was learned. It is not surprising that strategies that prioritize high-burden settings would avert more cases.

      Thank you for raising this point. Indeed, several simplifying assumptions were necessary to ensure the computational feasibility of this complex optimization problem. As a result, our study primarily represents a thought experiment to assess whether current policies are aligned with an optimal allocation strategy or whether there might be a need to consider alternative strategies. As now further outlined in the introduction, approaches to this have differed over time and it remains a relevant debate for malaria policy.

      p. 2 L22: “However, there remains a lack of consensus on how best to achieve this longer-term aspiration. Historically, large progress was made in eliminating malaria mainly in lower-transmission countries in temperate regions during the Global Malaria Eradication Program in the 1950s, with the global population at risk of malaria reducing from around 70% of the world population in 1950 to 50% in 2000 (2). Renewed commitment to malaria control in the early 2000s with the Roll Back Malaria initiative subsequently extended the focus to the highly endemic areas in sub-Saharan Africa (3).”

      We believe our findings not only confirm an “expected” outcome – that prioritizing high-burden settings would avert more cases – but also clearly illustrate various consequences of different allocation strategies that are implemented or considered in reality, which may not be so obvious. For example, we found that initially allocating a larger share of the budget to high-transmission countries could be both almost optimal in terms of reducing clinical cases and maximising the number of countries reaching pre-elimination. We also observed a trade-off between reducing burden and reducing the global population at risk (“shrinking the map”) through a focus on near-elimination settings, and estimate the loss in burden reduction when following an elimination target.

      1. Generally, I found much of the text confusing and some concepts were barely explained, such that the logic was difficult to follow.

      Thank you for bringing this to our attention, and we regret to hear the manuscript was confusing to read. We believe that the revisions made as a result of the reviewer comments have now made the manuscript much easier to follow. We additionally passed the manuscript to a colleague to identify confusing passages, and have added a number of sentences to clarify key concepts and improve the structure.

      1. I am not sure why the authors chose to stratify countries by 2000 PfPR estimates and in essence explore a counterfactual set of resource allocation strategies rather than begin with the present and compare strategies moving forward. I would think that beginning in 2020 and modeling forward would be far more relevant, as we can't change the past. Furthermore, there was no comparison with allocations and funding decisions that were actually made between 2000 and 2020ish so the decision to begin at 2000 is rather confusing.

      Thank you for pointing this out. We have now made the rationale for this choice clearer in the manuscript. Our main reason for this was to allow comparison with the Global Fund funding allocation, which is largely based on malaria disease burden in 2000-2004. As stated in the paper, malaria prevalence estimates in the year 2000 are commonly considered to represent a “baseline” endemicity level, before large-scale implementation of interventions in the following decades. In the manuscript, the transmission-related element of the Global Fund allocation algorithm is represented in our “proportional allocation” strategy. Previously this was only mentioned in the methods, but we have now added the following in the results to address this comment of the reviewer:

      p. 6 L12: “Strategies prioritizing high- or low-transmission settings involved sequential allocation of funding to groups of countries based on their transmission intensity (from highest to lowest EIR or vice versa). The proportional allocation strategy mimics the current allocation algorithm employed by the Global Fund: budget shares are mainly distributed according to malaria disease burden in the 2000-2004 period. To allow comparison with this existing funding model, we also started allocation decisions from the year 2000.”

      The Global Fund framework additionally considers economic capacity and other specific factors, and we have now also included a direct comparison with the 2020-2022 Global Fund allocation in Supplementary Figure S12 (see Author response image 1).

      We agree that looking at allocation decisions from 2020 onward would also constitute a very interesting question. However, the high dimensionality in scenarios to consider for this would currently make it computationally infeasible to run on the global level. Not only would it have to include all interventions currently implemented and available for malaria at different levels of coverage, but also the option of scaling down existing interventions. Instead, our priority in this paper was to conduct a thought experiment including both P. falciparum and P. vivax on a large geographical scale.

      Author response image 1.

      Impact of the proportional allocation strategy and the 2020-2022 Global Fund allocation on global malaria cases (panel A) and the total population at risk of malaria (panel B) at varying budgets. Both strategies use the same algorithm for budget share allocation based on malaria disease burden in 2000-2004, but the Global Fund allocation additionally involves an economic capacity component and specific strategic priorities.

      1. I realize this is a back-of-the-envelope assessment (although it is presented to be less approximate than it is, and the title does not reveal that the only intervention strategy considered is ITNs) but the number and scope of modeling assumptions made are simply enormous. First, that modeling is done at the national scale, when transmission within countries is incredibly heterogeneous. The authors note a differential impact of ITNs at various transmission levels and I wonder how the assumption of an intermediate average PfPR vs modeling higher and lower PfPR areas separately might impact the effect of the ITNs.

      Thank you for this comment. We agree the title could be more specific and have changed this to “Resource allocation strategies for insecticide-treated bednets to achieve malaria eradication”.

      Regarding the scale of ITN allocation, it is true that allocation at a sub-national scale could affect the results. However, considering this at a national scale is most relevant for our analysis because this is the scale at which global funding allocation decisions are made in practice. A sentence explaining this has been added in the methods.

      p. 15 L8: “The analysis was conducted on the national level, since this scale also applies to funding decisions made by international donors (1).”

      Further considering different geographical scales would also require introducing other assumptions, for example about how different countries would distribute funding sub-nationally, whether specific countries would take cooperative or competitive approaches to tackle malaria within a region or in border areas, and about delays in the allocation of bednets in specific regions. These interesting questions were outside of the scope of this work, but certainly require further investigation.

      1. Second, the effect of ITNs will differ across countries due to variations in vector and human behavior and variation in insecticide resistance and susceptibility to the ITNs. The authors note this as a limitation but it is a little mind-boggling that they chose not to account for either factor since estimates are available for the historical period over which they are modeling.

      Thank you for pointing this out. We did consider this and mentioned it as a limitation. Nevertheless, the complexity of accounting for this should also be recognised; for example, there is substantial uncertainty about the precise relationship between insecticide resistance and the population-level effect of ITNs (Sherrard-Smith et al., 2022, Lancet Planetary Health) (4). Additionally, our simulations extend beyond the 2000-2023 period so further assumptions about future changes to these factors would also be required. Simplifying assumptions are inherent to all mathematical modelling studies and we consider these particular simplifications acceptable given the high-level nature of the analysis.

      1. Third, the assumption that elimination is permanent and nothing is needed to prevent resurgence is, as the authors know, a vast oversimplification. Since resources will be needed to prevent resurgence, it appears this assumption may have a substantial impact on the authors' results.

      Thank you for this comment. In the discussion, we have now expanded on this:

      p. 13 L3: “While our analysis presents allocation strategies to progress towards eradication, the results do not provide insight into allocation of funding to maintain elimination. In practice, the threat of malaria resurgence has important implications for when to scale back interventions.”

      We believe that from a global perspective, the questions of funding allocation to achieve elimination vs to maintain it can currently still be considered separately given the large time-scales involved. The cost of preventing resurgence is not known, and one major problem in accounting for this would also be to identify relevant timescales to quantify this over.

      1. The decision to group all settings with EIR > 7 together as "high transmission" may perhaps be driven by WHO definitions but at a practical level this groups together countries with EIR 10 and EIR 500. Why not further subdivide this group, which makes sense from a technical perspective when thinking about optimal allocation strategies?

      Thank you for pointing this out. The WHO categories used are better interpreted in terms of the corresponding prevalence, which places countries with a prevalence of over 35% in the high transmission categories (WHO Guidelines for malaria, 31 March 2022) (5). We felt this is appropriate given that we are looking at theoretical global allocation patterns and do not aim to make recommendations for specific groups of countries or individual countries within sub-Saharan Africa that would be distinguished through the use of higher cut-offs. In our analysis, all 25 countries in the high transmission category were located in sub-Saharan Africa.

      1. The relevance of this analysis for elimination is a little questionable since no one eliminates with ITNs alone, to the best of my understanding.

      Thank you for this comment. We indeed state in the paper that ITNs alone are not sufficient to eliminate malaria. However, we still think that our analysis is relevant for elimination by taking a more theoretical perspective on reducing transmission using interventions. Starting from the 2000 baseline (or current levels) globally, large-scale transmission reductions such as those achieved by mass ITN distribution still represent the first key step on the path to malaria eradication, as shown in previous modelling work (Griffin et al., 2016, Lancet Infectious Diseases) (6). In the final phase of elimination, the WHO also recommends the addition of more targeted and reactive interventions (WHO Guidelines for malaria, 31 March 2022) (5). Our changes to the title of the article (“Resource allocation strategies for insecticide-treated bednets to achieve malaria eradication”) should now better reflect that we consider ITNs as just one necessary component to achieve malaria eradication.

      Reviewer #2 (Public Review):

      1. Schmit et al. analyze and compare different strategies for the allocation of funding for insecticide-treated nets (ITNs) to reduce the global burden of malaria. They use previously published models of Plasmodium falciparum and Plasmodium vivax malaria transmission to quantify the effect of ITN distribution on clinical malaria numbers and the population at risk. The impact of different resource allocation strategies on the reduction of malaria cases or a combination of malaria cases and achieving pre-elimination is considered to determine the optimal strategy to allocate global resources to achieve malaria eradication.

      Strengths:

      Schmit et al. use previously published models and optimization for rigorous analysis and comparison of the global impact of different funding allocation strategies for ITN distribution. This provides evidence of the effect of three different approaches: the prioritization of high-transmission settings to reduce the disease burden, the prioritization of low-transmission settings to "shrink the malaria map", and a resource allocation proportional to the disease burden.

      Thank you for providing this summary and outline of the strengths of the paper.

      1. Weaknesses:

      The analysis and optimization which provide the evidence for the conclusions and are thus the central part of this manuscript necessitate some simplifying assumptions which may have important practical implications for the allocation of resources to reduce the malaria burden. For example, seasonality, mosquito species-specific properties, stochasticity in low transmission settings, and changing population sizes were not included. Other challenges to the reduction or elimination of malaria such as resistance of parasites and mosquitoes or the spread of different mosquito species as well as other beneficial interventions such as indoor residual spraying, seasonal malaria chemoprevention, vaccinations, combinations of different interventions, or setting-specific interventions were also not included. Schmit et al. clearly state these limitations throughout their manuscript.

      The focus of this work is on ITN distribution strategies, other interventions are not considered. It also provides a global perspective and analysis of the specific local setting (as also noted by Schmit et al.) and different interventions as well as combinations of interventions should also be taken into account for any decisions.

      Thank you for raising these points. As outlined at the beginning of our response, for computational reasons we indeed had to introduce several simplifying assumptions to perform this complex optimisation problem. As a result of these factors you highlighted, our study should primarily be interpreted as a thought experiment to assess whether current policies are aligned with an optimal allocation strategy or whether there might be a need to consider alternative strategies. The findings are relevant primarily to global funders and should not be used to inform individual country allocation decisions, which we have further clarified in the manuscript.

      1. Nonetheless, the rigorous analysis supports the authors' conclusions and provides evidence that supports the prioritization of funding of ITNs for settings with high Plasmodium falciparum transmission. Overall, this work may contribute to making evidence-based decisions regarding the optimal prioritization of funding and resources to achieve a reduction in the malaria burden.

      Thank you for this positive assessment of our work.

      Reviewer #1 (Recommendations For The Authors):

      1. L144: last paragraph, the focus on endemic equilibrium: I did not really understand this, when 39 years is mentioned later is that a different analysis? How are cases averted calculated in a time-agnostic endemic equilibrium analysis? Perhaps a little more detail here would be helpful.

      A further explanation of this has been added in the results and methods.

      p. 8 L 22: “To evaluate the robustness of the results, we conducted a sensitivity analysis on our assumption on ITN distribution efficiency. Results remained similar when assuming a linear relationship between ITN usage and distribution costs (Figure S10). While the main analysis involves a single allocation decision to minimise long-term case burden (leading to a constant ITN usage over time in each setting irrespective of subsequent changes in burden), we additionally explored an optimal strategy with dynamic re-allocation of funding every 3 years to minimise cases in the short term.”

      p. 17 L25: “To ensure computational feasibility, 39 years was used as it was the shortest time frame over which the effect of re-distribution of funding from countries having achieved elimination could be observed.”

      p. 18 L 9: “Global malaria case burden and the population at risk were compared between baseline levels in 2000 and after reaching an endemic equilibrium under each scenario for a given budget.”

      1. L148: what is proportional allocation by disease burden and how is that different from prioritizing high-transmission settings?

      Further details have been added in the text.

      p. 6 L12: “Strategies prioritizing high- or low-transmission settings involved sequential allocation of funding to groups of countries based on their transmission intensity (from highest to lowest EIR or vice versa). The proportional allocation strategy mimics the current allocation algorithm employed by the Global Fund: budget shares are mainly distributed according to malaria disease burden in the 2000-2004 period. To allow comparison with this existing funding model, we also started allocation decisions from the year 2000.”

      1. L198-9: did low transmission settings get the majority of funding at intermediate and maximum budgets because they have the most population (I think so, based on Fig 1)?

      Yes, this is correct. We state in the results: “the optimized distribution of funding to minimize clinical burden depended on the available global budget and was driven by the setting-specific transmission intensity and the population at risk”.

      1. L206: what is ITN distribution efficiency? This is not explained. What is the 39-year period? Why this duration?

      Further explanations have been added in the results section, which were previously only detailed in the methods:

      p. 8 L 22: “To evaluate the robustness of the results, we conducted a sensitivity analysis on our assumption on ITN distribution efficiency. Results remained similar when assuming a linear relationship between ITN usage and distribution costs (Figure S10)."

      p. 17 L25: “To ensure computational feasibility, 39 years was used as it was the shortest time frame over which the effect of re-distribution of funding from countries having achieved elimination could be observed.”

      1. L218: what is "no intervention with a high budget"? is this a phrasing confusion?

      Yes, this has been changed.

      p. 9 L14: “We estimated that optimizing ITN allocation to minimize global clinical incidence could, at a high budget, avert 83% of clinical cases compared to no intervention.”

      1. L235-7: on comparing these results to previous work on the 20 highest-burden countries: is the definition of "high" similar enough across these studies that this is a relevant comparison?

      We believe this is reasonably comparable, as looking at the 20 highest-burden countries encompasses almost the entire high-transmission group in our work (25 countries in total), on which the comparison is made.

      1. L267-70: I didn't understand this sentence at all.

      Thanks for flagging this. The sentence referred to is: “Allocation proportional to disease burden did not achieve as great an impact as other strategies because the funding share assigned to settings was constant irrespective of the invested budget and its impact, and we did not reassign excess funding in high-transmission settings to other malaria interventions.”

      The previously mentioned added details on the proportional allocation strategy in the manuscript should now make this clearer, together with this clarification:

      p. 11 L17: “In modelling this strategy, we did not reassign excess funding in high-transmission settings to other malaria interventions, as would likely occur in practice.”

      For proportional allocation, a fixed proportion of the budget is calculated for each country based on disease burden, as described in the Global Fund allocation documentation (see Methods). However, since ITNs are the only intervention considered, this leads to a higher budget being allocated than is needed in some countries (i.e. where more funding doesn’t translate into further health gains).

      1. L339 EIR range: 80 is high at the country level but areas within countries probably went as high as 500 back in 2000. How does this affect the modeled estimates of ITN impact?

      The question of sub-national differences in transmission has been addressed in the public review comments. Briefly, we consider the national scale to be most relevant for our analysis because this is the scale at which global funding allocation decisions are made in practice. Although, as you correctly point out, the EIR affects ITN impact, it is not possible to conclude what the average effect of this would be on the country level without considering the following factors and introducing further assumptions on these: how would different countries distribute funding sub-nationally? Which countries would take cooperative or competitive approaches to tackle malaria within a region or in border areas? Would there be delays in the allocation of bednets in specific regions? These interesting questions were outside of the scope of this work, but certainly require further investigation.

      1. L347 population size constant: births and deaths are still present, is that right? Unclear from this sentence

      Yes, this is correct. Full details on the model can be found in the Supplementary Materials.

      1. L370 estimating ITN distribution required to achieve simulated population usage: is this a single relationship for all of Africa? Is it based on ITNs distributed 2:1 -> % access -> % usage? So it accounts for allocation inefficiency?

      Yes, this is represented by a single relationship for all of Africa to account for allocation inefficiency and is based on observed patterns across the continent and methodology developed in a previous publication (Bertozzi-Villa et al., 2021, Nature Communications) (7). Full details can be found in the Supplementary Materials (“Relationship between distribution and usage of insecticide-treated nets (ITNs)”, p. 21).

      1. L375: the ITN unit cost is assumed constant across countries and time (I think, it doesn't say explicitly), is this a good assumption?

      Yes, this is correct. We consider this a reasonable assumption within the scope of the paper. While delivery costs likely vary across countries, international funders usually have pooled procurement mechanisms for ITNs (The Global Fund, 2023, Pooled Procurement Mechanism Reference Pricing: Insecticide-Treated Nets).

      1. L399: "single allocation of a constant ITN usage" it is not explained what exactly this means

      Further explanations have been added in the manuscript.

      p. 8 L24: “While the main analysis involves a single allocation decision to minimise long-term case burden (leading to a constant ITN usage over time in each setting irrespective of subsequent changes in burden), we additionally explored an optimal strategy with dynamic re-allocation of funding every 3 years to minimise cases in the short term.”

      Reviewer #2 (Recommendations For The Authors):

      1. Additionally to the public comments, the only major comment is that in this reviewer's opinion, the focus on ITNs as the only intervention should be made clearer at different places in the manuscript (e.g. in the discussion lines 303-304). Otherwise, there are only some minor comments (see below).

      We have now modified the following sentence and also included this suggestion in the title (“Resource allocation strategies for insecticide-treated bednets to achieve malaria eradication”).

      p. 13 L8: “Our analysis demonstrates the most impactful allocation of a global funding portfolio for ITNs to reduce global malaria cases.”

      1. Minor comments:
      2. It may be of interest to compare the maximum budget obtained from the optimization with other estimates of required funding and actual available funding.

      Thank you for this interesting suggestion. Our maximum budget estimates are similar to the required investments projected for the WHO Global Technical Strategy: US$3.7 billion for ITNs in our analysis compared to between US$6.8 and US$10.3 billion total annual resources between 2020 and 2030, of which an estimated 55% would be required for (all) vector control (US$3.7 - US$5.7 billion) (Patouillard et al., 2016, BMJ Global Health) (8). However, it is well known that current spending is far below these requirements: total investments in malaria were estimated to be about US$3.1 billion per year in the last 5 years (World Health Organization, 2022, World Malaria Report 2022) (9).

      1. Line 177: should "Figure S7" be bold?

      Yes, this has been corrected.

      1. Line 218: what does "no intervention with high budget" mean? Should this simply be "no intervention"?

      This has been changed.

      p. 9 L14: “We estimated that optimizing ITN allocation to minimize global clinical incidence could, at a high budget, avert 83% of clinical cases compared to no intervention.”

      1. In this reviewer's opinion it would be easier for the reader if the weighting term in the objective function would be added in the Materials and Methods section. The weighting could be added without extending the section substantially and the explanation in lines 390-393 may be easier to understand.

      Thank you for this suggestion. We agree and have added this in the main manuscript.

      References

      1. The Global Fund. Description of the 2020-2022 Allocation Methodology 2019 [Available from: https://www.theglobalfund.org/media/9224/fundingmodel_2020-2022allocations_methodology_en.pdf.

      2. Hay SI, Guerra CA, Tatem AJ, Noor AM, Snow RW. The global distribution and population at risk of malaria: past, present, and future. Lancet Infect Dis. 2004;4(6):327-36.

      3. Feachem RGA, Phillips AA, Hwang J, Cotter C, Wielgosz B, Greenwood BM, et al. Shrinking the malaria map: progress and prospects. The Lancet. 2010;376(9752):1566-78.

      4. Sherrard-Smith E, Winskill P, Hamlet A, Ngufor C, N'Guessan R, Guelbeogo MW, et al. Optimising the deployment of vector control tools against malaria: a data-informed modelling study. The Lancet Planetary Health. 2022;6(2):e100-e9.

      5. World Health Organization. WHO Guidelines for malaria, 31 March 2022. Geneva: World Health Organization; 2022. Contract No.: Geneva WHO/UCN/GMP/ 2022.01 Rev.1.

      6. Griffin JT, Bhatt S, Sinka ME, Gething PW, Lynch M, Patouillard E, et al. Potential for reduction of burden and local elimination of malaria by reducing Plasmodium falciparum malaria transmission: a mathematical modelling study. The Lancet Infectious Diseases. 2016;16(4):465-72.

      7. Bertozzi-Villa A, Bever CA, Koenker H, Weiss DJ, Vargas-Ruiz C, Nandi AK, et al. Maps and metrics of insecticide-treated net access, use, and nets-per-capita in Africa from 2000-2020. Nature Communications. 2021;12(1):3589.

      8. Patouillard E, Griffin J, Bhatt S, Ghani A, Cibulskis R. Global investment targets for malaria control and elimination between 2016 and 2030. BMJ global health. 2017;2(2):e000176.

      9. World Health Organization. World malaria report 2022. Geneva: World Health Organization; 2022. Report No.: 9240064893.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      (1) The authors claimed that they examined the arterial and venous identity of the hyperbranched vessels via live imaging analysis of the high glucose-treated Tg(flt1:YFP::kdrl:ras-mCherry) line, and revealed that the hyperbranched ectopic vessels comprised arteries and veins. That's good, of course. However, there are no relevant results in Figure 2. Please revise it.

      Thank you very much for the suggestion. We’ve added this part of the results in Figure 2i and j.

      (2) In Figures 3f and 3g, some of the ECs protruded long and intricate sprouts, and nearly all the ECs within an ISV underwent the outgrowth of filopodia in some extreme cases (Figure 3g), suggesting that the high glucose treatment induced the endothelial differentiation into tip cell-like cells. The findings are surprising and interesting. In order to further confirm the author's conclusion, in situ hybridization experiments are more appropriate to show the expression changes of tip cell-like cell marker genes in the high glucose-treated embryos.

      Thank you very much for your constructive suggestions. We have performed the analysis of single-cell RNA-seq data, and the results showed that the tip cell marker genes such as esm1, apln, and cxcr4a were significantly up-regulated in arterial and capillary ECs after high glucose treatment. The results were integrated into Figure 3 of the revised manuscript.

      (3) Embryos treated with AS1842856 or injected with foxo1a-MO exhibited excessive angiogenesis (Figure 5g-i), suggesting the transcription activity of foxo1 is required to maintain the quiescent state of endothelial cells. Did the downregulation of foxo1a lead to the differentiation of endothelial cells into tip-cell-like cells?

      Thank you very much for the question. We examined our results carefully and marked these tip cell-like cells with arrow heads in Figure 5h of the revised manuscript.

      (4) Foxo1a was significantly downregulated in arterial and capillary ECs after high glucose treatment (Figure 5c-e). More importantly, whether overexpression of foxo1a in the high glucose-treated embryos could eliminate the hyperangiogenic characteristics?

      Thank you for the great questions. We performed rescue experiments, and the results suggested that the overexpression of foxo1a partially mitigated the excessive angiogenesis induced by high glucose treatment. These results were integrated into Figure 6 of the revised manuscript.

      (5) The authors' results found that foxo1a was enriched in both the predicted binding sites of marcksl1a by ChIP-PCR experiments (Figure 7d). This result is reliable. However, whether these two sites are important for marcksl1a gene transcription needs to be confirmed by relevant experiments, such as luciferase reporter assays.

      We’ve performed the luciferase reporter assays and added these data to Figure 8f and g.

      Reviewer #2:

      Suggested major experiments:

      (1) A previous study (Jorgens et al., Diabetes 64, 2015) reported that high tissue glucose levels increased reactive dicarbonyl methylglyoxal (MG) concentrations in zebrafish embryos and triggered the formation of hyperbranched ISVs. Additionally, they illustrated that MG induced the vascular hyperbranching phenotype via enhancing phosphorylated VEGFR and pAKT signaling cascade. The authors must examine whether both pVEGFR and pAKT are increased in noncaloric monosaccharide (NMS)-treated embryos. The authors need also to test the crosstalks between VEGFR/AKT signaling and foxo1a-Marcksl1a pathway in glucose or NMS-treated embryos.

      Thank you very much for your suggestion. We treated the embryos with AS1842856 (foxo1 inhibitor) and Lenvatinib (VEGFR inhibitor), and the results showed that Lenvatinib treatment attenuated the excessive angiogenesis induced by foxo1 inhibition. We also examined the expression level of vegfaa after AS1842856 treatment; the results suggested that foxo1 inhibition did not affect the expression of vegfaa.

      Author response image 1.

      (2) In this manuscript, the authors performed single endothelial cell sequencing in glucose-treated embryos, and found reduced foxo1a expression and upregulated marcksl1a . Based on these data, the authors demonstrated that glucose and NMS-induced excessive angiogenesis through the foxo1a-marcksl1a pathway. The authors must conduct endothelial scRNA-seq in NMS-treated embryos, and analyze and compare the datasets with scRNA-seq datasets from glucose-treated endothelial cells, considering the focus of the paper. In addition, ASBs have been suggested as healthy alternatives to sugar-sweetened beverages. The authors also need to examine carefully whether metabolic gene programs are altered in glucose-treated endothelial cells, which was mentioned in Jorgens et al paper.

      Thank you very much for your constructive suggestions. We have performed the whole embryo transcriptome sequencing after high D-glucose and L-glucose treatment. We analyzed and compared the differentially expressed genes of control, high D-glucose-treated, and high L-glucose-treated embryos. The results revealed that 1259 and 1074 genes were up-regulated significantly in high D-glucose and high L-glucose treated embryos, respectively, compared with control.

      We also analyzed some metabolic-related genes and found that some genes involved in gluconeogenesis, glycolysis, and oxidative phosphorylation were significantly changed. The results were integrated into supplementary Figure12 and 13 of the revised manuscript.

      (3) Glucose or NMS treatments induce the hyperbranched endothelial vessels from the dorsal aorta and ISVs but not cardinal veins. In Figure 4i, the arterial and capillary cell population is increased in glucose-treated embryos, but the venous cell population seems to be reduced. The authors need to check whether arterial/venous differentiation and proliferation are affected in glucose- and NMS-treated embryos.

      Thank you for your suggestions. We examined arterial/venous differentiation based on Tg(flt1BAC:YFP::kdrl:ras-mCherry) zebrafish line, in which the YFP is mainly expressed in arterial Endothelial cells. We found the endothelial cells of excessively formed blood vessels induced by high glucose treatment are mainly arterial (Figure 2j). This might explain why the arterial and capillary cell population was increased in glucose-treated embryos.

      (4) The manuscript proposes that excessively branched vessels within ISVs arise from the ectopic activation of quiescent endothelial cells (ECs) into tip cells. To confirm this process, the authors need to detect some specific tip cell markers to demonstrate their ectopic activation.

      Thank you for your constructive suggestions. We have performed the analysis of single-cell RNA-seq data, and the results showed that the tip cell marker genes such as esm1, apln, and cxcr4a were significantly up-regulated in arterial and capillary ECs after high glucose treatment. The results were integrated into Figure 3 of the revised manuscript.

      (5) Disaccharides such as lactose, maltose, and sucrose did not exhibit a notable induction of excessive angiogenic phenotype. However, the specific treatment concentrations utilized in the study were not delineated. Therefore, further investigation is warranted to determine whether increased disaccharide concentrations can cause vascular hyperbranching phenotype.

      Thank you very much for the suggestions. We’ve described the concentrations of monosaccharides and disaccharides in the materials and methods section of the revised manuscript. Following the suggestion, we treated zebrafish embryos with a higher concentration of the disaccharide. The results showed that higher concentrations of disaccharide treatment also caused excessive angiogenesis in zebrafish embryos. These results were integrated into supplementary Figure 8 of the revised manuscript.

      (6) The authors claim that glucose and NMS (such as L-glucose) induce excessive angiogenesis through the foxo1a-marcksl1a pathway. Following exposure to elevated glucose levels, a substantial down-regulation of foxo1a was observed in arterial and capillary endothelial cells. This down-regulation led to the release of foxo1a inhibition on marccksl1a, subsequently resulting in an augmented expression of marccksl1a and the manifestation of a vascular phenotype. Consequently, it is imperative to investigate whether the foxo1a overexpression can attenuate marccksl1a expression and mitigate the vascular phenotype induced by monosaccharides. Sufficient data support is needed for the conclusion that monosaccharides induce angiogenesis via the foxo1a-marcksl1a pathway.

      Thank you very much for your constructive suggestions.

      We confirmed the expression of marcksl1a in foxo1a-overexpressed embryos. The results indicated that foxo1a overexpression significantly attenuated marcksl1a expression. The results were integrated into Figure 8c. We also performed the rescue experiments, which indicated that overexpression of foxo1a partially mitigated the excessive angiogenesis induced by high glucose treatment. These results were integrated into Figure 6 of the revised manuscript.

      Minor corrections:

      (1) Figure 2i, j has no corresponding graphs.

      We’ve made the change in Figure 2.

      (2) Figure 2h has no vertical coordinates.

      We’ve made the change in Figure 2.

      (3) All Figures should be referenced within the manuscript.

      We’ve checked our manuscript carefully and made the corrections.

      (4) The concentrations of monosaccharides and disaccharides employed in this study must be distinctly elucidated within the manuscript and annotated using the internationally recognized unit notation.

      We’ve checked our manuscript carefully and described the concentrations of monosaccharides and disaccharides in the revised materials and methods section.

      Reviewer #3:

      (1) A possible limitation of the study is that the mechanism leading to angiogenesis in the retinal circulation and in peripheral vasculature is certainly different as diabetes is associated with excessive angiogenesis in the retina and a defect in angiogenesis in the peripheral circulation as shown by a reduced post-ischemic revascularization (see Silvestre et al.: DOI: 10.1152/physrev.00006.2013).

      Thank you very much for your suggestions. As you said, the peripheral blood vessel model in this study does not fully represent individuals with diabetic retinopathy, which is a limitation. However, from a specific view, the phenotype and mechanism of excessive angiogenesis of peripheral blood vessels in the high glucose model may provide a reference for excessive angiogenesis in the retina; they might have similar etiology and regulation mechanisms in excessive angiogenesis.

      (2) Another limitation is that angiogenesis in the embryo is not fully representative of the excessive angiogenesis observed in the diabetic retinal circulation. It would be of interest to analyse the retinal vascular tree in adult fish submitted to high glucose and to ASB.

      In our future study, we will try to observe the angiogenesis phenotype in the diabetic retina and improve the disease model.

      (3) Line 52: "Endothelial cell dysfunction (ECD)" instead of "Endothelial dysfunction (ECD)".

      We’ve made the correction in the revised manuscript.

      (4) The authors should elaborate more on the observation showing that L-glucose, D-mannose, D-ribose, and L-arabinose, which could not be digested by animals, also induce excessive angiogenesis. Is the effect indirect?

      In the current manuscript, we conducted an in vivo live imaging analysis to show the phenotype of excessive angiogenesis caused by those noncaloric monosaccharides. However, we did not find differences in the phenotypes of embryos treated with noncaloric and caloric monosaccharides. Therefore, we supposed that the mechanisms underlying the phenotypes were similar. The effect might be indirect.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This work presents H3-OPT, a deep learning method that effectively combines existing techniques for the prediction of antibody structure. This work is important because the method can aid the design of antibodies, which are key tools in many research and industrial applications. The experiments for validation are solid.

      Comments to Author:

      Several points remain partially unclear, such as:

      1). Which examples constitute proper validation;

      Thank you for your kind reminder. We have modified the text of the experiments for validation to identify which examples constitute proper validation. We have corrected the “Finally, H3-OPT also shows lower Cα-RMSDs compared to AF2 or tFold-Ab for the majority of targets in an expanded benchmark dataset, including all antibody structures from CAMEO 2022” into “Finally, H3-OPT also shows lower Cα-RMSDs compared to AF2 or tFold-Ab for the majority (six of seven) of targets in an expanded benchmark dataset, including all antibody structures from CAMEO 2022” and added the following sentence in the experimental validation section of our revised manuscript to clarify which examples constitute proper validation: “AlphaFold2 outperformed IgFold on these targets”.

      2) What the relevance of the molecular dynamics calculations as performed is;

      Thank you for your comment, and I apologize for any confusion. The goal of our molecular dynamics calculations is to compare the differences in binding affinities, an important issue of antibody engineering, between AlphaFold2-predicted complexes and H3-OPT-predicted complexes. Molecular dynamics simulations enable the investigation of the dynamic behaviors and interactions of these complexes over time. Unlike other tools for predicting binding free energy, MM/PBSA or MM/GBSA calculations provide dynamic properties of complexes by sampling conformational space, which helps in obtaining more accurate estimates of binding free energy. In summary, our molecular dynamics calculations demonstrated that the binding free energies of H3-OPT-predicted complexes are closer to those of native complexes. We have included the following sentence in our manuscript to provide an explanation of the molecular dynamics calculations: “Since affinity prediction plays a crucial role in antibody therapeutics engineering, we performed MD simulations to compare the differences in binding affinities between AF2-predicted complexes and H3-OPT-predicted complexes.”.

      3) The statistics for some of the comparisons;

      Thank you for the comment. We have incorporated statistics for some of the comparisons in the revised version of our manuscript and added the following sentence in the Methods section: “We conducted two-sided t-test analyses to assess the statistical significance of differences between the various groups. Statistical significance was considered when the p-values were less than 0.05. These statistical analyses were carried out using Python 3.10 with the Scipy library (version 1.10.1).”.

      4) The lack of comparison with other existing methods.

      We appreciate your valuable comments and suggestions. Conducting comparisons with a broader set of existing methods can further facilitate discussions on the strengths and weaknesses of each method, as well as the accuracy of our method. In our study, we conducted a comparison of H3-OPT with many existing methods, including AlphaFold2, HelixFold-Single, ESMFold, and IgFold. We demonstrated that several protein structure prediction methods, such as ESMFold and HelixFold-Single, do not match the accuracy of AlphaFold2 in CDR-H3 prediction. Additionally, we performed a detailed comparison between H3-OPT, AlphaFold2, and IgFold (the latest antibody structure prediction method) for each target.

      We sincerely thank the comment and have introduced a comparison with OmegaFold. The results have been incorporated into the relevant sections (Fig 4a-b) of the revised manuscript.

      Author response image 1.

      Public Reviews

      Comments to Author:

      Reviewer #1 (Public Review):

      Summary:

      The authors developed a deep learning method called H3-OPT, which combines the strength of AF2 and PLM to reach better prediction accuracy of antibody CDR-H3 loops than AF2 and IgFold. These improvements will have an impact on antibody structure prediction and design.

      Strengths:

      The training data are carefully selected and clustered, the network design is simple and effective.

      The improvements include smaller average Ca RMSD, backbone RMSD, side chain RMSD, more accurate surface residues and/or SASA, and more accurate H3 loop-antigen contacts.

      The performance is validated from multiple angles.

      Weaknesses:

      1) There are very limited prediction-then-validation cases, basically just one case.

      Thanks for pointing out this issue. The number of prediction-then-validation cases is helpful to show the generalization ability of our model. However, obtaining experimental structures is both costly and labor-intensive. Furthermore, experimental validation cases only capture a limited portion of the sequence space in comparison to the broader diversity of antibody sequences.

      To address this challenge, we have collected different datasets to serve as benchmarks for evaluating the performance of H3-OPT, including our non-redundant test set and the CAMEO dataset. The introduction of these datasets allows for effective assessments of H3-OPT’s performance without biases and tackles the obstacle of limited prediction-then-validation cases.

      Reviewer #2 (Public Review):

      This work provides a new tool (H3-Opt) for the prediction of antibody and nanobody structures, based on the combination of AlphaFold2 and a pre-trained protein language model, with a focus on predicting the challenging CDR-H3 loops with enhanced accuracy than previously developed approaches. This task is of high value for the development of new therapeutic antibodies. The paper provides an external validation consisting of 131 sequences, with further analysis of the results by segregating the test sets into three subsets of varying difficulty and comparison with other available methods. Furthermore, the approach was validated by comparing three experimentally solved 3D structures of anti-VEGF nanobodies with the H3-Opt predictions

      Strengths:

      The experimental design to train and validate the new approach has been clearly described, including the dataset compilation and its representative sampling into training, validation and test sets, and structure preparation. The results of the in-silico validation are quite convincing and support the authors' conclusions.

      The datasets used to train and validate the tool and the code are made available by the authors, which ensures transparency and reproducibility, and allows future benchmarking exercises with incoming new tools.

      Compared to AlphaFold2, the authors' optimization seems to produce better results for the most challenging subsets of the test set.

      Weaknesses:

      1) The scope of the binding affinity prediction using molecular dynamics is not that clearly justified in the paper.

      We sincerely appreciate your valuable comment. We have added the following sentence in our manuscript to justify the scope of the molecular dynamics calculations: “Since affinity prediction plays a crucial role in antibody therapeutics engineering, we performed MD simulations to compare the differences in binding affinities between AF2-predicted complexes and H3-OPT-predicted complexes.”.

      2) Some parts of the manuscript should be clarified, particularly the ones that relate to the experimental validation of the predictions made by the reported method. It is not absolutely clear whether the experimental validation is truly a prospective validation. Since the methodological aspects of the experimental determination are not provided here, it seems that this may not be the case. This is a key aspect of the manuscript that should be described more clearly.

      Thank you for the reminder about experimental validation of our predictions. The sequence identities of the wild-type nanobody VH domain and H3 loop, when compared with the best template, are 0.816 and 0.647, respectively. As a result, these mutants exhibited low sequence similarity to our dataset, indicating the absence of prediction bias for these targets. Thus, H3-OPT outperformed IgFold on these mutants, demonstrating our model's strong generalization ability. In summary, the experimental validation actually serves as a prospective validation.

      Thanks for your comments, we have added the following sentence to provide the methodological aspects of the experimental determination: “The protein expression, purification and crystallization experiments were described previously. The proteins used in the crystallization experiments were unlabeled. Upon thawing the frozen protein on ice, we performed a centrifugation step to eliminate any potential crystal nucleus and precipitants. Subsequently, we mixed the protein at a 1:1 ratio with commercial crystal condition kits using the sitting-drop vapor diffusion method facilitated by the Protein Crystallization Screening System (TTP LabTech, mosquito). After several days of optimization, single crystals were successfully cultivated at 21°C and promptly flash-frozen in liquid nitrogen. The diffraction data from various crystals were collected at the Shanghai Synchrotron Research Facility and subsequently processed using the aquarium pipeline.”

      3) Some Figures would benefit from a clearer presentation.

      We sincerely thanks for your careful reading. According to your comments, we have made extensive modifications to make our presentation more convincing and clearer (Fig 2c-f).

      Author response image 2.

      Reviewer #3 (Public Review):

      Summary:

      The manuscript introduces a new computational framework for choosing 'the best method' according to the case for getting the best possible structural prediction for the CDR-H3 loop. The authors show their strategy improves on average the accuracy of the predictions on datasets of increasing difficulty in comparison to several state-of-the-art methods. They also show the benefits of improving the structural predictions of the CDR-H3 in the evaluation of different properties that may be relevant for drug discovery and therapeutic design.

      Strengths:

      The authors introduce a novel framework, which can be easily adapted and improved. The authors use a well-defined dataset to test their new method. A modest average accuracy gain is obtained in comparison to other state-of-the art methods for the same task while avoiding testing different prediction approaches.

      Weaknesses:

      1) The accuracy gain is mainly ascribed to easy cases, while the accuracy and precision for moderate to challenging cases are comparable to other PLM methods (see Fig. 4b and Extended Data Fig. 2). That raises the question: how likely is it to be in a moderate or challenging scenario? For example, it is not clear whether the comparison to the solved X-ray structures of anti-VEGF nanobodies represents an easy or challenging case for H3-OPT. The mutant nanobodies seem not to provide any further validation as the single mutations are very far away from the CDR-H3 loop and they do not disrupt the structure in any way. Indeed, RMSD values follow the same trend in H3-OPT and IgFold predictions (Fig. 4c). A more challenging test and interesting application could be solving the structure of a designed or mutated CDR-H3 loop.

      Thank you for your rigorous consideration. When the experimental structure is unavailable, it is difficult to directly determinate whether the target is easy-to-predict or challenging. We have conducted our non-redundant test set in which the number of easy-to-predict targets is comparable to the other two groups. Due to the limited availability of experimental antibody structures, especially nanobody structures, accurately predicting CDR-H3 remains a challenge. In our manuscript, we discuss the strengths and weakness of AlphaFold2 and other PLM-based methods, and we introduce H3-OPT as a comprehensive solution for antibody CDR3 modeling.

      We also appreciate your comment on experimental structures. We fully agree with your opinion and made attempts to solve the experimental structures of seven mutants, including two mutants (Y95F and Q118N) which are close to CDR-H3 loop. Unfortunately, we tried seven different reagent kits with a total of 672 crystallization conditions, but were unable to obtain crystals for these mutants. Despite the mutants we successfully solved may not have significantly disrupted the structures of CDR-H3 loops, they have still provided valuable insights into the differences between MSA-based methods and MSA-free methods (such as IgFold) for antibody structure modeling.

      We have further conducted a benchmarking study using two examples, PDBID 5U15 and 5U0R, both consisting of 18 residues in CDR-H3, to evaluate H3-OPT's performance in predicting mutated H3 loops. In the first case (target 5U15), AlphaFold2 failed to provide an accurate prediction of the extended orientation of the H3 loop, resulting in a less accurate prediction (Cα-RMSD = 10.25 Å) compared to H3-OPT (Cα-RMSD = 5.56 Å). In the second case (target 5U0R, a mutant of 5U15 in CDR3 loop), AlphaFold2 and H3-OPT achieved Cα-RMSDs of 6.10 Å and 4.25 Å, respectively. Additionally, the Cα-RMSDs of OmegaFold predictions were 8.05 Å and 9.84 Å, respectively. These findings suggest that both AlphaFold2 and OmegaFold effectively captured the mutation effects on conformations but achieved lower accuracy in predicting long CDR3 loops when compared to H3-OPT.

      2) The proposed method lacks a confidence score or a warning to help guide the users in moderate to challenging cases.

      We appreciate your suggestions and we have trained a separate module to predict confidence scores. We used the MSE loss for confidence prediction, where the label error was calculated as the Cα deviation of each residue after alignment. The inputs of this module are the same as those used for H3-OPT, and it generates a confidence score ranging from 0 to 100.

      3) The fact that AF2 outperforms H3-OPT in some particular cases (e.g. Fig. 2c and Extended Data Fig. 3) raises the question: is there still room for improvements? It is not clear how sensible is H3-OPT to the defined parameters. In the same line, bench-marking against other available prediction algorithms, such as OmegaFold, could shed light on the actual accuracy limit. We totally understand your concern. Many papers have suggested that PLM-based models are computationally efficient but may have unsatisfactory accuracy when high-resolution templates and MSA are available (Fast, accurate antibody structure prediction from deep learning on massive set of natural antibodies, Ruffolo, J. A. et al, 2023). However, the accuracy of AF2 decreased substantially when the MSA information is limited. Therefore, we directly retained high-confidence structures of AF2 and introduced a PSPM to improve the accuracy of the targets with long CDR-H3 loops and few sequence homologs. The improvement in mean Cα-RMSD demonstrated the room for accurately predicting CDR-H3 loops.

      We also appreciate your kind comment on defined parameters. In fact, once a benchmark dataset is established, determining an optimal cutoff value through parameter searching can indeed further improve the performance of H3-OPT in CDR3 structure prediction. However, it is important to note that this optimal cutoff value heavily depends on the testing dataset being used. Therefore, we provide a recommended cutoff value and offer a program interface for users who wish to manually define the cutoff value based on their specific requirements. Here, we showed the average Cα-RMSDs of our test set under different confidence cutoffs and the results have been added in the text accordingly.

      Author response table 1.

      We also appreciate your reminder, and we have conducted a benchmark against OmegaFold. The results have been included in the manuscript (Fig 4a-b).

      Author response image 3.

      Reviewer #1 (Recommendations For The Authors):

      1) In Fig 3a, please also compare IgFold and H3-OPT (merge Fig. S2 into Fig 3a)

      In Fig 3b, please separate Sub2 and Sub3, and add IgFold's performance.

      Thank you very much for your professional advice. We have made revisions to the figures based on your suggestions.

      Author response image 4.

      2) For the three experimentally solved structures of anti-VEGF nanobodies, what are the sequence identities of the VH domain and H3 loop, compared to the best available template? What is the length of the H3 loop? Which category (Sub1/2/3) do the targets belong to? What is the performance of AF2 or AF2-Multimer on the three targets?

      We feel sorry for these confusions. The sequence identities of the VH domain and H3 loop are 0.816 and 0.647, respectively, comparing with the best template. The CDR-H3 lengths of these nanobodies are both 17. According to our classification strategy, these nanobodies belong to Sub1. The confidence scores of these AlphaFold2 predicted loops were all higher than 0.8, and these loops were accepted as the outputs of H3-OPT by CBM.

      3) Is AF2-Multimer better than AF2, when using the sequences of antibody VH and antigen as input?

      Thanks for your suggestions. Many papers have benchmarked AlphaFold2-Multimer for protein complex modeling and demonstrated the accuracy of AlphaFold2-Multimer on predicting the protein complex is far from satisfactory (Benchmarking AlphaFold for protein complex modeling reveals accuracy determinants, Rui Yin, et al., 2022). Additionally, there is no significantly difference between AlphaFold2 and AlphaFold2-Multimer on antibody modeling (Structural Modeling of Nanobodies: A Benchmark of State-of-the-Art Artificial Intelligence Programs, Mario S. Valdés-Tresanco, et al., 2023)

      From the data perspective, we employed a non-redundant dataset for training and validation. Since these structures are valuable, considering the antigen sequence would reduce the size of our dataset, potentially leading to underfitting.

      4) For H3 loop grafting, I noticed that only identical target and template H3 sequences can trigger grafting (lines 348-349). How many such cases are in the test set?

      We appreciate your comment from this perspective. There are thirty targets in our database with identical CDR-H3 templates.

      Reviewer #2 (Recommendations For The Authors):

      • It is not clear to me whether the three structures apparently used as experimental confirmation of the predictions have been determined previously in this study or not. This is a key aspect, as a retrospective validation does not have the same conceptual value as a prospective, a posteriori validation. Please note that different parts of the text suggest different things in this regard "The model was validated by experimentally solving three structures of anti-VEGF nanobodies predicted by H3-OPT" is not exactly the same as "we then sought to validate H3-OPT using three experimentally determined structures of anti-VEGF nanobodies, including a wild-type (WT) and two mutant (Mut1 and Mut2) structures, that were recently deposited in protein data bank". The authors are kindly advised to make this point clear. By the way, "protein data bank" should be in upper case letters.

      We gratefully thank you for your feedback and fully understand your concerns. To validate the performance of H3-OPT, we initially solved the structures of both the wild-type and mutants of anti-VEGF nanobodies and submitted these structures to Protein Data Bank. We have corrected “that were recently deposited in protein data bank” into “that were recently deposited in Protein Data Bank” in our revised manuscript.

      • It would be good to clarify the goal and importance of the binding affinity prediction, as it seems a bit disconnected from the rest of the paper. Also, it would be good to include the production MD runs as Sup, Mat.

      Thanks for your valuable comment. We have added the following sentence in our manuscript to clarify the goal and importance of the molecular dynamics calculations: “Since affinity prediction plays a crucial role in antibody therapeutics engineering, we performed MD simulations to compare the differences in binding affinities between AF2-predicted complexes and H3-OPT-predicted complexes.”. The details of production runs have been described in Method section.

      • Has any statistical test been performed to compare the mean Cα-RMSD values across the modeling approaches included in the benchmark exercise?

      Thanks for this kind recommendation. We conducted a statistical test to assess the performance of different modeling approaches and demonstrated significant improvements with H3-OPT compared to other methods (p<0.001). Additionally, we have trained H3-OPT with five random seeds and compared mean Cα-RMSD values with all five models of AF2. Here, we showed the average Cα-RMSDs of H3-OPT and AlphaFold2.

      Author response table 1.

      • In Fig. 2c-f, I think it would be adequate to make the ordering criterion of the data points explicit in the caption or the graph itself.

      We appreciate your comment and suggestion. We have revised the graph in the manuscript accordingly.

      Author response image 5.

      • Please revise Figure S2 caption and/or its content. It is not clear, in parts b and c, which is the performance of H3-OPT. Why weren´t some other antibody-specific tools such as IgFold included in this comparison?

      Thanks for your comments. The performance of H3-OPT is not included in Figure S2. Prior to training H3-OPT, we conducted several preliminary studies, and the detailed results are available in the supplementary sections. We showed that AlphaFold2 outperformed other methods (including AI-based methods and TBM methods) and produced sub-angstrom predictions in framework regions. The comparison of IgFold with other methods was discussed in a previous work (Fast, accurate antibody structure prediction from deep learning on massive set of natural antibodies, Ruffolo, J. A. et al, 2023). In that study, we found that IgFold largely yielded results comparable to AlphaFold2 but with lower prediction cost. Additionally, we have also conducted a detailed comparison of CDR-H3 loops with IgFold in our main text.

      • It is stated that "The relative binding affinities of the antigen-antibody complexes were evaluated using the Python script...". Which Python script?

      Thank you for your comments, and I apologize for the confusion. This python script is a module of AMBER software, we have corrected “The relative binding affinities of the antigen-antibody complexes were evaluated using the python script” into “The relative binding affinities of the antigen-antibody complexes were evaluated using the MMPBSA module of AMBER software”.

      Reviewer #3 (Recommendations For The Authors):

      Does H3-OPT improve the AF2 score on the CDR-H3? It would be interesting to see whether grafted and PSPM loops improve the pLDDT score by using for example AF2Rank [https://doi.org/10.1103/PhysRevLett.129.238101]. That could also be a way to include a confidence score into H3-OPT.

      We are so grateful for your kind question. H3-OPT could not provide a confidence score for output in current version, so we did not know whether H3-OPT improve the AF2 score or not.

      We appreciate your kind recommendations and have calculated the pLDDT scores of all models predicted by H3-OPT and AF2 using AF2Rank. We showed that the average of pLDDT scores of different predicted models did not match the results of Cα-RMSD values.

      Author response table 3.

      Therefore, we have trained a separate module to predict the confidence score of the optimized CDR-H3 loops. We hope that this module can provide users with reliable guidance on whether to use predicted CDR-H3 loops.

      The test case of Nb PDB id. 8CWU is an interesting example where AF2 outperforms H3-OPT and PLMs. The top AF2 model according to ColabFold (using default options and no template [https://doi.org/10.1038/s41592-022-01488-1]) shows a remarkably good model of the CDR-H3, explaining the low Ca-RMSD in the Extended Data Fig. 3. However, the pLDDT score of the 4 tip residues (out of 12), forming the hairpin of the CDR-H3 loop, pushes down the average value bellow the CBM cut-off of 80. I wonder if there is a lesson to learn from that test case. How sensible is H3-OPT to the CBM cut-off definition? Have the authors tried weighting the residue pLDDT score by some structural criteria before averaging? I guess AF2 may have less confidence in hydrophobic tip residues in exposed loops as the solvent context may not provide enough support for the pLDDT score.

      Thanks for your valuable feedback. We showed the average Cα-RMSDs of our test set under different confidence cutoffs and the results have been added in the text accordingly.

      Author response table 4.

      We greatly appreciate your comment on this perspective. Inspired on your kind suggestions, we will explore the relationship between cutoff values and structural information in related work. Your feedback is highly valuable as it will contribute to the development of our approach.

      A comparison against the new folding prediction method OmegaFold [https://doi.org/10.1101/2022.07.21.500999] is missed. OmegaFold seems to outperform AF2, ESM, and IgFold among others in predicting the CDR-H3 loop conformation (See [https://doi.org/10.3390/molecules28103991] and [https://doi.org/10.1101/2022.07.21.500999]). Indeed, prediction of anti-VEGF Nb structure (PDB WT_QF_0329, chain B in supplementary data) by OmegaFold as implemented in ColabFold [https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/beta/omegafold.ipynb] and setting 10 cycles, renders Ca-RMSD 1.472 Å for CDR-H3 (residues 98-115).

      We appreciate your valuable suggestion. We have added the comparison against OmegaFold in our manuscript. The results have been included in the manuscript (Fig 4a-b).

      Author response image 6.

      In our test set, OmegaFold outperformed ESMFold in predicting the CDR-H3 loop conformation. However, it failed to match the accuracy of AF2, IgFold, and H3-OPT. We discussed the difference between MSA-based methods (such as AlphaFold2) and MSA-free methods (such as IgFold) in predicting CDR-H3 loops. Similarly, OmegaFold provided comparative results with HelixFold-Single and other MSA-free methods but still failed to match the accuracy of AlphaFold2 and H3-OPT on Sub1.

      The time-consuming step in H3-OPT is the AF2 prediction. However, most of the time is spent in modeling the mAb and Nb scaffolds, which are already very well predicted by PLMs (See Fig. 4 in [https://doi.org/10.3390/molecules28103991]). Hence, why not use e.g. OmegaFold as the first step, whose score also correlates to the RMSD values [https://doi.org/10.3390/molecules28103991]? If that fails, then use AF2 or grafting. Alternatively, use a PLM model to generate a template, remove/mask the CDR loops (at least CDR-H3), and pass it as a template to AF2 to optimize the structure with or without MSA (e.g. using AF2Rank).

      Thanks for your professional feedbacks. It is really true that the speed of MSA searching limited the application of high-throughput structure prediction. Previous studies have demonstrated that the deep learning methods performed well on framework residues. We once tried to directly predict the conformations of CDR-H3 loops using PLM-based methods, but this initial version of H3-OPT lacking the CBM could not replicate the accuracy of AF2 in Sub1. Similarly, we showed that IgFold and OmegaFold also provide lower accuracy in Sub1 (average Cα-RMSD is 1.71 Å and 1.83 Å, respectively, whereas AF2 predicted an average of 1.07 Å). Therefore, The predictions of AlphaFold2 not only produce scaffolds but also provide the highest quality of CDR-H3 loops when high-resolution templates and MSA are available.

      Thank you once again for your kind recommendation. In the current version of H3-OPT, we have highlighted the strengths of H3-OPT in combining the AF2 and PLM models in various scenarios. AF2 can provide accurate predictions for short loops with fewer than 10 amino acids, and PLM-based models show little or no improvement in such cases. In the next version of H3-OPT, as the first step, we plan to replace the AF2 models with other methods if any accurate MSA-free method becomes available in the future.

      Line 115: The statement "IgFold provided higher accuracy in Sub3" is not supported by Fig. 2a.

      We are sorry for our carelessness. We have corrected “IgFold provided higher accuracy in Sub3” into “IgFold provided higher accuracy in Sub3 (Fig. 3a)”.

      Lines 195-203: What is the statistical significance of results in Fig 5a and 5b?

      Thank you for your kind comments. The surface residues of AF2 models are significantly higher than those of H3-OPT models (p < 0.005). In Fig. 5b, H3-OPT models predicted lower values than AF2 models in terms of various surface properties, including polarity (p <0.05) and hydrophilicity (p < 0.001).

      Lines 212-213: It is not easy to compare and quantify the differences between electrostatic maps in Fig. 5d. Showing a Dmap (e.g. mapmodel - mapexperiment) would be a better option. Additionally, there is no methodological description of how the maps were generated nor the scale of the represented potential.

      Thank you for pointing this out. We have modified the figure (Fig. 5d) according to your kind recommendation and added following sentences to clarify the methodological description on the surface electrostatic potential:

      “Analysis of surface electrostatic potential

      We generated two-dimensional projections of CDR-H3 loop’s surface electrostatic potential using SURFMAP v2.0.0 (based on GitHub from February 2023: commit: e0d51a10debc96775468912ccd8de01e239d1900) with default parameters. The 2D surface maps were calculated by subtracting the surface projection of H3-OPT or AF2 predicted H3 loops to their native structures.”

      Author response image 7.

      Lines 237-240 and Table 2: What is the meaning of comparing the average free energy of the whole set? Why free energies should be comparable among test cases? I think the correct way is to compare the mean pair-to-pair difference to the experimental structure. Similarly, reporting a precision in the order of 0.01 kcal/mol seems too precise for the used methodology, what is the statistical significance of the results? Were sampling issues accounted for by performing replicates or longer MDs?

      Thanks for your rigorous advice and pointing out these issues. We have modified the comparisons of free energies of different predicted methods and corrected the precision of these results. The average binding free energies of H3-OPT complexes is lower than AF2 predicted complexes, but there is no significant difference between these energies (p >0.05).

      Author response table 4.

      Comparison of binding affinities obtained from MD simulations using AF2 and H3-OPT.

      Thanks for your comments on this perspective. Longer MD simulations often achieve better convergence for the average behavior of the system, while replicates provide insights into the variability and robustness of the results. In our manuscript, each MD simulation had a length of 100 nanoseconds, with the initial 90 nanoseconds dedicated to achieving system equilibrium, which was verified by monitoring RMSD (Root Mean Square Deviation). The remaining 10 nanoseconds of each simulation were used for the calculation of free energy. This approach allowed us to balance the need for extensive sampling with the verification of system stability.

      Regarding MD simulations for CDR-H3 refinement, its successful application highly depends on the starting conformation, the force field, and the sampling strategy [https://doi.org/10.1021/acs.jctc.1c00341]. In particular, the applied plan MD seems a very limited strategy (there is not much information about the simulated times in the supplementary material). Similarly, local structure optimizations with QM methods are not expected to improve a starting conformation that is far from the experimental conformation.

      Thank you very much for your valuable feedback. We fully agree with your insights regarding the limitations of MD simulations. Before training H3-OPT, we showed the challenge of accurately predicting CDR-H3 structures. We then tried to optimize the CDR-H3 loops by computational tools, such as MD simulations and QM methods (detailed information of MD simulations is provided in the main text). Unfortunately, these methods were not expected to improve the accuracy of AF2 predicted CDR-H3 loops. These results showed that MD simulations and QM methods not only are time-consuming, but also failed to optimize the CDR-H3 loops. Therefore, we developed H3-OPT to tackle these issues and improve the accuracy of CDR3-H3 for the development of antibody therapeutics.

      Text improvements

      Relevant statistical and methodological parameters are presented in a dispersed manner throughout the text. For example, the number of structures in test, training, and validation datasets is first presented in the caption of Fig. 4. Similarly, the sequence identity % to define redundancy is defined in the caption of Fig. 1a instead of lines 87-88, where authors define "we constructed a non-redundant dataset with 1286 high-resolution (<2.5 Å)". Is the sequence redundancy for the CDR-H3 or the whole mAb/Nb?

      Thank you for pointing out these issues. We have added the number of structures in each subgroup in the caption of Fig. 1a: “Clustering of the filtered, high-resolution structures yielded three datasets for training (n = 1021), validation (n = 134), and testing (n = 131).” and corrected “As data quality has large effects on prediction accuracy, we constructed a non-redundant dataset with 1286 high-resolution (<2.5 Å) antibody structures from SAbDab” into “As data quality has large effects on prediction accuracy, we constructed a non-redundant dataset (sequence identity < 0.8) with 1286 high-resolution (<2.5 Å) antibody structures from SAbDab” in the revised manuscript. The sequence redundancy applies to the whole mAb/Nb.

      The description of ablation studies is not easy to follow. For example, what does removing TGM mean in practical terms (e.g. only AF2 is used, or PSPM is applied if AF2 score < 80)? Similarly, what does removing CBM mean in practical terms (e.g. all AF2 models are optimized by PSPM, and no grafting is done)? Thanks for your comments and suggestions. We have corrected “d, Differences in H3-OPT accuracy without the template module. e, Differences in H3-OPT accuracy without the CBM. f, Differences in H3-OPT accuracy without the TGM.” into “d, Differences in H3-OPT accuracy without the template module. This ablation study means only PSPM is used. e, Differences in H3-OPT accuracy without the CBM. This ablation study means input loop is optimized by TGM and PSPM. f, Differences in H3-OPT accuracy without the TGM. This ablation study means input loop is optimized by CBM and PSPM.”.

      Authors should report the values in the text using the same statistical descriptor that is used in the figures to help the analysis by the reader. For example, in lines 223-224 a precision score of 0.75 for H3-OPT is reported in the text (I assume this is the average value), while the median of ~0.85 is shown in Fig. 6a.

      Thank you for your careful checks. We have corrected “After identifying the contact residues of antigens by H3-OPT, we found that H3-OPT could substantially outperform AF2 (Fig. 6a), with a precision of 0.75 and accuracy of 0.94 compared to 0.66 precision and 0.92 accuracy of AF2.” into “After identifying the contact residues of antigens by H3-OPT, we found that H3-OPT could substantially outperform AF2 (Fig. 6a), with a median precision of 0.83 and accuracy of 0.97 compared to 0.64 precision and 0.95 accuracy of AF2.” in proper place of manuscript.

      Minor corrections

      Lines 91-94: What do length values mean? e.g. is 0-2 Å the RMSD from the experimental structure?

      We appreciate your comment and apologize for any confusion. The RMSD value is actually from experimental structure. The RMSD value evaluates the deviation of predicted CDR-H3 loop from native structure and also represents the degree of prediction difficulty in AlphaFold2 predictions. We have added following sentence in the proper place of the revised manuscript: “(RMSD, a measure of the difference between the predicted structure and an experimental or reference structure)”.

      Line 120: is the "AF2 confidence score" for the full-length or CDR-H3?

      We gratefully appreciate for your valuable comment and have corrected “Interestingly, we observed that AF2 confidence score shared a strong negative correlation with Cα-RMSDs (Pearson correlation coefficient =-0.67 (Fig. 2b)” into “Interestingly, we observed that AF2 confidence score of CDR-H3 shared a strong negative correlation with Cα-RMSDs (Pearson correlation coefficient =-0.67 (Fig. 2b)” in the revised manuscript.

      Line 166: Do authors mean "Taken" instead of "Token"?

      We are really sorry for our careless mistakes. Thank you for your reminder.

      Line 258: Reference to Fig. 1 seems wrong, do authors mean Fig. 4?

      We sincerely thank the reviewer for careful reading. As suggested by the reviewer, we have corrected the “Fig. 1” into “Fig. 4”.

      Author response image 7.

      Point out which plot corresponds to AF2 and which one to H3-OPT

      Thanks for pointing out this issue. We have added the legends of this figure in the proper positions in our manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Building upon their famous tool for the deconvolution of human transcriptomics data (EPIC), Gabriel et al. implemented a new methodology for the quantification of the cellular composition of samples profiled with Assay for Transposase-Accessible Chromatin sequencing (ATAC-Seq). To build a signature for ATAC-seq deconvolution, they first created a compendium of ATAC-seq data and derived chromatin accessibility marker peaks and reference profiles for 21 cell types, encompassing immune cells, endothelial cells, and fibroblasts. They then coupled this novel signature with the EPIC deconvolution framework based on constrained least-square regression to derive a dedicated tool called EPIC-ATAC. The method was then assessed using real and pseudo-bulk RNA-seq data from human peripheral blood mononuclear cells (PBMC) and, finally, applied to ATAC-seq data from breast cancer tumors to show it accurately quantifies their immune contexture.

      Strengths:

      Overall, the work is of very high quality. The proposed tool is timely; its implementation, characterization, and validation are based on rigorous methodologies and resulted in robust results. The newly-generated, validation data and the code are publicly available and well-documented. Therefore, I believe this work and the associated resources will greatly benefit the scientific community.

      Weaknesses:

      CA few aspects can be improved to clarify the value and applicability of the EPIC-ATAC and the transparency of the benchmarking analysis.

      (1) Most of the validation results in the main text assess the methods on all cell types together, by showing the correlation, RMSE, and scatterplots of the estimated vs. true cell fractions. This approach is valuable for showing the overall method performance and for detecting systematic biases and noisy estimates. However, it provides very limited insights regarding the capability of the methods to estimate the individual cell types, which is the ultimate aim of deconvolution analysis. This limitation is exacerbated for rare cell types, which could even have a negative correlation with the ground truth fractions, but not weigh much on the overall RMSE and correlation. I would suggest integrating into the main text and figures an in-depth assessment of the individual cell types. In particular, it should be shown and discussed which cell types can be accurately quantified and which ones are less reliable.

      We thank the reviewer for raising this important point. Discussing the accuracy of EPIC-ATAC in predicting individual cell-type proportions would indeed be valuable in the main text. We have updated the text as follows.

      In the first version of our manuscript, we had a section called “T cell subtypes quantification reveals the ATAC-Seq deconvolution limits for closely related cell types” which highlighted that EPIC-ATAC shows low performances when predicting the proportions of cell types that are closely related, e.g., CD4+ T cell or CD8+ T cell subtypes. The section is now named “Accuracy of ATAC-Seq deconvolution is determined by the abundance and specificity of each cell type” and has been expanded to discuss the accuracy of EPIC-ATAC predictions within each major cell type.

      To do so, we represented in Figure 5A the performances of EPIC-ATAC in each cell type present in the benchmarking datasets from Figures 3 and 4. Additionally, we have kept in the supplementary figures the details of the correlation values and RMSE values within each cell type and for each tool (Supplementary Figures 9 and 10). The following text has been added in the main text to describe these analyses:

      “Accuracy of ATAC-Seq deconvolution is determined by the abundance and specificity of each cell type

      To investigate the impact of cell type abundance on the accuracy of ATAC-Seq deconvolution, we evaluated EPIC-ATAC predictions in each major cell type separately in the different benchmarking datasets (Figure 5A). NK cells, endothelial cells, neutrophils or dendritic cells showed lower correlation values. These values can be explained by the fact that these cell types are low-abundant in our benchmarking datasets (Figure 5A). For the endothelial cells and dendritic cells, the RMSE values associated to these cell types remain low. This suggests that while the predictions of EPIC-ATAC might not be precise enough to compare these cell-type proportions between different samples, the cell-type quantification within each sample is reliable. For the NK cells and the neutrophils, we observed more variability with higher RMSE values in some datasets which suggests that the markers and profiles for these cell types might be improved. Supplementary Figures 9 and 10 detail the performances of each tool when considering each cell type separately in the PBMC and the cancer datasets. As for EPIC-ATAC, the predictions from the other deconvolution tools are more reliable for the frequent cell types.”

      (2) In the benchmarking analysis, EPIC-ATAC is compared to several deconvolution methods, most of which were originally developed for transcriptomics data. This comparison is not completely fair unless their peculiarities and the limitations of tweaking them to work with ATAC-seq data are discussed. For instance, some methods (including the original EPIC) correct for cell-type-specific mRNA bias, which is not present in ATAC-seq data and might, thus, result in systematic errors.

      We thank the reviewer for this comment and have updated the results and methods sections as follows:

      We provide in the Materials and methods section, the paragraph “Benchmarking of the EPIC-ATAC framework against other existing deconvolution tools” which describes how each tool included in the benchmark was used in the ATAC-Seq context. We have added a reference to this section in the main text when introducing the first benchmarking analysis.

      For each tool, the main changes consisted in: (i) replacing the initial RNA-Seq profiles and markers by the EPIC-ATAC reference profiles and markers and (ii) providing as input a bulk ATAC-Seq dataset with matched ATAC-Seq features (the same approach as the one used in EPIC-ATAC was considered, see answer to the next comment). Having reference profiles/markers and an ATAC-Seq bulk query with matched features was the only requirement of the different deconvolution models to be able to run on ATAC-Seq data with the default methods parameters, except for quanTIseq. Indeed, this method, like EPIC, corrects its estimations for cell-type-specific mRNA content bias. We have disabled this option for the bulk ATAC-Seq deconvolution.

      We can however not exclude that a hyper parametrization of each tool could have helped to improve their current performances. Also, for RNA-Seq data deconvolution, some of the methods followed specific features filtering, e.g., the quanTIseq framework removes a manually curated list of noisy genes as well as aberrant immune genes identified in the TCGA data and ABIS uses immune-specific housekeeping genes. We can hypothesize that additional filtering could be explored for the ATAC-Seq deconvolution to improve the performance of the tools.

      We have clarified these points in the results section when introducing the benchmarking, in the methods and in the discussion section.

      (3) On a similar note, it could be made more explicit which adaptations were introduced in EPIC, besides the ad-hoc ATAC-seq signature, to make it applicable to this type of data.

      In the first version of the manuscript, we described the changes brought to EPIC to perform bulk ATAC-Seq deconvolution in the Material and methods section in the paragraph “Running EPIC-ATAC on bulk ATAC-Seq data”.  We have moved and completed this paragraph in the results section before the description of the evaluation of EPIC-ATAC in different datasets. The paragraph is the following:

      “EPIC-ATAC integrates the marker peaks and profiles into EPIC to perform bulk ATAC-Seq data deconvolution

      The cell-type specific marker peaks and profiles derived from the reference samples were integrated into the EPIC deconvolution tool (Racle et al., 2017; Racle and Gfeller, 2020). We will refer to this ATAC-Seq deconvolution framework as EPIC-ATAC. To ensure the compatibility of any input bulk ATAC-Seq dataset with the EPIC-ATAC marker peaks and reference profiles, we provide an option to lift over hg19 datasets to hg38 (using the liftOver R package) as the reference profiles are based on the hg38 reference genome. Subsequently, the features of the input bulk matrix are matched to our reference profiles’ features. To match both sets of features, we determine for each peak of the input bulk matrix the distance to the nearest peak in the reference profiles peaks. Overlapping regions are retained and the feature IDs are matched to their associated nearest peaks. If multiple features are matched to the same reference peak, the counts are summed. Before the estimation of the cell-type proportions, we transform the data following an approach similar to the transcripts per million (TPM) transformation which has been shown to be appropriate to estimate cell fractions from bulk mixtures in RNA-Seq data (Racle et al., 2017; Sturm et al., 2019). We normalize the ATAC-Seq counts by dividing counts by the peak lengths as well as samples depth and rescaling counts so that the counts of each sample sum to 106. In RNA-Seq based deconvolution, EPIC uses an estimation of the amount of mRNA in each reference cell type to derive cell proportions while correcting for cell-type-specific mRNA bias. For the ATAC-Seq based deconvolution these values were set to 1 to give similar weights to all cell-types quantifications. Indeed ATAC-Seq measures signal at the DNA level, hence the quantity of DNA within each reference cell type is similar.”

      (4) Given that the final applicability of EPIC-ATAC is on real bulk RNA-seq data, whose characteristics might not be completely recapitulated by pseudo-bulk samples, it would be interesting to see EPIC and EPIC-ATAC compared on a dataset with matched, real bulk RNA-seq and ATAC-seq, respectively. It would nicely complement the analysis of Figure 7 and could be used to dissect the commonalities and peculiarities of these two approaches.

      We thank the reviewer for raising this important point. EPIC-ATAC will be applied to real bulk ATAC-Seq data and pseudobulk data cannot indeed fully recapitulate the bulk signals.  Recently, a dataset composed of more than 100 samples with matched bulk RNA-Seq, bulk ATAC-Seq as well as matched flow cytometry data has been published by Morandini and colleagues in GeroScience in November 2023. We thus retrieved these data to compare the predictions obtained by EPIC-ATAC on the bulk ATAC-Seq data and the predictions of the original version of EPIC on the bulk RNA-Seq data to the cell-type quantification obtained by flow cytometry. We also assessed whether both modalities could be complementary using a simple approach averaging the predictions obtained from both modalities. The results of these analyzes have been summarized in the Figure 7C and are described in the main text in the last paragraph of the paper:

      “We compared the predictions obtained using each modality to the flow cytometry cell-type quantifications. EPIC-ATAC predictions were better correlated with the flow cytometry measures for some cell types (e.g., CD8+, CD4+ T cells, NK cells) while this trend was observed with the EPIC-RNA predictions in other cell types (B cells, neutrophils, monocytes) (Figure 7C). We then tested whether the predictions obtained from both modalities could be combined to improve the accuracy of each cell-type quantification. Averaging the predictions obtained from both modalities shows a moderate improvement (Figure 7C), suggesting that the two modalities can complement each other.”

      Reviewer #2 (Public Review):

      Summary:

      The manuscript expands the current bulk sequencing data deconvolution toolkit to include ATAC-seq. The EPIC-ATAC tool successfully predicts accurate proportions of immune cells in bulk tumour samples and EPIC-ATAC seems to perform well in benchmarking analyses. The authors achieve their aim of developing a new bulk ATAC-seq deconvolution tool.

      Strengths:

      The manuscript describes simple and understandable experiments to demonstrate the accuracy of EPIC-ATAC. They have also been incredibly thorough with their reference dataset collections. The authors have been robust in their benchmarking endeavours and measured EPIC-ATAC against multiple datasets and tools.

      Weaknesses:

      Currently, the tool has a narrow applicability in that it estimates the percentage of immune cells in a bulk ATAC-seq experiment.

      Comments:

      (1) Has any benchmarking been done on the runtime of the tool? Although EPIC-ATAC seems to "win" in benchmarking metrics, sometimes the differences are quite small. If EPIC-ATAC takes forever to run, compared to another tool that is a lot quicker, might some people prefer to sacrifice 0.01 in correlation for a quicker running tool?

      We thank the reviewer for raising this point that was not addressed in the manuscript. We have added a supplementary figure (Supplementary Figure 8) which represents the CPU time used by each tool. The figure shows that all the tools could be run in less than 20 seconds in average. This figure has been mentioned at the end of the benchmarking paragraphs.

      (2) In Figure 3B the data points look a bit squashed in the bottom-left corner. Could the plot be replotted with the data point spread out? There also seems to be some inter-patient variability. Could the authors comment on that?

      We have updated Figure 3B to increase the visibility of the dots in the bottom-left corner. To do so, we have limited the x and y axes to the maximum of the predicted proportions for the y axis and true proportions for the x axis.

      We also acknowledge that the accuracy of the predictions varies across samples. In particular, one sample (Sample4, star shape on Figure 3B) exhibits larger discrepancies between EPIC-ATAC predictions and the ground truth. To understand the lower performance, we have visualized our marker peaks in the five PBMC samples (Figure below). Based on this visualization, we can see that Sample4 might be an outlier sample considering that its cellular composition is similar to that of Sample2 and Sample5, however this sample shows particularly high ATAC-Seq accessibility at the monocytes and dendritic markers. This can explain why EPIC-ATAC overestimates the proportions of the two populations in this case. We have added the previously mentioned figures as a Supplementary Figure (Supplementary Figure 2) and have described it in the results section in the paragraph “EPIC-ATAC accurately estimates immune cell fractions in PBMC ATAC-Seq samples”.

      (3) Could the authors comment on the possibility of expanding EPIC-ATAC into more than a percentage prediction tool? Perhaps EPIC-ATAC could remove the immune cell signal from the bulk ATAC-seq data to "purify" the uncharacterised cells in silico, or generate pseudo-ATAC-seq tracks of the identified cell types.

      We thank the reviewer for this interesting question. As suggested by the reviewer, one approach to purify bulk genomics data using the cell-type proportions estimated by a cell-type deconvolution tool is to subtract the weighted sum of the signal observed in the reference data, weights corresponding to the predicted proportions. We used this approach on the EPIC-ATAC predictions obtained from pseudobulks built from scATAC-Seq data from diverse cancer types coming from the Human Tumor Atlas Network (HTAN) (See also the answer of the first recommendation of Reviewer 1). This dataset allows us to compare for a relatively large number of samples (a maximum of 25  samples in a cancer type cohort) the purified signal to the true signal derived from the single-cell data. The results are presented in the figure below which shows that the correlations between the predicted and true signals are relatively good in most of the cancer types (blue boxplots). However, these correlation levels are lower than the ones obtained when comparing the signal obtained from the entire pseudobulk (red boxplots) with the true signal. This suggests that this purification approach leads to a signal that is less precise and accurate than the signal resulting from all cells mixtures.

      Author response image 1.

      Boxplots of the correlation values obtained from the comparison of the bulk signal and the ground truth signal from the uncharacterized cells in each sample (red) and from the comparison of the predicted signal and the ground truth signal from the uncharacterized cells in each sample (blue).

      Also, note that in our simple approach, negative values can be obtained. The predicted signal will thus be difficult to interpret and to use in downstream analyses. Methods claiming to perform purification of bulk samples use more complex and dedicated algorithms. For example, Symphony (Burdziak et al., 2019) (cited in our introduction) uses single-cell RNA-Seq data in addition to the bulk chromatin accessibility data to infer cluster-specific accessibility profiles. Considering that EPIC was not designed for purification purposes, we decided not to include this analysis in the updated version of the manuscript.

      Recommendations For The Authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) The original EPIC had two different signatures for application to blood or tumor RNA-seq. It is not clear instead if EPIC-ATAC applies with the same signature and framework to any tissue and disease context. This aspect should be clarified in the text.

      We thank the reviewer for raising this point which was not clear in the previous version of the manuscript. As in the original version of EPIC, in EPIC-ATAC two reference profiles and sets of markers are available, the PBMC reference and the TME reference. We used the PBMC reference profiles and markers to deconvolve the PBMC samples and the TME reference profiles and markers to deconvolve the cancer samples. We have clarified this point in the result section of the main text in the paragraph “ATAC-Seq data from sorted cell populations reveal cell-type specific marker peaks and reference profiles” as follows (added text underlined):

      “The resulting marker peaks specific only to the immune cell types were considered for the deconvolution of PBMC samples (PBMC markers). For the deconvolution of tumor bulk samples, the lists of marker peaks specific to fibroblasts and endothelial cells were added to the PBMC markers. This extended set of markers was further refined based on the correlation patterns of the markers in tumor bulk samples from the diverse solid cancer types from The Cancer Genome Atlas (TCGA) (Corces et al., 2018), i.e., markers exhibiting the highest correlation patterns in the tumor bulk samples were selected using the findCorrelation function from the caret R package (Kuhn, 2008) (Figure 1, box 4, see the Material and methods, section 2). The latter filtering ensures the relevance of the markers in the TME context since cell-type specific TME markers are expected to be correlated in tumor bulk ATAC-Seq measurements (Qiu et al., 2021). 716 markers of immune, fibroblasts and endothelial cell types remained after the last filtering (defined as TME markers). Considering the difference in cell types and the different filtering steps applied on the PBMC and TME markers, we recommend to use the TME markers and profiles to deconvolve bulk samples from tumor samples and the PBMC markers and profiles to deconvolve PBMC samples.”

      We also note that when running EPIC-ATAC using the PBMC markers and the TME markers independently to perform the deconvolution of the cancer datasets, we see that overall the use of the TME markers leads to a better performance (Figure below).

      Figure legend: Correlation and RMSE values obtained when running EPIC-ATAC on each cancer dataset (points) using the PBMC (red) and the TME (blue) markers.

      To demonstrate that the TME markers can be applied to different cancer types, we have completed the evaluation of EPIC-ATAC on tumor samples by considering an additional dataset: the Human Tumor Atlas Network (HTAN) single-cell multiomic (scRNA-Seq and scATAC-Seq) dataset. We have processed this dataset and built scATAC-Seq pseudobulks for 7 cancer types on which EPIC-ATAC was applied to. This analysis has been summarized in Figure 4 and Supplementary Figure 4 and shows that EPIC-ATAC is applicable in a diverse set of tissues.

      (2) EPIC and EPIC-ATAC have a valuable feature, which is absent from most deconvolution methods: the estimation of unknown content. It would be informative for the users to understand from the benchmarking analysis whether this feature gives an advantage to EPIC-ATAC with respect to the other approaches.

      Indeed, among the tools that we included in our benchmarking analysis, only EPIC-ATAC and quanTIseq enable users to predict the proportions of cells that are not present in the reference profiles, i.e., the uncharacterized cells. For the other tools we thus fixed the estimated proportions of uncharacterized cells to 0. This approach provides a clear and significant advantage to EPIC-ATAC and to quanTIseq. For this reason, we also provide a version of the benchmarking in which we exclude the uncharacterized cells and rescale the true and estimated cell-type proportions to sum to 1. In this second benchmarking approach, EPIC-ATAC still outperforms some of the other deconvolution tools.

      We have clarified this point in the results section, in the paragraph “EPIC-ATAC accurately predicts fractions of cancer and non-malignant cells in tumor samples”.

      (3) The selection of the most discriminative markers is very well described in the text and beautifully illustrated in Figure 2. However, it is unclear why UMAP plots are used to represent cell-type similarities and dissimilarities. Would a linear dimensionality reduction approach like PCA be already sufficient to show these groups, especially considering the not-so-extreme dimensionality of the underlying data? In addition, a statistic that could be also considered to compare clusters to the cell type labels in the two scenarios is the Adjusted Rand Index (ARI).

      We thank the reviewer for this relevant comment. We initially used UMAP to facilitate the visualization of the different cell-type groups. However, it is true that the three first axes of the principal component analyses performed based on each set of marker peaks already capture most of the structure in the data and that the use of UMAP can lead to an artificial enhancement of separation between the different groups of cells. We have updated Figure 2B by replacing the UMAP scatter plots by 3D representations of the first three principal components of the PCA and have added in Supplementary Figure 1B the pairwise scatter plots of these first 3 principal components. On the main figures, we have also added the ARI metric comparing the cell-type annotation and the clustering obtained using the first 10 axes of the PCA and model based clustering.

      (4) In the introduction, it is stated that "the reasonable cost and technical advantages of these protocols foreshadow an increased usage of ATAC-Seq in cancer studies". I would suggest adding a reference to justify this trend. Also, it should be discussed how ATAC-seq deconvolution compares to other types of deconvolution approaches applied to cheaper epigenetic data like methylation one (e.g. epidish, methylcc, tca, minfi).

      We have complemented this sentence with two references to justify the assertion: (i) a review published by Luo, Gribskov and Wang in 2022 showing the increasing number of ATAC-Seq studies in the field of cancer research, and (ii) a protocol paper from Grandi et al. published in 2022 on the state-of-the-art Omni-ATAC protocol for ATAC-sequencing which discusses the broad applicability and the technical advantages of ATAC-sequencing. Also in the preceding sentence, a recent ATAC-Seq protocol that can be applied to FFPE samples has been mentioned, FFPE samples being the most common samples in clinical cancer research.

      We agree with the reviewer on the fact that other epigenetic assays such as methylation assays are cost effective. However, ATAC-sequencing provides additional information on the epigenetic landscape of a sample’s genome and some questions regarding regulatory regions and transcription factor activity cannot be answered with methylation data. Methods that can be applied on ATAC-Seq data specifically are thus needed. Most of the cell-type deconvolution algorithms existing so far are applicable on RNA-Seq or methylation data. These algorithms often use similar methodological concepts, e.g., linear combination of the reference profiles for reference-based methods, which could be used in different modalities. However, methylation-based deconvolution tools often take as input a data format that is specific to methylation data, e.g., two color micro array data (RGChannelSet R object) for the minfi deconvolution function (estimatesCellCounts) or leverage methylation-specific information to perform the deconvolution. For example, methylCC uses a model based on latent variables representing a binarized measures of the methylation status of cell-type specific regions (1 or 0 for clearly methylated or unmethylated regions). Such methods are more difficult to adapt than tools  based on RNA-Seq data where the signal is quantified using read counts similarly to ATAC-Seq data.

      Nevertheless, some methods such as EPIdish or MethylCIBERSORT have proposed new methylation reference profiles and have used existing models that are not specific to methylation data to deconvolve the bulk data. In our work, we followed a similar approach where we propose new reference profiles specific to chromatin accessibility data, integrate them to an existing method EPIC as well as test them in other existing tools. Note that methylation reference profiles cannot be directly used for ATAC-Seq data deconvolution considering that methylation measures methylation status at CpG sites (dinucleotides) and ATAC-Seq measures the accessibility of regions of hundreds base pairs.

      An analysis comparing the performance of methylation-based deconvolution and ATAC-Seq based deconvolution would be informative. However, such analysis is beyond the scope of our paper considering that none of the datasets used for our benchmarking provide these two modalities for the same samples.

      In the manuscript, we have completed the references associated to the methylation-based deconvolution tools with the ones mentioned in the previous paragraphs and by the reviewer and have completed the discussion as follows:

      “The comparison of EPIC-ATAC applied on ATAC-Seq data with EPIC applied on RNA-Seq data has shown that both modalities led to similar performances and that they could complement each other. Another modality that has been frequently used in the context of bulk sample deconvolution is methylation. Methylation profiling techniques such as methylation arrays are cost effective (Kaur et al., 2023) and DNA methylation signal is highly cell-type specific (Kaur et al., 2023; Loyfer et al., 2023). Considering that methylation and chromatin accessibility measure different features of the epigenome, additional analyses comparing and/or complementing ATAC-seq based deconvolution with methylation-based deconvolution could be of interest as future datasets profiling both modalities in the same samples become available.”

      (5) In the Results section, some methodological steps could be phrased in a bit more extensive way to let the reader understand the rationale and the actual approach. I recognize there is also a reference to the Methods section, where all methodologies are reported in detail, but some of the sentences are hard to understand due to their synthetic format, e.g.: "markers with potential residual accessibility in human tissues were then filtered out".

      We thank the reviewer for this comment and we have followed his recommendation to expand sentences with a synthetic format. Text changes and additions are underlined below:

      “To limit batch effects, the collected samples were homogeneously processed from read alignment to peak calling. For each cell type, we derived a set of stable peaks observed across samples and studies, i.e. for each study, peaks detected in at least half of the samples were considered, and for each cell type, only peaks detected jointly in all studies were kept (see Materials and Methods, section 1).”

      “To filter out markers that could be accessible in other human cell-types than those included in our reference profiles, we used the human atlas study (K. Zhang et al., 2021), which identified modules of open chromatin regions accessible in a comprehensive set of human tissues, and we excluded from our marker list the markers overlapping these modules (Figure 1, box 3, see Materials and Methods section 2).”

      “For the deconvolution of tumor bulk samples, the lists of marker peaks specific to fibroblasts and endothelial cells were added to the PBMC markers. This extended set of markers was further refined based on the correlation patterns of the markers in tumor bulk samples from the diverse solid cancer types from The Cancer Genome Atlas (TCGA) (Corces et al., 2018), i.e., markers exhibiting the highest correlation patterns in the tumor bulk samples were selected using the findCorrelation function from the caret R package (Kuhn, 2008)  (Figure 1, box 4, see the Material and methods, section 2).”

      Also, following the comments and recommendations of the Reviewer 1, we have: (i) moved the method section describing the adaptation of EPIC to ATACseq data to provide more details in the results section (see answer to the third comment of Reviewer 1), (ii) clarified how the existing tools used in the benchmarking analyses were adapted for ATAC-Seq deconvolution (see answer to the second comment of Reviewer 1), and (iii) detailed how the comparison between our estimations of the infiltration levels in the samples from Kumegawa et al. and the estimations from the original study was performed (see answer to the seventh recommendation of Reviewer 1).

      (6) In the main text, it is stated that "the list of markers was further refined based on the correlation patterns of the markers in tumor bulk samples from diverse cancer types from The Cancer Genome Atlas". It should be clarified if these are only solid cancers, or if blood cancers were also used.

      We have considered only the solid cancers and have clarified this point in the results section: “This extended set of markers was further refined based on the correlation patterns of the markers in tumor bulk samples from the diverse solid cancer types from The Cancer Genome Atlas”.

      (7) When reporting that "these predictions are consistent with the infiltration level estimations reported in the original publication", it should be mentioned how the infiltration levels were quantified in this publication and how this agreement was quantified. This would be important also to claim in the abstract that "EPIC-ATAC accurately infers the immune contexture of the main breast cancer subtypes".

      We thank the reviewer for this comment, we acknowledge that the agreement between the EPIC-ATAC predictions and the infiltration levels quantified in the original publication should be further described in the paper. We have expanded the text in the results section in the paragraph “EPIC-ATAC accurately infers the immune contexture in a bulk ATAC-Seq breast cancer cohort” to clarify this point. Additionally, we have added a panel in Figure 6 (panel A) which shows a good agreement between EPIC-ATAC predictions and the metric used in the original paper to evaluate the infiltration levels of different cell types.

      The added text is underlined below:

      “We applied EPIC-ATAC to a breast cancer cohort of 42 breast ATAC-Seq samples including samples from two breast cancer subtypes, i.e., 35 oestrogen receptor (ER)-positive human epidermal growth factor receptor 2 (HER2)-negative (ER+/HER2-) samples and 7 triple negative (TNBC) tumors (Kumegawa et al., 2023). No cell sorting was performed in parallel to the chromatin accessibility sequencing. For this reason, the authors used a set of cell-type-specific cis-regulatory elements (CREs) identified in scATAC-Seq data from similar breast cancer samples (Kumegawa et al., 2022) and estimated the amount of infiltration of each cell type by averaging the ATAC-Seq signal of each set of cell-type-specific CREs in their samples. We used EPIC-ATAC to estimate the proportions of different cell types of the TME. These predictions were then compared to the metric used by Kumegawa and colleagues in their study to infer levels of infiltration. A high correlation between the two metrics was observed for each cell type (Pearson’s correlation coefficient from 0.5 for myeloid cells to 0.94 for T cells, Figure 6A).”  

      (8) It should be made explicit if EPIC-ATAC quantifies mDC, pDC, or their sum.

      In our collection of reference ATAC-Seq samples from which the markers and profiles have been derived, mDCs and pDCs were both included in the dendritic cells.  EPIC-ATAC thus quantifies the total amount of dendritic cells, i.e., mDCs and pDCs included. We have added a sentence in the main text to clarify this point:

      To identify robust chromatin accessibility marker peaks of cancer relevant cell types, we collected 564 samples of sorted cell populations from twelve studies including eight immune cell types (B cells […] dendritic cells (DCs) (mDCs and pDCs are grouped in this cell-type category) […] and  endothelial (Liu et al., 2020; Xin et al., 2020) cells (Figure 1 box 1, Figure 2A, Supplementary Table 1).

      Reviewer #2 (Recommendations For The Authors):

      The authors should double-check the naming of tools is done correctly e.g. ChIPSeeker has been spelled incorrectly in some instances throughout the manuscript.

      We thank the reviewer for pointing out this mistake and have corrected the mistake in the main text.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      […] Overall, the conclusions of this study are mostly well supported by the data. The concept of placental aging has been controversial, with several prior studies with conflicting viewpoints on whether placental aging occurs at all, is a normal process during gestation, or rather only a pathologic phenomenon in abnormal pregnancies. This has been rather difficult to study given the difficulty of obtaining serial placental samples in late gestation. The authors used both a mouse model of serial placental sampling and human placental samples obtained at preterm, but non-pathologic deliveries, which is an impressive accomplishment as it provides insight into a previously poorly understood timepoint of pregnancy. The data clearly demonstrate changes in the HIF-1 pathway and cellular senescence at increasing gestational ages in the third trimester, which is consistent with the process of aging in other tissues.

      Weaknesses of this study are that although the authors attribute alterations in HIF-1 pathways in advanced gestation to hypoxia, there are no experiments directly assessing whether the changes in HIF-1 pathways are due to hypoxia in either in vitro or in vivo experiments. HIF-1 has both oxygen-dependent and oxygen-independent regulation, so it is unclear which pathways contribute to placental HIF-1 activity during late gestation, especially since the third-trimester placenta is exposed to significantly higher oxygen levels compared to the early pregnancy environment. Additionally, the placenta is in close proximity to the maternal decidua, which consists of immune and stromal cells, which are also significantly affected by HIF-1. Although the in vitro experimental data in this study demonstrate that HIF-1 induction leads to a placenta senescence phenotype, it is unclear whether the in vivo treatment with HIF-1 induction acts directly on the placenta or rather on uterine myometrium or decidua, which could also contribute to the initiation of preterm labor.

      We thank Reviewer #1 for the thoughtful analysis offered here. We agree that our study has not determined whether placental HIF-1 activation occurring during late gestation is due to oxygen-dependent or oxygen-independent regulation; both possibilities are outlined in paragraph 3 of the Discussion. We used a pharmacological approach in our experiments characterizing the effects of HIF-1 stabilization in trophoblasts because it allows superior command of experimental conditions, but in future studies using hypoxic growth conditions we will determine whether oxygen sensing is a critical component of the aging effects on mitochondrial abundance, metabolism, and cellular senescence in the placenta.

      Reviewer #1 also appropriately highlights the possibility that extra-placental effects of DMOG may contribute to the initiation of preterm labor in our mouse model. Future studies making use of mice with placenta-specific transgenes will allow clarification of the specific contributions of placental HIF-1 signaling to labor onset.

      Reviewer #2 (Public Review):

      […] The major strength of this study is the use of multiple model systems to address the question at hand. The consistency of findings between mouse and human placenta, and the validation of mechanisms in vitro and in vivo modeling are strong support for their conclusions. The rationale for studying the term placentas to understand the abnormal process of preterm birth is clearly explained. Although the idea that hypoxic stress and placental senescence are triggers for labor is not novel, the comprehensiveness of the approach and idea to study the normal aging process are appreciated.

      There are some areas of the manuscript that lack clarity and weaknesses in the methodology worth noting. The rationale for focusing on senescence and HIF-1 is not clearly given that other pathways were more significantly altered in the WGCNA analysis. The placental gene expression data were from bulk transcriptomic analyses, yet the authors do not explicitly discuss the limitations of this approach. Although the reader can assume that the authors attribute the mRNA signature of aging to trophoblasts - of which, there are different types - clarification regarding their interpretation of the data and the relevant cell types would strengthen the paper. Additionally, while the inclusion of human placenta data is a major strength, the differences between mouse and human placental structure and cell types make highlighting the specific cells of interest even more important; although there are correlations between mouse and human placenta, there are also many differences, and the comparison is further limited when considering the whole placenta rather than specific cell populations.

      Additional details regarding methods and the reasons for choosing certain readouts are needed. Trophoblasts are sensitive to oxygen tension which varies according to gestational age, and it is unclear if this variable was taken into consideration in this study. Many of the cellular processes examined are well characterized in the literature yet the rationale for choosing certain markers is unclear (e.g., Glb1 for senescence; the transcripts selected as representative of the senescence-associated secretory phenotype; mtDNA lesion rate).

      Overall, the findings presented are a valuable contribution to the field. The authors provide a thoughtful discussion that places their findings in the context of current literature and poses interesting questions for future pursuit. Their efforts to be comprehensive in the characterization of placental aging is a major strength; few placental studies attempt to integrate mouse and human data to this extent, and the validation and presentation of a potential mechanism by which fetal trophoblasts signal to maternal uterine myocytes are exciting.

      Nevertheless, a clear discussion of the methodologic limitations of the study would strengthen the manuscript.

      We thank Reviewer #2 for careful consideration of our data and for the valuable feedback.

      We chose to focus on HIF-1 signaling, mitochondrial function and abundance, and cellular senescence among the pathways that emerged from WGCNA based on our testable hypothesis that these three phenomena could be linked, with HIF-1 upstream of mitochondrial changes and cellular senescence (noted in Lines 166-169 with references to studies on aging establishing this connection in other systems). The other pathways not studied here (FOXO, AMPK, mTOR signaling) are important stress-response mediators which likely play additional key roles in the biology we have begun to describe; extensive future studies are warranted to explore this fully.

      While we focused on establishing new mechanistic insights for aging in the placenta as a whole, localization of the effects described here to specific placental cell populations will be important to clarify in future studies, as is proposed in the Discussion (lines 316-319, which has been updated for emphasis). To our knowledge, no single-cell transcriptomics studies of the placenta have been published to date describing gene expression changes across advancing gestational age in healthy pregnancies, and the quantitative value of immunolocalization studies of candidate proteins in isolation would be limited.

      We do not dispute the limitations of mouse placenta as an imperfect model for the human organ; we have provided parallel data from human specimens wherever possible. We agree that this will continue to be critical in future studies, especially those aiming to achieve cell-type localization of these signaling pathways.

      As mentioned in the response to Reviewer #1, we utilized pharmacological HIF-1 induction in our experimental models rather than manipulation of oxygen tension but acknowledge the value of follow-up studies utilizing hypoxic growth conditions in the Discussion.

      SA-b-Gal activity is a key biomarker of cellular senescence, and this is most commonly assessed histochemically. Unfortunately, detecting b-galactosidase enzyme activity was not possible in the biobanked human specimens we accessed in this study (not collected/stored in a suitable format for histochemical processing), which is why we instead quantified expression of the lysosomal enzyme b-D-galactosidase, encoded by GLB1, the gene responsible for SA-b-Gal activity (Lee BY et al. Senescence-associated β-galactosidase is lysosomal β-galactosidase. Aging Cell 2006 – cited in line 106). A host of other senescence markers exists, but their appearance in senescent cells depends on the cell type and underlying drivers of the senescent phenotype (reference #45), with SA-b-Gal activity among the most universal. Similarly, the specific SASP components depend on cell type and senescence stimulus; we selected the markers in Figure 5H based on their previously established roles as mediators of placental signaling. As noted in the text (lines 120-121 with references to the relevant literature), mtDNA damage has previously been implicated as a driver of age-related loss-of-function in other tissues, which led us to explore whether mtDNA damage accompanies the other signs of mitochondrial dysfunction and dysregulation that were emerging in our data.

      Reviewer #3 (Public Review):

      In this study, Ciampa and colleagues demonstrate that HIF-1α activity is increased with gestation in humans and mice placentas and use several in vitro models to indicate that HIF activation in trophoblasts may release factors (yet to be identified) which promote myometrial contraction. Previous studies have linked placental factors to the preparation of the myometrium for labour (e.g. prostaglandins), but HIF-1α has not been implicated. Due to several issues regarding the experimental design, the results do not currently support the conclusions.

      Major concerns:

      1)  The hypothesis states that placental aging promotes parturition via HIF-1a activation, the study does not provide any evidence of an aged placenta. Aging is considered a progressive and irreversible loss of functional capacity, inability to maintain homeostasis, and decreased ability to repair the damage. The placenta retains all these abilities throughout pregnancy [PMID: 9462184], and there's no evidence that the placenta functionally declines between 35-39 weeks, otherwise, it wouldn't be able to support fetal development. However, there is evidence of a functional decline in post-term placentas (i.e. >40 weeks in humans) but the authors compare preterm placentas with E17.5 mice placentas or 39-week human placentas, both these gestational periods are prior to the onset of parturition in most pregnancies (human = 40wkGA, mice=E18.5).

      We thank Reviewer #3 for careful consideration of our manuscript and the thoughtful feedback.

      Our stance that the placenta ages across its normal lifespan is based on the appearance of cellular senescence as an emerging pathway in latter gestational timepoints in the WGCNA, with subsequent validation of cellular senescence markers accumulating in placental samples from the advanced gestational age cohort. Although functional deficits stemming from the appearance of cellular senescence late in pregnancy may not be appreciable at the whole-system level until post-dates, we propose that the subclinical cellular aging that we have detected even before labor onset may be relevant in the setting of a “second hit” stressor — eg, impaired ability to maintain homeostasis, repair damage.

      Future studies will examine functional deficits at the cellular level in response to HIF-1 stabilization (eg. Seahorse assay) and in early- versus late-gestational age primary cells. We hypothesize such studies will reveal impaired resistance to metabolic stressors in the senescent phenotype. Further, there will be value in exploring the impact of senolytics in restoring function to aged tissue.

      In both mouse and human, our selection of placentas that had not yet been exposed to spontaneous labor was deliberate, in order to avoid confounding from the inflammatory effects of labor and delivery itself (due to large swings in perfusion pressure and local ischemia-reperfusion events).

      2)  While the authors provide evidence that HIF-1α activity increases in both the human and mice placenta as gestation progresses, the mechanistic link between placental HIF-1α and parturition is not strongly supported. For example, most of the evidence is based on in vitro studies showing that conditioned media from trophoblasts treated with CoCl2 increased the contraction of myometrial cells. The specific factor responsible was not identified but the authors allude to pro- inflammatory factors such as cytokines. It was therefore interesting to note that the conditioned media had undergone a filtration step that removes all substances >10kDa, which includes the majority of cytokines and hormones.

      We appreciate the opportunity to clarify that in the filtration step, we retained the >10 kDa fraction, allowing us to clear CoCl2 itself among other <10kDa molecules. A 10kDa cutoff was chosen to allow for retention of cytokines including those previously implicated as signals that can promote contractility in uterine myocytes. As mentioned in the discussion, future studies will aim to identify specific factors within the secretome that are necessary and sufficient to induce the contractile changes.

      3) An alternative explanation is that CoCl2 treatment-induced trophoblast differentiation and the effects on myometrial contraction may be related to differences in secreted factors produced by cytotrophoblasts versus syncytiotrophoblast. Although JAR cells do not spontaneously differentiate, they can be induced to syncytialise upon cAMP stimulation. Ref 39 the authors cite shows this. Indeed, the morphology of the cells in Fig5F that were exposed to CoCl2 indicates that they may be syncytialised. Syncytialised trophoblasts also express markers of senescence including increased SA-β-gal activity and reductions in mitochondrial activity.

      The following is taken from Reference 39, final paragraph:

      For instance, among the tested cell lines the choriocarcinoma cell line BeWo is best suited for studies on syncy8al fusion. However, ACH-3P, JAR and Jeg-3 cells react to forskolin treatment with elevated levels of hCG they do not form syncy8a73 and are therefore poor models for syncy8aliza8on over a period of 7

      days.

      4)  The in vivo experiment showing reduced gestation length in pregnant mice receiving DMOG injection is interesting. However, we cannot exclude the effects of DMOG on non-placental tissues (both maternal and fetal) or the non-specific effects of DMOG (i.e. HIF-1α independent). Furthermore, previous studies using a more direct approach to alter HIF-1α activity in the placenta using trophoblast-specific overexpression of HIF-1α in mice do not lead to changes in gestation length [PMID: 30808910].

      As stated in the response to Reviewer #1, we acknowledge the possibility that extra-placental effects of DMOG may contribute to the initiation of preterm labor in our mouse model. Future studies making use of mice with placenta-specific transgenes will allow clarification of the specific contributions of placental HIF-1 signaling to labor onset.

      Regarding PMID 30808919, as noted in our Discussion (lines 326-335), an important distinction is that the referenced paper studied effects of trophoblast- specific expression of a constitutively active HIF1 from the beginning of pregnancy, and their findings highlight markedly abnormal placental development in that context. By contrast, we describe effects of HIF-1 stabilization late in pregnancy in a normally developed placenta.

      5)  Lack of appropriate experimental models. E.g. JAR choriocarcinomas are not an ideal model of the human trophoblast as they are malignant. Much better models are available e.g. primary human trophoblasts from term placentas or human trophoblast stem cells from first-trimester placentas. Similarly, the mouse model is also not specific as discussed above.

      We agree with the Reviewer that the JAR cell line has important differences from human trophoblasts, nonetheless as stated in the Results section (Lines 181-184) they were used in order to model long-term exposure to HIF-1 induction without underlying syncytialization confounding the findings, as would be the case with primary cells.

      6)  Lack of cohesion between the different experimental models. E.g. CoCl2 was used to induce hypoxia/HIF1α in mouse TBs, but DMOG was used in vivo in mice. SA-β Gal staining was carried out in cells but not in mouse or human tissues.

      We used two distinct prolyl hydroxylase inhibitors (CoCl2 and DMOG) in our in vitro studies (Figures 4, 5, and 5 Supplement) to demonstrate reproducibility across models and to help attribute the effects to HIF-1 stabilization rather than off-target effects. DMOG was chosen for the in vivo studies because of its prior use in mice.

      As mentioned in response to reviewer 2, detecting b-galactosidase enzyme activity was not possible in the biobanked human specimens we accessed in this study (not collected/stored in a suitable format for histochemical processing), which is why we instead quantified expression of the lysosomal enzyme b-D- galactosidase, encoded by GLB1, the gene responsible for SA-b-Gal activity (Lee BY et al. Senescence-associated β-galactosidase is lysosomal β-galactosidase. Aging Cell 2006 – cited in line 106).

      7)  Evidence of senescence and mitochondrial abundance could be strengthened by providing additional markers. E.g. only GLB1 mRNA expression is provided as evidence of senescence, and COX IV protein for mitochondrial abundance in mouse and human placentas.

      As mentioned in response to Reviewer 2, the appearance of other senescence markers depends on the cell type and underlying drivers of the senescent phenotype (reference #45), with SA-b-Gal activity among the most universal. Future studies will further probe which markers accompany cellular senescence in aging placenta to define the senescence phenotype in this setting.

      8)  Given that the main goal of this study was to investigate the role of hypoxia, hypoxia (i.e. low oxygen) was never directly induced and the results were based on chemical inducers of HIF-1α which have multiple off-target effects.

      As mentioned in response to Reviewer 1, we agree that our study has not determined whether placental HIF-1 activation occurring during late gestation is due to oxygen-dependent or oxygen-independent regulation; both possibilities are outlined in paragraph 3 of the Discussion. We used a pharmacological approach in our foundational experiments characterizing the effects of HIF-1 stabilization in trophoblasts because it allows superior command of experimental conditions, but in future studies using hypoxic growth conditions we will determine whether oxygen sensing is a critical component of the aging effects on mitochondrial abundance, metabolism, and cellular senescence in the placenta. We are encouraged by the consistency of the senescence phenotype in JAR cells following administration of two distinct prolyl hydroxylase inhibitors, CoCl2 and DMOG, suggesting that the effects seen are more likely attributable to HIF-1 stabilization (versus off-target effects).

      Reviewer #1 (Recommendations For The Authors):

      This is a very interesting and well-written study that supports the concept of placental aging using a combination of a mouse model, in vitro cell lines, and human placental samples.

      Overall this is an important contribution to our current understanding of placental biology highlighting the role of the HIF-1 pathway and merits publication.

      This study would be strengthened by the following addition:

      - As stated in the Public Review, the authors attribute HIF-1 induction at increased gestation to hypoxia, however, this has not been demonstrated experimentally and HIF-1 has both O2-dependent and independent regulation. The authors could perform an in vitro culture of primary placental cells or JAR cells under hypoxic conditions and assess the HIF-1 pathway/mitochondria activity to provide support for a hypoxia-dependent mechanism.

      We thank Reviewer #1 for the thoughtful analysis offered here. We agree that our study has not determined whether placental HIF-1 activation occurring during late gestation is due to oxygen-dependent or oxygen-independent regulation; both possibilities are outlined in paragraph 3 of the Discussion. We used a pharmacological approach to characterize effects of HIF-1 stabilization in trophoblasts because it allows superior command of experimental conditions, but in future studies using hypoxic growth conditions we will determine whether oxygen sensing is a critical component of the aging effects on mitochondrial abundance, metabolism, and cellular senescence in the placenta.

      Reviewer #2 (Recommendations For The Authors):

      Major comments:

      1. The rationale for the pursuit of HIF-1 and cellular senescence after initial WGCNA was weakly supported, though this avenue led to interesting and impactful results. The text could provide a stronger rationale for pursuing these pathways as opposed to the top- upregulated and downregulated pathways, perhaps by emphasizing previously published work in the field.

      We thank Reviewer #2 for careful consideration of our data and for the valuable feedback.

      We chose to focus on HIF-1 signaling, mitochondrial function and abundance, and cellular senescence among the pathways that emerged from WGCNA based on our testable hypothesis that these three phenomena could be linked, with HIF-1 upstream of mitochondrial changes and cellular senescence (noted in Lines 166-169 with references to studies establishing this connection in other systems). The other pathways not studied here (FOXO, AMPK, mTOR signaling) are important stress-response mediators which likely play additional key roles in the biology we have begun to describe; extensive future studies are warranted to explore this fully.

      2.  Validation of the gene expression data with placental histology and immunolocalization of proteins of interest would bolster the study by identifying the relevant cell types and showing changes in protein levels over time. Additionally, single-cell RNA-seq data from mouse and human placenta are available. Exploration of these published datasets would also be interesting.

      While we focused on establishing new mechanistic insights for aging in the placenta as a whole, localization of the effects described here to specific placental cell populations will be important to clarify in future studies, as is proposed in the Discussion (lines 316-319, which has been updated for emphasis). To our knowledge, no single-cell transcriptomics studies of the placenta have been published to date describing gene expression across advancing gestational age timepoints, and the value of single timepoint “snapshots” that exist in the literature is limited for the purpose of validating the aging mechanisms we have proposed here.

      3. In Figure 2, all of the data have a gestational age-dependent trend except for Fig 2F where the mtDNA lesion rate drops at e15.5. What is the authors' interpretation of these results?

      A testable hypothesis to explain this result is that as mtDNA damage begins to accumulate, cells are initially able to respond via mitophagy, removing those mitochondria with damaged DNA (e15.5), until that response is overwhelmed, allowing the detectable mtDNA lesion rate to spike at e17.5.

      4. In paragraph three of the results, the authors conclude that there is an accumulation of ROS stress, yet there is no direct measurement of ROS. Measuring ROS directly in this setting would strengthen this conclusion (similar to what is done in Figure 5E).

      We interpreted the accumulation of mtDNA damage as a marker of ROS stress, but the Reviewer correctly points out that we did not measure ROS directly in this model. We have updated the language (line 126) to be more accurate.

      5. There is a discrepancy in the length of CoCl2 treatment in primary trophoblasts vs. JAR cells (48 hours vs. 6 days). Treatment with DMOG in JAR cells also differed (4 days). Do the authors have any evidence that longer vs. shorter stabilization of HIF-1 has secondary effects in these cells that could affect the results of the study?

      We preliminarily explored the timecourse of the effects of HIF-1 stabilization in JAR cells, as shown in Fig 5 – Supp 1, and also found that the decline in mt abundance precedes the appearance of senescence markers (data not shown). JAR cells are a much better model for exploring effects of chronic exposure to HIF-1 stabilization because they do not syncytialize as primary trophoblasts do. We limited our studies in primary cells for this reason to a 48h- timepoint, because the effects of syncytialization would confound longer studies. With the aim of simply validating our CoCl2 findings with a separate prolyl hydroxylase inhibitor, we picked an intermediate timepoint for convenience. The reviewer correctly pinpoints the value of future studies that further dissect the kinetics of these phenomena, which could also potentially identify at which phases the effects are reversible.

      6. The authors evaluated mitochondrial effects in a time course experiment (Figure 5 Supplement 1) and found that the effects of HIF-1 stabilization emerged after three days of treatment, but no such experiment was conducted to determine the timing of senescence with SA-βGal. It would be interesting to correlate the mitochondrial effects and onset of senescence caused by HIF-1 stabilization.

      In future studies we will continue to explore the relative dynamics of HIF1 stabilization vs mitochondrial effects and senescence. In doing so it will be important to explore other markers of senescence; while SAbGal is the most universal senescence marker, others (such as p16 or p21 induction), if present, may lend themselves to more precise quantification and a clearer definition of senescence “start time”.

      7. IL-1β is used in experiments testing the effect of JAR-conditioned media on uterine myocytes. The conclusion of this experiment is that conditioned media from JAR cells treated with CoCl2, but not from untreated JAR cells, results in myocyte contraction (Figure 6E) and expression of contraction-associated genes (Figure 6A-D). Although the figure shows that IL-1β + conditioned media increases expression of these genes compared to IL- 1β alone, an added control condition where conditioned media is used in the absence of IL- 1β would underscore this conclusion and show whether the components in the conditioned media are sufficient to induce gene expression and contraction. There is also no justification for the 10 kDa cutoff in this experiment.

      We did test whether conditioned media could induce contractile changes in myocytes in the absence of IL-1b co-stimulation, and indeed found that the CoCl2-stimulated conditioned media does elicit this effect on its own. We eliminated these conditions from the published figure in an aim to limit its complexity, but present them here (*, p< 0.05 vs no treatment):

      Author response image 1.<br />

      The filtration step was implemented to concentrate the conditioned media prior to applying it to the myocytes. A 10kDa cutoff was chosen to ensure retention of most cytokines, especially those previously implicated in contractile switching of uterine myocytes (eg. IL1b, IL1a, TNFa each approximately 18 kDa, IL6 approximately 21 kDa). The filtration and wash steps also ensured clearance of CoCl2 out of the conditioned media before it was applied to myocytes.

      8. Figure 7 shows the use of DMOG in vivo to stabilize HIF-1, which induces preterm labor. Is there a way to inhibit HIF-1 signaling downstream to show that preterm labor in vivo is specifically due to HIF-1 stabilization and not an off-target effect of DMOG? Rescue experiments either in vitro or in DMOG-treated mice using HIF-1s inhibitors would be very compelling although we recognize these may not be feasible. Regardless, a comment on the translational impact of this study and the potential of targeting the HIF pathway to treat or prevent SPTB should be considered.

      There is considerable research into HIF inhibitors as cancer therapeutics (and FDA approval of a HIF2a inhibitor, belzutifan, for von Hippel Lindau disease). Future studies into the ability of HIF-1 inhibitors to rescue preterm labor are certainly of interest, though translational potential may be limited by systemic toxicity unless a targeted placenta-specific delivery system can be achieved. Genetic approaches using placenta-specific knockout might also be useful, particularly if conditional knockout can be achieved to limit the effects on HIF-1 signaling to late pregnancy, after placental development is complete.

      9. The effect of JAR-conditioned media on uterine myocytes is very interesting. The authors might consider additional discussion of what the putative mediators are and what is suggested in the preterm birth literature (e.g., Sheller-Miller, PMID: 30679631). Assessment of other SASP factors in using ELISA, e.g., would strengthen the study, or at least a rationale for the genes evaluated.

      We agree that follow-up studies should be done to identify which components of the secretome are key for mediating the contractile effect in myocytes, as noted in the Discussion (Lines 271-273), now updated for emphasis and with the suggested references.

      Additional minor comments:

      10.  For Figure 1A, without reading the figure legend it is unclear that the vertical color graph represents different gene clusters; consider labeling the y-axis with 'Gene clusters.' Also, blue and turquoise clusters could be labeled as "upregulated" or "downregulated" for simplicity and clarity.

      Updated, thank you for the suggestions.

      11. For mRNA expression wherever relevant, state in the figure legends and main text the method used (i.e., qPCR) and what the reference timepoint and normalization strategy was. For instance, in Figure 2 (and supplement 1), we were of the impression that the e15.5 and e17.5 values were normalized to e13.5.

      Updated, thank you for the suggestions.

      12.  For Figure 5, can the authors explain in the main text what is Mtsox and how is it a marker for mitochondrial depolarization? In 5E, it would be helpful to mention what is TMRE and FCCP are and how it measures mitochondrial ROS.

      Updated, thank you for the suggestions.

      13.  Figure 5 Supplement 2 and Figure 5 Supplement 3 appear to be missing labels indicating black vs. blue vs. red datasets.

      Updated, thank you for the suggestion.

      14.  Figure 7c, what is the n in each group?

      Gestational length data in Figures 7c and 7d each reflect the same n=8 mice per group.

      15.  Minor edits are needed for inconsistent use of terms (pre-term vs. preterm, for example) and grammar.

      Updated, thank you for the suggestion.

      Suggested additions to the Methods section to improve reproducibility:

      16.    Include more detail re: cell culture conditions, including % oxygen.

      Updated, thank you.

      17.  Collagen lattice contraction assay - include details on how measurements of collagen discs were performed. Was this automated?

      Updated, thank you.

      18.  Immunoblots. Details, such as the amount of protein loaded, gel composition, protein extraction method, etc., would be helpful.

      Updated, thank you.

      Reviewer #3 (Recommendations For The Authors):

      Minor comments:

      1.  It is unclear why 2-way ANOVA was performed in figure 3 when there are only 2 groups under comparison: <35 wks vs >39 wks

      As in Figure 2D, multiple genes are analyzed together in Figure 3A using 2-way ANOVA with the two factors being 1) gestational age and 2) individual gene targets (GLB1, HK2, GLUT1). This approach allows us to define the combined effect of gestational age on expression level of all of the genes whose expression is increasing.

      2.  Scale bars missing in some figures - Fig4E, Fig 5D, 5F, Fig5 - Suppl 3C.

      Scale bars were not captured with the original images; we regret this omission.

    1. Author Response

      The following is the authors’ response to the current reviews.

      We thank both reviewers for their detailed and positive assessment of our work.

      To Reviewer #2, we have now explicated the pattern -- (QXQXQX>3)4 where X>3 denotes any length of three or more residues of any composition -- in the first paragraph of the discussion.

      To Reviewer #3, we have made slight modifications to the text in the “Q zippers poison themselves” results section, to attempt to further clarify the mechanism of self-poisoning.

      Briefly, the reviewer questions if an alternative model -- where inhibition involves non-structured rather than Q-zipper containing oligomers -- better explains the data. We provided two lines of evidence that we believe exclude this alternative model. First, we point out in the first paragraph of the “Q zippers poison themselves” section that the cells that unexpectedly lack amyloid in the high concentration regime have negligible levels of AmFRET, indicating that the inhibitory oligomers themselves occur at low concentrations regardless of the total concentration, and are therefore limited by a kinetic barrier. Second, we point out in the third paragraph of the section that the severity of amyloid inhibition with respect to concentration has a sequence dependence that matches the expectation of converging phase boundaries for crystal polymorphs -- specifically, inhibition is most severe for sequences that have a local Q density just high enough to form a Q zipper on both sides of each strand. Inhibition relaxed for sequences having more or less Qs than that threshold. In contrast, disordered oligomerization is not expected to have such a dependence on the precise pattern of Qs and Ns.


      The following is the authors’ response to the original reviews.

      We are pleased that the editors find our study valuable. We find that the reviewers’ criticisms largely arise from misunderstandings inherent to the conceptually challenging nature of the topic, rather than fundamental flaws, as we will elaborate here. We are grateful for the opportunity afforded by eLife to engage reviewers in what we intend to be a constructive public dialogue.

      Response to Reviewer 1

      This review is highly critical but lacks specifics. The reviewer’s criticisms reflect a position that seems to dismiss a critical role for (or perhaps even the existence of) conformational ordering in polyQ amyloid, which is untenable.

      The reviewer states that our objective to characterize the amyloid nucleus “rests on the assertion that polyQ forms amyloid structures to the exclusion of all other forms of solids”. We do not fully agree with this assertion because our findings show that detectable aggregation is rate-limited by conformational ordering, as evident by 1) its discontinuous relationship to concentration, 2) its acceleration by a conformational template, and 3) its strict dependence on very specific sequence features that are consistent with amyloid structure but not disordered aggregation).

      We strongly disagree with the reviewer’s subjective statement that we have not critically assessed our findings and that they do not stand up to scrutiny. This statement seems to rest on the perceived contradiction of our findings with that of Crick et al. 2013. Contrary to the reviewer’s assessment, we argue here that the conclusions of Crick et al. do more to support than to refute our findings. Briefly, Crick et al. investigated the aggregation of synthetic Q30 and Q40 peptides in vitro, wherein fibrils assembled from high concentrations of peptide were demonstrated to have saturating concentrations in the low micromolar range. As explained below, this finding of a saturating concentration does not refute our results. More relevant to the present work are their findings that “oligomers” accumulated over an hours-long timespan in solutions that are subsaturated with respect to fibrils, and these oligomers themselves have (nanomolar) critical concentrations. The authors postulated that the oligomers result from liquid–liquid demixing of intrinsically disordered polyglutamine. However, phase separation by a peptide is expected to fix its concentration in both the solute and condensed phases, and, because disordered phase separation is faster than amyloid formation, the postulated explanation removes the driving force for any amyloid phase with a critical solubility greater than that of the oligomers. In place of this interpretation that truly does appear to -- in the reviewer’s words -- “contradict basic physical principles of how homopolymers self-assemble”, we interpret these oligomers as evidence of Q zipper-containing self-poisoned multimers, rounded as an inherent consequence of self-poisoning (Ungar et al., 2005), and plausibly akin to semicrystalline spherulites that have been observed in other polymer crystal and amyloid-forming systems (Crist and Schultz, 2016; Vetri and Foderà, 2015). Importantly, the physical parameters governing the transition between amyloid spherulites and fibrils have been characterized in the case of insulin (Smith et al. 2012), where it was found that spherulites form at lower protein concentrations than fibrils. This mirrors the observation by Crick et al. that fibrils have a higher solubility limit than the spherical oligomers. . Further rebuttal to the perceived incompatibility of monomeric nucleation with the existence of a critical concentration for amyloid

      We appreciate that the concept of a monomeric nucleus can superficially appear inconsistent with the fact that crystalline solids such as polyQ amyloid have a saturating concentration, but this is only true if one neglects that polyQ amyloids are polymer crystals with intramolecular ordering. The perceived discrepancy is perhaps most easily dispelled by the fact that folded proteins can form crystals, and the folded state of the protein. These crystals have critical concentrations, and the protein subunits within them each have intramolecular crystalline order (in the form of secondary structure). When placed in a subsaturated solution, the protein crystals dissolve into the constituent monomers, and yet those monomers still retain intramolecular order. Our present findings for polyQ are conceptually no different.

      To further extrapolate this simple example to polyQ, one can also draw on the now well-established phenomenon of secondary nucleation, whereby transient interactions of soluble species with ordered species leads to their own ordering (Törnquist et al., 2018). Transience is important here because it implies that intramolecular ordering can in principle propagate even in solutions that are subsaturated with respect to bulk crystallization. This is possible in the present case because the pairing of sufficiently short beta strands (equivalent to “stems” in the polymer crystal literature) will be more stable intramolecularly than intermolecularly, due to the reduced entropic penalty of the former. Our elucidation that Q zipper ordering can occur with shorter strands intramolecularly than intermolecularly (Fig. S4C-D) demonstrates this fact. It is also evident from published descriptions of single molecule “crystals” formed in sufficiently dilute solutions of sufficiently long polymers (Hong et al., 2015; Keller, 1957; Lauritzen and Hoffman, 1960).

      In suggesting that a saturating concentration for amyloid rules out monomeric nucleation, the reviewer assumes that the Q zipper-containing monomer must be stable relative to the disordered ensemble. This is not inherent to our claim. The monomeric nucleating structure need not be more stable than the disordered state, and monomers may very well be disordered at equilibrium at low concentrations. To be clear, our claim requires that the Q zipper-containing monomer is both on pathway to amyloid and less stable than all subsequent species that are on pathway to amyloid. The former requirement is supported by our extensive mutational analysis. The latter requirement is supported by our atomistic simulations showing the Q zipper-containing monomer is stabilized by dimerization (included in our 2021 preprint). Hence, requisite ordering in the nucleating monomer is stabilized by intermolecular interactions. We provide in Author response image 1 an illustration to clarify what we believe to be the discrepancy between our claim and the reviewer’s interpretation.

      Author response image 1.

      That the rate-limiting fluctuation for a crystalline phase can occur in a monomer can also be understood as a consequence of Ostwald’s rule of stages, which describes the general tendency of supersaturated solutes, including amyloid forming proteins (Chakraborty et al., 2023), to populate metastable phases en route to more stable phases (De Yoreo, 2022; Schmelzer and Abyzov, 2017). Our findings with polyQ are consistent with a general mechanism for Ostwald’s rule wherein the relative stabilities of competing polymorphs differ with the number of subunits (De Yoreo, 2022; Navrotsky, 2004). As illustrated in Fig. 6 of Navrotsky, a polymorph that is relatively stable at small particle sizes tends to give way to a polymorph that -- while initially unstable -- becomes more stable as the particles grow. The former is analogous to our early stage Q zipper composed of two short sheets with an intramolecular interface, while the latter is analogous to the later stage Q zipper composed of longer sheets with an intermolecular interface. Subunit addition stabilizes the latter more than the former, hence the initial Q zipper that is stabilized more by intra- than intermolecular interactions will mature with growth to one that is stabilized more by intermolecular interactions.

      We have added a new figure (Fig. 6) to the manuscript to illustrate qualitative features of the amyloid pathway we have deduced for polyQ.

      Rebuttal to the perceived necessity of in vitro experiments

      The overarching concern of this reviewer and reviewing editor is whether in-cell assays can inform on sequence-intrinsic properties. We understand this concern. We believe however that the relative merit of in-cell assays is largely a matter of perspective. The truly sequence-intrinsic behavior of polyQ, i.e. in a vacuum, is less informative than the “sequence-intrinsic” behaviors of interest that emerge in the presence of extraneous molecules from the appropriate biological context. In vitro experiments typically include a tiny number of these -- water, ions, and sometimes a crowding agent meant to approximate everything else. Obviously missing are the myriad quinary interactions with other proteins that collectively round out the physiological solvent. The question is what experimental context best approximates that of a living human neuron under which the pathological sequence-dependent properties of polyQ manifest. We submit that a living yeast cell comes closer to that ideal than does buffer in a test tube.

      The reviewer’s statements that our findings must be validated in vitro ignores the fact -- stressed in our introduction -- that decades of in vitro work have not yet generated definitive evidence for or against any specific nucleus model. In addition to the above, one major problem concerns the large sizes of in vitro systems that obscure the effects of primary nucleation. For example, a typical in vitro experimental volume of e.g. 1.5 ml is over one billion-fold larger than the femtoliter volume of a cell. This means that any nucleation-limited kinetics of relevant amyloid formation are lost, and any alternative amyloid polymorphs that have a kinetic growth advantage -- even if they nucleate at only a fraction the rate of relevant amyloid -- will tend to dominate the system (Buell, 2017). Novel approaches are clearly needed to address these problems. We present such an approach, stretch it to the limit (as the reviewer notes) across multiple complementary experiments, and arrive at a novel finding that is fully and uniquely consistent with all of our own data as well as the collective prior literature.

      That the preceding considerations are collectively essential to understand relevant amyloid behavior is evident from recent cryoEM studies showing that in vitro-generated amyloid structures generally differ from those in patients (Arseni et al., 2022; Bansal et al., 2021; Radamaker et al., 2021; Schmidt et al., 2019; Schweighauser et al., 2020; Yang et al., 2022). This is highly relevant to the present discourse because each amyloid structure is thought to emanate from a different nucleating structure. This means that in vitro experiments have broadly missed the mark in terms of the relevant thermodynamic parameters that govern disease onset and progression. Note that the rules laid out via our studies are not only consistent with structural features of polyQ amyloid in cells, but also (as described in the discussion) explain why the endogenous structure of a physiologically relevant Q zipper amyloid differs from that of polyQ.

      A recent collaboration between the Morimoto and Knowles groups (Sinnige et al.) investigated the kinetics of aggregation by Q40-YFP expressed in C. elegans body wall muscle cells, using quantitative approaches that have been well established for in vitro amyloid-forming systems of the type favored by the reviewer. They calculate a reaction order of just 1.6, slightly higher than what would be expected for a monomeric nucleus but nevertheless fully consistent with our own conclusions when one accounts for the following two aspects of their approach. First, the polyQ tract in their construct is flanked by short poly-Histidine tracts on both sides. These charges very likely disfavor monomeric nucleation because all possible configurations of a four-stranded bundle position the beginning and end of the Q tract in close proximity, and Q40 is only just long enough to achieve monomeric nucleation in the absence of such destabilization. Second, the protein is fused to YFP, a weak homodimer (Landgraf et al., 2012; Snapp et al., 2003). With these two considerations, our model -- which was generated from polyQ tracts lacking flanking charges or an oligomeric fusion -- predicts that amyloid nucleation by their construct will occur more frequently as a dimer than a monomer. Indeed, their observed reaction order of 1.6 supports a predominantly dimeric nucleus. Like us and others, Sinnige et al. did not observe phase separation prior to amyloid formation. This is important because it not only argues against nucleation occurring in a condensate, it also suggests that the reaction order they calculated has not been limited by the concentration-buffering effect of phase separation.

      While we agree that our conclusions rest heavily on DAmFRET data (for good reason), we do provide supporting evidence from molecular dynamics simulations, SDD-AGE, and microscopy.

      To summarize, given the extreme limitations of in vitro experiments in this field, the breadth of our current study, and supporting findings from another lab using rigorous quantitative approaches, we feel that our claims are justified without in vitro data.

      Rebuttals to other critiques

      We do not deny that flanking domains can modulate the kinetics and stability of polyQ amyloid. However, as stated and referenced in the introduction, they do not appear to change the core structure. We have also added a paragraph concerning flanking domains to the discussion, and acknowledged that “the extent to which our findings will translate in these different contexts remains to be determined.” Nevertheless, that the intrinsic behavior of the polyQ tract itself is central to pathology is evident from the fact that the nine pathologic polyQ proteins have similar length thresholds despite different functions, flanking domains, interaction partners, and expression levels.

      The reviewer states that we found nucleation potential to require 60 Qs in a row. Our data are collectively consistent with nucleation occurring at and above approximately 36 Qs, a point repeated in the paper. The reviewer may be referring to our statement, ”Sixty residues proved to be the optimum length to observe both the pre- and post-nucleated states of polyQ in single experiments”. The purpose of this statement is simply to describe the practical consideration that led us to use 60 Qs for the bulk of our assays. We do appreciate that the fraction of AmFRET-positive cells is very low for lengths just above the threshold, especially Q40. They are nevertheless highly significant (p = 0.004 in [PIN+] cells, one-tailed T-test), and we have modified the figure and text to clarify this.

      The reviewer characterizes self-poisoning as the hallmark of crystallization from polymer melts, which would be problematic for our conclusions if self-poisoning were limited to this non-physiological context. In fact the term was first used to describe crystallization from solution (Organ et al., 1989), wherein the phenomenon is more pronounced (Ungar et al., 2005).

      Response to Reviewer 2

      We thank the reviewer for their detailed and helpful critique.

      The reviewer correctly notes that the majority of our manipulations were conducted with 60-residue long tracts (which corresponds to disease onset in early adulthood), and this length facilitates intramolecular nucleation. However, we also analyzed a length series of polyQ spanning the pathological threshold, as well as a synthetic sequence designed explicitly to test the model nucleus structure with a tract shorter than the pathological threshold, and both experiments corroborate our findings.

      The reviewer mentions “several caveats” that come with our result, but their subsequent elaboration suggests they are to be interpreted more as considerations than caveats. We agree that increasing sequence complexity will tend to increase homogeneity, but this is exactly the motivation of our approach. We explicitly set out to determine the minimal complexity sequence sufficient to specify the nucleating conformation, which we ultimately identified in terms of secondary and tertiary structure. We do not specify which parts of a long polyQ tract correspond to which parts of the structure, because, as the reviewer points out, they can occur at many places. Hence, depending on the length of the polyQ tract, the nucleus we describe may have any length of sequence connecting the strand elements. We do not think that the effects of N-residue placement can be interpreted as a confounding influence on hairpin position because the striking even-odd pattern we observe implicates the sides of beta strands rather than the lengths. Moreover, we observe this pattern regardless of the residue used (Gly, Ser, Ala, and His in addition to Asn).

      We thank the reviewer for noting the novelty and plausibility of the self-poisoning connection. We would like to elaborate on our finding that self-poisoning inhibits nucleation (in addition to elongation), as this will be confusing to many readers. While self-poisoning is claimed to inhibit primary nucleation in the polymer crystal literature (Ungar et al., 2005; Zhang et al., 2018), the semantics of “nucleation” in this context warrants clarification. Technically, the same structure can be considered a nucleus in one context but not in another. The Q zipper monomer, even if it is rate-limiting for amyloid formation at low concentrations (and is therefore the “nucleus”), is not necessarily rate-limiting when self-poisoned at high concentrations. Whether it comprises the nucleus in this case depends on the rates of Q zipper formation relative to subunit addition to the poisoned state. If the latter happens slower than Q zipper formation de novo, it can be said that self-poisoning inhibits nucleation, regardless of whether the Q zipper formed. We suspect this to be the mechanism by which preemptive oligomerization blocks nucleation in the case of polyQ, though other mechanisms may be possible.

      We believe the revised text also now incorporates the remaining suggestions of this reviewer, with two exceptions. 1) We retain the phrase “hidden pattern”, because we believe our data argue for a nucleus whose formation requires that Qs occur in a pattern that we now elaborate as (QXQXQX>3)4 where X>3 denotes any length of three or more residues of any composition. In amyloids formed from long polyQ molecules, the nucleus will involve any subset of 12 Qs that match this pattern. 2) We decided not to re-order the mansucript to discuss self-poisoning after establishing the monomer nucleus (even though we agree that doing so would improve the logical flow) because the interpretation of the data with respect to self-poisoning helps to establish critical strand lengths, and self-poisoning creates an anomaly in the DAmFRET data that is difficult to ignore. We add text clarifying that high local concentrations “effectively shifts the rate-limiting step to the growth of a higher order relatively-disordered species”.

      Response to Reviewer 3

      We thank the reviewer for their helpful comments.

      We opted to retain Figures 1A and B because we think they are important for comprehending the subject and objectives of the study. We modified the former to attempt to make it more clear. We have also elaborated on DAmFRET as it is a relatively new approach that may be unfamiliar to many readers. Beyond this, we refer the reviewer and readers to our cited prior work describing the theory and interpretation of DAmFRET. Note that the y-axes of DAmFRET plots are not raw FRET but rather “AmFRET”, a ratio of FRET to total expression level. As explained thoroughly in our cited prior work, the discontinuity of AmFRET with expression level indicates that the high AmFRET-population formed via a disorder-to-order transition. When the query protein is predicted to be intrinsically disordered, the discontinuous transition to high AmFRET invariably (among hundreds of proteins tested in prior published and unpublished work) signifies amyloid formation as corroborated by SDD-AGE and tinctorial assays.

      When performed using standard flow cytometry as in the present study, every AmFRET measurement corresponds to a cell-wide average, and hence does not directly inform on the distribution of the protein between different stoichiometric species. As there is only one fluorophore per protein molecule, monomeric nuclei have no signal. DAmFRET can distinguish cells expressing monomers from stable dimers from higher order oligomers (see e.g. Venkatesan et al. 2019), and we are therefore quite confident that AmFRET values of zero correspond to cells in which a vast majority of the respective protein is not in homo-oligomeric species (i.e. is monomeric or in hetero-complexes with endogenous proteins). The exact value of AmFRET, even for species with the same stoichiometry, will depend both on the effect of their respective geometries on the proximity of mEos3.1 fluorophores, and on the fraction of protein molecules in the species. Hence, we only attempt to interpret the plateau values of AmFRET (where the fraction of protein in an assembled state approaches unity) as directly informing on structure, as we did in Fig. S3A.

      We believe that AmFRET decreases with longer polyQ because the mass fraction of fluorophore decreases in the aggregate, simply because the extra polypeptide takes up volume in the aggregate.

      Yes, the fraction of positive cells in a discontinuous DAmFRET plot does increase with time. However, given the more laborious data collection and derivation of nucleation kinetics in a system with ongoing translation, especially across hundreds of experiments with other variables, ours is a snapshot measurement to approximately derive the relative contributions of intra- and intermolecular fluctuations to the nucleation barrier, rather than the barrier’s magnitude.

      We have revised the tautological statement by removing “non-amyloid containing”.

      Concerning the correlation of our data with the pathological length threshold -- as we state in the first results section, “Our data recapitulated the pathologic threshold -- Q lengths 35 and shorter lacked AmFRET, indicating a failure to aggregate or even appreciably oligomerize, while Q lengths 40 and longer did acquire AmFRET in a length and concentration-dependent manner”. Hence, most of our experiments were conducted with 60Q not because it resembles the pathological threshold, but rather because it was most convenient for DAmFRET experiments.

      Self-poisoning is a widely observed and heavily studied phenomenon in polymer crystal physics, though it seems not yet to have entered the lexicon of amyloid biologists. We were new to this concept before it emerged as an extremely parsimonious explanation for our results. As described in the text, two pieces of evidence exclude the alternative mechanism suggested by the reviewer -- that non-structured oligomers form and subsequently engage and inhibit the template. Specifically, 1) inhibition occurs without any detectable FRET, even at high total protein concentration, indicating the species do not form in a concentration-dependent manner that would be expected of disordered oligomers; and 2) inhibition itself has strict sequence requirements that match those of Q zippers. Hence our data collectively suggest that inhibition is a consequence of the deposition of partially ordered molecules onto the templating surface.

      We have softened the subheading and text of the relevant section in the discussion to more clearly indicate the speculative nature of our statements concerning the possible role of self-poisoned oligomers in toxicity.

      We stand by our statement 'that kinetically arrested aggregates emerge from the same nucleating event responsible for amyloid formation', as this follows directly from self-poisoning.

      Regarding the arguments for lateral and axial growth, we agree that the data are indirect. However, that polyQ forms lamellar amyloids both in vitro and in vivo is now established, so we do not feel it necessary to rigorously show that here. Nevertheless, we need to include this section primarily because it introduces the fact that ordering in polyQ amyloid occurs in the lateral as well as axial dimensions, and the onset of lateral ordering (lamellar growth) explains the very different behaviors of QU and QB sequences apparent on the DAmFRET plots. Ultimately, the two dimensions of growth are important to understand self-poisoning and maturation of the short nucleating zipper to amyloid.

      References

      Arseni D, Hasegawa M, Murzin AG, Kametani F, Arai M, Yoshida M, Ryskeldi-Falcon B. 2022. Structure of pathological TDP-43 filaments from ALS with FTLD. Nature 601:139–143. doi:10.1038/s41586-021-04199-3

      Bansal A, Schmidt M, Rennegarbe M, Haupt C, Liberta F, Stecher S, Puscalau-Girtu I, Biedermann A, Fändrich M. 2021. AA amyloid fibrils from diseased tissue are structurally different from in vitro formed SAA fibrils. Nat Commun 12:1013. doi:10.1038/s41467-021-21129-z

      Buell AK. 2017. The Nucleation of Protein Aggregates - From Crystals to Amyloid Fibrils. Int Rev Cell Mol Biol 329:187–226. doi:10.1016/bs.ircmb.2016.08.014

      Chakraborty D, Straub JE, Thirumalai D. 2023. Energy landscapes of Aβ monomers are sculpted in accordance with Ostwald’s rule of stages. Sci Adv 9:eadd6921. doi:10.1126/sciadv.add6921 Crist B, Schultz JM. 2016. Polymer spherulites: A critical review. Prog Polym Sci 56:1–63. doi:10.1016/j.progpolymsci.2015.11.006

      De Yoreo JJ. 2022. Casting a bright light on Ostwald’s rule of stages. Proc Natl Acad Sci USA 119. doi:10.1073/pnas.2121661119

      Hong Y, Yuan S, Li Z, Ke Y, Nozaki K, Miyoshi T. 2015. Three-Dimensional Conformation of Folded Polymers in Single Crystals. Phys Rev Lett 115:168301. doi:10.1103/PhysRevLett.115.168301 Keller A. 1957. A note on single crystals in polymers: Evidence for a folded chain configuration. Philosophical Magazine 2:1171–1175. doi:10.1080/14786435708242746

      Landgraf D, Okumus B, Chien P, Baker TA, Paulsson J. 2012. Segregation of molecules at cell division reveals native protein localization. Nat Methods 9:480–482. doi:10.1038/nmeth.1955

      Lauritzen JI, Hoffman JD. 1960. Theory of Formation of Polymer Crystals with Folded Chains in Dilute Solution. J Res Natl Bur Stand A Phys Chem 64A:73–102. doi:10.6028/jres.064A.007

      Navrotsky A. 2004. Energetic clues to pathways to biomineralization: precursors, clusters, and nanoparticles. Proc Natl Acad Sci USA 101:12096–12101. doi:10.1073/pnas.0404778101

      Ohhashi Y, Ito K, Toyama BH, Weissman JS, Tanaka M. 2010. Differences in prion strain conformations result from non-native interactions in a nucleus. Nat Chem Biol 6:225–230. doi:10.1038/nchembio.306

      Organ SJ, Ungar G, Keller A. 1989. Rate minimum in solution crystallization of long paraffins. Macromolecules 22:1995–2000. doi:10.1021/ma00194a078

      Radamaker L, Baur J, Huhn S, Haupt C, Hegenbart U, Schönland S, Bansal A, Schmidt M, Fändrich M. 2021. Cryo-EM reveals structural breaks in a patient-derived amyloid fibril from systemic AL amyloidosis. Nat Commun 12:875. doi:10.1038/s41467-021-21126-2

      Sahoo B, Singer D, Kodali R, Zuchner T, Wetzel R. 2014. Aggregation behavior of chemically synthesized, full-length huntingtin exon1. Biochemistry 53:3897–3907. doi:10.1021/bi500300c

      Schmelzer JWP, Abyzov AS. 2017. How do crystals nucleate and grow: ostwald’s rule of stages and beyond In: Šesták J, Hubík P, Mareš JJ, editors. Thermal Physics and Thermal Analysis, Hot Topics in Thermal Analysis and Calorimetry. Cham: Springer International Publishing. pp. 195–211. doi:10.1007/978-3-319-45899-1_9

      Schmidt M, Wiese S, Adak V, Engler J, Agarwal S, Fritz G, Westermark P, Zacharias M, Fändrich M. 2019. Cryo-EM structure of a transthyretin-derived amyloid fibril from a patient with hereditary ATTR amyloidosis. Nat Commun 10:5008. doi:10.1038/s41467-019-13038-z

      Schweighauser M, Shi Y, Tarutani A, Kametani F, Murzin AG, Ghetti B, Matsubara T, Tomita T, Ando T, Hasegawa K, Murayama S, Yoshida M, Hasegawa M, Scheres SHW, Goedert M. 2020. Structures of α-synuclein filaments from multiple system atrophy. Nature 585:464–469. doi:10.1038/s41586-020-2317-6

      Snapp EL, Hegde RS, Francolini M, Lombardo F, Colombo S, Pedrazzini E, Borgese N, Lippincott-Schwartz J. 2003. Formation of stacked ER cisternae by low affinity protein interactions. J Cell Biol 163:257–269. doi:10.1083/jcb.200306020

      Törnquist M, Michaels TCT, Sanagavarapu K, Yang X, Meisl G, Cohen SIA, Knowles TPJ, Linse S. 2018. Secondary nucleation in amyloid formation. Chem Commun 54:8667–8684. doi:10.1039/c8cc02204f

      Ungar G, Putra EGR, de Silva DSM, Shcherbina MA, Waddon AJ. 2005. The Effect of Self-Poisoning on Crystal Morphology and Growth Rates In: Allegra G, editor. Interphases and Mesophases in Polymer Crystallization I, Advances in Polymer Science. Berlin, Heidelberg: Springer Berlin Heidelberg. pp. 45–87. doi:10.1007/b107232

      Vetri V, Foderà V. 2015. The route to protein aggregate superstructures: Particulates and amyloid-like spherulites. FEBS Lett 589:2448–2463. doi:10.1016/j.febslet.2015.07.006

      Wild EJ, Boggio R, Langbehn D, Robertson N, Haider S, Miller JRC, Zetterberg H, Leavitt BR, Kuhn R, Tabrizi SJ, Macdonald D, Weiss A. 2015. Quantification of mutant huntingtin protein in cerebrospinal fluid from Huntington’s disease patients. The Journal of Clinical Investigation.

      Yang Y, Arseni D, Zhang W, Huang M, Lövestam S, Schweighauser M, Kotecha A, Murzin AG, Peak-Chew SY, Macdonald J, Lavenir I, Garringer HJ, Gelpi E, Newell KL, Kovacs GG, Vidal R, Ghetti B, Ryskeldi-Falcon B, Scheres SHW, Goedert M. 2022. Cryo-EM structures of amyloid-β 42 filaments from human brains. Science 375:167–172. doi:10.1126/science.abm7285

      Zhang X, Zhang W, Wagener KB, Boz E, Alamo RG. 2018. Effect of Self-Poisoning on Crystallization Kinetics of Dimorphic Precision Polyethylenes with Bromine. Macromolecules 51:1386–1397. doi:10.1021/acs.macromol.7b02745

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer 1 (Public review):

      The first part of the manuscript is not particularly novel, and it would be beneficial to clearly state which aspects of the analyses and derivations are different from previous literature. For example, the derivation that rank-1 RNNs cannot implement selection vector modulation is already present in the Extended Discussion of Pagan et al., 2022 (Equations 42-43). Similarly, it would be helpful to more clearly explain how the proposed pathway-based information flow analysis differs from the circuit diagram of latent dynamics in Dubreuil et al., 2022.

      We thank the reviewer for the insightful comments and providing us a good opportunity to better clarify the novelty of our work regarding the analyses and derivations. In general, as the reviewer pointed out, the major novelty of our work lies in explicitly linking selection mechanisms (proposed in Mante et al. 2013) with circuit-level descriptions of low-rank RNNs (developed in Dubreuil et al. 2022). This is made possible through a set of analyses and derivation integrating both linearized dynamical systems analysis (Mante et al., 2013) and the circuit diagram of latent dynamics (Dubreuil et al. 2022). Specifically, starting from rank-3 RNN models, we first derived the circuit diagram of latent dynamics (Eqs. 18 and 19) by applying the theory developed in Dubreuil et al. 2022. However, without further analysis, there is no explicit link between this latent dynamics and selection mechanism. In this manuscript, based on the line attractor assumption, we linearized the latent dynamics around the line attractor (Mante et al., 2013), which enabled us to explicitly solve the equation (from eq. 20 to eq. 27) and derive an explicit formula for the effective coupling of information flow (Fig. 5A). This formula of effective coupling strength supported an explicit pathway-based definition of selection vector modulation (Fig. 5) and selection vector (Fig. 6), the core result of this manuscript. Importantly, the same analysis can be extended to higher-order lowrank RNNs (Eqs. 47-55), suggesting the general applicability of our result. We have revised the manuscript to clearly state the novelty of our work. Please see Lines 292-294.

      As such a set of analyses and derivation integrates many results from previous literatures, it naturally shared many similarities with previous results as the reviewer pointed out. Below, we compared our work with previous ones mentioned by the reviewer:

      (1) For example, the derivation that rank-1 RNNs cannot implement selection vector modulation is already present in the Extended Discussion of Pagan et al., 2022 (Equations 42-43). 

      For this point, we totally agree with the reviewer that the derivation of rank-1 RNNs’ limitations in implementing selection vector modulation is not particularly novel. The reason why we started from rank-1 RNNs is because these RNNs are the simplest examples revealing the intriguing link between the connectivity property and the modulation mechanism and thereby serving as the ideal introduction for the subsequent in-depth discussion for general audiences. In the original manuscript, we cited the Pagan et al. 2023 note but may not make it explicit enough. As the reviewer pointed out that the derivation has been added into the latest version of Pagan et al. paper (Pagan et al. 2024), we now cite the Pagan et al. 2024 paper and make it clear that the derivation has been derived in Pagan et al. 2024. Please see Lines 186-188 in the main text.

      (2) Similarly, it would be helpful to more clearly explain how the proposed pathway-based information flow analysis differs from the circuit diagram of latent dynamics in Dubreuil et al., 2022.

      As we explained earlier, the latent dynamics in Dubreuil et al. alone did not provide an explicit link between circuit diagram and selection mechanisms. Our analysis go beyond the theory developed in Dubreuil et al. 2022 paper by integrating the linearized dynamical systems analysis (Mante et al. 2013), eventually providing a previously-unknown explicit link between circuit diagram and selection mechanisms.

      With regard to the results linking selection vector modulation and dimensionality, more work is required to understand the generality of these results, and how practical it would be to apply this type of analysis to neural recordings. For example, it is possible to build a network that uses input modulation and to greatly increase the dimensionality of the network simply by adding additional dimensions that do not directly contribute to the computation. Similarly, neural responses might have additional high-dimensional activity unrelated to the task. My understanding is that the currently proposed method would classify such networks incorrectly, and it is reasonable to imagine that the dimensionality of activity in high-order brain regions will be strongly dependent on activity that does not relate to this task.

      We thank the reviewer for this insightful comment. As what the reviewer suggested, we did more work to better understand the generality and applicability of the index proposed in the manuscript.

      Firstly, to see if the currently proposed method can work when there is significant amount of neural activity variance irrelevant to the task, we manually added irrelevant neural activity into the trained RNNs (termed as redundant RNNs, see Methods for details, Lines 1200-1215). As expected, we found that for these redundant RNNs, the correlation between the proposed index and the proportion of selection vector modulation indeed disappeared (Figure 7-figure supplement 4B). In fact, in the original version of our manuscript, we presented an extreme example of this idea in our discussion, where we designed two RNNs with theoretically identical neural activity patterns—one relying purely on input modulation and the other on selection vector modulation (Figure 7-figure supplement 3). Therefore, for this extreme example, any activity-based index alone would fail to differentiate between these two mechanisms, suggesting the challenge of distinguishing different selection mechanisms when taskirrelevant neural activity is added.

      Secondly, we asked why the proposed index works well for the trained RNNs, which is kind of surprising in the first place as the reviewer pointed out. One possibility is that for trained RNNs, the task-irrelevant neural activity is minimal. To test this possibility, we conducted in-silico lesion experiments for the trained RNNs. The main idea is that if an RNN contains a large portion of taskirrelevant variance, there will exist a subspace (termed as task-irrelevant subspace) that captures this part of variance and removing this task-irrelevant subspace will not affect the network’s behavior. Based on this idea, we developed an optimization method to identify such a task-irrelevant subspace for any given RNN (see Methods for details, Lines 1216-1244). The results show that in the originally trained RNNs, the identified task-irrelevant subspace can only explain a small portion of neural activity variance (Figure 7-figure supplement 4, panel C). As a control, when applying the same optimization method to the redundant RNNs, we found that the identified task-irrelevant subspace can explain a significantly larger portion of neural activity variance (Figure 7-figure supplement 4, panel C). Taken together, we concluded that the reason why the index works for trained RNNs is because the major variance of the neural activity of the network learned through backpropagation is task-relevant.

      Therefore, this set of analyses provided an understanding why the proposed index works for trained RNNs and failed for the redundant RNNs. We have added this part of analyses in the Discussion part. See Lines 601-610. As the reviewer pointed out that it is highly likely that there exists taskirrelevant neural activity variance in high brain regions, the proposed index may not work well in neural recordings. With this understanding, we tone down the conclusion related to experimentally testable prediction in the main text (e.g., in Abstract and Introduction). We thank the reviewer again for helping us improve the clarity of our work.

      Finally, a number of aspects of the analysis are not clear. The most important element to clarify is how the authors quantify the "proportion of selection vector modulation" in vanilla RNNs (Figures 7d and 7g). I could not find information about this in the Methods, yet this is a critical element of the study results. In Mante et al., 2013 and in Pagan et al., 2022 this was done by analyzing the RNN linearized dynamics around fixed points: is this the approach used also in this study? Also, how are the authors producing the trial-averaged analyses shown in Figures 2f and 3f? The methods used to produce this type of plot differ in Mante et al., 2013 and Pagan et al., 2022, and it is necessary for the authors to explain how this was computed in this case.

      We thank the reviewer for the valuable comments. Yes, for proportion of selection vector modulation (Figure 7D and 7G) we employed the method used in Mante et al., 2013. For the trial-averaged analyses shown in Figures 2f and 3f, we followed a procedure used in Mante et al., 2013. In the revised version, we have added the relate information. See Lines 852-853 and 872-889. We thank the reviewer again for improving the clarify of our work.

      I am also confused by a number of analyses done to verify mathematical derivations, which seem to suggest that the results are close to identical, but not exactly identical. For example, in the histogram in Figure 6b, or the histogram in Figure 7-figure supplement 3d: what is the source of the small variability leading to some of the indices being less than 1?

      In Figure 6B, the two selection vectors are considered theoretically equivalent under the meanfield assumption. However, because the RNNs we use have a finite number of neurons, finite-size effects inevitably cause slight deviations from perfect equivalence.

      To verify this, we generated rank-3 RNNs of different sizes in the experiment for Figure 6b (see the Supplementary section “Building rank-3 RNNs with both input and selection vector modulations”). Specifically, for a fixed number of neurons 𝑁, we independently sampled 𝛼, 𝛽 and 𝛾 from a Uniform(0,1) distribution and built an RNN with 𝑁 neurons based on the procedure as in Figure 5C. We then computed the selection vector for the RNN in a given context (for example, context 1) in two ways:

      (1) via linearized dynamical system analysis following Mante et al. (2013), producing the selection vector sc<sup>classical</sup>

      (2) using the theoretical derivation

      Author response image 1.

      cos angles for selection vectors computed using two methods in RNN with different size. Black bars indicate median values.

      We repeated this process 1000 times for each 𝑁 and measured the cosine angle between these two selection vectors. As shown in Author response image 1, as 𝑁 increases, the cosine angles approach 1 more consistently, indicating that the two selection vectors become nearly equivalent in larger RNNs. Conversely, smaller RNNs display more pronounced finite-size effects, which accounts for indices slightly below 1.

      Reviewer 2 (Public review):

      The introduction could have been written in a more accessible manner for any non-expert readers.

      We sincerely thank the reviewer for the constructive feedback on the introduction and have revised it accordingly.

      Reviewer #2 (Recommendations for the authors):

      The level of mastery of the low-rank framework is altogether impressive. I need however to point to a technical detail. The derivations of the information flow assume that the vectors m and vectors I are orthogonal (e.g. in Equation 14). This is not necessarily the case in trained networks, and Figure 2F suggests this is not the case in the trained rank 1 network. In that situation, the overlap between m and I leads to an additional term in the Equation going directly from the input to the output vector (see, e.g., Equation 15 in Beiran et al. Neuron 2023). In general, these kind of overlaps can contribute an additional pathway in higher rank networks too.

      We thank the reviewer for the valuable comments. The derivations presented in Equation 14 do not actually require that the vectors 𝒎 and 𝑰 are orthogonal. Rather, our definition of the task variable differs slightly from the one in Beiran et al. (2023). Consider a rank-1 RNN with a single input channel:

      Author response image 2.

      Difference of the definition of task variable with previous work. (A) Our definition of task variable. (B) Definition of task variable in Beiran et al. 2023.

      As long as 𝒎 and 𝑰 are linearly independent, the state 𝒙(𝑡) can be uniquely written as a linear combination of the two vectors (Author response image 2):

      where and are the task variables associated with 𝒎 and 𝑰, respectively. Substituting this expression into the dynamical equations yields:

      Hence, there is no additional term directly linking the input to the output vector in our formulation. By contrast, in Beiran et al. (2023), the input vector 𝑰 is decomposed into components parallel (𝐼//) and perpendicular (𝑰-) to 𝒎, and the task variables are defined as (Figure 4-figure supplement 3B):

      This leads to dynamics of the form:

      thus creating an additional direct term from the input to the output vector under their definition.

      The designed rank 3 network relies on a multi-population structure. This is explained clearly in the methods, but it could be stressed more in the main text to dispel the notion that higherrank networks may not need a multi-population structure to perform this task (cf Dubreuil et al 2022).

      Thank you for the valuable comments. In the revised version, we emphasize this point by adding the following sentence: “our rank-3 network relies on a multi-population structure, consistent with the notion that higher-rank networks still require a multi-population structure to perform flexible computations (Dubreuil et al. 2022)”. See Lines 238-240.

      (3) An important result in Pagan et al and Mante et al is that the line attractor direction is invariant across contexts. I believe this is explicitly enforced in the models studied here, but this could be made more clear. It would be interesting to discuss the importance of this constraint.

      We thank the reviewer for the valuable comments. In our hand-crafted RNN examples (Figures 3– 6), we enforce the choice axis to be identical across the two contexts (Figure R4B). Even in the rank-1 example (Figure 2), where we analyze a trained RNN, the choice axis still shows a substantial overlap between the two contexts (Figure R4A). However, in the trained vanilla RNNs shown in Figure 7, when the regularization term is relatively small, the overlap in the choice axis between contexts is smaller (Figure R4C)—i.e., the line attractor direction shifts between different contexts.

      Author response image 3.

      Cosine angle between the choice axes in two contexts for different RNNs. (A) Rank-1 RNNs in Figure 2. (B) Rank-3 RNNs in Figure 3-6. (C) Vanilla RNNs in Figure 7.

      Our theoretical framework can also accommodate situations where the direction of the choice axis changes. For instance, consider the rank-3 RNN in Figure 6, where the choice axis is defined as with 𝐺 being a diagonal matrix whose elements represent the slopes of each neuron’s activation function. Since these slopes can change across contexts, itself can vary across contexts. Likewise, the input representation direction may be written as , allowing both the choice axis and the input axis to adapt to the context. The selection vector is given by:

      Here, we no longer assume that is context-invariant; rather, we only assume its norm remains the same across contexts. Under this weaker assumption, we still have

      Substituting these into the equations yields the following expressions for input modulation and selection vector modulation:

      Figure 6B: it was not clear to me what exactly is plotted here.

      We thank the reviewer for pointing out the missing explanation. In Figure 6B, we show the distribution of the cosine angles between two ways of computing the selection vector for randomly generated rank-3 RNNs. Specifically, We generate 1000 RNNs according to the procedure in Figure 5C, with each RNN defined by parameters 𝛼 , 𝛽 and 𝛾 independently sampled from a Uniform(0,1) distribution. For each RNN, we computed the selection vector for the RNN in a given context (e.g., context 1 or 2) in two ways:

      (1)  via linearized dynamical system analysis following Mante et al. (2013), producing the selection vector sv<supclassical</sup> (classical in Figure 6B),

      (2)  using the theoretical derivation (“our’s” in Figure 6B)

      We repeated this process 1000 times and measured the cosine angle between these two selection vectors and plot the resulting distribution for context 1 (gray) and context 2 (blue) in Figure 6B. The figure shows that the computed selection vectors via the two methods are almost equal, as evidenced by the cosine angles clustering very close to 1.

      We have revised it accordingly. See Lines 1135-1143.

      In Figure 7, how was the effective dimension of vanilla RNNs controlled or varied? The metric used (effective dimension) is relatively non-standard, it would be useful to give some intuition to the reader about it.

      We thank the reviewer for these valuable comments.

      Controlling the effective dimension

      When train vanilla RNNs, we included a regularization term in the loss function of the form

      where 𝑤536 is a regularization coefficient. By adjusting 𝑤536, we can influence the distribution of singular values of connectivity of 𝐽. When w<sub>reg</sub> is larger, the learned 𝐽 tends to have fewer large singular values, hence with lower effectivity dimension; when 𝑤536 is small, more singular values remain large, increasing the matrix’s effective dimension.

      Definition and intuition: effective dimension

      Consider a connectivity matrix 𝐽 with singular values . The matrix’s rank is the number of nonzero singular values. However, rank alone can overlook differences in how quickly those singular values decay. To capture this, we define the effective dimension as:

      Each term lies between 0 and 1, so the effective dimension satisfies:

      When all nonzero singular values are equal, edim(𝐽) equals the matrix rank. But if some singular values are much smaller than others, effective dimension will be closer to 1. For example:

      -  𝐽<sub>1</sub> has nonzero singular values (1, 0.1, 0.01). Its effective dimension is 1.0101, indicating that most of the variance is captured by the largest singular value.

      -  𝐽sub>0</sub> has nonzero singular values (1, 0.8, 0.7). Its effective dimension is 2.13, which reflects that multiple singular values contribute significantly.

      Hence, while both >𝐽<sub>1</sub> and 𝐽sub>0</sub> are rank-3 matrices, their effective dimensions highlight the difference in how each matrix distributes its variance.

      We have added the intuition underlying this concept in Methods (see Lines 1135-1143). We thank the reviewer for improving the clarity of our work. 

      Eqs 19&21: n^T_r should be n^T_dv?

      Thank you for point out this mistake. We have fixed it in the revised version.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Weaknesses: 

      The main weakness in this paper lies in the authors' reliance on a single model to derive conclusions on the role of local antigen during the acute phase of the response by comparing T cells in model antigen-vaccinia virus (VV-OVA) exposed skin to T cells in contralateral skin exposed to DNFB 5 days after the VV-OVA exposure. In this setting, antigen-independent factors may contribute to the difference in CD8+ T cell number and phenotype at the two sites. For example, it was recently shown that very early memory precursors (formed 2 days after exposure) are more efficient at seeding the epithelial TRM compartment than those recruited to skin at later times (Silva et al, Sci Immunol, 2023). DNFB-treated skin may therefore recruit precursors with reduced TRM potential. In addition, TRM-skewed circulating memory precursors have been identified (Kok et al, JEM, 2020), and perhaps VV-OVA exposed skin more readily recruits this subset compared to DNFB-exposed skin. Therefore, when the DNFB challenge is performed 5 days after vaccinia virus, the DNFB site may already be at a disadvantage in the recruitment of CD8+ T cells that can efficiently form TRM. In addition, CD8+ T cell-extrinsic mechanisms may be at play, such as differences in myeloid cell recruitment and differentiation or local cytokine and chemokine levels in VV-infected and DNFB-treated skin that could account for differences seen in TRM phenotype and function between these two sites. Although the authors do show that providing exogenous peptide antigen at the DNFB-site rescues their phenotype in relation to the VV-OVA site, the potential antigen-independent factors distinguishing these two sites remain unaddressed. In addition, there is a possibility that peptide treatment of DNFB-treated initiates a second phase of priming of new circulatory effectors in the local-draining lymph nodes that are then recruited to form TRM at the DFNB-site, and that the effect does not solely rely on TRM precursors at the DNFB-treated skin site at the time of peptide treatment. 

      Thank you for pointing out these potential caveats to our work.  We have considered the possibility that late application of peptide or cell-extrinsic difference could affect the interpretation of our results.  We would like to highlight that in our prior publication on this topic [1], we found that OT-1 responses in mice infected with VV-OVA and VV-N (irrelevant antigen) yielded the same responses as in our VV-OVA/DNFB models.  In addition, in both our prior publication and our current manuscript, application of peptide to DNFB painted sites results in T<sub>RM</sub> with a similar phenotype to those in the VV-OVA site.  Thus, we are confident that it is the presence of cognate antigen in the skin that drives the augmented T<sub>RM</sub> fitness that we observe.

      Secondly, although the authors conclusively demonstrate that TGFBRIII is induced by TCR signals and required for conferring increased fitness to local-antigen-experienced CD8+ TRM compared to local antigen-inexperienced cells, this is done in only one experiment, albeit repeated 3 times. The data suggest that antigen encounter during TRM formation induces sustained TGFBRIII expression that persists during the antigen-independent memory phase. It remains unclear why only the antigen encounter in skin, but not already in the draining lymph nodes, induces sustained TGFBRIII expression. Further characterizing the dynamics of TGFBRIII expression on CD8+ T cells during priming in draining lymph nodes and over the course of TRM formation and persistence may shed more light on this question. Probing the role of this mechanism at other sites of TRM formation would also further strengthen their conclusions and enhance the significance of this finding. 

      This is an intriguing point.  We do not understand why expression of TGFbR3 in T<sub>RM</sub> required antigen encounter in the skin if T<sub>RM</sub> at all sites clearly have encountered antigen during priming in the LN.  We speculate that durable TGFbR3 expression may require antigen encounter in the context of additional cues present in the periphery or only once cells have committed to the T<sub>RM</sub> lineage.  A more detailed characterization of the dynamics of TGFbR3 expression in multiple tissues would be informative and represents a promising future direction for this project.  We note that to robustly perform these experiments a reporter mouse would likely be a requirement.

      Reviewer #2 (Public review): 

      Weaknesses: 

      Overall, the authors' conclusions are well supported, although there are some instances where additional controls, experiments, or clarifications would add rigor. The conclusions regarding skin-localized TCR signaling leading to increased skin CD8+ TRM proliferation in-situ and increased TGFBR3 expression would be strengthened by assessing skin CD8+ TRM proliferation and TGFBR3 expression in models of high versus low avidity topical OVA-peptide exposure.

      Thank you for these helpful suggestions.  We did not attempt these experiment as we were concerned that given the relatively modest expansion differences observed with the APL that resolving differences in TGFbR3 and BrdU would prove unreliable. However, this is something that we could attempt as we continue working on this project.

      The authors could further increase the novelty of the paper by exploring whether TGFBR3 is regulated at the RNA or protein level. To this end, they could perform analysis of their single-cell RNA sequencing data (Figure 1), comparing Tgfbr3 mRNA in DNFB versus VV-treated skin. 

      As discussed above, a more detailed analysis of TGFbR3 regulation is of great interest.  These experiments would likely require the creation of additional tools (e.g. a reporter mouse) to provide robust data.  However, as suggested, we have re-analyzed our scRNAseq looking for expression of Tgfbr3. Pseudobulk analysis of cells isolated from VV or DNFB sites suggests that Tgfbr3 appears to be elevated in antigen-experienced TRM at steady-state (Author response image 1).

      Author response image 1.

      Pseudobulk analysis by average gene expression of Tgfbr3 in cells isolated from either VV or DNFB treated flanks, divided by the average gene expression of Tgfbr3 in naïve CD8 T cells from the same dataset.

      For clarity, when discussing antigen exposure throughout the paper, it would be helpful for the authors to be more precise that they are referring to the antigen in the skin rather than in the draining lymph node. A more explicit summary of some of the lab's previous work focused on CD8+ TRM and the role of TGFb would also help readers better contextualize this work within the existing literature on which it builds. 

      We appreciate this feedback, and we have clarified this in the text.

      For rigor, it would be helpful where possible to pair flow cytometry quantification with the existing imaging data.

      Thank you for these suggestions.  In terms of quantification of number of T<sub>RM</sub>by flow cytometry, we have previously demonstrated as much as a 36-fold decrease in cell count when compared to numbers directly visualized by immunofluorescence [1].  Thus, for enumeration of T<sub>RM</sub> we rely primarily on direct IF visualization and use flow cytometry primarily for phenotyping.

      Additional controls, namely enumerating TRM in the opposite, untreated flank skin of VV-only-treated mice and the treated flank skin of DNFB-only treated mice, would help contextualize the results seen in dually-treated mice in Figure 2.

      Without a source of inflammation (e.g. VV infection of DNFB) we see very few T<sub>RM</sub>in untreated skin.  A representative image is provided (Author response image 2).  A single DNFB stimulation does not recruit any CD8+ T cells to the skin without a prior sensitization [2].

      Author response image 2.

      Representative images of epidermal whole mounts of VV treated flank skin, and an untreated site from the same mouse isolated on day 50 post infection and stained for CD8a.

      In figure legends, we suggest clearly reporting unpaired T tests comparing relevant metrics within VV or DNFB-treated groups (for example, VV-OVA PBS vs VV-OVA FTY720 in Figure 3F).

      Thank you for this suggestion.  The figure legends have been amended.

      Finally, quantifying right and left skin draining lymph node CD8+ T cell numbers would clarify the skin specificity and cell trafficking dynamics of the authors' model. 

      We quantified the numbers of CD8 T cells in left and right skin draining lymph nodes by flow cytometry in mice at day 50 post VV infection DNFB-pull.  We observe similar numbers of cells at both sites (Author response Image 3).

      Author response Image 3.

      Quantification of total number of CD8+ T cells in left and right inguinal lymph nodes. Each symbol represents paired data from the same individual animal, and this is representative of 3 separate experiments.

      Reviewer #1 (Recommendations for the authors): 

      (1) Figures 1D and S1C demonstrate that 80-90 % of TRM at both VV and DNFB sites express CD103+. In contrast, the sequencing data suggests the TRM at the VV site has much higher Itgae expression. Also, clusters 3 and 4, which express significantly more Itgae than all other clusters, together comprise only ~30% of CD8+ T cells at the VV-infected skin site. How can these discrepancies between transcript and protein expression be explained? 

      Thank you for these excellent comments. T<sub>RM</sub> at both VV and DNFB sites appear to express similarly high levels of CD103 protein in both the OT-I system as we previously published [1] and in a polyclonal system using tetramers.  The lower penetrance of Itgae expression in the scRNAseq data we attribute to a lack of sensitivity which is common with this modality.  However, the relative increased expression of Itgae in clusters 3 and 4 is interesting and may suggest increased Itgae production/stability.  However, in the absence of any effect on protein expression, we chose not to focus on these mRNA differences.

      (2) For the experiments in Figure 3D, in order to exclude a contribution from circulating memory cells, FTY720 should have been administered during the duration of, not prior to, the initiation of the recall response. The effect of FTY720 wears off quickly, so the current experimental setting likely allows for circulating cells to enter the skin. This concern is mitigated by the results of anti-Thy1.1 mAb treatment, but documenting the experiment as in Figure D will likely be confusing to readers. 

      Thank you for this comment.  We relied on the literature indicating that the half-life of FTY720 in blood is longer than 6 days [3-5].  However, on reviewing this again, there are other reports suggesting a lower halflife.  Thank you for pointing out this potential caveat.  As mentioned above, we do not think this affects the interpretation of our data as similar results were obtained with anti-Thy1.1

      (3) Similar to what is described in the weaknesses section, the data on TGFBRIII expression is lacking. When is TGFBRIII induced? In the LN during primary activation and it is then sustained by a secondary antigen exposure at the peripheral target tissue site? Or is it only induced in the peripheral tissue, and there is interesting biology to uncover in regard to how it is induced by the TCR only after secondary exposure, etc.? 

      Thank you for these comments. As discussed above, a more detailed analysis of TGFbR3 regulation is of great interest.  These experiments would likely require the creation of additional tools (e.g. a reporter mouse) to provide robust data and are part of our future directions.

      (4) As described in the weakness section, there could be TCR-independent differences between the VV-OVA and DNFB sites that lead to phenotypic changes in the TRMs that are formed there, both CD8+ T cell-intrinsic (kinetics; with regard to time after initial priming) and extrinsic (microenvironmental differences due to the nature of the challenge, recruited cell types, cytokines, chemokines, etc.). Since the authors report the use of both VV and VV-ova, we recommend an experimental strategy that controls for this by challenging one site with VV and another with VV-OVA concomitantly, followed by repeating the key experiments reported in this manuscript. 

      As discussed above, we have previously published a very similar experiment using VV-OVA and VV-N infection on opposite flanks [1].

      (5) In Figure 6J please indicate means and provide more of the statistics comparing the groups (such as comparing VV-WT vehicle to VV-KO vehicle etc.), and potentially display on a linear scale as with all of the other figures looking at cells/mm2 to help convince the reader of the conclusions and support the secondary findings mentioned in the text such as "Notably, numbers of Tgfbr3ΔCD8 TRM in cohorts treated with vehicle remained at normal levels indicating that loss of TGFβRIII does not affect TRM epidermal residence in the steady state" despite it looking like there is a decrease when looking at the graph. 

      We appreciate the feedback on the readability of this figure, and so have updated figure 6J to be on a linear scale and added additional helpful statistics to the figure legend. The difference between Tgfbr3<sup>WT</sup> and Tgfbr3<sup>∆CD8</sup> at steady state is excellent point, and we agree that there could to be a trend towards reduction in the huNGFR+ T<sub>RM</sub> across both groups, even without CWHM12 administration. However, we did not see statistically significant reductions in steady-state Tgfbr3<sup>∆CD8</sup> T<sub>RM</sub>, but the slight reduction in both VV-OVA and DNFB treated flanks suggests that TGFßRIII may play a role in steady-state maintenance of all T<sub>RM</sub>. Perhaps with more sensitive tools to better visualize TGFßRIII expression, we could identify stepwise upregulation of TGFßRIII depending on TCR signal strength, possibly starting in the lymph node. We have also amended our description of this figure in the text, to allow for the possibility that a low, but under the level of detection amount of TGFßRIII could play a role in steady-state maintenance of both local antigen-experienced and bystander T<sub>RM</sub>.

      Minor points: 

      (1) In describing Figure 4B, the term "doublets" for pairs of connected dividing cells is confusing. 

      Thank you for this comment, the term has been revised to “dividing cells” in the text and figure.

      (2) Figure legend 4F: BrdU is not "expressed" . 

      Very true, it has been changed to “incorporation”.

      (3) Do CreERT2 and/or huNGFR expressed by transferred OT-I cells act as foreign antigens in C57BL/6 mice, potentially causing elimination of circulating memory cells? If that were the case, this would not necessarily confound the read-out of TRM persistence studied here, since skin TRM are likely protected from at least antibody-mediated deletion and their numbers are not maintained by recruitment of circulating cells at stead-state. However, it would be useful to be aware of this potential limitation of this and similar models. 

      Thank you for raising the important technical concern.  In our prior work [1] and this work, we monitor the levels of transferred OT-I cells in the blood over time.  We have not observed rejection of huNGFR+ cells.  We also note that others using the same system have also not observed rejection [6].

      (4) In Figure 6J, means or medians should be indicated 

      This has been updated in Figure 6J.

      (5) Using the term "antigen-experienced" to specifically refer to TRM at the VV site could be confusing, since those at the DNFB site are also Ag-experienced (in the LN draining the VV skin site). 

      We agree that it is a challenging term, as all T<sub>RM</sub> are memory cells. That is why in the text we refer to T<sub>RM</sub> isolated from the VV site as “local antigen experienced T<sub>RM</sub>.”, to try to distinguish them from bystanders that did not experience local antigen.

      (6) The Title essentially restates what was already reported in the authors' prior study. If the data supporting the TGFBRIII-mediated mechanism is studied in more depth, maybe adding this aspect to the title may be useful? 

      Thank you for this suggestion.  I think the current title is probably most suitable for the current manuscript but we are willing to change it should the editors support an alternative title.

      Reviewer #2 (Recommendations for the authors): 

      (1) Definition of bystander CD8+ TRM: The first paragraph of the introduction defines CD8+ TRM. To improve the clarity of this definition, we suggest being explicit that bystander TRM experience cognate antigen in the SDLNs but, in contrast to other TRM, do not experience cognate antigen in the skin. 

      Thank you, we have clarified this is in the text.

      (2) Consider softening the language when comparing the efficiency of CD8+ recruitment of the skin between DNFB and VV-treated flanks. For example, substitute "equal efficiency" with "comparable efficiency" since it is difficult to directly compare the extent of inflammation between viral and hapten-based treatments. 

      We have adjusted this terminology throughout the paper.

      (3) Throughout figure legends, we appreciate the indication of the number of experimental repeats performed. We suggest, either through statistics or supplemental figures, demonstrating the degree of variability between experiments to aid readers in understanding the reproducibility of results. 

      Thank you for this suggestion.  In key figures we show data from individual mice across multiple experiments. Thus, inter-experiment variability is captured in our figures.  

      (4) Figure 1: 

      a) Add control mice treated with either vaccinia virus or DNFB and harvest back skin at day 52 to demonstrate baseline levels of polyclonal and B8R tetramer-positive CD8s in the epidermis. These controls would clarify the background CD8+ expansion that might occur in DNFB-treated mice in the absence of vaccinia virus. 

      This point was addressed above.

      b) Figure 1: It would be helpful to see the %Tet+ population specifically in the CD103+ population, recognizing that the majority of the CD8+ from the skin are CD103+. 

      We did look only at CD103+ CD8 T cells from the skin for our tetramer analysis, so this has been clarified in the figure legend.

      c) Provide a UMAP, very similar to 1H, where CD8+ T cells, vaccinia virus, and DNFB-treated flanks are overlaid.

      Thank you for this suggestion.  A UMAP combining aspects of 1G (cell types from the whole ImmgenT dataset) with 1H (our data) results in a figure that is very difficult to interpret.  Thus, we have separated cell types across the entire ImmgenT data set (e.g. CD8+ T cells) and our data into 2 separate panels.

      d) 1D: left flow plot has numbered axis while the right flow plot does not. 

      Thank you, this has been fixed.

      (5) Figure 2: 

      a) In the figure legend, define what is meant by the grey line present in Figures 2C and 2D. 

      This has been updated in the figure legend.

      b) Edit the Y axis of 2C and 2D to specify the TRM signature score. 

      This has been updated in the figure.

      c) Include panel 1D from 1S into Figure 2 to help clarify for the reader what genes are expressed in the 0 - 5 clusters.

      We appreciate the feedback, but we found the heatmap made the figure look too busy, so we feel comfortable keeping it available within supplemental figure 1.

      d) In body of text explicitly discuss that the TRM module used to calculate a signature score was created using virus infection modules (HSV, LCMV and influenza) and thus some of the transcriptional similarity between the authors vaccinia virus treated CD8+ TRM and the TRM module might be due to viral infection rather than TRM status.

      Thank you for this comment.  We have now emphasized this point in the text.

      (6) Figure 3: 

      a) If there are leftover tissue sections, it would be optimal to show specific staining for CD103. We recognize that this data has been previously published by the lab, but it would be ideal to show it once in this paper. 

      Unfortunately, we do not have leftover tissue sections, so we are unable to measure CD103 by I.F. in these experiments.

      b) If you did collect skin draining lymph nodes in the Thy1.1 depletion model, it would be nice to see flow data showing the depletion effects in the skin draining lymph nodes in addition to the blood. 

      Unfortunately, we did not collect the skin draining lymph nodes, and do not have that data for the relevant experiments.

      c) Figure 3 F & G: Perform a T-test comparing vaccinia virus PBS to FTY720 and isotype to anti-Thy1.1 within the same treatment group. Showing no significance with these two comparisons would strengthen the authors' claims. Statistics can be described in legend. 

      We have included this analysis in the figure legend.

      (7) Figure 4: 

      a) It would be helpful to have the CD69+/CD103+ population in this model discussed/defined more. The CD69 expression seen in 4E is lower than the reviewers would've predicted, and it would be interesting to see CD103 expression as well.

      We have found that generally CD103 is a stronger marker for in the skin by flow, as CD69 staining is somewhat less robust in the colors we have chosen.  By way of example, we present gating we did upstream in that experiment, gated previously on liveCD45+CD3+CD8+ events (Author response image 4).

      Author response image 4.

      Representative flow cytometric plots showing CD69 and CD103 expression in gated live CD45+CD8+CD90.1+ cells isolates from VV-OVA or DNFB treated flanks.

      (8) Figure 5: 

      a) Define APL and its purpose in both the body of text and the figure legend. 

      We have clarified this in the text and the figure legend.

      b) Using in-vivo BrdU, compare proliferation between high avidity N4 and low avidity Y3 OVA-peptide at the primary recall timepoint. 

      We considered this, but due to the lack of sensitivity of the BrdU incorporation and the relatively subtle phenotype of the Y3, we did not think the assay would be sensitive enough to identify differences.

      (9) Figure 6: 

      a) Compare TGFBR3 expression in CD8+ T cells from mice receiving high avidity N4 versus low avidity Y3 OVA-peptide at the primary recall timepoint. 

      This point was discussed above.

      b) Either 1) examine TGFBR3 mRNA expression in VV vs DNFB skin from scRNA-seq dataset or 2) perform a qPCR on epidermal CD8+ T cells from mice receiving high avidity N4 versus low avidity Y3 at the primary recall timepoint. This would help distinguish whether TGFBR3 regulation occurs at the mRNA versus protein level. 

      This point has been discussed above.

      c) Figure 6A: Not required, but it seems like the TGFBR3 gate could be shifted to the right a bit. 

      The gates were set using FMO.

      d) Figure 6C: What comparison is the asterisk indicating significance referring to?

      It is the Dunnett’s test comparing VV-OVA to DNFB and untreated skin, the figure has been amended to clarify this point.

      e) Figure 6: To increase the rigor of the claim that CWHM12 is creating a TGFb limiting condition, the authors could either 1) perform an ELISA or cell-based assay measuring active TGFb, 2) recapitulate results of 6J using monoclonal antibody against avb6 as done in Hirai et al., 2021, Immunity., or 3) examine Tgfbr3 mRNA expression in your single cell RNAseq data, comparing cluster 0 and cluster 3.

      We are pleased to have the opportunity to show Tgfbr3 mRNA, which is above in figure R1.

      (10) Material and methods: 

      Specify how the localization of the back skin used for imaging was made consistent between the right and left flanks. 

      We have updated this methodology in the text.

      Literature Cited

      (1) Hirai, T., et al., Competition for Active TGFβ Cytokine Allows for Selective Retention of Antigen-Specific Tissue- Resident Memory T Cells in the Epidermal Niche. Immunity, 2021. 54(1): p. 84-98.e5.

      (2) Manresa, M.C., Animal Models of Contact Dermatitis: 2,4-Dinitrofluorobenzene-Induced Contact Hypersensitivity, in Animal Models of Allergic Disease: Methods and Protocols, K. Nagamoto-Combs, Editor. 2021, Springer US: New York, NY. p. 87-100.

      (3) Müller, H.C., et al., The Sphingosine-1 Phosphate receptor agonist FTY720 dose dependently affected endothelial integrity in vitro and aggravated ventilator-induced lung injury in mice. Pulmonary Pharmacology & Therapeutics, 2011. 24(4): p. 377-385.

      (4) Nofer, J.-R., et al., FTY720, a Synthetic Sphingosine 1 Phosphate Analogue, Inhibits Development of Atherosclerosis in Low-Density Lipoprotein Receptor–Deficient Mice. Circulation, 2007. 115(4): p. 501-508.

      (5) Brinkmann, V., et al., Fingolimod (FTY720): discovery and development of an oral drug to treat multiple sclerosis. Nat Rev Drug Discov, 2010. 9(11): p. 883-97.

      (6) Andrews, L.P., et al., A Cre-driven allele-conditioning line to interrogate CD4<sup>+</sup> conventional T cells. Immunity, 2021. 54(10): p. 2209-2217.e6.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This manuscript provides a fundamental contribution to the understanding of the role of intrinsically disordered proteins in circadian clocks and the potential involvement of phase separation mechanisms. The authors convincingly report on the structural and biochemical aspects and the molecular interactions of the intrinsically disordered protein FRQ. This paper will be of interest to scientists focusing on circadian clock regulation, liquid-liquid phase separation, and phosphorylation.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      "Phosphorylation, disorder, and phase separation govern the behavior of Frequency in the fungal circadian clock" is a convincing manuscript that delves into the structural and biochemical aspects of FRQ and the FFC under both LLPS and non-LLPS conditions. Circadian clocks serve as adaptations to the daily rhythms of sunlight, providing a reliable internal representation of local time.

      All circadian clocks are composed of positive and negative components. The FFC contributes negative feedback to the Neurospora circadian oscillator. It consists of FRQ, CK1, and FRH. The FFC facilitates close interaction between CK1 and the WCC, with CK1-mediated phosphorylation disrupting WCC:c-box interactions necessary for restarting the circadian cycle.

      Despite the significance of FRQ and the FFC, challenges associated with purifying and stabilizing FRQ have hindered in vitro studies. Here, researchers successfully developed a protocol for purifying recombinant FRQ expressed in E. coli.

      Armed with full-length FRQ, they utilized spin-labeled FRQ, CK1, and FRH to gain structural insights into FRQ and the FFC using ESR. These studies revealed a somewhat ordered core and a disordered periphery in FRQ, consistent with prior investigations using limited proteolysis assays. Additionally, p-FRQ exhibited greater conformational flexibility than np-FRQ, and CK1 and FRH were found in close proximity within the FFC. The study further demonstrated that under LLPS conditions in vitro, FRQ undergoes phase separation, encapsulating FRH and CK1 within LLPS droplets, ultimately diminishing CK1 activity within the FFC. Intriguingly, higher temperatures enhanced LLPS formation, suggesting a potential role of LLPS in the fungal clock's temperature compensation mechanism.

      Biological significance was supported by live imaging of Neurospora, revealing FRQ foci at the periphery of nuclei consistent with LLPS. The amino acid sequence of FRQ conferred LLPS properties, and a comparison of clock repressor protein sequences in other eukaryotes indicated that LLPS formation might be a conserved process within the negative arms of these circadian clocks.

      In summary, this manuscript represents a valuable advancement with solid evidence in the understanding of a circadian clock system that has proven challenging to characterize structurally due to obstacles linked to FRQ purification and stability. The implications of LLPS formation in the negative arm of other eukaryotic clocks and its role in temperature compensation are highly intriguing.

      Strengths:

      The strengths of the manuscript include the scientific rigor of the experiments, the importance of the topic to the field of chronobiology, and new mechanistic insights obtained.

      Weaknesses:

      This reviewer had questions regarding some of the conclusions reached.

      Recommendations For The Authors:

      The reviewer has a few questions for the authors:

      1) Concerning the reduced activity of sequestered CK1 within LLPS droplets with FRQ, to what extent is this decrease attributed to distinct buffer conditions for LLPS formation compared to non-LLPS conditions?

      We don’t believe that these buffer conditions significantly influence the change in FRQ phosphorylation by CK1 observed at elevated temperatures. The pH and ionic strength of the buffer are in keeping with physiological conditions (300 mM NaCl, 50 mM sodium phosphate, 10 mM MgCl2, pH 7.5); CK1 autophosphorylation is robust and generally increases with temperature under these conditions (Figure 7B). However, as LLPS increases CK1 autophosphorylation remains high, whereas phosphorylation of FRQ dramatically decreases. In fact, we chose to alter temperature specifically to induce changes in phase behavior under constant buffer conditions. In this way LLPS could be increased, and FRQ phosphorylation evaluated, without altering the solution composition. Thus, we believe that the reduced CK1 kinase activity toward FRQ as a substrate is directly due to the impact of the generated LLPS milieu, i.e. the changes in structural/dynamic properties of FRQ and/or CK1 induced by the effects of being a phase separate microenvironment, which could be substantially different from non-phase separated buffer environment. For example, previous work done on the disordered region of DDX4 [Brady et al. 2017, and Nott et al. 2015] show that even the amount of water content and stability of biomolecules such as double strand nucleic acids encapsulated within the droplets differ between non- and phase separated DDX4 samples.

      Nott T.J. et al. Phase transition of a disordered nuage protein generates environmentally responsive membraneless organelles. Mol. Cell. 2015 57 936-947.

      Brady J.P. et al. Structural and hydrodynamic properties of an intrinsically disordered region of a germ cell-specific protein on phase separation. PNAS 2017 114 8194-8203.

      In the results section we have clarified the use of temperature to control LLPS, “We compared the phosphorylation of FRQ by CK1 in a buffer that supports phase separation under different temperatures, using the latter as a means to control the degree of LLPS without altering the solution composition.”

      On p.16 of the discussion we have elaborated on the above point, “We believe that the reduced CK1 kinase activity toward FRQ as a substrate is directly due to the impact of the generated LLPS milieu, i.e. the changes in structural/dynamic properties of FRQ and/or CK1 induced by the effects of being a phase separate microenvironment, which could be substantially different from non-phase separated buffer environment. For example, previous work done on the disordered region of DDX4 {Brady, 2017 #130;Nott, 2015 #131} show that even the amount of water content and stability of biomolecules such as double strand nucleic acids encapsulated within the droplets differ between non- and phase separated DDX4 samples. Indeed, the spin-labeling experiments indicate that the dynamics of FRQ have been altered by LLPS (Fig. 7D).”

      2) The DEER technique demonstrated spatial proximity between FRH and CK1 when bound to FRQ in the FFC. Is there evidence suggesting their lack of proximity in the absence of FRQ? Also, how important is this spatial proximity to FFC function?

      We have additional data substantiating that FRH and CK1 do not interact in the absence of FRQ. In the revised paper we have included the results of a SEC-MALS experiment showing that FRH and CK1 elute separately when mixed in equimolar amounts and applied to an analytical S200 column coupled to a MALS detector (Figure 1 below and Fig. S8). The importance of the FRH and CK1 proximity is currently unknown, but there are reasons to believe that it could have functional consequences. For example, CK1, as recruited by FRQ, phosphorylates the White-Collar Complex (WCC) in the repressive arm of the circadian oscillator [e.g. He et al. Genes Dev. 20, 2552 (2006); Wang et al, Mol. Cell 74, 771 (2019)]. Interactions between the WCC and the FFC are mediated at least in part by FRH binding to White Collar-2 [Conrad et al. EMBO J. 35, 1707 (2016)]. Thus, FRH:FRQ may effectively bridge CK1 to the WCC to facilitate the phosphorylation of the latter by the former.

      He et al. CKI and CKII mediate the FREQUENCY-dependent phosphorylation of the WHITE COLLAR complex to close the Neurospora circadian negative feedback loop. Genes Dev. 2006 20, 2552-2565.

      Wang B. et al. The Phospho-Code Determining Circadian Feedback Loop Closure and Output in Neurospora Mol. Cell 2019 74, 771-784.

      Conrad et al. Structure of the frequency-interacting RNA helicase: a protein interaction hub for the circadian clock. EMBO J. 2016 35, 1707-1719.

      Author response image 1.

      Size-exclusion chromatography- multiangle light scattering (SEC-MALS) of a mixture of purified FRH and CK1. The proteins elute separately as monomers with no evidence of co-migration.

      3) Is there any indication that impairing FRQ's ability to undergo LLPS disrupts clock function?

      We do not currently have direct evidence that LLPS of FRQ is essential for clock function. These experiments are ongoing, but complicated by the fact that changes to FRQ predicted to alter LLPS behavior also have the potential to perturb its many other clock-related functions that include dynamic interactions with partners, dynamic post-translational modification and rates of synthesis and degradation. That said, the intrinsic disorder of FRQ is important for it to act as a protein interaction hub, and large intrinsically disordered regions (IDRs) very often mediate LLPS, as is certainly the case here. In this work, we argue that the ability of FRQ to sequester clock proteins during the TTFL may involve LLPS. Additionally, we show that the phosphorylation state of FRQ, which is a critical factor in clock period determination, depends on LLPS. Given that the conditions under which FRQ phase separates are physiological in nature and that live-cell imaging is consistent with FRQ phase separation in the nucleus, it seems likely that FRQ does phase separate in Neurospora. Furthermore, given that the sequence features of FRQ that mediate phase-separation are conserved not only across FRQ homologs but also in other functionally related clock proteins, it is probable, albeit worthy of further investigation, that LLPS has functional consequences for the clock. See the response to reviewer 3 for more discussion on this topic.

      Minor Points:

      Indeed, we have included a reference to this paper on p. 3: “Emerging studies in plants (Jung, et al., 2020), flies (Xiao, et al., 2021) and cyanobacteria (Cohen, et al., 2014; Pattanayak, et al., 2020) implicate LLPS in circadian clocks, and in Neurospora it has recently been shown that the Period-2 (PRD-2) RNA-binding protein influences frq mRNA localization through a mechanism potentially mediated by LLPS (Bartholomai, et al., 2022).”

      • On page 9, six lines from the top, please insert "of" between "distributions" and "p-FRQ".

      We have corrected this typo.

      Reviewer #2 (Public Review):

      Summary:

      This study presents data from a broad range of methods (biochemical, EPR, SAXS, microscopy, etc.) on the large, disordered protein FRQ relevant to circadian clocks and its interaction partners FRH and CK1, providing novel and fundamental insight into oligomerization state, local dynamics, and overall structure as a function of phosphorylation and association. Liquid-liquid phase separation is observed. These findings have bearings on the mechanistic understanding of circadian clocks, and on functional aspects of disordered proteins in general.

      Strengths:

      This is a thorough work that is well presented. The data are of overall high quality given the difficulty of working with an intrinsically disordered protein, and the conclusions are sufficiently circumspect and qualitative to not overinterpret the mostly low-resolution data.

      Weaknesses:

      None

      Recommendations For The Authors:

      1)Fig.2B: Beyond the SEC part (absorbance vs elution volume), I don't understand this plot, in particular the horizontal lines. They appear to be correlating molecular weight with normalized absorption at 280 nm, but the chromatogram amplitudes are different. Clarify, or modify the plot. There are also some disconnected line segments between 10-11 mL - these seem to be spurious.

      We apologize for the confusion. The horizontal lines are meant to only denote the average molecular weights of the elution peaks and not correlate with the A280 values. The disconnected lines are the light-scattering molecular weight readouts from which the horizontal lines are derived. The problematic nature of the figure is that the full elution traces and MALS traces across the peaks call for different scales to best depict the relevant features of the data. We have reworked the figure and legend to make the key points more clear.

      2) It could be useful to add AF2 secondary structure predictions, pLDDT, and the helical propensity analysis to the sequence ribbon in Fig.1C.

      Thank you for the suggestion, we have updated the figure to incorporate the pLDDT scores into the linear sequence map, as well as the secondary structure predictions.

      3) Fig.3D: It would be better to show the raw data rather than the fits. At the same time, I appreciate the fact that the authors resisted the temptation to show distance distributions.

      Yes, we agree that it is important to show the raw data; it is included in the supplementary section. Depicting the raw data here unfortunately obscures the differences in the traces and we believe that showing the data as a superposition is quite useful to convey the main differences among the sites. However, we have now explicitly stated in the figure legend that the corresponding raw data traces are given in Figures S5-6.

      4) Fig.5: For all distance distributions, error intervals should be added (typically done in terms of shaded bands around the best-fit distribution). As shown, precision is visually overstated. The error analysis shown in the SI is dubious, as it shows some distances have no error whatsoever (e.g. 6nm in 370C-490C), which is not possible.

      We did previously show the error intervals in the SI, but we agree that it is better to include them here as well, and have done so in the new Figure 5. With respect to the error analysis, we are following the methodology described in the following paper:

      Srivastava, M. and Freed J., Singular Value Decomposition Method To Determine Distance Distributions in Pulsed Dipolar Electron Spin Resonance: II. Estimating Uncertainty. J. Phys Chem A (2019) 123:359-370. doi: 10.1021/acs.jpca.8b07673.

      Briefly, the uncertainty we are plotting is showing the "range" of singular values over which the singular value decomposition (SVD) solution remains converged. For most of the data displayed in this paper we only used the first few singular values (SVs) and the solution remained converged for ± 1 or 2 SVs near the optimum solution. For example, if the optimum solution was 4 SVs then the range in which the solution remained converged is ~3-6 SVs. We plot three lines - lowest range of SVs, highest range of SVs and optimum number of SVs – in the SI figures the optimum SV solution is shown in black and the region between the converged solutions with the highest and lowest number of SVs is shaded in red. Owing to the point-wise reconstruction of the distance distribution, the SVD method enables localized uncertainty at each distance value. Therefore, some points will have high uncertainty, whereas others low. The distance that may appear to have no uncertainty has actually very low uncertainty; which can be seen at close inspection. In these cases, we observe this "isosbestic" type behavior where the P(r) appears to change little across the acceptable solutions and hence there is only a small range of P(r) values at that particular r. This behavior results from multimodal distributions wherein the change in SVs shifts neighboring peaks to lower and higher distances respectively, producing an apparent cancelation effect. What we believe is most important for the biochemical interpretation, and accurately reflected by this analysis, is the general width of the uncertainty across the distribution and how this impacts the error in both the mean and the overall skewing of the distribution at short or long distances.

      Details of the error treatment as described above have been added to the supplementary methods section.

      5) The Discussion (p.13) states that the SAXS and DEER data show that disorder is greater than in a molten globule and smaller than in a denatured protein. Evidence to support this statement (molten globule DEER/SAXS reference data etc.) should be made explicit.

      We will make the statement more explicit by changing it to the following: “Notably, the shape of the Kratky plots generated from the SAXS data suggest a degree of disorder that is substantially greater than that expected of a molten globule (Kataoka, et al., 1997), but far from that of a completely denatured protein (Kikhney, et al., 2015; Martin, Erik W., et al., 2021). Similarly, the DEER distributions, though non-uniform across the various sites examined, indicate more disorder than that of a molten globule (Selmke et al., 2018) but more order than a completely unfolded protein (van Son et al. 2015).”

      van Son, M., et al. Double Electron−Electron Spin Resonance Tracks Flavodoxin Folding, J. Phys. Chem. B 2015, 119, 13507−13514. doi: 10.1021/acs.jpcb.5b00856.

      Selmke, B. et al. Open and Closed Form of Maltose Binding Protein in Its Native and Molten Globule State As Studied by Electron Paramagnetic Resonance Spectroscopy. Biochemistry 2018, 57, 5507−5512 doi: 10.1021/acs.biochem.8b00322.

      6) Fig. S11B could be promoted to the main paper.

      This comment makes a good point. Figure 8 is now an updated scheme, similar to the previous Fig. S11B. Thank you for the suggestion.

      Minor corrections:

      p.1: "composed from" -> "composed of"

      p.2: TFFLs -> TTFLs

      p.2: "and CK1 via" => "and to CK1 via"

      p.5: "Nickel" -> "nickel"

      p.5: "Size Exclusion Chromatography" -> "Size exclusion chromatography"

      p.5: "Multi Angle Light Scattering" -> "multi-angle light scattering"

      Fig.2 caption: "non-phosphorylated (np-FRQ)" -> "non-phosphorylated FRQ (np-FRQ)"

      Fig. S3: What are the units on the horizontal axis?

      Fig. 5H is too small

      Fig. S8, S9: all distance distribution plots show a spurious "1"

      Fig. 6A has font sizes that are too small to read

      p.11: "cytoplasm facing" -> "cytoplasm-facing"

      p.11: "temperature dependent" -> "temperature-dependent"

      p.12: "substrate-sequestration and product-release" -> "substrate sequestration and product release"

      p.12: "depend highly buffer composition" -> "depend highly on buffer composition"

      We thank the reviewer for finding these errors and their attention to detail. All of these minor points have been addressed in the revised manuscript.

      Reviewer #3 (Public Review):

      Summary:

      The manuscript from Tariq and Maurici et al. presents important biochemical and biophysical data linking protein phosphorylation to phase separation behavior in the repressive arm of the Neurospora circadian clock. This is an important topic that contributes to what is likely a conceptual shift in the field. While I find the connection to the in vivo physiology of the clock to be still unclear, this can be a topic handled in future studies.

      Strengths:

      The ability to prepare purified versions of unphosphorylated FRQ and P-FRQ phosphorylated by CK-1 is a major advance that allowed the authors to characterize the role of phosphorylation in structural changes in FRQ and its impact on phase separation in vitro.

      Weaknesses:

      The major question that remains unanswered from my perspective is whether phase separation plays a key role in the feedback loop that sustains oscillation (for example by creating a nonlinear dependence on overall FRQ phosphorylation) or whether it has a distinct physiological role that is not required for sustained oscillation.

      The reviewer raises the key question regarding data suggesting LLPS and phase separated regions in circadian systems. To date condensates have been seen in cyanobacteria (Cohen et al, 2014, Pattanayak et al, 2020) where there are foci containing KaiA/C during the night, in Drosophila (Xiao et al, 2021) where PER and dCLK colocalize in nuclear foci near the periphery during the repressive phase, and in Neurospora (Bartholomai et al, 2022) where the RNA binding protein PRD-2 sequesters frq and ck1a transcripts in perinuclear phase separated regions. Because the proteins responsible for the phase separation in cyanobacteria and Drosophila are not known, it is not possible to seamlessly disrupt the separation to test its biological significance (Yuan et al, 2022), so only in Neurospora has it been possible to associate loss of phase separation with clock effects. There, loss of PRD-2, or mutation of its RNA-binding domains, results in a ~3 hr period lengthening as well as loss of perinuclear localization of frq transcripts. A very recent manuscript (Xie et al., 2024) calls into question both the importance and very existence of LLPS of clock proteins at least as regards to mammalian cells, noting that it may be an artefact of overexpression in some places where it is seen, and that at normal levels of expression there is no evidence for elevated levels at the nuclear periphery. Artefacts resulting from overexpression plainly cannot be a problem for our study nor for Xiao et al. 2021 as in both cases the relevant clock protein, FRQ or PER, was labeled at the endogenous locus and expressed under its native promoter. Also, it may be worth noting that although we called attention to enrichment of FRQ[NeonGreen] at the nuclear periphery, there remained abundant FRQ within the core of the nucleus in our live-cell imaging.

      Cohen SE, et al.: Dynamic localization of the cyanobacterial circadian clock proteins. Curr Biol 2014, 24:1836–1844, https://doi.org/10.1016/j.cub.2014.07.036.

      Pattanayak GK, et al.: Daily cycles of reversible protein condensation in cyanobacteria. Cell Rep 2020, 32:108032, https://doi.org/10.1016/j.celrep.2020.108032.

      Xiao Y, Yuan Y, Jimenez M, Soni N, Yadlapalli S: Clock proteins regulate spatiotemporal organization of clock genes to control circadian rhythms. Proc Natl Acad Sci U S A 2021, 118, https://doi.org/10.1073/pnas.2019756118.

      Bartholomai BM, Gladfelter AS, Loros JJ, Dunlap JC. 2022 PRD-2 mediates clock-regulated perinuclear localization of clock gene RNAs within the circadian cycle of Neurospora. Proc Natl Acad Sci U S A. 119(31):e2203078119. doi: 10.1073/pnas.2203078119.

      Yuan et al., Curr Biol 78: 102129, 2022. https://doi.org/10.1016/j.ceb.2022.102129

      Pancheng Xie, Xiaowen Xie, Congrong Ye, Kevin M. Dean, Isara Laothamatas , S K Tahajjul T Taufique, Joseph Takahashi, Shin Yamazaki, Ying Xu, and Yi Liu (2024). Mammalian circadian clock proteins form dynamic interacting microbodies distinct from phase separation. Proc. Nat. Acad. Sci. USA. In press.

      We have updated the discussion on p. 15 accordingly:

      “Live cell imaging of fluorescently-tagged FRQ proteins is consistent with FRQ phase separation in N. crassa nuclei. FRQ is plainly not homogenously dispersed within nuclei, and the concentrated foci observed at specific positions in the nuclei indicate condensate behavior similar to that observed for other phase separating proteins (Bartholomai, et al., 2022; Caragliano, et al., 2022; Gonzalez, A., et al., 2021; Tatavosian, et al., 2019; Xiao, et al., 2021). While ongoing experiments are exploring more deeply the spatiotemporal dynamics of FRQ condensates in nuclei, the small size of fungal nuclei as well as their rapid movement with cytoplasmic bulk flow through the hyphal syncytium makes these experiments difficult. Of particular interest is drawing comparisons between FRQ and the Drosophila Period protein, which has been observed in similar foci that change in size and subnuclear localization throughout the circadian cycle (Meyer, et al., 2006; Xiao, et al., 2021), although it must be noted that the foci we observed are considerably more dynamic in size and shape than those reported for PER in Drosophila (Xiao, et al., 2021). A very recent manuscript (Xie, et al., 2024) calls into question the importance and very existence of LLPS of clock proteins at least in regards to mammalian cells, noting that it may be an artifact of overexpression in some instances where it is seen, and that at normal levels of expression there is no evidence for elevated levels at the nuclear periphery. Artifacts resulting from overexpression are unlikely to be a problem for our study and that of Xiao et al as in both cases clock proteins were tagged at their endogenous locus and expressed from their native promoters. Although we noted enrichment of FRQmNeonGreen near the nuclear envelope in our live-cell imaging, there remained abundant FRQ within the core of the nucleus.”

      Recommendations For The Authors:

      The data in Fig 6 showing microscopy of Neurospora is suggestive but needs more information/controls. Does the strain that expresses FRQ-mNeonGreen have normal circadian rhythms? How were the cultures handled (in terms of circadian entrainment etc.) for imaging? Do samples taken at different clock times appear different in terms of punctate structures in microscopy? The authors cite the Xiao 2021 paper in Drosophila, but would be good to see if the in vivo picture is fundamentally similar in Neurospora.

      All of the live-cell images we report were from cells grown in constant light; in the dark, strains bearing FRQ[NeonGreen] have normally robust rhythms with a slightly elongated period length as measured by a frq Cbox-luc reporter. Although we are interested, of course, in whether and if so how the punctate structures changed as function of circadian time, this is work in progress and beyond the scope of the present study. This said, it is plain to see from the movie included as a Supplemental file here that the puncta we see are moving and fusing/splitting on a scale of seconds whereas those reported in Drosophila by Xiao et al. (Xiao et al, 2021, above) were stable for many minutes; thus the FRQ foci seen in Neurospora are quite a bit more dynamic than those in Drosophila.

      We have updated the results section on p. 11 to provide this information more clearly: “FRQ thus tagged and driven by its own promoter is expressed at physiologically normal levels, and strains bearing FRQmNeonGreen as the only source of FRQ are robustly rhythmic with a slightly longer than normal period length. Live-cell imaging in Neurospora crassa offers atypical challenges because the mycelia grow as syncytia, with continuous rapid nuclei motion during the time of imaging. This constant movement of nuclei is compounded by the very low intranuclear abundance of FRQ and the small size of fungal nuclei, making not readily feasible visualization of intranuclear droplet fission/fusion cycles or intranuclear fluorescent photobleaching recovery experiments (FRAP) that could report on liquid-like properties. Nonetheless, bright and dynamic foci-like spots were observed well inside the nucleus and near the nuclear periphery, which is delineated by the cytoplasm-facing nucleoporin Son-1 tagged with mApple at its C-terminus (Fig. 6D,E, Movie S1). Such foci are characteristic of phase separated IDPs (Bartholomai, et al., 2022; Caragliano, et al., 2022; Gonzalez, A., et al., 2021; Tatavosian, et al., 2019) and share similar patterning to that seen for clock proteins in Drosophila (Meyer, et al., 2006; Xiao, et al., 2021), although the foci we observed are substantially more dynamic than those reported in Drosophila.”

      Another issue where some commentary would be helpful: Fig 7 shows that phase separation behavior is strongly temperature dependent (not biophysically surprising). Is that at odds with the known temperature compensation of the circadian rhythm if LLPS indeed plays a key role in the oscillator?

      We believe that the dependence of CK1-mediated FRQ phosphorylation on temperature, as manifested by FRQ phase separation, is consistent with temperature compensation within the Neurospora circadian oscillator. The phenomenon of temperature compensation by circadian clocks involves the intransigence of the oscillator period to temperature change. Stability of period with temperature change would not necessarily be expected of a generic chemical oscillator, which would run faster (shorter period) at higher temperature owing to Arrhenius behavior of the underlying chemical reactions. Circadian phosphorylation of FRQ is one such chemical process that contributes to the oscillation of FRQ abundance on which the clock is based. Reduced CK1 phosphorylation of FRQ causes both longer periods [Mehra et al., 2009] and loss of temperature compensation (manifested as a reduction of period length at higher temperature) [Liu et al, Nat Comm, 10, 4352 (2019); Hu et al, mBio, 12, e01425 (2021)]. Thus, the ability of increased LLPS formation at elevated temperature to reduce FRQ phosphorylation by CK1 (but not intrinsic CK1 autophosphorylation) would be a means to counter a decreasing period length that would otherwise manifest in an under compensated system. As further negative feedback on the system, LLPS is also promoted by FRQ phosphorylation itself, which in turn will reduce phosphorylation by CK1. Thus, both increased FRQ phosphorylation and temperature will couple to increased LLPS and mitigate period shortening through reduction of CK1 activity.

      Mehra et al., A Role for Casein Kinase 2 in the Mechanism Underlying Circadian Temperature Compensation. May 15, 2009. Cell 137, 749–760,

      Liu et al. FRQ-CK1 interaction determines the period of circadian rhythms in Neurospora. Nat Comm. 2019, 10 4352.

      Hu et al FRQ-CK1 Interaction Underlies Temperature Compensation of the Neurospora Circadian Clock mBio 2021 12 WOS:000693451600006.

      We have added Figure 8 to clarify the interpretation of the temperature compensation implicaitons of our work, the legend of which reads:

      “Figure 8: LLPS may play a role in temperature compensation of the clock through modulation of FRQ phosphorylation. Reduced CK1 phosphorylation of FRQ causes both longer periods (Mehra, et al., 2009) and loss of temperature compensation (manifested as a shortening of period at higher temperature) (Hu, et al., 2021; Liu, X., et al., 2019). Thus, the ability of increased LLPS at elevated temperature (larger grey circle) to reduce FRQ phosphorylation by CK1 will counter a shortening period that would otherwise manifest in an under compensated system. As further negative feedback, LLPS is also promoted by increased FRQ phosphorylation, which in turn will reduce phosphorylation by CK1. Thus, both increased FRQ phosphorylation and temperature favor LLPS and reduction of CK1 activity.”

      one minor comment: The chemical structures in Fig 3A have some issues where the "N" and "S" are flipped. Would be good to remake these figures to fix this problem.

      We apologize, the figure has been replaced with an improved version.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer 1:

      (1) In Figure 1, it is curious that the authors only chose E.coli and staphytlococcus sciuri to test the induction of Chi3l1. What about other bacteria? Why does only E.coli but not staphytlococcus sciuri induce chi3l1 production? It does not prove that the gut microbiome induces the expression of Chi3l1. If it is the effect of LPS, does it trigger a cell death response or inflammatory responses that are known to induce chi3l1 production? What is the role of peptidoglycan in this experiment? Also, it is recommended to change WT to SPF in the figure and text, as no genetic manipulation was involved in this figure.

      Thank you for your valuable feedback and insightful suggestions. In our study, we tried to identify bacteria from murine gut contents and feces using 16S sequencing. However, only E. coli and Staphylococcus sciuri were identified (Figure 1D). Consequently, our experiments were limited to these two bacterial strains. While we have not tested other bacteria, our data suggest that not all bacteria can induce the expression of Chi3l1. Given that E. coli is Gram-negative and Staphylococcus sciuri is Gram-positive, we hypothesized that the difference in their ability to induce Chi3l1 expression might be due to variations between Gram-negative and Gram-positive bacteria, such as the presence of lipopolysaccharides (LPS).

      To test this hypothesis, we used LPS to induce Chi3l1 expression. Consistent with our hypothesis, LPS successfully induced Chi3l1 expression (Figure 1F&G). Additionally, we observed that Chi3l1 expression is significantly upregulated in specific pathogen-free (SPF) mice compared to germ-free mice (Figure 1A), demonstrating that the gut microbiome induces the expression of Chi3l1.

      Although we have not examined cell death or inflammatory responses, the protective role of Chi3l1 shown in Figure 5 suggests that any such responses would be mild and negligible. Regarding the role of peptidoglycan in the induction of Chi3l1 expression in DLD-1 cells, we have not yet explored this aspect. However, we agree with your suggestion that it would be worthwhile to investigate this in future experiments.

      We have also made the suggested modifications to the labeling (Figure 1A) and the clarification in the revised manuscript accordingly (page 3, Line 95-96; Line 102-106).

      Thank you again for your constructive feedback.

      (2) In Figure 2, the binding between Chi3l1 and PGN needs better characterization, regarding the affinity and how it compares with the binding between Chi3l1 and chitin. More importantly, it is unclear how this interaction could facilitate the colonization of gram-positive bacteria.

      Thank you for your insightful suggestions and we have performed the suggested experiments and included the results in the revised manuscript (Figure 2E-G, page 3-4, Line 132-146).

      Our results indicate that Chi3l1 interact with PGN in a dose-increase manner (Figure 2E). In contrast, the binding between Chi3l1 and chitin did not exhibit dose dependency (Figure 2E). These findings suggest a specific and distinct binding mechanism for Chi3l1 with PGN compared to chitin.

      We conducted DLD-1 cell-bacteria adhesion experiments, using GlmM mutant (PGN synthesis mutant) and K12 (wild-type) bacteria to test their adhesion capabilities. The results showed that the adhesion ability of the GlmM mutant to cells significantly decreased (Figure 2F). Additionally, after knocking down Chi3l1 in DLD-1 cells, we observed a decreased bacterial adhesion (Figure 2G). These findings suggest that Chi3l1 and PGN interaction plays a crucial role in bacterial adhesion.

      (3) In Figure 3, the abundance of furmicutes and other gram-positive species is lower in the knockout mice. What is the rationale for choosing lactobacillus in the following transfer experiments?

      We appreciate your thorough review. Among the Gram-positive bacteria that we have sequenced and analyzed, Lactobacillus occupies the largest proportion. Given the significant presence and established benefits of Lactobacillus, we chose it for the subsequent transfer experiments to leverage its known properties and availability, thereby ensuring the robustness and reproducibility of our findings.This is supported by the study referenced below.

      Lamas B, Richard ML, Leducq V, Pham HP, Michel ML, Da Costa G, Bridonneau C, Jegou S, Hoffmann TW, Natividad JM, Brot L, Taleb S, Couturier-Maillard A, Nion-Larmurier I, Merabtene F, Seksik P, Bourrier A, Cosnes J, Ryffel B, Beaugerie L, Launay JM, Langella P, Xavier RJ, Sokol H. CARD9 impacts colitis by altering gut microbiota metabolism of tryptophan into aryl hydrocarbon receptor ligands. Nat Med. 2016 Jun;22(6):598-605. doi: 10.1038/nm.4102. Epub 2016 May 9. PMID: 27158904; PMCID: PMC5087285.

      (4) FDAA-labeled E. faecalis colonization is decreased in the knockouts. Is it specific for E. faecalis, or it is generally true for all gram-positive bacteria? What about the colonization of gram-negative bacteria?

      Thank you for your insightful suggestions and we have investigated the colonization of gram-negative bacteria, OP50-mcherry (a strain of E.coli that express mCherry) and included the results in the updated manuscript (Supplementary Figure 3B, page 5, Line 197-200). We performed rectal injection of both wildtype and Chi11-/- mice with mCherry-OP50, and found that Chi11-/- mice had much higher colonization of E. coli compared to wildtype mice.

      (5) In Figure 5, the fact that FMT did not completely rescue the phenotype may point to the role of host cells in the processes. The reason that lactobacillus transfer did completely rescue the phenotypes could be due to the overwhelming protective role of lactobacillus itself, as the experiments were missing villin-cre mice transferred with lactobacillus.

      Thank you for your valuable feedback and thorough review. In our study, pretreatment with antibiotics in mice to eliminate gut microbiota demonstrated that IEC∆Chil1 mice exhibited a milder colitis phenotype (Supplementary Figure 4). This suggests that Chi3l1-expressing host cells are likely to play a detrimental role in colitis. Consequently, the failure of FMT to completely rescue the phenotype is likely due to the incomplete preservation of bacteria in the feces during the transfer experiment.

      We agree with your assessment of the protective role of lactobacillus. This also explains the significant difference in colitis phenotype between Villin-cre and IEC∆Chil1 mice (Figure 5B-E), as lactobacillus levels are significantly lower in IEC∆Chil1 mice (Figure 4F). Given the severity of colitis in Villin-cre mice at 7 days post-DSS, even if lactobacillus were transferred back to these mice, it is unlikely to result in a significant improvement.

      (6) Conflicting literature demonstrating the detrimental roles of Chi3l1 in mouse IBD model needs to be acknowledged and discussed.

      Thank you for your insightful suggestions and we have included additional discussions in the revised manuscript (page 6-7, Line 258-274).

      Reviewer #2 (Public Review):

      (1) Images are of great quality but lack proper quantification and statistical analysis. Statements such as "substantial increase of Chi3l1 expression in SPF mice" (Fig.1A), "reduced levels of Firmicutes in the colon lumen of IEC ∆ Chil1" (Fig.3F), "Chil1-/- had much lower colonization of E.faecalis" (Fig.4G), or "deletion of Chi3l1 significantly reduced mucus layer thickness" (Supplemental Figure 3A-B) are subjective. Since many conclusions were based on imaging data, the authors must provide reliable measures for comparison between conditions, as long as possible, such as fluorescence intensity, area, density, etc, as well as plots and statistical analysis.

      Thank you for your insightful suggestions and we have performed the suggested statistical analysis on most of the figures and included the analysis in the revised manuscript (Figure 1A, Figure 3E&F, Supplementary Figure 3B&C).Given large quantity of dietary fiber intertwined with bacteria, it is challenging to make a reliable quantification of bacteria in Figure 4G. However, it is easy to distinguish bacteria from dietary fiber under the microscope. We have exclusively analyzed gut sections from six mice in each group, and the results are consistent between the two groups.

      (2) In the fecal/Lactobacillus transplantation experiments, oral gavage of Lactobacillus to IECChil1 mice ameliorated the colitis phenotype, by preventing colon length reduction, weight loss, and colon inflammation. These findings seem to go against the notion that Chi3l1 is necessary for the colonization of Lactobacillus in the intestinal mucosa. The authors could speculate on how Lactobacillus administration is still beneficial in the absence of Chi3l1. Perhaps, additional data showing the localization of the orally administered bacteria in the gut of Chi3l1 deficient mice would clarify whether Lactobacillus are more successfully colonizing other regions of the gut, but not the mucus layer. Alternatively, later time points of 2% DSS challenge, after Lactobacillus transplantation, would suggest whether the gut colonization by Lactobacillus and therefore the milder colitis phenotype, is sustained for longer periods in the absence of Chi3l1.

      Thank you for your thorough review and insightful suggestions. Since we pretreated mice with antibiotics, the intestinal mucus layer is likely damaged according to a previous study (PMID: 37097253). Therefore, gavaged Lactobacillus cannot colonize in the mucus layer. Moreover, existing studies have shown that the protective effect of Lactobacillus is mainly derived from its metabolites or thallus components, rather than the living bacteria itself (PMID: 36419205, PMID: 27516254).

      Zhan M, Liang X, Chen J, Yang X, Han Y, Zhao C, Xiao J, Cao Y, Xiao H, Song M. Dietary 5-demethylnobiletin prevents antibiotic-associated dysbiosis of gut microbiota and damage to the colonic barrier. Food Funct. 2023 May 11;14(9):4414-4429. doi: 10.1039/d3fo00516j. PMID: 37097253.

      Montgomery TL, Eckstrom K, Lile KH, Caldwell S, Heney ER, Lahue KG, D'Alessandro A, Wargo MJ, Krementsov DN. Lactobacillus reuteri tryptophan metabolism promotes host susceptibility to CNS autoimmunity. Microbiome. 2022 Nov 23;10(1):198. doi: 10.1186/s40168-022-01408-7. PMID: 36419205.

      Piermaría J, Bengoechea C, Abraham AG, Guerrero A. Shear and extensional properties of kefiran. Carbohydr Polym. 2016 Nov 5;152:97-104. doi: 10.1016/j.carbpol.2016.06.067. Epub 2016 Jun 23. PMID: 27516254.

      Reviewer #3 (Public Review):

      The claim that mucus-associated Ch3l1 controls colonization of beneficial Gram-positive species within the mucus is not conclusive. The study should take into account recent discoveries on the nature of mucus in the colon, namely its mobile fecal association and complex structure based on two distinct mucus barrier layers coming from proximal and distal parts of the colon (PMID: ). This impacts the interpretation of how and where Ch3l1 is expressed and gets into the mucus to promote colonization. It also impacts their conclusions because the authors compare fecal vs. tissue mucus, but most of the mucus would be attached to the feces. Of the mucus that was claimed to be isolated from the WT and IEC Ch3l1 KO, this was not biochemically verified. Such verification (e.g. through Western blot) would increase confidence in the data presented. Further, the study relies upon relative microbial profiling, which can mask absolute numbers, making the claim of reduced overall Gram-positive species in mice lacking Ch3l1 unproven. It would be beneficial to show more quantitative approaches (e.g. Quantitative Microbial Profiling, QMP) to provide more definitive conclusions on the impact of Ch3l1 loss on Gram+ microbes.

      You raise an excellent point about the data interpretation, and we appreciate your insightful suggestions. We have included the discussion regarding the recent discoveries in the revised manuscript (page 7-8, Line 304-312). According to the recent discovery, the mucus in the proximal colon forms a primary encapsulation barrier around fecal material, while the mucus in the distal colon forms a secondary barrier. Our findings indicate that Chi3l1 is expressed throughout the entire colon, including the proximal, middle, and distal sections (See Author response image 1 below, P.S. Chi3l1 detection in colon presented in the manuscript are from the middle section). This suggests that Chi3l1 likely promotes bacterial colonization across the entire colon. Despite most mucus being expelled with feces, the

      constant production of mucus and the minimal presence of Chi3l1 in feces (Figure 4C) indicate that Chi3l1 continuously plays a role in promoting the colonization of microbiota.

      Author response image 1.

      Chi3l1 express in the proximal and distal colon. Immunofluoresence staining on proximal and distal colon sections to detect Chi3l1 (Red) expression. Nuclei were detected with DAPI (blue). Scale bars, 50um.

      Given the isolation method of the mucus layer, we followed the paper titled "The Antibacterial Lectin RegIIIγ Promotes the Spatial Segregation of Microbiota and Host in the Intestine" (PMID: 21998396). Although we did not find a suitable marker representative of the mucus layer for western blotting, we performed protein mass spectrometry on the isolated mucus layers and analyzed the data by comparing it with established research ("Proteomic Analyses of the Two Mucus Layers of the Colon Barrier Reveal That Their Main Component, the Muc2 Mucin, Is Strongly Bound to the Fcgbp Protein," PMID: 19432394). Our data showed a high degree of overlap with the proteins identified in established studies (see Author response image 2 below).

      Author response image 2.

      Comparison of mucus layer proteins identified by mass spectrometry between Our team and the Hansson team Mucus layer proteins identified by mass spectrometry between our team and the Hansson team (PMID: 19432394) are compared.

      Due to a lack of expertise, it has been challenging for us to perform reliable QMP experiments. However, since QMP involves qPCR combined with bacterial sequencing, we conducted 16S rRNA sequencing and confirmed the quantity of certain bacteria by qPCR (revised manuscript, Figure 3B, H, Figure 4E, F, Supplementary Figure 3A). Therefore, our data is reliable to some extent.

      Other weaknesses lie in the execution of the aims, leaving many claims incompletely substantiated. For example, much of the imaging data is challenging for the reader to interpret due to it being unfocused, too low of magnification, not including the correct control, and not comparing the same regions of tissues among different in vivo study groups. Statistical rigor could be better demonstrated, particularly when making claims based on imaging data. These are often presented as single images without any statistics (i.e. analysis of multiple images and biological replicates). These images include the LTA signal differences, FISH images, Enterococcus colonization, and mucus thickness.

      Thank you for your thorough review and insightful suggestions. We have performed the recommended statistical analysis on most of the figures and included the analysis in the revised manuscript (Figure 1A, Figure 3E&F, Supplementary Figure 3B&C). We have also added arrows in Figure 2B to make the figure easier to understand. Additionally, we repeated some key experiments to show the same regions of tissues among different groups. We will upload higher resolution figures during the revision. Thank you again for your constructive feedback.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      It is recommended to change WT to SPF in the figure and text, as no genetic manipulation was involved in Figure 1.

      Thank you for your insightful suggestion. We have also made the suggested modifications to the labeling (revised manuscript, Figure 1A).

      Reviewer #2 (Recommendations For The Authors):

      The manuscript is well-written, but it would benefit from a critical reading to correct some typos and small grammar issues. Histological and IF images would be more informative if they contained arrows and labels guiding the reader's attention to what the authors want to show. More details about the structures shown in the figures should be included in the legends.

      Thank you for your thorough review and insightful suggestions. We have revised the manuscript to correct noticeable typos and grammar issues. Arrows have been added to Figure 2A&B to make the figures easier to understand. Additionally, we have included a detailed description of the structural similarities and differences between chitin and peptidoglycan in the figure legend ( revised manuscript, page 19, line 730-733).

      Minor points:

      • Page 1, line 36: Please correct "mice models" to "mouse models".

      Thank you for your insightful suggestion and we have made the suggested correction in the revised manuscript (page 1, line 41).

      • Page 3, line 110: "by comparing the structure of chitin with that of peptidoglycan (PGN), a component of bacterial cells walls, we observed that they have similar structures (Fig.2A)". Although both structures are shown side-by-side, no similarities are mentioned or highlighted in the text, figure, or legend.

      Thank you for your insightful suggestion and we have included a detailed description of the structural similarities and differences between chitin and peptidoglycan in the figure legend (revised manuscript, page 19, line 730-733).

      • Fig.5C and Fig.5G: y axis brings "weight (%)". I believe the authors mean "weight change (%)"?

      We agrees with your suggestion and has corrected the labeling according to your suggestion (revised manuscript, Figure 5C and G)

      • Page 8: Genotyping method is described as a protocol. Please modify it.

      Thank you for your constructive suggestion and we have modified the genotyping method in the revised manuscript (page 8, line 339-349)

      • Please expand on the term "scaffold model" used in the abstract and discussion.

      Thank you for your thorough review. In this model, Chi3l1 acts as a key component of the scaffold. By binding to bacterial cell wall components like peptidoglycan, Chi3l1 helps anchor and organize bacteria within the mucus layer. This interaction facilitates the colonization of beneficial bacteria such as Lactobacillus, which are important for gut health. We included more descriptions regarding scaffold model in the revised manuscript (page 6, line 248-250)

      • Discussion session often recapitulates results description, which makes the text repetitive.

      Thank you for your constructive suggestion and we have removed unnecessary results description in the discussion session in the revised manuscript.

      Reviewer #3 (Recommendations For The Authors):

      Major comments

      (1) Figure 1A. The staining is very faint, and hard to see. The reader cannot be certain those are Ch311-positive cells. Higher Mag is needed.

      Thank you for your insightful suggestion and we have included the higher resolution figures in the revised manuscript Figure 1A.

      (2) The mucus is produced largely by the proximal colon, is adherent to the feces, and mobile with the feces (PMID: 33093110). Therefore it is important to determine where the Ch311 is being expressed to be released into the lumen. Further Ch3l1 expression studies are needed to be done in both proximal and distal colon.

      Thank you for your thorough review and insightful suggestions. We have addressed this part in our public review. Additionally, we agree with your suggestions and will conduct further studies on Chi3l1 expression in both the proximal and distal colon.

      (3) Figure 1B. The image is out of focus for the Ileum, and the DAPI signal needs to be brought up for the colon. Which part of the colon is this? The UEA1+ cells do not really look like goblet cells. A better image with clearer goblet cells is needed.

      Thank you for your constructive suggestions. In the revised manuscript, we have included higher-resolution images (Figure 1B). The middle colon (approximately 3 to 4 cm distal from the cecum) was harvested for staining. In addition to UEA-1, we utilized anti-MUC2 antibody to label goblet cells in this colon segment (see Author response image 3 below). The patterns of goblet cells identified by UEA-1 or MUC2 antibodies are similar. The UEA-1-positive cells shown in Figure 1B are presumed to be goblet cells.

      Author response image 3.

      Goblet Cell Distribution in the Middle Colon. Goblet cells in the middle segment of the colon (approximately 3 to 4 cm distal from the cecum) were detected using immunofluorescence with antibodies against UEA-1 (green) and MUC2 (red). Scale bar=50μm. Representative images are shown from three mice individually stained for each antibody.

      (4) Figure 1G. There needs to be some counterstain or contrast imaging to show evidence that cells are present in the untreated sample.

      Thank you for your insightful suggestions. We have annotated the cells present in the untreated sample based on the overexposure in the revised manuscript (Figure 1G).

      (5) Figure 3B. Is this absolute quantification? How were the data normalized to allow comparison of microbial loads?

      Thank you for your thorough review. Figure 3B presents absolute quantification data based on the methodology described in the paper titled "The Antibacterial Lectin RegIIIγ Promotes the Spatial Segregation of Microbiota and Host in the Intestine" (PMID: 21998396). Briefly, we amplified a short segment (179 bp) of the 16S rRNA gene using conserved 16S rRNA-specific primers and OP50 (a strain of E. coli) as the template. After gel extraction and concentration measurement, the PCR products were diluted to gradient concentrations (0.16, 0.32, 0.64, 1.28, 2.56, 5.12, 10.24, 20.48 pg/µl). These gradient concentrations were used as templates for qPCR to generate a standard curve based on Ct values and bacterial concentration. The standard curve is used to calculate bacterial concentration in the samples. The data presented in Figure 3B represent the weight of bacteria/milligram sample, calculated as (bacterial concentration x bacterial volume) / (weight of feces or gut content).

      (6) Figure 3D. The major case is made for a dramatic reduction in Gram+ species, but Figure 1D does not show a dramatic change. Is this difference significant?

      Thank you for your thorough review. We don’t think we are clear about your question. However, there was no significant difference in Figure 3D. The dramatic reduction in Gram+ species are made based on the LTA, Firmicutes FISH, individual species comparison between WT and KO mice, bacterial QPCR results together (Figure 3E-H).

      (7) Figures 3E and 3F. These stainings are alone not convincing of reduced Gram+ in the KOs. Some stats are required for these images. An independent complementary method is also needed to quantify these with statistics since this data is so central to the study's conclusions.

      Thank you for your constructive suggestions. We have included statistical analysis in the revised manuscript (Figure 3E and F). Given large quantity of dietary fiber intertwined with bacteria, it is challenging to make a reliable quantification of bacteria in Figure 3E. However, it is easy to distinguish bacteria from dietary fiber under the microscope. We have exclusively analyzed gut sections from six mice in each group, and the results are consistent with the Firmicutes FISH results. Complementary method such as bacterial QPCR have been employed to quantify these (Figure 4E, F). Due to a lack of expertise, it has been challenging for us to perform reliable QMP experiments.

      (8) Figure 3G. To make quantitative conclusions, the authors need to do quantitative microbial profiling (QMP) of the microbiota. Relative abundance masks absolute numbers, which could be increased. There are qPCR-based QMP platforms the authors could use (PMID: PMIDs: 31940382, 33763385).

      Thank you for your constructive suggestions. Due to a lack of expertise, it has been challenging for us to perform reliable QMP experiments. However, since QMP involves qPCR combined with bacterial sequencing, we conducted 16S rRNA sequencing and confirmed the quantity of certain bacteria by qPCR (revised manuscript, Figure 3B, H, Figure 4E, F, Supplementary Figure 3A). In addition to the original bacterial qPCR data presented in the manuscript, we included another bacterial species, Turicibater. Consistent with the 16S rRNA sequencing analysis data, qPCR results showed that Turicibacter was more abundant in IECΔChil1 mice than Villin-cre mice (revised manuscript, supplementary Figure 3A, page 4, line 171-173) Therefore, our data is reliable to some extent.

      (9) Figure 4B. The data nicely shows Ch3l1 in mucus. However, no data supports the authors' main claim Ch3h1 binds Gram-positive bacteria in situ. Dual staining of Ch3l1 with Firmicutes probe would be supportive to show this interaction is happening in vivo.

      You raise an excellent point, and we agree with your suggestion that we should confirm Chi3l1 binding to Gram-positive bacteria in situ. During the study, we attempted dual staining of Chi3l1 with a universal bacterial 16S FISH probe several times, but we were unsuccessful. Despite various optimizations of the protocol, we were only able to detect bacteria, not Chi3l1. It appears that the antibody is not suitable for this method.

      (10) Figures 4D - F. Because mucus is associated with feces (PMID: ), the data with feces likely contains both Muc2/mucus and Feces. Therefore, it is unclear what the "mucus" is referring to in these figures. To support the authors' conclusions, there needs to be some validation that mucus was purified in the assays. This must be confirmed at a minimum by PAS staining on SDS PAGE gel (should be very high molecular weight) or Western blot with UEA lectin.

      Thank you for your insightful suggestions. As mentioned in the public review, the mucus layer was isolated following the protocol described in the paper titled "The Antibacterial Lectin RegIIIγ Promotes the Spatial Segregation of Microbiota and Host in the Intestine" (PMID: 21998396). Briefly, after harvesting the middle colon from the mice, we cut open the colon longitudinally. After removing the gut contents, the lumen was vigorously rinsed in PBS while holding one end with forceps. The pellet obtained after centrifuging the rinsate was used as our mucus sample. Fresh feces were collected immediately after the mice defecated in a new, empty cage. We performed Western blot analysis to detect UEA lectin but were unsuccessful.

      However, as noted in the public review, we conducted protein mass spectrometry on the isolated mucus layers and analyzed the data by comparing it with established research ("Proteomic Analyses of the Two Mucus Layers of the Colon Barrier Reveal That Their Main Component, the Muc2 Mucin, Is Strongly Bound to the Fcgbp Protein," PMID: 19432394). Our data showed a high degree of overlap with the proteins identified in these established studies.

      (11) Figure 4E/F: The units of measurement are in pg/cm2, implying picogram per area. Can the authors please explain what this unit is referring to?

      We are grateful for your thorough review. The unit pg/cm ² represents picograms per square centimeter. Figures 4E and 4F present absolute quantification data based on the methodology described in the paper titled "The Antibacterial Lectin RegIIIγ Promotes the Spatial Segregation of Microbiota and Host in the Intestine" (PMID: 21998396). Briefly, we harvested a 3x0.5 cm section of colon and a 9x0.4 cm section of ileum. And then we collected the mucus layer as previously described (responses to question 10). We measured bacterial concentration as described in response to question 5 using the equation (y = -1.53ln(x) + 13.581), where x represents the bacterial concentration and y represents the Ct value. After obtaining the bacterial concentration, we multiplied it by the volume of the rinsate and divided it by the area to obtain the values for pg/cm² used in the figures.

      (12) Figure 5E. Normal tissues appear to be from different colon regions from colitis tissues: the "Normal" looks like the proximal colon, while "Colitis" looks like the Distal colon. They cannot be directly compared.

      Thank you for your insightful suggestion. We have now included the updated image in the revised manuscript as Figure 5E to compare the same region of the colons.

      (13) Similarly, in Figure 5I it appears different colon regions are being compared between groups: Proximal colon in the bottom panels, and distal in the top panels. Since the proximal colon is less damaged by DSS, this data could be misleading.

      Thank you for your insightful suggestion. We have now included the updated image in the revised manuscript as Figure 5I to compare the same region of the colons.

      (14) In the DSS studies, are the VillinCre and IEC Chit3l1 mice co-housed littermates?

      Thank you for your insightful suggestion. In the DSS studies, the Villin-Cre and IECΔChil1 mice are not co-housed littermates. However, they are derived from the same lineage and are housed in the same rack within the same room of the animal facility.

      (15) Supplementary Figure 3: Mucus thickness images; are they representative? Stats are needed on multiple mice to support the claim that the mucus is thinner.

      Thank you for your insightful suggestion. The images are representative of 4 mice each group. We have now included the statistical analysis in the revised manuscript Supplementary Figure 3C&D.

      Minor

      (1) Introduction: Reference to "mucosal layer": "Mucosal" and "Mucus" are different things. "Mucosal" refers to the epithelium, lamina propria, and muscularis mucosa. "Mucus" refers to the secreted mucus gel, the focus of the authors' study. Therefore, the statement "mucosal layer" is not proper. "Mucosal layer" should be changed to "mucus layer."

      Thank you for your constructive suggestions and we have learned a lot from it. We have made the replacement of “mucosal layer” to “mucus layer in the revised manuscript.

      (2) Line 366 and related lines: Feces cannot be "dissolved". "Resuspended" is a better term.

      Thank you for your constructive suggestion and we have made the changes of “dissolved” to “resuspended” in the revised manuscript.

      (3) Lines 36-37 and 43-44 are redundant to each other.

      Thank you for your constructive suggestion and we have removed the lines 36-37 in the revised manuscript.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Reviewer #1:

      Summary:

      The authors study age-related changes in the excitability and firing properties of sympathetic neurons, which they ascribe to age-related changes in the expression of KCNQ (Kv7, "M-type") K+ currents in rodent sympathetic neurons, whose regulation by GPCRs has been most thoroughly studied for over 40 years.

      Strengths:

      The strengths include the rigor of the current-clamp and voltage-clamp experiments and the lovely, crisp presentation of the data, The separation of neurons into tonic, phasic and adapting classes is also interesting, and informative. The ability to successfully isolate and dissociate peripheral ganglia from such older animals is also quite rare and commendable! There is much useful detail here.

      Thank you for recognizing the effort we put on presenting the data and analyzing the neuronal populations. I also believe the ability to isolate neurons from old animals is worth communicating to the scientific community.

      Weaknesses:

      Where the manuscript becomes less compelling is in the rapamycin section, which does not provide much in the way of mechanistic insights. As such, the effect is more of an epi-phenomenon of unclear insight, and the authors cannot ascribe a signaling mechanism to it that is supported by data. Thus, this latter part rather undermines the overall impact and central advance of the manuscript. The problem is exacerbated by the controversial and anecdotal nature of the entire mTor/aging field, some of whose findings have very unfortunately had to be recently retracted.

      I would strongly recommend to the authors that they end the manuscript with their analysis of the role of M current/KCNQ channels in the numerous age-related changes in sympathetic neuron function that they elegantly report, and save the rapamycin, and possible mTor action, for a separate line of inquiry that the authors could develop in a more thorough and scholarly way.

      Whereas the description of the data are very nice and useful, the manuscript does not provide much in the way of mechanistic insights. As such, the effect is more of an epi-phenomenon of unclear insight, and the authors cannot ascribe changes in signaling mechanisms, such as that of M1 mAChRs to the phenomena that is supported by data.

      I appreciate the new comment. We had agreed that our rapamycin experiments did not allow to ascribe the mechanism to the signaling pathway of mTOR. The new comment mentions M1 mAChRs signaling as another potential signaling mechanism. Our work centered on determining whether aging altered the function of sympathetic motor neurons and defining the mechanism. We presented evidence showing that the mechanism is a reduction of the M-current. We did not attempt to identify the signaling mechanism linking aging to a reduction in M-current. Therefore, we agree with the reviewer that we do not provide further details on the mechanism and that that remains an open question. However, I find it harsh to say that “the effect is more of an epiphenomenon of unclear insight”. How could we possibly test that the effect of aging on the excitability of these neurons only arises as a secondary effect or that is not causal? How could we test for sufficiency and necessity of aging? How could we modify the state of aging to test for causality? We would have to reverse aging and show that the effect on the excitability is gone. And that is exactly what we tried to do with the rapamycin experiment.

      Reviewer #1 (Recommendations For The Authors):

      (1) The significance values greater than p < 0.05 do not add anything and distract focus from the results that are meaningful. Fig. 5 is a good example. What does p = 0.7 mean? Or p = 0.6? Does this help the reader with useful information?

      I thank Reviewer 1 for raising this question. We have attempted different versions of how we report p values, as we want to make sure to address rigor and transparency in reporting data. As corresponding author, I favor reporting p values for all statistical comparisons. To help the reader identifying what we considered statistically significant, we color coded the p values, with red for p-value<0.05 and black for p-value>0.05. As a reader, seeing a p-value=0.7 allows me to know that the authors performed an analysis comparing these conditions and found the mean not to be different. Not presenting the p-value makes me wonder whether the authors even analyzed those groups. In other words, I value more the ability to analyze the data seeing all p-values than not being distracted by not-significant p-values. This is just my preference.

      (2) Fig. 1 is not informative and should be removed.

      I thank Reviewer 1 for the suggestion. In previous drafts of the manuscript, this figure was included only as a panel. However, we decided it was better to guide the reader into the scope of our work. This is part of our scientific style and, therefore, we prefer to keep the figure.

      (3) The emphasis on a particular muscarinic agonist favored by many ion channel physiologists, oxotremorine, is not meaningful (lines 192, 198). The important point is stimulation of muscarinic AChRs, which physiologically are stimulated by acetylcholine. The particular muscarinic agonist used is unimportant. Unless mandated by eLife, "cholinergic type 1 muscarinic receptors" are usually referred to as M1 mAChRs, or even better is "Gq-coupled M1 mAChRs." I don't think that Kruse and Whitten, 2021 were the first to demonstrate the increase in excitability of sympathetic neurons from stimulation of M1 mAChRs. Please try and cite in a more scholarly fashion.

      A) I have modified lines 192 and 198 removing mention to oxotremorine.

      B) I have modified the nomenclature used to refer to cholinergic type 1 muscarinic receptors.

      C) I cited references on the role of M current on sympathetic motor neuron excitability. I also removed the reference (Kruse and Whitten, 2021) referring only on the temporal correlation between the decrease of KCNQ current with excitability.

      (4) The authors may want to use the term "M current" (after defining it) as the current produced by KCNQ2&3-containing channels in sympathetic neurons, and reserve "KCNQ" or "Kv7" currents as those made by cloned KCNQ/Kv7 channels in heterologous systems. A reason for this is to exclude currents KCNQ1-containing channels, which most definitely do not contribute to the "KCNQ" current in these cells. I am not mandating this, but rather suggesting it to conform with the literature.

      Thank you for the suggestion. I have modified the text to use the term M current. I maintain the use of KCNQ only when referring to KCNQ channel, such as in the section describing the abundance of KCNQ2.

      (5) The section in the text on "Aging reduces KCNQ current" is confusing. Can the authors describe their results and their interpretation more directly?

      I am not sure to understand the request. I assumed point 5 and 6 are related and decided to answer point 6.

      (6) Please explain the meaning of the increase in KCNQ2 abundance with age in Fig. 6G. How is this increase in KCNQ2 expression consistent with an increase in excitability? The explanation of "The decrease in KCNQ current and the increase in the abundance of KCNQ2 protein suggest a potential compensatory mechanism that occurs during aging, which we are actively investigating in an independent study." is rather odd, considering that the entire thesis of this paper is that changes in excitability and firing properties are underlied by changes in KCNQ2/3 channel expression/density. Suddenly, is this not the case?? What about KCNQ3? It would be very enlightening if the authors would just quantify the ratio of KCNQ2:KCNQ3 subunits in M-type channels in young and old mice using simple TEA dose/response curves (see Shapiro et al., JNS, 2000; Selyanko et al., J. Physiol., Hadley et al., Br. J. Pharm., 2001 and a great many more). It is also surprising that the authors did not assess or probe for differences in mAChR-induced suppression of M current between SCG neurons of young and old mice. This would seem to be a fundamental experiment in this line of inquiry.

      A. Please explain the meaning of the increase in KCNQ2 abundance with age in Fig. 6G. How is this increase in KCNQ2 expression consistent with an increase in excitability? The explanation of "The decrease in KCNQ current and the increase in the abundance of KCNQ2 protein suggest a potential compensatory mechanism that occurs during aging, which we are actively investigating in an independent study." is rather odd, considering that the entire thesis of this paper is that changes in excitability and firing properties are underlied by changes in KCNQ2/3 channel expression/density. Suddenly, is this not the case?? Our interpretation is that the decrease in M current is not caused by a decrease in the abundance of KCNQ (2) channels. We do not claim that changes in excitability are underlied by a reduction in the expression or density of KCNQ2 channels. On the contrary, our working hypothesis is that the reduction in M current is caused by changes in traffic, degradation, posttranslational modifications, or cofactors for KCNQ2 or KCNQ3 channels. We have modified the description in the results section to clarify this concept.

      B. What about KCNQ3? Unfortunately, we did not find an antibody to detect KCNQ3 channels. I have added a sentence to state this.

      C. KCNQ2:KCNQ3 subunits in M-type channels in young and old mice using simple TEA dose/response curves. This is a great idea. Thank you for the suggestion. Is this a necessary experiment for the acceptance of this manuscript?

      D. It is also surprising that the authors did not assess or probe for differences in mAChR-induced suppression of M current between SCG neurons of young and old mice. This would seem to be a fundamental experiment in this line of inquiry. Reviewer 1 is correct. We did not assess for differences in the suppression of M current by mAChR activation. We do not see the connection of this experiment with the scope of the current investigation.

      (7) Why do the authors use linopirdine instead of XE-991? Both are dirty drugs hardly specific to KCNQ channels at 25 uM concentrations, but linopirdine less so. The Methods section lists the source of XE991 used in the study, not linopirdine. Is there an error?

      A. Why do the authors use linopirdine instead of XE-991? After validation of KCNQ2/3 inhibition by Linopirdine, we found the effect on membrane potential recordings to be reproducible. Linopirdine has also been reported to be reversible. We wanted to assess reversibility on the excitability of young neurons. We did not find the effect to be reversible. We performed experiments applying XE-991 while recording the membrane potential. XE-991 did not show a clear effect. I was not surprised by this. It is very likely that the pharmacological inhibition of one channel leads to the activation of other channel types. This is highlighted in the work by Kimm, Khaliq, and Bean, 2015. “Further experiments revealed that inhibiting either BK or Kv2 alone leads to recruitment of additional current through the other channel type during the action potential as a consequence of changes in spike shape.” In fact, it was quite remarkable that the aged and young phenotypes were mimicked by targeting KCNQ pharmacologically.

      B. Both are dirty drugs hardly specific to KCNQ channels at 25 uM concentrations, but linopirdine less so. I have added a sentence to point out that linopirdine is less potent than XE-991. It reads: “We want to point out that linopirdine is less potent than XE-991 and that it has been reported to activate TRPV1 channels (Neacsu and Babes, 2010). Despite this limitation, the application of linopirdine to young sympathetic motor neurons led to depolarization and firing of action potentials.”

      C. The Methods section lists the source of XE991 used in the study, not linopirdine. Is there an error? Thank you for pointing out this. I have added information for both retigabine and linopirdine in the Methods section, both were missing.

      (8) Can the authors use a more scientific explanation of RTG action than "activating KCNQ channels?" For instance, RTG induces both a negative-shift in the voltage-dependance of activation and a voltage-independent increase in the open probability, both of which differing in detail between KCNQ2 and KCNQ3 subunits. The authors are free to use these exact words. Thus, the degree of "activation" is very dependent upon voltage at any voltages negative to the saturating voltages for channel activation.

      I have modified the text to reflect your suggestion.

      (9) Methods: did the authors really use "poly-l-lysine-coated coverslips?" Almost all investigators use poly-D-lysine as a coating for mammalian tissue-culture cells and more substantial coatings such as poly-D-lysine + laminin or rat-tail collagen for peripheral neurons, to allow firm attachment to the coverslip.

      That is correct. We used poly-L-lysine-coated coverslips. Sympathetic motor neurons do not adhere to poly-D-Lysine.

      (10) As a suggestion, sampling M-type/KCNQ/Kv7 current at 2 kHz is not advised, as this is far faster than the gating kinetics of the channels. Were the signals filtered?

      It is correct. Currents were sampled at 2KHz. Data were low-pass filtered at 3 KHz. Our conditions are not far from what is reported by others. Some sample at 10KHz and even 50 KHz. Others do not report the sample frequency.

      Reviewer #2:

      Weaknesses:

      None, the revised version of the manuscript has addressed all my concerns.

      I am glad we were able to satisfy previous concerns.

      Reviewer #3:

      The main weakness is that this study is a descriptive tabulation of changes in the electrophysiology of neurons in culture, and the effects shown are correlative rather than establishing causality.

      Allow me to clarify our previous responses and determine how this aligns with your concerns. In the previous revision, Reviewer 3 wrote: “It is difficult to know from the data presented whether the changes in KCNQ channels are in fact directly responsible for the observed changes in membrane excitability.” And suggested to “use of blockers and activators to provide greater relevance.” I assumed these comments were the main concern and that doing such experiments was enough to satisfy the criticism. It is discouraging to see that our experiments did not satisfy the concerns of the reviewer of being correlative.

      If Reviewer 3 is referring to stablishing causality between aging and a reduction in M current, I would like to emphasize that such endeavor is complicated as there is not a clear experiment to solve that issue. Our best attempt was to reverse aging with rapamycin, but the recommendation was to remove those experiments.

      … but the specifics of the effects and relevance to intact preparations are unclear. Additional experiments in slice cultures would provide greater significance on the potential relevance of the findings for intact preparations.

      I apologize for missing this point in the previous revision. The proposed experiments will require an upward microscope coupled to an electrophysiology rig. Unfortunately, I do not have the equipment to do these experiments.

      Summary of recommendations from the three reviewers:

      Please make corrections as suggested by reviewer 1 to improve the manuscript. Specifically, reviewer 1 suggests making changes to p values in Figure 5,

      It is not clear what the suggested changes are. The comment from Reviewer 1 says: The significance values greater than p < 0.05 do not add anything and distract focus from the results that are meaningful. If the suggested change is to remove p values > 0.05, I have explained my rational for keeping those values. If the Journal has a specific format on how to report p-values, I will be happy to make appropriate changes.

      and the importance of citing original scholarly works related to effects of increase in excitability of sympathetic neurons by M1 receptors, and the terminology for M currents and KCNQ currents. These changes will improve the manuscript and are strongly recommended.

      I cited original papers on that area, and changed the terminology for M current. I kept KCNQ when referring to the channel protein or abundance.

      The section dealing with Aging Reduces KCNQ currents seems to contain a lot of extraneous information especially in the last part of the long paragraph and this section should be rewritten for improved clarity… and - the implications or lack thereof - of the correlation of KCNQ with AP firing rates.

      A. I removed extraneous information in that section. It now reads: Previous work by our group and others demonstrated that cholinergic stimulation leads to a decrease in M current and increases the excitability of sympathetic motor neurons at young ages \cite{RN67,RN68,RN69,RN71, RN72, RN73, RN74, RN75}. The molecular determinants of the M current are channels formed by KCNQ2 and KCNQ3 in these neurons \cite{RN76, RN77, RN70}. Thus, Figure 6A shows a voltage response (measured in current-clamp mode) and a consecutive M current recording (measured in voltage-clamp mode) in the same neuron upon stimulation of cholinergic type 1 muscarinic receptors. It illustrates the temporal correlation between the decrease of M current with the increase in excitability and firing of APs upon activation with oxotremorine. This strong dependence led us to hypothesize that aging decreases M current, leading to a depolarized RMP and hyperexcitability (Figure 6B). For these experiments, we measured the RMP and evoked activity using perforated patch, followed by the amplitude of M current using a whole-cell voltage clamp in the same cell. We also measured the membrane capacitance as a proxy for cell size. Interestingly, M current density was smaller by 29\% in middle age (7.5 ± 0.7 pA/pF) and by 55\% in old (4.8 ± 0.7 pA/pF) compared to young (10.6 ± 1.5 pA/pF) neurons (Figure 6C-D). The average capacitance was similar in young (30.8 ± 2.2 pF), middle-aged (27.4 ± 1.2 pF), and old (28.8 ± 2.3 pF) neurons (Figure 6E), suggesting that aging is not associated with changes in cell size of sympathetic motor neurons, and supporting the hypothesis that aging alters the levels of M current. Next, we tested the effect on the abundance of the channels mediating M current. Contrary to our expectation, we observed that KCNQ2 protein levels were 1.5 ± 0.1 -fold higher in old compared to young neurons (Figure 6F-G). Unfortunately, we did not find an antibody to detect consistently KCNQ3 channels. We concluded that the decrease in M current is not caused by a decrease in the abundance of KCNQ2 protein.

      B. and - the implications or lack thereof - of the correlation of KCNQ with AP firing rates. I am not sure to understand the request on the section of the correlation of KCNQ with AP firing rate. I divided the long paragraph.

      The apparent lack of correlation between KCNQ current and KCNQ2 protein needs to be better explained. This is a central part of the study and this result undercuts the premise of the paper.

      Indeed, total KCNQ2 protein abundance increases while M current decreases. We do not claim in our work that changes in excitability are caused by a reduction in the expression or density of KCNQ2 channels. On the contrary, our current working hypothesis is that the reduction in M current is caused by changes in traffic, degradation, posttranslational modifications, or cofactors for KCNQ2 or KCNQ3 channels. I have modified the description in the results section and discussion to clarify this concept.

      Additionally, the poor specificity of Linordipine for KCNQ should be pointed out in the limitations.

      I pointed this limitation. It reads: We want to point out that linopirdine is less potent than XE-991 and that it has been reported to activate TRPV1 channels (Neacsu and Babes, 2010). Despite this limitation, the application of linopirdine to young sympathetic motor neurons led to depolarization and firing of action potentials.

      Finally, the editor notes that the author response should not contain ambiguities in what was addressed in the revision. In the original summary of consolidated revisions that were requested, one clearly and separately stated point (point 4) was that experiments in slice cultures should be strongly considered to extend the significance of the work to an intact brain preparation. The author response letter seems to imply that this was done, but this is not the case. The author response seems to have combined this point with another separate point (point 3) about using KCNQ drugs, and imply that all concerns were addressed. Authors should be clear about what revisions were in fact addressed.

      As corresponding author, and direct responsible of the document provided for the reply to the reviewers, I apologize for my mistake. After reviewing this comment, I realized I did not respond to the Major points in the section of the Recommendations for the authors from Reviewer 3. I missed that entire section. My previous responses addressed the Public review of reviewer 3. When doing so, I did not separate the sentences, omitting the request on performing the experiment in slices.


      The following is the authors’ response to the original reviews.

      Reviewer #1

      Summary:

      The authors study age-related changes in the excitability and firing properties of sympathetic neurons, which they ascribe to age-related changes in the expression of KCNQ (Kv7, "M-type") K+ currents in rodent sympathetic neurons, whose regulation by GPCRs has been most thoroughly studied for over 40 years. The authors suggest the ingestion of rapamycin may partially reverse the age-related decrease in M-channel expression. With the rapamycin part included, it is unclear how this work will impact the field of age-related neuronal dysfunction, as the mechanistic information is not strong.

      Strengths:

      The strengths include the rigor of the current-clamp and voltage-clamp experiments, the lovely, crisp presentation of the data, and the expert statistics. The separation of neurons into tonic, phasic, and adapting classes is also interesting, and informative. The writing is also elegant, and crisp. The above is especially true of the manuscript up until the part dealing with the effects of rapamycin, which becomes less compelling.

      We appreciate the thoughtful comments and constructive feedback to improve the impact of the manuscript.

      Weaknesses:

      Where the manuscript becomes less compelling is in the rapamycin section, which does not provide much in the way of mechanistic insights. As such, the effect is more of an epi-phenomenon of unclear insight, and the authors cannot ascribe a signaling mechanism to it that is supported by data. Thus, this latter part rather undermines the overall impact and central advance of the manuscript. The problem is exacerbated by the controversial and anecdotal nature of the entire mTor/aging field, some of whose findings have very unfortunately had to be recently retracted.

      I would strongly recommend to the authors that they end the manuscript with their analysis of the role of M current/KCNQ channels in the numerous age-related changes in sympathetic neuron function that they elegantly report, and save the rapamycin, and possible mTor action, for a separate line of inquiry that the authors could develop in a more thorough and scholarly way.

      We agree with the reviewer in that we cannot ascribe a signaling mechanism to the reversibility observed with rapamycin. Therefore, we are following the recommendation of the reviewer and have removed the rapamycin section.

      We want to emphasize that, in the aging field, any advancement in the knowledge of how drugs such as rapamycin reverse age-associated phenotypes is of crucial importance. These drugs, commonly referred to as aging interventions, include rapamycin, calorie restriction, elamipretide, and metformin. We could have used any of these interventions. And yet, the cellular and molecular mechanisms for each one of these anti-aging drugs are unknown.

      We want to note that, although the nature of the mTOR field is controversial, the effect of rapamycin in extending lifespan and improving health is not. At least these authors have not been able to find retracted papers on that subject or notices from the NIA alerting on this issue. We kindly request the reviewer to provide the references related to rapamycin that were retracted so we can evaluate how that affects the rigor of the premise for our future work.

      As authors, we also find it important to note that we are confident of our observations regarding the effect of rapamycin, and that we are not removing this section because we are retracting our claims. We will use these data to continue our research of the mechanism behind the effect of aging on sympathetic motor neurons.

      Reviewer #2:

      Summary:

      This research shows compelling and detailed evidence showing that aging influences intrinsic membrane properties of peripheral sympathetic motor neurons such that they become more excitable. Furthermore, the authors present convincing evidence that the oral administration of the anti-aging drug Rapamycin partially reversed hyperexcitability in aged neurons. This study also investigates the molecular mechanisms underlying age-associated hyperexcitability in mouse sympathetic motor neurons. In that regard, the authors found an age-associated reduction of an outward current having properties similar to KCNQ2/Q3 potassium current. They suggested a reduction of KCNQ2/Q3 current density in aged neurons as a potential mechanism behind their overactivity.

      Strengths:

      Detailed and rigorous analysis of electrical responses of peripheral sympathetic motor neurons using electrophysiology (perforated patch and whole-cell recordings). Most of the conclusions of this paper are well supported by the data.

      We thank the reviewer for valuing our effort to present a detailed and rigorous analysis.

      Weaknesses:

      (1) The identity of the age-associated reduced current as KCNQ2/Q3 is not corroborated by pharmacology (blocking the current with the specific blocker XE-991).

      We have performed experiments using blockers of KCNQ channels. See responses below.

      (2) The manuscript does not include a direct test of the reduction of KCNQ current as the mechanism behind age-induced hyperexcitability.

      Thank you for raising this point. We have performed experiments blocking KCNQ channels with Linopiridine in young neurons and found that the pharmacological reduction of KCNQ current was enough to depolarize the cell and, in some cases, elicit the firing of action potentials. We present the results in a new figure. We also added the description in the Results section.

      Reviewer #3:

      This is a descriptive study of membrane excitability and Na+ and K+ current amplitudes of sympathetic motor neurons in culture. The main findings of the study are that neurons isolated from aged animals show increased membrane excitability manifested as increased firing rates in response to electrical stimulation and changes in related membrane properties including depolarized resting membrane potential, increased rheobase, and spontaneous firing. By contrast, neuron cultures from young mice show little to no spontaneous firing and relatively low firing rates in response to current injection. These changes in excitability correlate with significant reductions in the magnitude of KCNQ currents in aged neurons compared to young neurons. Treating cultures with the immunosuppressive drug, rapamycin, which has known antiaging effects in model animals appears to reverse the firing rates in aged neurons and enhance KCNQ current. The authors conclude that aging promotes hyperexcitability of sympathetic motor neurons.

      The electrophysiological cataloging of the neuronal properties is generally well done, and the experiments are performed using perforated patch recordings which preserve the internal constituents of neurons, providing confidence that the effects seen are not due to washout of regulators from the cells.

      The main weakness is that this study is a descriptive tabulation of changes in the electrophysiology of neurons in culture, and the effects shown are correlative rather than establishing causality. It is difficult to know from the data presented whether the changes in KCNQ channels are in fact directly responsible for the observed changes in membrane excitability.

      We appreciate the constructive criticism. In an attempt to assess whether changes in KCNQ are in fact directly responsible for the changes in membrane excitability, we have performed experiments blocking KCNQ channels with Linopirdine in young neurons and found that the pharmacological reduction of KCNQ current was enough to depolarize the cell and, in some cases, elicit the firing of action potentials. Conversely, we activated KCNQ channels in old neurons with retigabine and found that the pharmacological activation was enough to hyperpolarize the membrane potential and stop the firing of action potentials. This effect was reversible. These two experiments provide solid evidence to our statement that age-associated reduction of KCNQ activity is responsible for the hyperexcited state in sympathetic motor neurons. We present the results in a new figure (Figure 8). We also added the description in the Results section.

      Furthermore, a notable omission seems to be the analysis of Ca2+ currents which have been widely linked to alterations in membrane properties in aging.

      We thank the reviewer for the comment. We did omit to include data on our studies of calcium currents. We agree that the study of the effect of calcium currents is relevant as it can influence the afterhyperpolarization. Furthermore, we believe that potential effects on calcium currents need to be studied in relation to other physiological processes that depend on calcium, including excitation-transcription coupling, calcium handling, and neurotransmitter release. Adding this information to this manuscript would only contribute to the tabulation of effects that we observe in sympathetic motor neurons with aging. As our main goal was to determine the ion channels responsible for the hyperexcited state, voltage-gated calcium channels or other calcium sources could have reflected a more indirect mechanism as compared to changes in sodium or potassium currents. We will continue our investigation on calcium currents and report our observations in the future, but for now, we have decided to leave it out of this work.

      As well, additional experiments in slice cultures would provide greater significance on the potential relevance of the findings for intact preparations. Finally, experiments using KCNQ blockers and activators could provide greater relevance that the observed changes in KCNQ are indeed connected to changes in membrane excitability.

      We are happy to report that we have performed these experiments and that the results strengthen the conclusion that changes in KCNQ are connected to changes in membrane excitability.

      Recommendations for the authors:

      We recommend the following essential revisions summarized from the reviews:

      (1) Is the change in KCNQ current responsible for the altered membrane excitability? What happens to membrane excitability when KCNQ is partially blocked (see reviewer 2 comment below)? Conversely, what happens to the excitability of aged neurons if KCNQ is activated (e.g., with retigabine)? (see reviewer 3 comment below). Results of these important experiments are needed to support the argument that KCNQ underlies the alterations in firing and membrane excitability.

      We have responded to this point. Thank you for the suggested experiments. In summary, the new experiments show that blocking KCNQ channels in young neurons lead to depolarization, and in some cases, the firing of action potentials. Conversely, the activation of KCNQ channels in aged neurons leads to hyperpolarization and a cease of firing. We have added a new figure and reported the results in the Results section.

      (2) Rapamycin experiments are underdeveloped and weak. These should be further developed by examining the effects of KCNQ blockers to see if their effects on membrane excitability are reversed. Also, see comment 2 from reviewer 1.

      We have followed the recommendation by reviewer 1 and removed the section on rapamycin.

      (3) The study should examine voltage-gated calcium currents to determine potential changes in these currents with aging. See reviewer 3 comments.

      We thank the reviewer for the comment. We performed preliminary experiments and found that aging impacts calcium currents. However, we omitted to include the data. In our opinion, the changes in calcium currents are outside the scope of this work, as the changes could be related to physiological processes that go beyond the control of firing. Effects on calcium currents need to be studied in relation to other physiological processes that depend on calcium, including excitation-transcription coupling, calcium handling, and neurotransmitter release. The study of the relationship between changes in calcium currents and those physiological processes would require multiple experiments and detailed analysis. We will continue our investigation on calcium currents and report our observations in the future, but for now, we have decided to leave it out of this work.

      We have also edited suggestions in the Figures and Legends.

      (2) In Fig.4 panel H, Y-axis must be # AP at 100 pA.

      We corrected the axis in Figure 4H.

      (3) In Legend Fig. 5, the number of cells for each subpopulation (n) needs to be corrected. In plots F-I, n= 9, 7, and 3 seem to be the number of adapting cells for 12-, 64- and 115w-old, respectively, instead of the number of single, phasic, and old cells for 12-week-old mice. A similar correction seems to be needed for 64-week-old and 115-week-old.

      We corrected the n number in Figure 5.

      (4) In Figure 6 panel C, it would be helpful for a reader to align the voltage protocol depicted with the current shown.

      We have aligned the voltage protocol to the current traces.

      (5) In the legend of Figure 7, the description of panel A ends with "Magnitude of voltage step to elicit each trace is shown in black", however in panel A there is no voltage depiction. In the description of panel D, "N = X animals, n=x cells" must be corrected.

      We have modified the legend to clarify. It now reads: “Text at the right of each current trace corresponds to the voltage used to elicit that current.”

      New Figure 8

      Author response image 1.

      Pharmacological inhibition and activation of KCNQ channels mimic the age-dependent phenotype. A. Membrane potential recordings from two young neurons treated with 25 μM linopirdine during the time illustrated by the light gray box. No holding current was applied. B. Left: Summary of the resting membrane potential measured before (light orange) and after (dark orange) the application of linopirdine. Right: Summary of the depolarization produced by linopirdine calculated by subtracting the post-drug voltage from the pre-drug voltage (V). Data points are from N = 2 animals, n = 8 cells, 14-week-old mice. C. Membrane potential recordings from two aged neurons treated with 10 μM retigabine during the time illustrated by the light gray box. No holding current was applied. D. Left: Summary of the resting membrane potential measured before (light purple) and after (dark purple) the application of retigabine. Right: Summary of the hyperpolarization produced by retigabine calculated by subtracting the post-drug voltage from the pre-drug voltage (V). Data points are from N = 2 animals, n = 7 cells, 120-week-old mice. P-values are shown at the top of the graphs.

    1. Author Response

      The following is the authors’ response to the current reviews.

      Joint Public Review

      This study is concerned with the general question as to how pools of synaptic vesicles are organized in presynaptic terminals to support different types of transmitter release, such as fast synchronous and asynchronous release. To address this issue, the authors employed the classical method of load- ing synaptic vesicle membranes with FM-styryl dyes and assessing dye destaining during repetitive synapse stimulation by live imaging as a readout of the mobilization of vesicles for fusion. Among other 1ndings, the authors provide evidence indicating that there are multiple reserve vesicle pools, that quickly and slowly mobilized reserves do not mix, and that vesicle fusion does not follow a mono-exponential time course, leading to the notion that two separate reserve pools of vesicles - slowly vs. rapidly mobilizing - feed two distinct releasable pools - reluctantly vs. rapidly releasing. These 1ndings are valuable to the 1eld of synapse biology, where the organization of synaptic vesicle pools that support synaptic transmission in different temporal and stimulation regimes has been a focus of intense experimentation and discussion for more than two decades.

      On the other hand, the present study has limitations, so that the authors’ key conclusions remain incompletely supported by the data, and alternative interpretations of the data remain possible. The approach of using bulk FM-styryl dye destaining as a readout of precise vesicle arrangements and pools in a population of functionally very diverse synapses bears problems. In essence, the approach is ’blind’ to many additional processes and confounding factors that operate in the back- ground, from other forms of release to inter-synaptic vesicle exchange. Further, averaging signals over many - functionally very diverse - synapses makes it diicult to distinguish the dynamics of separate vesicle pools within single synapses from a scenario where different kinetics of release originate from different types of synapses with different release probabilities.

      We thank the editors and reviewers for their time and patience, and are happy that they found our results valuable.

      We do not have a clear understanding of what the alternative interpretations might be - beyond those already addressed - but would like to. At present, we believe that the evidence for parallel processing of slowly and quickly mobilized reserve vesicles is solid and hope that people who are open to the possibility will evaluate the reasoning described within our report. The hypothesis that reserves are kept separate because they feed distinct subdivisions of the readily releasable pool remains to be tested.

      Beyond that, we have used FM-dye de-staining as a bulk measurement of sub-synaptic events in the sense that we have made no attempt to measure mobilization of isolated individual vesicles. We do not see how this necessarily leaves viable alternative interpretations, but this is diZcult to evaluate without knowing what the alternatives might be. On the other hand, the FM-dye technique has had good resolution at the level of distinguishing between individual synapses since at least Murthy et al. (2001). For our part, we are con1dent that our analysis in Figure 3 combined with the results in Figures 4-11 shows that the multiple reserve pools co-occur in many individual presynaptic terminals. We did not use electron microscopy to con1rm that all of the punctae analyzed in Figure 3 were indeed single synapses, but the reviewers did not recommend this, and we believe there is already enough published about the spatial distribution of synapses in cell culture to be con1dent that many of the punctae that are smaller than 1.5 µm were individuals.

      Overall, we have attempted to address all of the individual concerns raised by reviewers, and our understanding is that these concerns and our responses will be available on the eLife website. The reviewers were not convinced on every point, but these are cases where the nature of the concern was not clear to us. We hope that people who share these concerns will check out our responses and contact us with any further questions or alternative interpretations.

      (1) The authors sincerely addressed many of the previous concerns, mainly by clari1cation. The data are consistent with the authors’ hypothesis. The pool concept is somewhat similar to that of Richards et al (2000) and Rey et al (2015). The authors further propose that two reserve pools feed vesicles to two readily-releasable pools independently.

      To clarify further: The possibility that distinct reserve pools feed distinct readily releasable pools is predicted by our working model, and is something that we would like to test in the future, but is not a conclusion of the present study. Instead, in the present study, we tested the prediction that quickly and slowly mobilized reserve vesicles are processed in parallel without making assumptions about the the underlying mechanism.

      Unfortunately, the heterogeneity among individual synapses remains a concern as shown in (some of) the raw data (Fig. 3 and supplements).

      We emphasize that we have not attempted to minimize the extensive heterogeneity among synapses, but actually highlight this. In fact, we chose the image in Figure 3 for an example in part because of the lower left region replicated in Figure 3 supplement 2 demonstrating extensive heterogeneity along what appears to be a single axon. We are not the 1rst to notice the heterogeneity (see Waters and Smith, 2002), but we do provide a new possible explanation which, if correct, might be impor- tant for understanding biological computation (see our Discussion). At the same time, we believe that our evidence for multiple reserve pools within individual synapses with heterogenous properties is compelling. We see no contradiction, and indeed, our conclusion that the ratio of slowly to quickly mobilized varies extensively between synapses can only be correct if individual synapses contain mul- tiple types. We hope that people who are interested in our conclusions will evaluate the evidence and reasoning presented in our report.

      Bulk imaging of FM de-staining does not really measure the fraction of non-stained vesicles, which changes dynamically during stimulation, so that the situation calls for an independent readout of stained and non-stained vesicles. Moreover, direct correspondence between two speci1c stimulation frequencies (with long stimulation) and vesicle pools is not straightforward. These issues make the experimentally measured pools not well-de1ned.

      We think that the reviewer is suggesting an alternative scenario where decreases in the fractional rate of FM-dye de-staining seen during 1 Hz stimulation might be caused by a large (4-fold) increase in the total size of the reserve pool that dilutes the stained vesicles by mixing. This scenario is consis- tent with the results in Figures 2 and 4-7, and initially seems plausible because previous studies have shown that many vesicles are not mobilized, and therefore are not stained, during our standard load- ing protocol of 100 s at 20 Hz (Harata et al., 2001). However, liberation of this "deep reserve" as an explanation for the decrease in fractional destaining is not compatible with the results in Figures 10-11 that rule out mixing. For example, liberation of the deep reserve would cause fractional destaining to appear equally depressed during subsequent 20 Hz stimulation, and Figure 10 shows that this is not the case. The scenario cannot be rescued by postulating that the subsequent 20 Hz stimulation caused the deep reserve to quickly recapture the liberated vesicles because Figure 11D-E shows that fractional de-staining continues to be depressed at the very beginning of a second 1 Hz train that follows the 20 Hz stimulation.

      (2) The authors’ latest round of responses did not alleviate most of my major previous concerns. The additional data now shown in Fig 3 rely on conceptually the same type of bulk measurements and thus suffer from the same limitations as outlined in the earlier review.

      We believe that the new evidence in Figure 3 for multiple reserve pools at individual synapses is strong when evaluated in combination with the results in Figures 4-11. We do not, at present, see how the fact that FM-dye destaining is used as a bulk measurement at the sub-synaptic level could undercut our logic.

      Moreover, the image of neuronal cultures shown in Fig. 3 might be problematic. It shows very bright staining with large round lumps, which may be indicative of unhealthy cultures.

      Unhealthy cultures are not a concern because we used strict quantitative criteria to assess health that are better than we have seen elsewhere (details below). We think the reviewer might be reacting to the way we rendered the image; i.e., as “overexposed”. We did this to highlight the dimmest punctae, which is a key element of the analysis. The same image rendered with less contrast is now displayed in Author response image 1 (3rd panel from left).

      Author response image 1.

      Image to left is a reproduction of the example image in Figure 3, which was the average of 120 time lapse raw data images; scale bar is 20 µm. The second image is a replicate except all 69 punctae that were included in the study are occluded by 1.5 µm × 1.5 µm yellow squares. The third image is another replicate except with a different brightness setting. The rightmost image is one of the raw data images with brightness matched to the third image.

      More details (relevance to in vivo is in point 4):

      (1) Identifying unhealthy cultures is straightforward with our technique because synapses in un- healthy cultures destain spontaneously. Our criteria for accepting experiments for further analy- sis was less than 1.5 % spontaneous rundown/minute. This is a better way to judge health than we have seen elsewhere because it eliminates subjective decisions, and would be equally appli- cable for microscopes and imaging software of any quality. For our part, we used a 25X objective with a low numerical aperture and low intensity illumination that allowed us to completely avoid photobleaching. The images will look worse to some compared to when acquired with a higher quality microscope, but the absence of photobleaching is an important bene1t because it allowed us to avoid complicated corrections.

      (2) Stained areas larger than 1.5 µm across - such as the ones noted by the reviewer - were expressly excluded from our study because they could have been clusters of multiple synapses. The size criteria are detailed in the Legend of Figure 3. Punctae and larger areas that were excluded are the ones that are not occluded by yellow squares in the 2nd image from the left, above; at least two of the largest were likely clusters of synapses that were out of focus. Nevertheless, despite being excluded, it is unlikely that the stained areas larger than 1.5 µm in the image in Figure 3 were characteristic of unhealthy cultures because these areas did not de-stain spontaneously, but instead de-stained in response to 1 and 20 Hz electrical stimulation much like the small punctae that were included in the analysis.

      (3) Electron microscopy results have shown that individual synapses vary >10-fold in size, so a large range of brightness is expected (Murthy et al., 2001). The large range would either make the brighter punctae and clusters appear to be overexposed in a printed image, or render the dimmer punctae invisible. We have opted to present an image with overall brightness adjusted so that the dimmest punctae are visible. This is appropriate because one of the concerns was that analyzing the dimmest punctae would reveal underlying populations where the rate of fractional destaining was constant. In the end, no evidence for underlying populations emerged, which supports the conclusion that the decreases in fractional destaining occur at individual synapses. Note that adjusting brightness for example images was unavoidable; we used the camera in a range that was far below saturation and, because of this, images presented without adjusting brightness would appear to be completely black.

      (4) Primary cell cultures are non-physiological by de1nition, so the concept of health is intrinsically arbitrary, and relevance to synapses in brains is questioned routinely. However, the new 1ndings in the present report are that: (1) individual hippocampal synapses contain multiple reserve pools; (2) the reserves remain separate but are not distinguishable by the timing of mobilization when the frequency of stimulation is high; and (3) the reserves are nevertheless processed in parallel even when the frequency of stimulation is high. Of these, 1nding (1) has been reported previously for other synapse types, but 1ndings (2) and (3) were both unexpected, and 1nding (3) was not compatible with current concepts. Nevertheless, all three 1ndings were predicted by a model that was developed to explain orthogonal results from studies of intact synapses in ex vivo slices that did not 1t with current concepts either, as referenced in the Introduction. Because of this, we think that the parallel processing of quickly and slowly mobilized reserve vesicles likely occurs in individual Schaffer collateral synapses in vivo, and is not a cell culture artifact; the alternative would be too much of an unlikely coincidence.

      References

      Harata N, Pyle JL, Aravanis AM, Mozhayeva M, Kavalali ET & Tsien RW (2001). Limited numbers of recycling vesicles in small CNS nerve terminals: implications for neural signaling and vesicular cycling. Trends in Neurosciences 24, 637–43.

      Murthy VN, Schikorski T, Stevens CF & Zhu Y (2001). Inactivity produces increases in neurotransmitter release and synapse size. Neuron 32, 673–82.

      Waters J & Smith SJ (2002). Vesicle pool partitioning in2uences presynaptic diversity and weighting in rat hippocampal synapses. Journal of Physiology 541, 811–23.


      The following is the authors’ response to the original reviews.

      Reviewer 1

      Mahfooz et al. investigated the time course of synaptic vesicle fusion of cultured mouse hippocampal synapses using FM-styryl dyes. The major finding is that the FM destaining time course deviates from a mono-exponential function during 1 Hz, but not 20 Hz stimulation. The deviation from a mono-exponential function was also seen during a second stimulus train applied after recovery periods of several minutes, or after depletion of the readily-releasable vesicle pool. Furthermore, this "decreased fractional destaining" was unlikely due to long-term synaptic depression, or incomplete dye clearance. Fractional destaining was enhanced when the dye was loaded with 1 Hz compared with 20 Hz stimulation, suggesting that vesicles recycled during 1 Hz stimulation are predominantly sorted into a rapidly mobilized pool. Finally, they show that 20 Hz stimulation does not affect the decrease in fractional destaining induced and recorded during 1 Hz stimulation. Based on these observations, they put forward a model in which slowly and quickly resupplied synaptic vesicles are mobilized in parallel.

      The demonstration that FM destaining time courses deviate from single exponentials during 1 Hz stimulation (Figs 2-3) is a starting point used to rule out simple models where vesicles intermix freely and to introduce a mathematical technique for quantifying the extent of the deviations that is essential for the analysis of later experiments, where curve fitting could not be used. We then:

      1) Show that the deviation from simple models is not caused by depletion of the readily releasable pool, as noted by the reviewer;

      2) rule out a number of explanations for the deviation that do not involve reserve pools at all, again as noted;

      3) provide affirmative evidence for the presence of multiple reserve pools by labeling them with distinct colors;

      4) show that the vesicles within the distinct reserve pools do not intermix even when activity is intense enough to drive destaining with single exponential kinetics.

      We believe that the 4th point - documented in Figs 10-11 - is a key element.

      Beyond that, we note that our working model arose from previous studies, as referenced in the Introduction, not from the present results. The model did predict the parallel processing of quickly and slowly mobilized reserves, and the present study was designed to test this prediction. In that sense, the evidence in the current study supports our working model, not the other way around.

      In any case, most readers in the near term will be more interested in the serial versus parallel question, and less in precisely what the present results mean for evaluating our working model. Because of this, we emphasize that evidence for parallel processing of separate reserve pools depends solely on experimental results within the study, and not on modeling. As a consequence, the evidence will continue to be equally strong even if problems with our working model arise later on (lines 382-386).

      We do have additional unpublished evidence for the working model that does not bear directly on the parallel versus serial question. Some of this was removed from an earlier version of the manuscript and some has been newly gathered since the original submission. We will publish the additional evidence at a later point. We decided not to include it in the present manuscript expressly to avoid confusion about the relationship between modeling and the evidence for parallel processing in general.

      The paper addresses an interesting question - the relationship between the resupply and release of synaptic vesicles. The study is based on a lot of data of high quality. Most data are solid. However, some of the major conclusions are not well supported by the data. Moreover, it remains unclear how speci1c the findings are to the experimental design.

      The following points should be addressed:

      1) Most traces display a decrease in fluorescence intensity before stimulation. Data with a decrease in baseline fluorescence intensity of up to 1.5 % were considered for the analysis (Fig 2-supplement 2). I may have missed it, but were the data corrected for the observed decrease in baseline fluorescence intensity? (In the model shown in Appendix 1 Figure 1, they correct for "rundown"). For instance, are the residuals shown in Fig 2D, E based on corrected data? In case the data would not be corrected for a decrease in baseline fluorescence, would the decay kinetics also deviate from a single exponential after correction?

      We did not correct for rundown - as now noted on lines 96-97 - except in the figure in the Appendix, noted by the reviewer, where the uncorrected and corrected time courses are plotted side by side for easy comparison. However, our study includes an analysis showing that correcting for rundown during 1 Hz stimulation would increase - not decrease - the deviation from a single exponential (2 bars in rightmost panel in Fig 2C, and lines 113-116 of Results), so the absence of a correction does not weaken our conclusions.

      2) The analysis of "fractional destaining" is not clear to me. How many intervals of which length were chosen and why? For instance, the intervals often differ in length, number and do not cover the complete decay (e.g., Fig 2B).

      We calculated fractional destaining from longer intervals at later times because the overall amount of stain was less, meaning signal/noise was less, and scatter was more. We did this because increased scatter at later times could be counteracted by estimating the slope of destaining from longer intervals. An additional bene1t is that elongating the later intervals allowed us to plot only 6 bars for 25 min of 1 Hz destaining, which works better visually than 17.

      Increasing the interval length for later times is mathematically sound because the key factor causing distortions related to deviations from linearity is not the length of the interval per se but, instead, the fractional destaining over the interval. The fractional destaining is greater at the start of 1Hz stimulation, thus requiring shorter intervals.

      It would be possible to choose inappropriately long intervals that would distort estimates of the change in fractional destaining. However, we now include Fig 2-supplement 6 – which includes all 17 1.5 min intervals - to con1rm that any distortions after the first interval were minimal. The Appendix predicts a biologically important distortion for the first interval which we are following up, but this would underestimate the true deviation from quickly mixing pools, so would not be problematic for the present conclusions.

      Sometimes, only the interval right after stimulation onset was considered (e.g., Fig 7, 8).

      Figs 7, 8 in the previous version are now Figs 8, 9.

      This is appropriate because the goal was to estimate the fractional destaining at the very start, before the quickly mobilized fraction has destained.

      How quickly fractional destaining is expected to revert to the lowest value seen after 15 min of 1Hz stimulation in Fig 2 (and elsewhere) depends very much on assumptions - such as the number of reserve pools, etc. We sought to avoid this kind of additional analysis because we are keen to avoid the impression that our main conclusions depend on the speci1cs of modeling.

      How sensitive are the changes in fractional destaining to the choice of the intervals?

      Minimally. This can be seen by eye because the magenta lines in Fig 2B 1t the data well, but see Fig 2-supplement 6 for a quantitative comparison.

      For instance, would fractional destaining be increased if later intervals would have been chosen for the second 20 Hz stimulus in the experiment shown in Fig 9B?

      Previous Fig 9B is now Fig 10B.

      We cannot be certain, but think it probably would not be different. Neither an increase nor a decrease would be problematic for our conclusions.

      More detail: There is not enough data to evaluate this specifically for Fig 10B because the total amount of stain remaining at later intervals is little, meaning signal/noise is low, which causes extensive experimental scatter. However, synapses were even more extensively destained prior to time course c of Figure2-supplement 2C, which nevertheless matches time courses a, b, and d.

      I propose fitting all baseline-corrected data with a single and a double-exponential function (as well as single exponential plus line?) and reporting the corresponding time constants (slopes) and amplitudes.

      As noted above, we purposefully do not baseline correct data in a way that would make this possible. However, we do include exponential fits when appropriate, in Fig 2D-E, Fig 2- supplement 1, Fig 2-supplement-7, Fig 2-supplement-8, and Fig 12B.

      Indeed, the absence of any change in the weighting parameter despite substantial changes for both time constants seen after raising the temperature to 35C (Fig 2-supplement-8 vs Fig12B) is notable because it suggests that the contents of the reserve pools are not altered by changing temperature, even though vesicle trafficking is accelerated. Fig 2-supplement-8 is a supplementary figure because the result is outside the scope of the main point, not because the quality is lower than for other figures.

      Beyond that, exponential fits would not be adequate for most of the study because many experiments - including the core experiments in Figs 10-11 - require discontinuous stimulation, such as when we stop stimulating at 1 Hz, rest for minutes, and then start up again at 1 or 20 Hz. And, although widely used, exponentials are non-linear equations after all. Even when they can be used to quantify time courses, the fractional destaining measurement is almost always more informative, in the technical sense, because it avoids complications when estimating the importance of deviations occurring at the two extremes versus deviations in the middle of the time course.

      3) Along the same lines, is the average slow time constant indeed around 40 min? (Are the data shown in Fig 2 S7 based on an average?) If this would be the case, I suggest conducting a control experiment with a recording time > 40 min. Would fitting an exponential or a line to baseline data (without stimulation) also give a similar slow component?

      Fig 2-supplement 7 in the previous version is now Fig 2-supplement 8.

      First, yes, the time course shown in Fig 2-supplement 8 is the mean across preparations. The time courses of the individual preparations were quanti1ed as the median value of the individual ROIs before averaging.

      Second, no, fitting baseline data would give an approximately 3-fold greater time constant (i.e., 120 min) because fractional destaining decreases by about 3-fold when we stop stimulating after 25 min of 1 Hz stimulation (i.e., Fig 2C, 3B, and many others).

      The key point is that fractional destaining decreases greatly over long trains of 1 Hz stimulation.

      For Fig 2, we saw a 2.7+/-0.1-fold decrease before accounting for baseline destaining (lines 106-110), which increased to a 4.4-fold decrease when we did account for baseline destaining (lines 113-116). Overall, the 2.7-fold value is simultaneously a safe minimum boundary, and much greater than the value of 1.0 expected from models where vesicles mix freely.

      Note that future studies will show that even the 4.4-fold value is probably an underestimate because 1 Hz stimulation misses a fast component at the very beginning of the time courses, as predicted in the Appendix.

      4) How speci1c are the findings to 1 Hz (and 20 Hz) stimulation? From which frequency onward can a decrease in fractional destaining be no longer observed?

      Our logic depends only on the premise that we are able to find some frequency where fractional destaining no longer decreases. We knew that 20 Hz was a good place to start because of previous electrophysiological experiments - frequency jumps (Fig 1 of Wesseling and Lo, 2002 and Fig 2C of Garcia-Perez and Wesseling, 2008), and trains of action potentials followed by osmotic shocks (Fig 2A of Garcia-Perez et al., 2008) - showing that 20 Hz stimulation is enough to nearly completely exhaust the readily releasable pool. This is noted in lines 202-203, and Box 2.

      would previous stimulation with frequencies <20 Hz interfere with fractional destaining? These control experiments would help assessing how general/speci1c the findings are.

      Yes (Figs 4 and 11A at 1 Hz). Also, we have done experiments at 0.1 Hz, which will be published later; some of these were actually removed from an earlier version of the manuscript because the results are primarily relevant to deciding between particular parallel models, and are not relevant to the conclusion of the present study that quickly and slowly mobilized reserves are processed in parallel.

      Similarly, a major conclusion of the paper - the parallel mobilization of two vesicle pools - is largely based on these two stimulation frequencies. Can they exclude that mixing between the two pools occurs at other frequencies?

      We cannot exclude the possibility of breakdown at a higher frequency, but this would not undercut our conclusions. We do not have plans to try this experiment because: (1) a positive result would be open to concerns about non-physiologically heavy stimulation; and (2) a negative result would be difficult to interpret because of the possibility that the axons cannot follow at higher frequencies.

      6) Some information in the methods section is lacking. For instance, which species is the cell culture based on?

      Mice from both sexes were used. This is now speci1ed in the Methods.

      Reviewer 2

      By using optical monitoring of synaptic vesicles with FM1-43 at hippocampal synapses, the authors try to show the evidence for two parallel reserve pools of synaptic vesicles, which feed the vesicles to the readily releasable pool. The major strength of the study is the use of a quantitative model, which can be readily testable by experiments: in the course of the study, the authors propose the best vesicle pool model, which fits the experimental data "averaged over synapses" nicely. On the other hand, the weak point of the study comes from the optical method and the data: bulk imaging of vesicle dynamics monitored at each synapse is noisy and the signals vary considerably among synapses. Therefore, the average signals over many synapses may not reflect the vesicle dynamics of two reserve pools within a synapse, but something else, such as the different kinetics of release from multiple synapses with different release probability. Nevertheless, a new framework of two reserve pools offers a testable hypothesis of vesicle dynamics, and the use of single vesicle tracking and EM may allow one to give a de1nitive answer in the future studies Therefore, the study may be of interest to the community of synaptic neurobiology.

      1) The current version includes a new figure (Fig 3) showing that the deviations from single pool models seen in populations are caused by deviations occurring at the level of single synapses. The heterogeneity between synapses actually causes population statistics to underestimate - not overestimate - the mean and median size of the deviations at individuals.

      We think the new evidence in Fig 3 and supplements is conclusive without follow-on EM of the same punctae given the substantial body of already published EM on similar cultures. Essentially, the only way to explain the results without invoking multiple reserve pools in individual synapses would be to say that individual synapses ALWAYS come in clumps containing multiple types and are NEVER separated from neighbors by more than 1.5 microns - even when the clumps are separated from each other by 5 microns. There is already clear evidence against this.

      2) No new model is proposed here, see the first response to the first reviewer.

      3) We are not aware of alternative hypotheses that could account for our results, so cannot evaluate if single vesicle tracking and EM could add meaningful additional support.

      1) The existence of non-stained vesicles complicates the interpretation of the data. Because the release by 20 Hz and 1 Hz stimulation do not entirely reflect the release from fast and slow vesicle pools. the estimation of non-stained vesicles using synaptopHluorin (+ba1lomycin) and EPSCs would be helpful to examine fraction of non-stained / stained vesicles over time (with stimulation, the ratio may change dynamically, which may bring complications).

      Non-stained vesicles are not a complication, but instead a key element of our logic which is included in the diagrams in Boxes 1 and 2 and Figure 9. That is, quickly and slowly mobilized reserves can be distinguished at 1 Hz precisely because 1 Hz is not intense enough to exhaust the readily releasable pool (Box 2). The corollary is that stained vesicles must be replaced by non-stained vesicles, because otherwise 1 Hz stimulation would exhaust the readily releasable pool. And this is why FM-dyes (plus a beta-cyclodextrin during washing) are ideal for the current questions whereas other techniques, such as electrophysiology or synaptopHluorin imaging are obviously indispensable for other questions, but could not replace the FM-dyes in the current study. This is now noted on lines 86-89.

      We are aware that synaptopHluorin + ba1lomycin could, in principle, accomplish some of the same goals. However, ba1lomycin ended up being toxic when applied for tens of minutes, as it would have to be in our experiments. And, we do not see what critical question is not already answered with strong evidence using FM dyes.

      2) Individual synapses show marked differences in the time course of de-staining, suggesting differences in release probability. The averaging of the whole data may reflect "average" behavior of synapses, but for example, bi-exponential time course may reflect high Pr and low Pr synapses, rather than vesicle recruitment.

      The authors may comment on this issue.

      See newly added Fig 3, and responses above.

      3) Some differences are very small (Fig 10, the same amplitude as bleaching time course), and I am not certain if the observed differences are meaningful, given low signal to noise ratio in each synapse.

      Fig 10 in the previous version is Fig 11 in the current version.

      Even if correct, this would not be problematic because 20 Hz stimulation clearly did not cause fractional destaining to return to the initial value when stimulation was resumed at 1 Hz (compare d and f in Fig 11E). In any case, Figs 2C, 3B, 5B, 7B, and Fig 10-supplement 2A all show that the minimum fractional destaining value during 1 Hz stimulation is about 3-fold greater than during subsequent rest intervals, which is not a small difference. Also, note that Fig 2-supplement 3 shows that photobleaching likely did not play a role.

      Reviewer 3

      Reviewer #3 (Recommendations For The Authors):

      This study attempts to conceptualize the long-standing question of vesicle pool organization in presynaptic terminals. Authors used classical FM dye release experiments to support a hypothesis that rapidly and slowly releasing vesicles are mobilized in parallel without intermixing. This modular model is also supported indirectly by the authors’ recent findings of molecular links that connect a subset of vesicles in linear chains (published elsewhere).

      Our study should be seen as a test of the hypothesis that quickly and slowly mobilized reserves are processed in parallel. The evidence is independent of any modeling, and would continue to be equally strong if our working model turns out to be incorrect (lines 382-386).

      The scope of the original model was limited by a number of caveats. The main concerns included a limited data set measured in bulk from a highly heterogeneous synapse population, and a complex interrelationship between vesicle mobilization and the bulk FM dye de-staining kinetics. The second major limitation was measurements being performed at room temperature, which inhibits or alters a number of critical synaptic processes that are being modeled. This includes the efficiency of exo/endocytosis coupling, vesicle mobility and release site refractory period, which are stimulus- and temperature-dependent, but were not accounted for in the original model.

      The present study contains experiments at body temperature (Fig 12 and Fig 12-supplement 1 in the current version) and analyses of individual synapses (especially Fig 3 in the current version). To our knowledge all results are consistent with everything that is known about the efficiency of exo/endocytosis coupling, vesicle mobility and release site refractory periods.

      The authors made strong efforts to address previous concerns. However, the main conceptual point, i.e. linking the bulk FM dye de-staining kinetics with precise arrangement of vesicle pools, is not well supported and is generally highly problematic because it ignores many additional processes and confounding factors.

      For example, vesicle exchange between neighboring synapses constitutes from 15% to over 50% of total recycling vesicle population, and therefore is a major contributing factor to FM dye loss/redistribution, but is not considered in this study. Additionally, this vesicle exchange process undergoes calcium/activity-dependent changes, contributing to difficulty in interpreting the current experiments comparing FM de-staining at different stimulation frequencies.

      We do not see how exchange of vesicles between synapses could be a problem for our logic, so cannot evaluate this without a more detailed description of the concern. Instead, our results rule out random inter-synaptic exchange between quickly and slowly mobilized reserve pools because this would show up in our assays as mixing, which does not occur. We think there are three remaining possibilities:

      1) vesicles are exchanged primarily between quickly mobilized reserve pools

      2) vesicles are exchanged primarily between slowly mobilized reserve pools

      3) vesicles in quickly mobilized reserve pools are targeted to quickly mobilized reserve pools in other synapses and vesicles in slowly mobilized reserve pools are targeted to slowly mobilized reserve pools in other synapses.

      It would be interesting to know which of these is correct, but this is outside the scope of the current study.

      Moreover, other forms of release, such as asynchronous release, contribute a large fraction of released vesicles, but are not factored in. Asynchronous release varies widely in synapse population from 0.1 to >0.4 of synchronous release, but is entirely ignored. Spontaneous release may also contribute to FM dye loss over extended 25min recordings used.

      Spontaneous release and asynchronous release are not caveats.

      First, spontaneous: We suspect that spontaneous release contributes to the background destaining rate, but this is 3-fold slower than the minimum during 1 Hz stimulation on average (Figs 2C, 3C, 5B etc), so we know that the slowly mobilized reserve is mobilized by low frequency trains of action potentials (lines 410-412). Note that a different outcome - where the rate of destaining decreased to a very low level during long trains of 1 Hz stimulation - would not have been consistent with the idea that slowly mobilized vesicles are only released spontaneously because the remaining fluorescence can always be destained rapidly by increasing the stimulation intensity to 20 Hz (e.g., see examples in Fig 3).

      Second, asynchronous: We know that slowly mobilized reserves must be released synchronously at 35C because the asynchronous component is eliminated at this temperature (Huson et al., 2019), without altering the quantity of slowly mobilized reserves that are mobilized by 1 Hz stimulation (lines 350-360 of Results, and 445-452 of Discussion; we can con1rm from our own unpublished experiments that the disappearance of asynchronous release at 35C is a robust phenomenon in these cell cultures). Asynchronous release of slowly mobilized vesicles might occur at room temperature, but this would not argue against the conclusion that slowly mobilized vesicles are processed in parallel with quickly mobilized.

      Speci1c comments:

      Points 1-4 are already addressed above.

      5) The notion of the chained vesicles is somewhat confusing: how does the "first" vesicle located at the plasma membrane/release site get released if it is attached to the chain? Wouldn’t this "first" vesicle be non-immediately releasable since it must first be liberated? Since all vesicles shown in the Figure 1 have chains attached to them, what vesicle population then give rise to sub-millisecond release?

      This is not a concern relevant to the present study because none of the conclusions rely on the model in any way (see Introduction, and lines 382-386 of the Discussion). Beyond that: We previously published clear evidence that docked vesicles are tethered to non-docked vesicles (Figure 8 of Wesseling et al., 2019). We see no reason to suspect that a tether to an internal vesicle would prevent the docked vesicle from priming for release.

      7) Model: For fitting de-staining during 20 Hz stimulation, authors state that it was necessary to allow >5-fold Facilitation. This seems to be non-physiologically relevant, since previous studies found only very mild facilitation at room temperature (typically below a factor of 1.5-2.0) and the authors themselves state that, at most, a 1.3 fold facilitation was found.

      If the 1.3-fold facilitation estimate comes from us, it must have been in a different context.

      Most estimates of facilitation that are published are heavily convolved with simultaneous depression, and there is additionally a saturation mechanism for readily releasable vesicles with high release probability that is not widely known (Garcia-Perez and Wesseling, 2008). The standard method for eliminating the depression is to lower the probability of release by lowering extracellular [Ca2+], which additionally relieves occlusion by the saturation mechanism. And, lowering [Ca2+] uncovers an enormous amount facilitation at synapses in hippocampal cell culture. For example, see Figure 2B of Stevens and Wesseling (1999), which shows a 7-fold enhancement during 9 Hz stimulation, and Figure 3 of the same study, which shows a linear relationship with frequency. Taken together these two results suggest 15-fold enhancement during 20 Hz stimulation, which far exceeds the 5-fold value needed at inefficient release sites to make our working model 1t the FM-dye destaining results.

      References

      Garcia-Perez E, Lo DC & Wesseling JF (2008). Kinetic isolation of a slowly recovering component of short-term depression during exhaustive use at excitatory hippocampal synapses. Journal of Neurophysiology 100, 781–95.

      Garcia-Perez E & Wesseling JF (2008). Augmentation controls the fast rebound from depression at excitatory hippocampal synapses. Journal of Neurophysiology 99, 1770–86.

      Huson V, van Boven MA, Stuefer A, Verhage M & Cornelisse LN (2019). Synaptotagmin-1 enables frequency coding by suppressing asynchronous release in a temperature dependent manner. Scienti1c reports 9, 11341.

      Stevens CF & Wesseling JF (1999). Augmentation is a potentiation of the exocytotic process. Neuron 22, 139–46.

      Wesseling JF & Lo DC (2002). Limit on the role of activity in controlling the release-ready supply of synaptic vesicles. Journal of Neuroscience 22, 9708–20.

      Wesseling JF, Phan S, Bushong EA, Siksou L, Marty S, Pérez-Otaño I & Ellisman M (2019). Sparse force-bearing bridges between neighboring synaptic vesicles. Brain Structure and Function 224, 3263–3276.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      In this study, the authors introduced an essential role of AARS2 in maintaining cardiac function. They also investigated the underlying mechanism that through regulating alanine and PKM2 translation are regulated by AARS2. Accordingly, a therapeutic strategy for cardiomyopathy and MI was provided. Several points need to be addressed to make this article more comprehensive:

      Thank this reviewer for the overall supports on our manuscript.

      (1) Include apoptotic caspases in Figure 2B, and Figure 4 B and E as well.

      This is a good point for further investigating the role of apoptosis signaling in cardiac-specific AARS2 knockout hearts. Since we are focusing on cardiomyocyte phenotypes, immunostaining on TUNEL and anti-cTnT directly evaluated the level of cardiomyocyte apoptosis, which was supported by Western blots with anti-Bcl-2 and anti-BAX of control and mutant hearts. TUNEL data accurately represents biochemical and morphological characteristics of apoptotic cells, and is more sensitive than the conventional histochemical and biochemical methods. Future studies are needed to address how apoptosis components including apoptotic caspases are involved in cardiomyocyte apoptosis in AARS2 mutant hearts.

      (2) It would be better to show the change of apoptosis-related proteins upon the knocking down of AARS2 by small interfering RNA (siRNA).

      Since primary culture of neonatal cardiomyocytes also contained non-cardiomyocytes, using Western blots with anti-apoptosis proteins cannot directly assess cardiomyocytes phenotypes. In this work, our data on the elevation of cTnT<sup>+</sup>/TUNEL<sup>+</sup> cardiomyocytes and cardiac fibrosis in AARS2 mutant hearts suggest that AARS2 deficiency induced cardiomyocyte death.

      (3) In Figure 5, the authors performed Mass Spectrometry to assess metabolites of homogenates. I was wondering if the change of other metabolites could be provided in the form of a heatmap.

      Indeed, we assessed other metabolites by mass spectrometry as shown below, we found that overexpression of AARS2 in either transgenic mouse hearts or neonatal cardiomyocytes had no consistent changes on the level of fumarate, succinate, malate, alpha-ketoglutarate (alpha-KG), citrate, oxaloacetate (OAA), ATP, and ADP, thus suggesting that AARS2 overexpression has more specific effect on the level of lactate, pyruvate, and acetyl-CoA.

      Author response image 1.

      (4) The amounts of lactate should be assessed using a lactate assay kit to validate the Mass Spectrometry results.

      We carried out several rounds of mass spectrometry experiments, suggesting that lactate is consistently elevated after AARS2 overexpression in neonatal cardiomyocytes as shown below. We will establish other lactate assays in future studies.

      Author response image 2.

      (5) How about the expression pattern of PKM2 before and after mouse MI. Furtherly, the correlation between AARS2 and PKM2?

      Previous studies have shown that the expression level of PKM2 in mice is significantly increased after cardiac surgery at different time points, which may be related to cardiometabolic changes [1]. Our co-IP experiments showed no direct interactions between AARS2 and PKM2 (Figure 6K), while both AARS2 proteins and mRNA decreased on the 3 days (Figure 1A-B) and 7 days (Author response image 3)after myocardial infarction in mice. Thus, the level of AARS2 is reversely related to PKM2 after myocardial infarction.

      Author response image 3.

      (6) In Figure 5, how about the change of apoptosis-related proteins after administration of PKM2 activator TEPP-46?

      It has been shown that TEPP-46 treatment decreased cardiomyocyte death in different models that induced cardiomyocyte apoptosis [2, 3]. We would like to refer these published works that TEPP-46 treatment improves heart function by inhibiting cardiac injury-induced cardiomyocyte death.

      Reviewer #2 (Public Review):

      Summary:

      The authors aimed to elucidate the role of AARS2, an alanyl-tRNA synthase, in mouse hearts, specifically its impact on cardiac function, fibrosis, apoptosis, and metabolic pathways under conditions of myocardial infarction (MI). By investigating the effects of both deletion and overexpression of AARS2 in cardiomyocytes, the study aims to determine how AARS2 influences cardiac health and survival during ischemic stress.

      The authors successfully achieved their aims by demonstrating the critical role of AARS2 in maintaining cardiomyocyte function under ischemic conditions. The evidence presented, including genetic manipulation results, functional assays, and mechanistic studies, robustly supports the conclusion that AARS2 facilitates cardiomyocyte survival through PKM2-mediated metabolic reprogramming. The study convincingly links AARS2 overexpression to improved cardiac outcomes post-MI, validating the proposed protective AARS2-PKM2 signaling pathway.

      This work may have a significant impact on the field of cardiac biology and ischemia research. By identifying AARS2 as a key player in cardiomyocyte survival and metabolic regulation, the study opens new avenues for therapeutic interventions targeting this pathway. The methods used, particularly the cardiomyocyte-specific genetic models and ribosome profiling, are valuable tools that can be employed by other researchers to investigate similar questions in cardiac physiology and pathology.

      Understanding the metabolic adaptations in cardiomyocytes during ischemia is crucial for developing effective treatments for MI. This study highlights the importance of metabolic flexibility and the role of specific enzymes like AARS2 in facilitating such adaptations. The identification of the AARS2-PKM2 axis adds a new layer to our understanding of cardiac metabolism, suggesting that enhancing glycolysis can be a viable strategy to protect the heart from ischemic damage.

      We thank this reviewer for his/her supports on our manuscript.

      Strengths:

      (1) Comprehensive Genetic Models: The use of cardiomyocyte-specific AARS2 knockout and overexpression mouse models allowed for precise assessment of AARS2's role in cardiac cells.

      (2) Functional Assays: Detailed phenotypic analyses, including measurements of cardiac function, fibrosis, and apoptosis, provided evidence for the physiological impact of AARS2 manipulation.

      (3) Mechanistic Insights: This study used ribosome profiling (Ribo-Seq) to uncover changes in protein translation, specifically highlighting the role of PKM2 in metabolic reprogramming.

      (4) Therapeutic Relevance: The use of the PKM2 activator TEPP-46 to reverse the effects of AARS2 deficiency presents a potential therapeutic avenue, underscoring the practical implications of the findings.

      Weaknesses:

      (1) Species Limitation: The study is limited to mouse and rat models, and while these are highly informative, further validation in human cells or tissues would strengthen the translational relevance.

      We fully agree with this reviewer that this study is limited to mouse and rat models. It would certainly be important to address how AARS2-PKM2 is related myocardial infarction patients in the future.

      (2) Temporal Dynamics: The study does not extensively address the temporal dynamics of AARS2 expression and PKM2 activity during the progression of MI and recovery, which could offer deeper insights into the timing and regulation of these processes.

      Thanks for this critical point. Indeed, we found that both AARS2 proteins and mRNA decreased on 3 days (Figure 1A-B) and 7 days (Author response image 3) after myocardial infarction in mice as shown below. Others have reported PKM2 proteins increased after heart surgery in mice at different time points [1]. Thus, the level of AARS2 is reversely related to PKM2 after myocardial infarction.

      Reviewer #3 (Public Review):

      In the present study, the author revealed that cardiomyocyte-specific deletion of mouse AARS2 exhibited evident cardiomyopathy with impaired cardiac function, notable cardiac fibrosis, and cardiomyocyte apoptosis. Cardiomyocyte-specific AARS2 overexpression in mice improved cardiac function and reduced cardiac fibrosis after myocardial infarction (MI), without affecting cardiomyocyte proliferation and coronary angiogenesis. Mechanistically, AARS2 overexpression suppressed cardiomyocyte apoptosis and mitochondrial reactive oxide species production, and changed cellular metabolism from oxidative phosphorylation toward glycolysis in cardiomyocytes, thus leading to cardiomyocyte survival from ischemia and hypoxia stress. Ribo-Seq revealed that AARS2 overexpression increased pyruvate kinase M2 (PKM2) protein translation and the ratio of PKM2 dimers to tetramers that promote glycolysis. Additionally, PKM2 activator TEPP-46 reversed cardiomyocyte apoptosis and cardiac fibrosis caused by AARS2 deficiency. Thus, this study demonstrates that AARS2 plays an essential role in protecting cardiomyocytes from ischemic pressure via fine-tuning PKM2-mediated energy metabolism, and presents a novel cardiac protective AARS2-PKM2 signaling during the pathogenesis of MI. This study provides some new knowledge in the field, and there are still some questions that need to be addressed in order to better support the authors' views.

      We thank this reviewer for his/her overall supports on our manuscript.

      (1) WGA staining showed obvious cardiomyocyte hypertrophy in the AARS2 cKO heart. Whether AARS affects cardiac hypertrophy needs to be further tested.

      WGA staining is widely used to measure the size of cardiomyocytes in the literature. Here, we found that the size of mutant cardiomyocytes increased by ~20% after AARS2 knockout. In addition, we also measured and found that the ratio of heart to body weight increased in AARS2 mutant mice compared with control siblings as shown below.

      Author response image 4.

      (2) The authors observed that AARS2 can improve myocardial infarction, and whether AARS2 has an effect on other heart diseases.

      Thanks for this critical point. We agree with this reviewer that it will be important to address whether overexpression of AARS2 has cardiac protection in other heart diseases such as transverse aortic constriction in the future.

      (3) Studies have shown that hypoxia conditions can lead to mitochondrial dysfunction, including abnormal division and fusion. AARS2 also affects mitochondrial division and fusion and interacts with mitochondrial proteins, including FIS and DRP1, the authors are suggested to verify.

      This is a good point. Mitochondrial dysfunction occurs when cardiomyocytes are subjected to hypoxia conditions such as myocardial infarction. Our ribosome sequencing data suggested that overexpression of AARS2 had no effect on the level of FIS1 and DRP2 as shown below. We agree with this reviewer that future studies are needed to clarify potential interactions between AARS2 and FIS/DRP1 proteins.

      Author response image 5.

      (4) The authors only examined the role of AARS2 in cardiomyocytes, and fibroblasts are also an important cell type in the heart. Authors should examine the expression and function of AARS2 in fibroblasts.

      We fully agree with this reviewer that AARS2 may also function in cardiac fibroblasts since it is expressed in fibroblasts and cardiomyocyte-specific AARS2 knockout led to more fibrosis after myocardial infarction, which certainly warrant future investigations.

      (5) Overexpression of AARS2 can inhibit the production of mtROS, and has a protective effect on myocardial ischemia and H/ R-induced injury, and the occurrence of iron death is also closely related to ROS, whether AARS protects myocardial by regulating the occurrence of iron death?

      Thank this reviewer for his/her critical point. Our current data cannot rule out whether iron-mediated death is involved in AARS2 function in cardiac protection, which warrant future investigations.

      (6) Please revise the English grammar and writing style of the manuscript, spelling and grammatical errors should be excluded.

      Sorry for spelling and grammatical errors. We have carefully revised this manuscript now.

      (7) Recent studies have shown that a decrease in oxygen levels leads to an increase in AARS2, and lactic acid rises rapidly without being oxidized. Both of these factors inhibit oxidative phosphorylation and muscle ATP production by increasing mitochondrial lactate acylation, thereby inhibiting exercise capacity and preventing the accumulation of reactive oxygen species ROS. The key role of protein lactate acylation modification in regulating oxidative phosphorylation of mitochondria, and the importance of metabolites such as lactate regulating cell function through feedback mechanisms, i.e. cells adapt to low oxygen through metabolic regulation to reduce ROS production and oxidative damage, and therefore whether AARS2 in the heart also acts in this way.

      This is an interesting question. Since overexpression of AARS2 in muscles has previously been reported to increase PDHA1 lactylation and decrease its activity [4]. Actually, we initially examined whether overexpression of AARS2 in cardiomyocytes has similar effect on PDHA1 lactylation. However, our results showed that overexpression of AARS2 had no evident increases of lactylated PDHA1 in cardiomyocytes as shown below. However, future studies are needed to explore whether other proteins lactylation by AARS2 are involved in its cardiac protection function.

      Author response image 6.

      Reviewer #2 (Recommendations For The Authors):

      Suggestions for Improved or Additional Experiments, Data, or Analyses:

      (1) Validation in Human Models: It would be great if, in the future, the authors could conduct experiments with human cardiomyocytes derived from induced pluripotent stem cells (iPSCs) to validate the findings in a human context. This would strengthen the translational relevance of the results.

      We fully agree with this reviewer that this study is limited to mouse and rat models. It would certainly be important to address how AARS2-PKM2 is related myocardial infarction patients and/or human iPSC-derived cardiomyocytes in the future.

      (2) Broader Metabolic Analysis: To perform comprehensive metabolic profiling (e.g., metabolomics) to identify other metabolic pathways influenced by AARS2 overexpression or deficiency. This could provide a more holistic view of the metabolic changes and potential compensatory mechanisms.

      As noted above, we indeed assessed other metabolites by mass spectrometry, we found that overexpression of AARS2 in either transgenic mouse hearts or neonatal cardiomyocytes had no consistent changes on the level of fumarate, succinate, malate, alpha-ketoglutarate (alpha-KG), citrate, oxaloacetic acid (OAA), ATP, and ADP, thus suggesting that AARS2 overexpression has more specific effect on the level of lactate, pyruvate, and acetyl-CoA.

      (3) Temporal Dynamics: Investigate the temporal expression and activity of AARS2 and PKM2 during the progression and recovery phases of myocardial infarction. Time-course studies could elucidate the dynamics and regulatory mechanisms involved.

      As noted above, we found that both AARS2 proteins and mRNA decreased on the third and seventh day after myocardial infarction in mice. Others have reported PKM2 proteins increased after heart surgery in mice at different time points [1]. Thus, the level of AARS2 is reversely related to PKM2 after myocardial infarction.

      (4) Investigate Additional Pathways: Explore the involvement of other signaling pathways and tRNA synthetases that might interact with or complement the AARS2-PKM2 axis. This could uncover broader regulatory networks affecting cardiomyocyte survival and function.

      Thank this reviewer for his/her critical point. This certainly warrants future investigations.

      (5) Mitochondrial Function Assays: Perform detailed mitochondrial function assays, including measurements of mitochondrial respiration and membrane potential, to further elucidate the role of AARS2 in mitochondrial health and function under stress conditions.

      We fully agree with this reviewer that future studies are needed to address how AARS2 is involved in mitochondrial function.

      (6) Single-Cell Analysis: Utilize single-cell RNA sequencing to examine the heterogeneity in cardiomyocyte responses to AARS2 manipulation, providing insights into cell-specific adaptations and potential differential effects within the heart tissue.

      We fully agree with this reviewer that it is important to address how AARS2 (cKO or overexpression) regulate cardiomyocyte heterogeneity and function in the future. 

      Recommendations for Improving the Writing and Presentation:

      (1) Visual Aids: Include more schematic diagrams to illustrate the proposed mechanisms, especially the AARS2-PKM2 signaling pathway and its impact on metabolic reprogramming. This can help readers better understand complex interactions.

      Below is our working hypothesis on the role of AARS2 in cardiac protection. AARS2 deficiency caused mitochondrial dysfunction due to increasing ROS production and apoptosis while decreasing PKM2 function and glycolysis, thus leading to cardiomyopathy in mutant mice.  On the other hand, overexpression of AARS2 in mice activates PKM2 and glycolysis while decreases ROS production and apoptosis, thus improving heart function after myocardial infarction.

      Author response image 7.

      (2) Discussion: Shorten the Discussion and systematically address the significance of the findings, limitations of the study, and potential future directions. This will provide a clearer narrative and context for the results.

      We have now made revisions on the Discussion part to highlight the significance of this work and brief perspective of future direction.

      (3) Minor corrections to the text and figures.

      We have now revised the full text carefully.

      (4) Typographical Errors: Carefully proofread the manuscript to correct any typographical errors and ensure consistent use of terminology and abbreviations throughout the text.

      Thanks. Based on the reviewer’s suggestions, we have carefully revised the manuscript and have done proof-reading on the whole manuscript.

      Availability of data, code, reagents, research ethics, or other issues:

      (1) Data Presentation: Ensure that all graphs and charts are clearly labeled with appropriate units, scales, and legends. Use color schemes that are accessible to color-blind readers.

      We followed these rules to present the data.

      (2) Supplementary Information: Provide detailed supplementary information, including raw data, experimental protocols, and analysis scripts, to enhance the reproducibility of the study.

      We provided the raw data, experimental protocols, and analysis scripts in the manuscript.

      (3) Data and Code Availability. Data Sharing: Authors should ensure that all raw data, processed data, and relevant metadata are deposited in publicly accessible repositories. Provide clear instructions on how to access these data. Code Availability: Make all analysis code available in a public repository, such as GitHub, with adequate documentation to allow other researchers to replicate the analyses.

      We have deposited RNA-Seq data at ArrayExpress (E-MTAB-13767). We have also uploaded the original data in the supplementary file.

      (4) Research Ethics and Compliance. Ethics Statement: Include a detailed statement on the ethical approval obtained for animal experiments, specifying the institution and ethical review board that granted approval. Conflict of Interest: Clearly state any potential conflicts of interest and funding sources that supported the research to ensure transparency.

      Thanks. In the manuscript we made an ethical statement, stating conflicts of interest and sources of funding.

      References:

      (1) Y. Tang, M. Feng, Y. Su, T. Ma, H. Zhang, H. Wu, X. Wang, S. Shi, Y. Zhang, Y. Xu, S. Hu, K. Wei, D. Xu, Jmjd4 Facilitates Pkm2 Degradation in Cardiomyocytes and Is Protective Against Dilated Cardiomyopathy, Circulation, 147 (2023) 1684-1704.

      (2) L. Guo, L. Wang, G. Qin, J. Zhang, J. Peng, L. Li, X. Chen, D. Wang, J. Qiu, E. Wang, M-type pyruvate kinase 2 (PKM2) tetramerization alleviates the progression of right ventricle failure by regulating oxidative stress and mitochondrial dynamics, Journal of translational medicine, 21 (2023) 888.

      (3) B. Saleme, V. Gurtu, Y. Zhang, A. Kinnaird, A.E. Boukouris, K. Gopal, J.R. Ussher, G. Sutendra, Tissue-specific regulation of p53 by PKM2 is redox dependent and provides a therapeutic target for anthracycline-induced cardiotoxicity, Science translational medicine, 11 (2019).

      (4) Y. Mao, J. Zhang, Q. Zhou, X. He, Z. Zheng, Y. Wei, K. Zhou, Y. Lin, H. Yu, H. Zhang, Y. Zhou, P. Lin, B. Wu, Y. Yuan, J. Zhao, W. Xu, S. Zhao, Hypoxia induces mitochondrial protein lactylation to limit oxidative phosphorylation, Cell research, 34 (2024) 13-30.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Cystinosis is a rare hereditary disease caused by biallelic loss of the CTNS gene, encoding two cystinosin protein isoforms; the main isoform is expressed in lysosomal membranes where it mediates cystine efflux whereas the minor isoform is expressed at the plasma membrane and in other subcellular organelles. Sur et al proceed from the assumption that the pathways driving the cystinosis phenotype in the kidney might be identified by comparing the transcriptome profiles of normal vs CTNS-mutant proximal tubular cell lines. They argue that key transcriptional disturbances in mutant kidney cells might not be present in non-renal cells such as CTNS-mutant fibroblasts.

      Using cluster analysis of the transcriptomes, the authors selected a single vacuolar H+ATPase (ATP6VOA1) for further study, asserting that it was the "most significantly downregulated" vacuolar H+ATPase (about 58% of control) among a group of similarly downregulated H+ATPases. They then showed that exogenous ATP6VOA1 improved CTNS(-/-) RPTEC mitochondrial respiratory chain function and decreased autophagosome LC3-II accumulation, characteristic of cystinosis. The authors then treated mutant RPTECs with 3 "antioxidant" drugs, cysteamine, vitamin E, and astaxanthin (ATX). ATX (but not the other two antioxidant drugs) appeared to improve ATP6VOA1 expression, LC3-II accumulation, and mitochondrial membrane potential. Respiratory chain function was not studied. RTPC cystine accumulation was not studied.

      In this manuscript, as an initial step, we have studied the first step in respiratory chain function by performing the Seahorse Mito Stress Test to demonstrate that the genetic manipulation (knocking out the CTNS gene and plasmid-mediated expression correction of ATP6V0A1) impacts mitochondrial energetics. We did not investigate the respirometry-based assays that can identify locations of electron transport deficiency, which we plan to address in a follow-up paper.

      We would like to draw attention to Figure 3D, where cystine accumulation has been studied. This figure demonstrates an increased intracellular accumulation of cystine.

      The major strengths of this manuscript reside in its two primary findings.

      (1) Plasmid expression of exogenous ATP6VOA1 improves mitochondrial integrity and reduces aberrant autophagosome accumulation.

      (2) Astaxanthin partially restores suboptimal endogenous ATP6VOA1 expression.

      Taken together, these observations suggest that astaxanthin might constitute a novel therapeutic strategy to ameliorate defective mitochondrial function and lysosomal clearance of autophagosomes in the cystinotic kidney. This might act synergistically with the current therapy (oral cysteamine) which facilitates defective cystine efflux from the lysosome.

      There are, however, several weaknesses in the manuscript.

      (1) The reductive approach that led from transcriptional profiling to focus on ATP6VOA1 is not transparent and weakens the argument that potential therapies should focus on correction of this one molecule vs the other H+ ATPase transcripts that were equally reduced - or transcripts among the 1925 belonging to at least 11 pathways disturbed in mutant RPTECs.

      The transcriptional profiling studies on ATP6V0A1 have been fully discussed and publicly shared. Table 2 lists the v-ATPase transcripts that are significantly downregulated in cystinosis RPTECs. We have also clarified and justified the choice of further studies on ATP6V0A1, where we state the following: "The most significantly perturbed member of the V-ATPase gene family found to be downregulated in cystinosis RPTECs is ATP6V0A1 (Table 2). Therefore, further attention was focused on characterizing the role of this particular gene in a human in vitro model of cystinosis."

      (2) A precise description of primary results is missing -- the Results section is preceded by or mixed with extensive speculation. This makes it difficult to dissect valid conclusions from those derived from less informative experiments (eg data on CDME loading, data on whole-cell pH instead of lysosomal pH, etc).

      We appreciate the reviewer highlighting areas for further improving the manuscript's readership. In our resubmission, we have revised the results section to provide a more precise description of the primary findings and restrict the inferences to the discussion section only.

      (3) Data on experimental approaches that turned out to be uninformative (eg CDME loading, or data on whole=cell pH assessment with BCECF).

      We have provided data whether it was informative or uninformative. Though lysosome-specific pH measurement would be important to measure, it was not possible to do it in our cells as they were very sick and the assay did not work. Hence we provide data on pH assessment with BCECF, which measures overall cytoplasmic and organelle pH, which is also informative for whole cell pH that is an overall pH of organelle pH and cytoplasmic pH.

      (4) The rationale for the study of ATX is unclear and the mechanism by which it improves mitochondrial integrity and autophagosome accumulation is not explored (but does not appear to depend on its anti-oxidant properties).

      We have provided rationale for the study of ATX; provided in the introduction and result section, where we mentioned the following: “correction of ATP6V0A1 in CTNS-/- RPTECs and treatment with antioxidants specifically, astaxanthin (ATX) increased the production of cellular ATP6V0A1, identified from a custom FDA-drug database generated by our group, partially rescued the nephropathic RPTEC phenotype. ATX is a xanthophyll carotenoid occurring in a wide variety of organisms. ATX is reported to have the highest known antioxidant activity and has proven to have various anti-inflammatory, anti-tumoral, immunomodulatory, anti-cancer, and cytoprotective activities both in vivo and in vitro_”._

      We are still investigating the mechanism by which ATX improves mitochondrial integrity, and this will be the focus of a follow-on manuscript.

      (5) Thoughtful discussion on the lack of effect of ATP6VOA1 correction on cystine efflux from the lysosome is warranted, since this is presumably sensitive to intralysosomal pH.

      In the revised manuscript, we have included a detailed discussion on the plausible reasons why ATP6V0A1 correction has no effect on cysteine efflux from the lysosome. We have now added to the Discussion – “However, correcting ATP6V0A1 had no effect on cellular cystine levels, likely because cystinosin is known to have multiple roles beyond cystine transport Cystinosin is demonstrated to be crucial for activating mTORC1 signaling by directly interacting with v-ATPases and other mTORC1 activators. Cystine depletion using cysteamine does not affect mTORC1 signaling. Our data, along with these observations, further supports that cystinosin has multiple functions and that its cystine transport activity is not mediated by ATP6V0A1.”

      (6) Comparisons between RPTECs and fibroblasts cannot take into account the effects of immortalization on cell phenotype (not performed in fibroblasts).

      The purpose of examining different tissue sources of primary cells in nephropathic cystinosis was to assess if any of the changes in these cells were tissue source specific. We used primary cells isolated from patients with nephropathic cystinosis—RPTECs from patients' urine and fibroblasts from patients' skin—these cells are not immortalized and can therefore be compared. This is noted in the results section - “Specific transcriptional signatures are observed in cystinotic skin-fibroblasts and RPTECs obtained from the same individual with cystinosis versus their healthy counterparts”.

      We next utilized the immortalized RPTEC cell line to create CRISPR-mediated CTNS knockout RPTECs as a resource for studying the pathophysiology of cystinosis. These cells were not compared to the primary fibroblasts.

      (7) This work will be of interest to the research community but is self-described as a pilot study. It remains to be clarified whether transient transfection of RPTECs with other H+ATPases could achieve results comparable to ATP6VOA1. Some insight into the mechanism by which ATX exerts its effects on RPTECs is needed to understand its potential for the treatment of cystinosis.

      In future studies we will further investigate the effect of ATX on RPTECs for treatment of cystinosis- this will require the conduct of Phase 1 and Phase 2 clinical studies which are beyond the scope of this current manuscript.

      Reviewer #2 (Public Review):

      Sur and colleagues investigate the role of ATP6V0A1 in mitochondrial function in cystinotic proximal tubule cells. They propose that loss of cystinosin downregulates ATP6V0A1 resulting in acidic lysosomal pH loss, and adversely modulates mitochondrial function and lifespan in cystinotic RPTECs. They further investigate the use of a novel therapeutic Astaxanthin (ATX) to upregulate ATP6V0A1 that may improve mitochondrial function in cystinotic proximal tubules.

      The new information regarding the specific proximal tubular injuries in cystinosis identifies potential molecular targets for treatment. As such, the authors are advancing the field in an experimental model for potential translational application to humans.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) There is a lack of care with precise wording and punctuation, which negatively affects the text. Importantly, the manuscript lacks a clear description of experimental Results. This section begins with speculation, then wanders through experimentation that didn't work (could be deleted). Figure 1A and lines 94-102 could be deleted. Data from CDME loading was found to be a "poor surrogate" for cystinosis and could be deleted from the manuscript or mentioned as a minor point in the discussion. The number of individual patient cell lines used for experimentation is unclear - 8 patients are mentioned on line 109, Figure 2B shows 6 normal fibroblasts, 3 CDME-loaded fibroblasts, and an indeterminate number of normal vs CDME-loaded cells (both colored red). Cluster analysis refers to two large gene clusters - data supporting this key conclusion is not shown. It is unclear why ATP6VOA1 was selected as the most significantly reduced H+ATPase from Table II. Thus, the focus on this particular gene appears to be largely "a hunch".

      In this study, we aim to establish a new concept by using multiple cell types and various assays tailored to each affected organelle, which might be confusing. Therefore, we believe Figure 1a provides a roadmap and helps clarify what to expect from this paper.

      This study was started a decade back, when CDME-mediated lysosomal loading was regularly used as a surrogate in vitro model to study cystinosis tissue injury. That was the reason to include CDME in the study design. Since we already had the CDME-treated data and in this article we are talking about another superior in vitro cystinosis model, we would like to include it.

      In the Result and Methods section, we mentioned “8 patients” with nephropathic cystinosis from whom we collected the RPTECs and Fibroblasts. These cystinotic cells are shown in blue and purple dots, respectively in figure 2B. Normal RPTEC and fibroblast cells were purchased from company and these cells were then treated with CDME to artificially load lysosomes with cystine. Details on the cell types and its procurement can be found in the Methods section under “Study design and Samples”. Normal and CDME-loaded RPTECs are shown in red and orange dots, whereas normal and CDME-loaded fibroblasts are shown in green and yellow dots, respectively in figure 2B.

      We removed this figure from the manuscript because the data is already detailed in Tables 1 and 2. As a sub-figure, the string pathway analysis output was illegible and did not add any new information. However, for your reference, we have now provided this data below.

      Author response image 1.

      STRIG pathway analysis using the microarray transcriptomic data from normal vs.cystinotic RPTECs. Ysing K-mean clustering on the genes in these significantly enriched pathways, we identified 2 distinct clusters, red and green nodes. Red nodes are enriched in nucleus-encoded mitochondrial genes and v-ATPases family, which are crucial for lysosomes and kidney tubular acid secretion. ATP6VOA1, the topmost v-ATPase in our cystinotic transcriptome dataset is highlighted in cyan. Green nodes are enriched in genes needed for DNA replication.

      (2) It was decided to use transcriptional profiling of CTNS mutant vs wildtype renal proximal tubular cells (RPTECs) as a way to uncover defective secondary molecular pathways that might be upstream drivers of the cystinosis phenotype. Since the kidneys are the first organs to deteriorate in cystinosis, it is postulated that transcriptome differences might be more obvious in kidney cells than in non-renal tissues, such as fibroblasts. A potential pitfall is that the RPTECs were transformed cell lines whereas fibroblasts were not.

      Transcriptional profiling was done on primary cells isolated from patients with nephropathic cystinosis—RPTECs from patients' urine and fibroblasts from patients' skin—these cells are not immortalized and can therefore be compared. This is noted in the results section - “Specific transcriptional signatures are observed in cystinotic skin-fibroblasts and RPTECs obtained from the same individual with cystinosis versus their healthy counterparts”.

      We utilized the immortalized RPTEC cell line to create CRISPR-mediated CTNS knockout RPTECs as a resource for studying the pathophysiology of cystinosis. These cells were not compared to the primary fibroblasts.

      (3) The authors wanted to study intralysosomal pH but could not, so used a pH-sensitive dye that reflects whole cell pH. It would be incorrect to take this measurement as support for their hypothesis that intralysosomal pH is increased. Since these experiments cannot be interpreted, they should be deleted from the manuscript.

      We have now corrected the term to "intracellular pH." Although measuring lysosome-specific pH would be important, it was not feasible in our cells as knocking out cystinosin gene made them fragile, making the assay ineffective. Therefore, we provide data on pH assessment using BCECF, which measures the overall pH of the cytoplasm and organelles. This information is still valuable for understanding the whole cell pH, encompassing both organelle and cytoplasmic pH. We have mentioned this as one of our limitations in the Discussion section.

      (4) The choice of ATX as a potential therapy is puzzling. Its antioxidant properties seem to be irrelevant since two other antioxidants had no effect. The mechanism by which it appears to correct some aspects of the cystinosis phenotype remains unknown and this should be pointed out. A key experiment to assess whether ATX reduces lysosomal cystine accumulation is missing. While the impact of ATX on cystinosis is interesting, the mechanism is unexplored.

      A detailed study on the mechanism by which ATX corrects certain aspects of the cystinosis phenotype is currently underway and will be presented in a follow-up paper. We have measured the effect of ATX and cysteamine, both individually and combined, on cystine accumulation using HPLC, as shown in the figure below. Our results indicate a significant increase in cystine levels with ATX treatment alone, while the combined ATX and cysteamine treatment significantly reduced cystine accumulation to the normal level. This suggests that ATX addresses specific aspects of the cystinosis phenotype through a different mechanism, not by reducing the accumulated cystine levels. When co-administered with cysteamine, they have the potential to complement each other's shortcomings. We believe that the increase in cystine with ATX alone may be due to interactions between ATX's ketone or hydroxyl groups and cystine's amine or carboxylic groups. Further research on this interaction is ongoing.

      We have now added to the Discussion – “We noticed a significant increase in cystine levels with ATX treatment alone (data not shown in the manuscript), while the combined ATX and cysteamine treatment significantly reduced cystine accumulation to the normal level. This may suggest that when co-administered with cysteamine, they have the potential to complement each other's shortcomings. We believe that the increase in cystine with ATX alone could be due to interactions between ATX's ketone or hydroxyl groups and cystine's amine or carboxylic groups. Further research on this interaction is ongoing.”

      Author response image 2.

      (5) The effects of exogenous ATP6VOA1 are interesting but had no effect on lysosomal cystine efflux, a hallmark of the cystinosis cellular phenotype. A discussion of this issue would be important.

      In the revised manuscript, we have included a detailed discussion on the plausible reasons why ATP6V0A1 correction has no effect on cysteine efflux from the lysosome. We have added to the Discussion – “However, correcting ATP6V0A1 had no effect on cellular cystine levels (Figure 7C), likely because cystinosin is known to have multiple roles beyond cystine transport. Cystinosin is demonstrated to be crucial for activating mTORC1 signaling by directly interacting with v-ATPases and other mTORC1 activators. Cystine depletion using cysteamine does not affect mTORC1 signaling (47). Our data, along with these observations, further supports that cystinosin has multiple functions and that its cystine transport activity is not mediated by ATP6V0A1.”

      (6) The arguments on lines 260-273 are not comprehensible. The authors confirm that RPTC LC3-II levels are increased, a marker of active processing of autophagosome cargo, prior to delivery to lysosomes. Discussion of balfilomycin (not used), mTORC activity, and endocytosis are not directly relevant and wander from interpretation of the LC3-II observation. One possibility is that the 50% decrease in ATP6VOA1 transcript is sufficient to slow the transfer of LC3-II-tagged cargo from autophagosome to lysosome - however, it would be important to offer a plausible explanation for why decreased ATP6VOA1 expression alone does not appear to be the key limitation on lysosomal cystine efflux.

      We have now rephrased our explanation in the Discussion section – “Cystinotic cells are known to have an increased autophagy or reduced autophagosome turnover rate. Autophagic flux in a cell is typically assessed by examining the accumulation of the autophagosome or autophagy-lysosome marker LC3B-II. This accumulation can be artificially induced using bafilomycin, which targets the V-ATPase, thereby inhibiting lysosomal acidification and degradation of its contents. Taken together, the observed innate increase in LC3B-II in cystinotic RPTECs (Figure 5A) without bafilomycin treatment suggests dysfunctional lysosomal acidification and thus could be linked to inhibited v-ATPase activity”.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Recommendations

      Recommendation #1: Address potential confounds in the experimental design:

      (1a) Confounding factors between baseline to early learning. While the visual display of the curved line remains constant, there are at least three changes between these two phases: 1) the presence of reward feedback (the focus of the paper); 2) a perturbation introduced to draw a hidden, mirror-symmetric curved line; 3) instructions provided to use reward feedback to trace the line on the screen (intentionally deceitful). As such, it remains unclear which of these factors are driving the changes in both behavior and bold signals between the two phases. The absence of a veridical feedback phase in which participants received reward feedback associated with the shown trajectory seems like a major limitation.

      (1b) Confounding Factors Between Early and Late Learning. While the authors have focused on interpreting changes from early to late due to the explore-exploit trade-off, there are three additional factors possibly at play: 1) increasing fatigue, 2) withdrawal of attention, specifically related to individuals who have either successfully learned the perturbation within the first few trials or those who have simply given up, or 3) increasing awareness of the perturbation (not clear if subjective reports about perturbation awareness were measured.). I understand that fMRI research is resource-intensive; however, it is not clear how to rule out these alternatives with their existing data without additional control groups. [Another reviewer added the following: Why did the authors not acquire data during a control condition? How can we be confident that the neural dynamics observed are not due to the simple passage of time? Or if these effects are due to the task, what drives them? The reward component, the movement execution, increased automaticity?]

      We have opted to address both of these points above within a single reply, as together they suggest potential confounding factors across the three phases of the task. We would agree that, if the results of our pairwise comparisons (e.g., Early > Baseline or Late > Early) were considered in isolation from one another, then these critiques of the study would be problematic. However, when considering the pattern of effects across the three task phases, we believe most of these critiques can be dismissed. Below, we first describe our results in this context, and then discuss how they address the reviewers’ various critiques.

      Recall that from Baseline to Early learning, we observe an expansion of several cortical areas (e.g., core regions in the DMN) along the manifold (red areas in Fig. 4A, see manifold shifts in Fig. 4C) that subsequently exhibit contraction during Early to Late learning (blue areas in Fig. 4B, see manifold shifts in Fig. 4D). We show this overlap in brain areas in Author response image 1 below, panel A. Notably, several of these brain areas appear to contract back to their original, Baseline locations along the manifold during Late learning (compare Fig. 4C and D). This is evidenced by the fact that many of these same regions (e.g., DMN regions, in Author response image 1 panel A below) fail to show a significant difference between the Baseline and Late learning epochs (see Author response image 1 panel B below, which is taken from supplementary Fig 6). That is, the regions that show significant expansion and subsequent contraction (in Author response image 1 panel A below) tend not to overlap with the regions that significantly changed over the time course of the task (in Author response image 1 panel B below).

      Author response image 1.

      Note that this basic observation above is not only true of our regional manifold eccentricity data, but also in the underlying functional connectivity data associated with individual brain regions. To make this second point clearer, we have modified and annotated our Fig. 5 and included it below. Note the reversal in seed-based functional connectivity from Baseline to Early learning (leftmost brain plots) compared to Early to Late learning (rightmost brain plots). That is, it is generally the case that for each seed-region (A-C) the areas that increase in seed-connectivity with the seed region (in red; leftmost plot) are also the areas that decrease in seed-connectivity with the seed region (in blue; rightmost plot), and vice versa. [Also note that these connectivity reversals are conveyed through the eccentricity data — the horizontal red line in the rightmost plots denote the mean eccentricity of these brain regions during the Baseline phase, helping to highlight the fact that the eccentricity of the Late learning phase reverses back towards this Baseline level].

      Author response image 2.

      Critically, these reversals in brain connectivity noted above directly counter several of the critiques noted by the reviewers. For instance, this reversal pattern of effects argues against the idea that our results during Early Learning can be simply explained due to the (i) presence of reward feedback, (ii) presence of the perturbation or (iii) instructions to use reward feedback to trace the path on the screen. Indeed, all of these factors are also present during Late learning, and yet many of the patterns of brain activity during this time period revert back to the Baseline patterns of connectivity, where these factors are absent. Similarly, this reversal pattern strongly refutes the idea that the effects are simply due to the passage of time, increasing fatigue, or general awareness of the perturbation. Indeed, if any of these factors alone could explain the data, then we would have expected a gradual increase (or decrease) in eccentricity and connectivity from Baseline to Early to Late learning, which we do not observe. We believe these are all important points when interpreting the data, but which we failed to mention in our original manuscript when discussing our findings.

      We have now rectified this in the revised paper, where we now write in our Discussion:

      “Finally, it is important to note that the reversal pattern of effects noted above suggests that our findings during learning cannot be simply attributed to the introduction of reward feedback and/or the perturbation during Early learning, as both of these task-related features are also present during Late learning. In addition, these results cannot be simply explained due to the passage of time or increasing subject fatigue, as this would predict a consistent directional change in eccentricity across the Baseline, Early and Late learning epochs.”

      However, having said the above, we acknowledge that one potential factor that our findings cannot exclude is that they are (at least partially) attributable to changes in subjects’ state of attention throughout the task. Indeed, one can certainly argue that Baseline trials in our study don’t require a great deal of attention (after all, subjects are simply tracing a curved path presented on the screen). Likewise, for subjects that have learned the hidden shape, the Late learning trials are also likely to require limited attentional resources (indeed, many subjects at this point are simply producing the same shape trial after trial). Consequently, the large shift in brain connectivity that we observe from Baseline to Early Learning, and the subsequent reversion back to Baseline-levels of connectivity during Late learning, could actually reflect a heightened allocation of attention as subjects are attempting to learn the (hidden) rewarded shape. However, we do not believe that this would reflect a ‘confound’ of our study per se — indeed, any subject who has participated in a motor learning study would agree that the early learning phase of a task is far more cognitively demanding than Baseline trials and Late learning trials. As such, it is difficult to disentangle this ‘attention’ factor from the learning process itself (and in fact, it is likely central to it).

      Of course, one could have designed a ‘control’ task in which subjects must direct their attention to something other than the learning task itself (e.g., divided attention paradigm, e.g., Taylor & Thoroughman, 2007, 2008, and/or perform a secondary task concurrently (Codol et al., 2018; Holland et al., 2018), but we know that this type of manipulation impairs the learning process itself. Thus, in such a case, it wouldn’t be obvious to the experimenter what they are actually measuring in brain activity during such a task. And, to extend this argument even further, it is true that any sort of brain-based modulation can be argued to reflect some ‘attentional’ process, rather than modulations related to the specific task-based process under consideration (in our case, motor learning). In this regard, we are sympathetic to the views of Richard Andersen and colleagues who have eloquently stated that “The study of how attention interacts with other neural processing systems is a most important endeavor. However, we think that over-generalizing attention to encompass a large variety of different neural processes weakens the concept and undercuts the ability to develop a robust understanding of other cognitive functions.” (Andersen & Cui, 2007, Neuron). In short, it appears that different fields/researchers have alternate views on the usefulness of attention as an explanatory construct (see also articles from Hommel et al., 2019, “No one knows what attention is”, and Wu, 2023, “We know what attention is!”), and we personally don’t have a dog in this fight. We only highlight these issues to draw attention (no pun intended) that it is not trivial to separate these different neural processes during a motor learning study.

      Nevertheless, we do believe these are important points worth flagging for the reader in our paper, as they might have similar questions. To this end, we have now included in our Discussion section the following text:

      “It is also possible that some of these task-related shifts in connectivity relate to shifts in task-general processes, such as changes in the allocation of attentional resources (Bédard and Song, 2013; Rosenberg et al., 2016) or overall cognitive engagement (Aben et al., 2020), which themselves play critical roles in shaping learning (Codol et al., 2018; Holland et al., 2018; Song, 2019; Taylor and Thoroughman, 2008, 2007; for a review of these topics, see Tsay et al., 2023). Such processes are particularly important during the earlier phases of learning when sensorimotor contingencies need to be established. While these remain questions for future work, our data nevertheless suggest that this shift in connectivity may be enabled through the PMC.”

      Finally, we should note that, at the end of testing, we did not assess participants' awareness of the manipulation (i.e., that they were, in fact, being rewarded based on a mirror image path). In hindsight, this would have been a good idea and provided some value to the current project. Nevertheless, it seems clear that, based on several of the learning profiles observed (e.g., subjects who exhibited very rapid learning during the Early Learning phase, more on this below), that many individuals became aware of a shape approximating the rewarded path. Note that we have included new figures (see our responses below) that give a better example of what fast versus slower learning looks like. In addition, we now note in our Methods that we did not probe participants about their subjective awareness re: the perturbation:

      “Note that, at the end of testing, we did not assess participants’ awareness of the manipulation (i.e., that they were, in fact, being rewarded based on a mirror image path of the visible path).”

      Recommendation #2: Provide more behavioral quantification.

      (2a) The authors chose to only plot the average learning score in Figure 1D, without an indication of movement variability. I think this is quite important, to give the reader an impression of how variable the movements were at baseline, during early learning, and over the course of learning. There is evidence that baseline variability influences the 'detectability' of imposed rotations (in the case of adaptation learning), which could be relevant here. Shading the plots by movement variability would also be important to see if there was some refinement of the moment after participants performed at the ceiling (which seems to be the case ~ after trial 150). This is especially worrying given that in Fig 6A there is a clear indication that there is a large difference between subjects' solutions on the task. One subject exhibits almost a one-shot learning curve (reaching a score of 75 after one or two trials), whereas others don't seem to really learn until the near end. What does this between-subject variability mean for the authors' hypothesized neural processes?

      In line with these recommendations, we have now provided much better behavioral quantification of subject-level performance in both the main manuscript and supplementary material. For instance, in a new supplemental Figure 1 (shown below), we now include mean subject (+/- SE) reaction times (RTs), movement times (MTs) and movement path variability (our computing of these measures are now defined in our Methods section).

      As can be seen in the figure, all three of these variables tended to decrease over the course of the study, though we note there was a noticeable uptick in both RTs and MTs from the Baseline to Early learning phase, once subjects started receiving trial-by-trial reward feedback based on their movements. With respect to path variability, it is not obvious that there was a significant refinement of the paths created during late learning (panel D below), though there was certainly a general trend for path variability to decrease over learning.

      Author response image 3.

      Behavioral measures of learning across the task. (A-D) shows average participant reward scores (A), reaction times (B), movement times (C) and path variability (D) over the course of the task. In each plot, the black line denotes the mean across participants and the gray banding denotes +/- 1 SEM. The three equal-length task epochs for subsequent neural analyses are indicated by the gray shaded boxes.

      In addition to these above results, we have also created a new Figure 6 in the main manuscript, which now solely focuses on individual differences in subject learning (see below). Hopefully, this figure clarifies key features of the task and its reward structure, and also depicts (in movement trajectory space) what fast versus slow learning looks like in the task. Specifically, we believe that this figure now clearly delineates for the reader the mapping between movement trajectory and the reward score feedback presented to participants, which appeared to be a source of confusion based on the reviewers’ comments below. As can be clearly observed in this figure, trajectories that approximated the ‘visible path’ (black line) resulted in fairly mediocre scores (see score color legend at right), whereas trajectories that approximated the ‘reward path’ (dashed black line, see trials 191-200 of the fast learner) resulted in fairly high scores. This figure also more clearly delineates how fPCA loadings derived from our functional data analysis were used to derive subject-level learning scores (panel C).

      Author response image 4.

      Individual differences in subject learning performance. (A) Examples of a good learner (bordered in green) and poor learner (bordered in red). (B) Individual subject learning curves for the task. Solid black line denotes the mean across all subjects whereas light gray lines denote individual participants. The green and red traces denote the learning curves for the example good and poor learners denoted in A. (C) Derivation of subject learning scores. We performed functional principal component analysis (fPCA) on subjects’ learning curves in order to identify the dominant patterns of variability during learning. The top component, which encodes overall learning, explained the majority of the observed variance (~75%). The green and red bands denote the effect of positive and negative component scores, respectively, relative to mean performance. Thus, subjects who learned more quickly than average have a higher loading (in green) on this ‘Learning score’ component than subjects who learned more slowly (in red) than average. The plot at right denotes the loading for each participant (open circles) onto this Learning score component.

      The reviewers note that there are large individual differences in learning performance across the task. This was clearly our hope when designing the reward structure of this task, as it would allow us to further investigate the neural correlates of these individual differences (indeed, during pilot testing, we sought out a reward structure to the task that would allow for these intersubject differences). The subjects who learn early during the task end up having higher fPCA scores than the subjects who learn more gradually (or learn the task late). From our perspective, these differences are a feature, and not a bug, and they do not negate any of our original interpretations. That is, subjects who learn earlier on average tend to contract their DAN-A network during the early learning phase whereas subjects who learn more slowly on average (or learn late) instead tend to contract their DAN-A network during late learning (Fig. 7).

      (2b) In the methods, the authors stated that they scaled the score such that even a perfectly traced visible path would always result in an imperfect score of 40 patients. What happens if a subject scores perfectly on the first try (which seemed to have happened for the green highlighted subject in Fig 6A), but is then permanently confronted with a score of 40 or below? Wouldn't this result in an error-clamp-like (error-based motor adaptation) design for this subject and all other high performers, which would vastly differ from the task demands for the other subjects? How did the authors factor in the wide between-subject variability?

      We think the reviewers may have misinterpreted the reward structure of the task, and we apologize for not being clearer in our descriptions. The reward score that subjects received after each trial was based on how well they traced the mirror-image of the visible path. However, all the participant can see on the screen is the visible path. We hope that our inclusion of the new Figure 6 (shown above) makes the reward structure of the task, and its relationship to movement trajectories, much clearer. We should also note that, even for the highest performing subject (denoted in Fig. 6), it still required approximately 20 trials for them to reach asymptote performance.

      (2c) The study would benefit from a more detailed description of participants' behavioral performance during the task. Specifically, it is crucial to understand how participants' motor skills evolve over time. Information on changes in movement speed, accuracy, and other relevant behavioral metrics would enhance the understanding of the relationship between behavior and brain activity during the learning process. Additionally, please clarify whether the display on the screen was presented continuously throughout the entire trial or only during active movement periods. Differences in display duration could potentially impact the observed differences in brain activity during learning.

      We hope that with our inclusion of the new Supplementary Figure 1 (shown above) this addresses the reviewers’ recommendation. Generally, we find that RTs, MTs and path variability all decrease over the course of the task. We think this relates to the early learning phase being more attentionally demanding and requiring more conscious effort, than the later learning phases.

      Also, yes, the visible path was displayed on the screen continuously throughout the trial, and only disappeared at the 4.5 second mark of each trial (when the screen was blanked and the data was saved off for 1.5 seconds prior to commencement of the next trial; 6 seconds total per trial). Thus, there were no differences in display duration across trials and phases of the task. We have now clarified this in the Methods section, where we now write the following:

      “When the cursor reached the target distance, the target changed color from red to green to indicate that the trial was completed. Importantly, other than this color change in the distance marker, the visible curved path remained constant and participants never received any feedback about the position of their cursor.”

      (2d) It is unclear from plots 6A, 6B, and 1D how the scale of the behavioral data matches with the scaling of the scores. Are these the 'real' scores, meaning 100 on the y-axis would be equivalent to 40 in the task? Why then do all subjects reach an asymptote at 75? Or is 75 equivalent to 40 and the axis labels are wrong?

      As indicated above, we clearly did a poor job of describing the reward structure of our task in our original paper, and we now hope that our inclusion of Figure 6 makes things clear. A ‘40’ score on the y-axis would indicate that a subject has perfectly traced the visible path whereas a perfect ‘100’ score would indicate that a subject has perfectly traced the (hidden) mirror image path.

      The fact that several of the subjects reach asymptote around 75 is likely a byproduct of two factors. Firstly, the subjects performed their movements in the absence of any visual error feedback (they could not see the position of a cursor that represented their hand position), which had the effect of increasing motor variability in their actions from trial to trial. Secondly, there appears to be an underestimation among subjects regarding the curvature of the concealed, mirror-image path (i.e., that the rewarded path actually had an equal but opposite curvature to that of the visible path). This is particularly evident in the case of the top-performing subject (illustrated in Figure 6A) who, even during late learning, failed to produce a completely arched movement.

      (2e) Labeling of Contrasts: There is a consistent issue with the labeling of contrasts in the presented figures, causing confusion. While the text refers to the difference as "baseline to early learning," the label used in figures, such as Figure 4, reads "baseline > early." It is essential to clarify whether the presented contrast is indeed "baseline > early" or "early > baseline" to avoid any misinterpretation.

      We thank the reviewers for catching this error. Indeed, the intended label was Early > Baseline, and this has now been corrected throughout.

      Recommendation #3. Clarify which motor learning mechanism(s) are at play.

      (3a) Participants were performing at a relatively low level, achieving around 50-60 points by the end of learning. This outcome may not be that surprising, given that reward-based learning might have a substantial explicit component and may also heavily depend on reasoning processes, beyond reinforcement learning or contextual recall (Holland et al., 2018; Tsay et al., 2023). Even within our own data, where explicit processes are isolated, average performance is low and many individuals fail to learn (Brudner et al., 2016; Tsay et al., 2022). Given this, many participants in the current study may have simply given up. A potential indicator of giving up could be a subset of participants moving straight ahead in a rote manner (a heuristic to gain moderate points). Consequently, alterations in brain networks may not reflect exploration and exploitation strategies but instead indicate levels of engagement and disengagement. Could the authors plot the average trajectory and the average curvature changes throughout learning? Are individuals indeed defaulting to moving straight ahead in learning, corresponding to an average of 50-60 points? If so, the interpretation of brain activity may need to be tempered.

      We can do one better, and actually give you a sense of the learning trajectories for every subject over time. In the figure below, which we now include as Supplementary Figure 2 in our revision, we have plotted, for each subject, a subset of their movement trajectories across learning trials (every 10 trials). As can be seen in the diversity of these trajectories, the average trajectory and average curvature would do a fairly poor job of describing the pattern of learning-related changes across subjects. Moreover, it is not obvious from looking at these plots the extent to which poor learning subjects (i.e., subjects who never converge on the reward path) actually ‘give up’ in the task — rather, many of these subjects still show some modulation (albeit minor) of their movement trajectories in the later trials (see the purple and pink traces). As an aside, we are also not entirely convinced that straight ahead movements, which we don’t find many of in our dataset, can be taken as direct evidence that the subject has given up.

      Author response image 5

      Variability in learning across subjects. Plots show representative trajectory data from each subject (n=36) over the course of the 200 learning trials. Coloured traces show individual trials over time (each trace is separated by ten trials, e.g., trial 1, 10, 20, 30, etc.) to give a sense of the trajectory changes throughout the task (20 trials in total are shown for each subject).

      We should also note that we are not entirely opposed to the idea of describing aspects of our findings in terms of subject engagement versus disengagement over time, as such processes are related at some level to exploration (i.e., cognitive engagement in finding the best solution) and exploitation (i.e., cognitively disengaging and automating one’s behavior). As noted in our reply to Recommendation #1 above, we now give some consideration of these explanations in our Discussion section, where we now write:

      “It is also possible that these task-related shifts in connectivity relates to shifts in task-general processes, such as changes in the allocation of attentional resources (Bédard and Song, 2013; Rosenberg et al., 2016) or overall cognitive engagement (Aben et al., 2020), which themselves play critical roles in shaping learning (Codol et al., 2018; Holland et al., 2018; Song, 2019; Taylor and Thoroughman, 2008, 2007; for a review of these topics, see Tsay et al., 2023). Such processes are particularly important during the earlier phases of learning when sensorimotor contingencies need to be established. While these remain questions for future work, our data nevertheless suggest that this shift in connectivity may be enabled through the PMC.”

      (3b) The authors are mixing two commonly used paradigms, reward-based learning, and motor adaptation, but provide no discussion of the different learning processes at play here. Which processes were they attempting to probe? Making this explicit would help the reader understand which brain regions should be implicated based on previous literature. As it stands, the task is hard to interpret. Relatedly, there is a wealth of literature on explicit vs implicit learning mechanisms in adaptation tasks now. Given that the authors are specifically looking at brain structures in the cerebral cortex that are commonly associated with explicit and strategic learning rather than implicit adaptation, how do the authors relate their findings to this literature? Are the learning processes probed in the task more explicit, more implicit, or is there a change in strategy usage over time? Did the authors acquire data on strategies used by the participants to solve the task? How does the baseline variability come into play here?

      As noted in our paper, our task was directly inspired by the reward-based motor learning tasks developed by Dam et al., 2013 (Plos One) and Wu et al., 2014 (Nature Neuroscience). What drew us to these tasks is that they allowed us to study the neural bases of reward-based learning mechanisms in the absence of subjects also being able to exploit error-based mechanisms to achieve learning. Indeed, when first describing the task in the Results section of our paper we wrote the following:

      “Importantly, because subjects received no visual feedback about their actual finger trajectory and could not see their own hand, they could only use the score feedback — and thus only reward-based learning mechanisms — to modify their movements from one trial to the next (Dam et al., 2013; Wu et al., 2014).”

      If the reviewers are referring to ‘motor adaptation’ in the context in which that terminology is commonly used — i.e., the use of sensory prediction errors to support error-based learning — then we would argue that motor adaptation is not a feature of the current study. It is true that in our study subjects learn to ‘adapt’ their movements across trials, but this shaping of the movement trajectories must be supported through reinforcement learning mechanisms (and, of course, supplemented by the use of cognitive strategies as discussed in the nice review by Tsay et al., 2023). We apologize for not being clearer in our paper about this key distinction and we have now included new text in the introduction to our Results to directly address this:

      “Importantly, because subjects received no visual feedback about their actual finger trajectory and could not see their own hand, they could only use the score feedback — and thus only reward-based learning mechanisms — to modify their movements from one trial to the next (Dam et al., 2013; Wu et al., 2014). That is, subjects could not use error-based learning mechanisms to achieve learning in our study, as this form of learning requires sensory errors that convey both the change in direction and magnitude needed to correct the movement.”

      With this issue aside, we are well aware of the established framework for thinking about sensorimotor adaptation as being composed of a combination of explicit and implicit components (indeed, this has been a central feature of several of our other recent neuroimaging studies that have explored visuomotor rotation learning, e.g., Gale et al., 2022 PNAS, Areshenkoff et al., 2022 elife, Standage et al., 2023 Cerebral Cortex). However, there has been comparably little work done on these parallel components within the domain of reinforcement learning tasks (though see Codol et al., 2018; Holland et al., 2018, van Mastrigt et al., 2023; see also the Tsay et al., 2023 review), and as far as we can tell, nothing has been done to date in the reward-based motor learning area using fMRI. By design, we avoided using descriptors of ‘explicit’ or ‘implicit’ in our study because our experimental paradigm did not allow a separate measurement of those two components to learning during the task. Nevertheless, it seems clear to us from examining the subjects’ learning curves (see supplementary figure 2 above), that individuals who learn very quickly are using strategic processes (such as action exploration to identify the best path) to enhance their learning. As we noted in an above response, we did not query subjects after the fact about their strategy use, which admittedly was a missed opportunity on our part.

      Author response image 6.

      With respect to the comment on baseline variability and its relationship to performance, this is an interesting idea and one that was explored in the Wu et al., 2014 Nature Neuroscience paper. Prompted by the reviewers, we have now explored this idea in the current data set by testing for a relationship between movement path variability during baseline trials (all 70 baseline trials, see Supplementary Figure 1D above for reference) and subjects’ fPCA score on our learning task. However, when we performed this analysis, we did not observe a significant positive relationship between baseline variability and subject performance. Rather, we actually found a trend towards a negative relationship (though this was non-significant; r=-0.2916, p=0.0844). Admittedly, we are not sure what conclusions can be drawn from this analysis, and in any case, we believe it to be tangential to our main results. We provide the results (at right) for the reviewers if they are interested. This may be an interesting avenue for exploration in future work.

      Recommendation #4: Provide stronger justification for brain imaging methods.

      (4a) Observing how brain activity varies across these different networks is remarkable, especially how sensorimotor regions separate and then contract with other, more cognitive areas. However, does the signal-to-noise ratio in each area/network influence manifold eccentricity and limit the possible changes in eccentricity during learning? Specifically, if a region has a low signal-to-noise ratio, it might exhibit minimal changes during learning (a phenomenon perhaps relevant to null manifold changes in the striatum due to low signal-to-noise); conversely, regions with higher signal-to-noise (e.g., motor cortex in this sensorimotor task) might exhibit changes more easily detected. As such, it is unclear how to interpret manifold changes without considering an area/network's signal-to-noise ratio.

      We appreciate where these concerns are coming from. First, we should note that the timeseries data used in our analysis were z-transformed (mean zero, 1 std) to allow normalization of the signal both over time and across regions (and thus mitigate the possibility that the changes observed could simply reflect mean overall signal changes across different regions). Nevertheless, differences in signal intensity across brain regions — particularly between cortex and striatum — are well-known, though it is not obvious how these differences may manifest in terms of a task-based modulation of MR signals.

      To examine this issue in the current data set, we extracted, for each subject and time epoch (Baseline, Early and Late learning) the raw scanner data (in MR arbitrary units, a.u.) for the cortical and striatal regions and computed the (1) mean signal intensity, (2) standard deviation of the signal (Std) and (3) temporal signal to noise ratio (tSNR; calculated by mean/Std). Note that in the fMRI connectivity literature tSNR is often the preferred SNR measure as it normalizes the mean signal based on the signal’s variability over time, thus providing a general measure of overall ‘signal quality’. The results of this analysis, averaged across subjects and regions, is shown below.

      Author response image 7.

      Note that, as expected, the overall signal intensity (left plot) of cortex is higher than in the striatum, reflecting the closer proximity of cortex to the receiver coils in the MR head coil. In fact, the signal intensity in cortex is approximately 38% higher than that in the striatum (~625 - 450)/450). However, the signal variation in cortex is also greater than striatum (middle plot), but in this case approximately 100% greater (i.e., (~5 - 2.5)/2.5)). The result of this is that the tSNR (mean/std) for our data set and the ROI parcellations we used is actually greater in the striatum than in cortex (right plot). Thus, all else being equal, there seems to have been sufficient tSNR in the striatum for us to have detected motor-learning related effects. As such, we suspect the null effects for the striatum in our study actually stem from two sources.

      The first likely source is the relatively lower number of striatal regions (12) as compared to cortical regions (998) used in our analysis, coupled with our use of PCA on these data (which, by design, identifies the largest sources of variation in connectivity). In future studies, this unbalance could be rectified by using finer parcellations of the striatum (even down to the voxel level) while keeping the same parcellation of cortex (i.e., equate the number of ‘regions’ in each of striatum and cortex). The second likely source is our use of a striatal atlas (the Harvard-Oxford atlas) that divides brain regions based on their neuroanatomy rather than their function. In future work, we plan on addressing this latter concern by using finer, more functionally relevant parcellations of striatum (such as in Tian et al., 2020, Nature Neuroscience). Note that we sought to capture these interrelated possible explanations in our Discussion section, where we wrote the following:

      “While we identified several changes in the cortical manifold that are associated with reward-based motor learning, it is noteworthy that we did not observe any significant changes in manifold eccentricity within the striatum. While clearly the evidence indicates that this region plays a key role in reward-guided behavior (Averbeck and O’Doherty, 2022; O’Doherty et al., 2017), there are several possible reasons why our manifold approach did not identify this collection of brain areas. First, the relatively small size of the striatum may mean that our analysis approach was too coarse to identify changes in the connectivity of this region. Though we used a 3T scanner and employed a widely-used parcellation scheme that divided the striatum into its constituent anatomical regions (e.g., hippocampus, caudate, etc.), both of these approaches may have obscured important differences in connectivity that exist within each of these regions. For example, areas such the hippocampus and caudate are not homogenous areas but themselves exhibit gradients of connectivity (e.g., head versus tail) that can only be revealed at the voxel level (Tian et al., 2020; Vos de Wael et al., 2021). Second, while our dimension reduction approach, by design, aims to identify gradients of functional connectivity that account for the largest amounts of variance, the limited number of striatal regions (as compared to cortex) necessitates that their contribution to the total whole-brain variance is relatively small. Consistent with this perspective, we found that the low-dimensional manifold architecture in cortex did not strongly depend on whether or not striatal regions were included in the analysis (see Supplementary Fig. 6). As such, selective changes in the patterns of functional connectivity at the level of the striatum may be obscured using our cortex x striatum dimension reduction approach. Future work can help address some of these limitations by using both finer parcellations of striatal cortex (perhaps even down to the voxel level)(Tian et al., 2020) and by focusing specifically on changes in the interactions between the striatum and cortex during learning. The latter can be accomplished by selectively performing dimension reduction on the slice of the functional connectivity matrix that corresponds to functional coupling between striatum and cortex.”

      (4b) Could the authors clarify how activity in the dorsal attention network (DAN) changes throughout learning, and how these changes also relate to individual differences in learning performance? Specifically, on average, the DAN seems to expand early and contract late, relative to the baseline. This is interpreted to signify that the DAN exhibits lesser connectivity followed by greater connectivity with other brain regions. However, in terms of how these changes relate to behavior, participants who go against the average trend (DAN exhibits more contraction early in learning, and expansion from early to late) seem to exhibit better learning performance. This finding is quite puzzling. Does this mean that the average trend of expansion and contraction is not facilitative, but rather detrimental, to learning? [Another reviewer added: The authors do not state any explicit hypotheses, but only establish that DMN coordinates activity among several regions. What predictions can we derive from this? What are the authors looking for in the data? The work seems more descriptive than hypothesis-driven. This is fine but should be clarified in the introduction.]

      These are good questions, and we are glad the reviewers appreciated the subtlety here. The reviewers are indeed correct that the relationship of the DAN-A network to behavioral performance appears to go against the grain of the group-level results that we found for the entire DAN network (which we note is composed of both the DAN-A and DAN-B networks). That is, subjects who exhibited greater contraction from Baseline to Early learning and likewise, greater expansion from Early to Late learning, tended to perform better in the task (according to our fPCA scores). However, on this point it is worth noting that it was mainly the DAN-B network which exhibited group-level expansion from Baseline to Early Learning whereas the DAN-A network exhibited negligible expansion. This can be seen in Author response image 8 below, which shows the pattern of expansion and contraction (as in Fig. 4), but instead broken down into the 17-network parcellation. The red asterisk denotes the expansion from Baseline to Early learning for the DAN-B network, which is much greater than that observed for the DAN-A network (which is basically around the zero difference line).

      Author response image 8.

      Thus, it appears that the DAN-A and DAN-B networks are modulated to a different extent during the task, which likely contributes to the perceived discrepancy between the group-level effects (reported using the 7-network parcellation) and the individual differences effects (reported using the finer 17-network parcellation). Based on the reviewers’ comments, this seems like an important distinction to clarify in the manuscript, and we have now described this nuance in our Results section where we now write:

      “...Using this permutation testing approach, we found that it was only the change in eccentricity of the DAN-A network that correlated with Learning score (see Fig. 7C), such that the more the DAN-A network decreased in eccentricity from Baseline to Early learning (i.e., contracted along the manifold), the better subjects performed at the task (see Fig. 7C, scatterplot at right). Consistent with the notion that changes in the eccentricity of the DAN-A network are linked to learning performance, we also found the inverse pattern of effects during Late learning, whereby the more that this same network increased in eccentricity from Early to Late learning (i.e., expanded along the manifold), the better subjects performed at the task (Fig. 7D). We should note that this pattern of performance effects for the DAN-A — i.e., greater contraction during Early learning and greater expansion during Late learning being associated with better learning — appears at odds with the group-level effects described in Fig. 4A and B, where we generally find the opposite pattern for the entire DAN network (composed of the DAN-A and DAN-B subnetworks). However, this potential discrepancy can be explained when examining the changes in eccentricity using the 17-network parcellation (see Supplementary Figure 8). At this higher resolution level we find that these group-level effects for the entire DAN network are being largely driven by eccentricity changes in the DAN-B network (areas in anterior superior parietal cortex and premotor cortex), and not by mean changes in the DAN-A network. By contrast, our present results suggest that it is the contraction and expansion of areas of the DAN-A network (and not DAN-B network) that are selectively associated with differences in subject learning performance.”

      Finally, re: the reviewers’ comments that we do not state any explicit hypotheses etc., we acknowledge that, beyond our general hypothesis stated at the outset about the DMN being involved in reward-based motor learning, our study is quite descriptive and exploratory in nature. Such little work has been done in this research area (i.e., using manifold learning approaches to study motor learning with fMRI) that it would be disingenuous to have any stronger hypotheses than those stated in our Introduction. Thus, to make the exploratory nature of our study clear to the reader, we have added the following text (in red) to our Introduction:

      “Here we applied this manifold approach to explore how brain activity across widely distributed cortical and striatal systems is coordinated during reward-based motor learning. We were particularly interested in characterizing how connectivity between regions within the DMN and the rest of the brain changes as participants shift from learning the relationship between motor commands and reward feedback, during early learning, to subsequently using this information, during late learning. We were also interested in exploring whether learning-dependent changes in manifold structure relate to variation in subject motor performance.”

      We hope these changes now make it obvious the intention of our study.

      (4c) The paper examines a type of motor adaptation task with a reward-based learning component. This, to me, strongly implicates the cerebellum, given that it has a long-established crucial role in adaptation and has recently been implicated in reward-based learning (see work by Wagner & Galea). Why is there no mention of the cerebellum and why it was left out of this study? Especially given that the authors state in the abstract they examine cortical and subcortical structures. It's evident from the methods that the authors did not acquire data from the cerebellum or had too small a FOV to fully cover it (34 slices at 4 mm thickness 136 mm which is likely a bit short to fully cover the cerebellum in many participants). What was the rationale behind this methodological choice? It would be good to clarify this for the reader. Related to this, the authors need to rephrase their statements on 'whole-brain' connectivity matrices or analyses - it is not whole-brain when it excludes the cerebellum.

      As we noted above, we do not believe this task to be a motor adaptation task, in the sense that subjects are not able to use sensory prediction errors (and thus error-based learning mechanisms) to improve their performance. Rather, by denying subjects this sensory error feedback they are only able to use reinforcement learning processes, along with cognitive strategies (nicely covered in Tsay et al., 2023), to improve performance. Nevertheless, we recognize that the cerebellum has been increasingly implicated in facets of reward-based learning, particularly within the rodent domain (e.g., Wagner et al., 2017; Heffley et al., 2018; Kostadinov et al., 2019, etc.). In our study, we did indeed collect data from the cerebellum but did not include it in our original analyses, as we wanted (1) the current paper to build on prior work in the human and macaque reward-learning domain (which focuses solely on striatum and cortex, and which rarely discusses cerebellum, see Averbeck & O’Doherty, 2022 & Klein-Flugge et al., 2022 for recent reviews), and, (2) allow this to be a more targeted focus of future work (specifically we plan on focusing on striatal-cerebellar interactions during learning, which are hypothesized based on the neuroanatomical tract tracing work of Bostan and Strick, etc.). We hope the reviewers respect our decisions in this regard.

      Nevertheless, we acknowledge that based on our statements about ‘whole-brain’ connectivity and vagueness about what we mean by ‘subcortex,’ that this may be confusing for the reader. We have now removed and/or corrected such references throughout the paper (however, note that in some cases it is difficult to avoid reference to “whole-brain” — e.g., “whole-brain correlation map” or “whole-brain false discovery rate correction”, which is standard terminology in the field).

      In addition, we are now explicit in our Methods section that the cerebellum was not included in our analyses.

      “Each volume comprised 34 contiguous (no gap) oblique slices acquired at a ~30° caudal tilt with respect to the plane of the anterior and posterior commissure (AC-PC), providing whole-brain coverage of the cerebrum and cerebellum. Note that for the current study, we did not examine changes in cerebellar activity during learning.”

      (4d) The authors centered the matrices before further analyses to remove variance associated with the subject. Why not run a PCA on the connectivity matrices and remove the PC that is associated with subject variance? What is the advantage of first centering the connectivity matrices? Is this standard practice in the field?

      Centering in some form has become reasonably common in the functional connectivity literature, as there is considerable evidence that task-related (or cognitive) changes in whole-brain connectivity are dwarfed by static, subject-level differences (e.g., Gratton, et al, 2018, Neuron). If covariance matrices were ordinary scalar values, then isolating task-related changes could be accomplished simply by subtracting a baseline scan or mean score; but because the space of covariance matrices is non-Euclidean, the actual computations involved in this subtraction are more complex (see our Methods). However, fundamentally (and conceptually) our procedure is simply ordinary mean-centering, but adapted to this non-Euclidean space. Despite the added complexity, there is considerable evidence that such computations — adapted directly to the geometry of the space of covariance matrices — outperform simpler methods, which treat covariance matrices as arrays of real numbers (e.g. naive substraction, see Dodero et al. & Ng et al., references below). Moreover, our previous work has found that this procedure works quite well to isolate changes associated with different task conditions (Areshenkoff et al., 2021, Neuroimage; Areshenkoff et al., 2022, elife).

      Although PCA can be adapted to work well with covariance matrix valued data, it would at best be a less direct solution than simply subtracting subjects' mean connectivity. This is because the top components from applying PCA would be dominated by both subject-specific effects (not of interest here), and by the large-scale connectivity structure typically observed in component based analyses of whole-brain connectivity (i.e. the principal gradient), whereas changes associated with task-condition (the thing of interest here) would be buried among the less reliable components. By contrast, our procedure directly isolates these task changes.

      References cited above:

      Dodero, L., Minh, H. Q., San Biagio, M., Murino, V., & Sona, D. (2015, April). Kernel-based classification for brain connectivity graphs on the Riemannian manifold of positive definite matrices. In 2015 IEEE 12th international symposium on biomedical imaging (ISBI) (pp. 42-45). IEEE.

      Ng, B., Dressler, M., Varoquaux, G., Poline, J. B., Greicius, M., & Thirion, B. (2014). Transport on Riemannian manifold for functional connectivity-based classification. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2014: 17th International Conference, Boston, MA, USA, September 14-18, 2014, Proceedings, Part II 17 (pp. 405-412). Springer International Publishing.

      (4e) Seems like a missed opportunity that the authors just use a single, PCA-derived measure to quantify learning, where multiple measures could have been of interest, especially given that the introduction established some interesting learning-related concepts related to exploration and exploitation, which could be conceptualized as movement variability and movement accuracy. It is unclear why the authors designed a task that was this novel and interesting, drawing on several psychological concepts, but then chose to ignore these concepts in the analysis.

      We were disappointed to hear that the reviewers did not appreciate our functional PCA-derived measure to quantify subject learning. This is a novel data-driven analysis approach that we have previously used with success in recent work (e.g., Areshenkoff et al., 2022, elife) and, from our perspective, we thought it was quite elegant that we were able to describe the entire trajectory of learning across all participants along a single axis that explained the majority (~75%) of the variance in the patterns of behavioral learning data. Moreover, the creation of a single behavioral measure per participant (what we call a ‘Learning score’, see Fig. 6C) helped simplify our brain-behavior correlation analyses considerably, as it provided a single measure that accounts for the natural auto-correlation in subjects’ learning curves (i.e., that subjects who learn quickly also tend to be better overall learners by the end of the learning phase). It also avoids the difficulty (and sometimes arbitrariness) of having to select specific trial bins for behavioral analysis (e.g., choosing the first 5, 10, 20 or 25 trials as a measure of ‘early learning’, and so on). Of course, one of the major alternatives to our approach would have involved fitting an exponential to each subject’s learning curves and taking measures like learning rate etc., but in our experience we have found that these types of models don’t always fit well, or derive robust/reliable parameters at the individual subject level. To strengthen the motivation for our approach, we have now included the following text in our Results:

      “To quantify this variation in subject performance in a manner that accounted the auto-correlation in learning performance over time (i.e., subjects who learned more quickly tend to exhibit better performance by the end of learning), we opted for a pure data-driven approach and performed functional principal component analysis (fPCA; (Shang, 2014)) on subjects’ learning curves. This approach allowed us to isolate the dominant patterns of variability in subject’s learning curves over time (see Methods for further details; see also Areshenkoff et al., 2022).”

      In any case, the reviewers may be pleased to hear that in current work in the lab we are using more model-based approaches to attempt to derive sets of parameters (per participant) that relate to some of the variables of interest described by the reviewers, but that we relate to much more dynamical (shorter-term) changes in brain activity.

      (4f) Overall Changes in Activity: The manuscript should delve into the potential influence of overall changes in brain activity on the results. The choice of using Euclidean distance as a metric for quantifying changes in connectivity is sensitive to scaling in overall activity. Therefore, it is crucial to discuss whether activity in task-relevant areas increases from baseline to early learning and decreases from early to late learning, or if other patterns emerge. A comprehensive analysis of overall activity changes will provide a more complete understanding of the findings.

      These are good questions and we are happy to explore this in the data. However, as mentioned in our response to query 4a above, it is important to note that the timeseries data for each brain region was z-scored prior to analysis, with the aim of removing any mean changes in activity levels (note that this is a standard preprocessing step when performing functional connectivity analysis, given that mean signal changes are not the focus of interest in functional connectivity analyses).

      To further emphasize these points, we have taken our z-scored timeseries data and calculated the mean signal for each region within each task epoch (Baseline, Early and Late learning, see panel A in figure below). The point of showing this data (where each z-score map looks near identical across the top, middle and bottom plots) is to demonstrate just how miniscule the mean signal changes are in the z-scored timeseries data. This point can also be observed when plotting the mean z-score signal across regions for each epoch (see panel B in figure below). Here we find that Baseline and Early learning have a near identical mean activation level across regions (albeit with slightly different variability across subjects), whereas there is a slight increase during late learning — though it should be noted that our y-axis, which measures in the thousandths, really magnifies this effect.

      To more directly address the reviewers’ comments, using the z-score signal per region we have also performed the same statistical pairwise comparisons (Early > Baseline and Late>Early) as we performed in the main manuscript Fig. 4 (see panel C in Author response image 9 below). In this plot, areas in red denote an increase in activity from Baseline to Early learning (top plot) and from Early to Late learning (bottom plot), whereas areas in blue denote a decrease for those same comparisons. The important thing to emphasize here is that the spatial maps resulting from this analysis are generally quite different from the maps of eccentricity that we report in Fig. 4 in our paper. For instance, in the figure below, we see significant changes in the activity of visual cortex between epochs but this is not found in our eccentricity results (compare with Fig. 4). Likewise, in our eccentricity results (Fig. 4), we find significant changes in the manifold positioning of areas in medial prefrontal cortex (MPFC), but this is not observed in the activation levels of these regions (panel C below). Again, we are hesitant to make too much of these results, as the activation differences denoted as significant in the figure below are likely to be an effect on the order of thousandths of a z-score (e.g., 0.002 > 0.001), but this hopefully assuages reviewers’ concerns that our manifold results are solely attributable to changes in overall activity levels.

      We are hesitant to include the results below in our paper as we feel that they don’t add much to the interpretation (as the purpose of z-scoring was to remove large activation differences). However, if the reviewers strongly believe otherwise, we would consider including them in the supplement.

      Author response image 9.

      Examination of overall changes in activity across regions. (A) Mean z-score maps across subjects for the Baseline (top), Early Learning (middle) and Late learning (bottom) epochs. (B) Mean z-score across brain regions for each epoch. Error bars represent +/- 1 SEM. (C) Pairwise contrasts of the z-score signal between task epochs. Positive (red) and negative (blue) values show significant increases and decreases in z-score signal, respectively, following FDR correction for region-wise paired t-tests (at q<0.05).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:<br /> The ingenious design in this study achieved the observation of 3D cell spheroids from an additional lateral view and gained more comprehensive information than the traditional one angle of imaging, which extensively extended the methods to investigate cell behaviors in the growth or migration of tumor organoids in the present study. I believe that this study opens an avenue and provides an opportunity to characterize the spheroid formation dynamics from different angles, in particular side-view with high resolution, in other organoids study in the future.

      Thank you for your positive response.

      (1) Figure 1A and B, the images of "First surface mirror" are unclear. The authors should capture a single image of "First surface mirror" by high resolution. The corresponding information on the mirror should also be included in the manuscript.

      Thank you for your kind reminder. To make the content more intuitive, we have added the clear image of the first surface mirror to Fig. 8C.

      (2) The spheroids sizes in this study are 200-300 um. Whether this size is the limitation by the device? And which is the best size by the device? The size of spheroids suitable for this device should be characterized.

      Thank you very much for your question. As shown in Fig. 1D, the imaging principle indicates that the sample size is theoretically not affected by the device. For larger biological samples or samples exceeding the size of a 35 mm petri dish, a larger container and first surface mirror can be used. However, in practice, it is not recommended to use this device with laboratory microscopes for samples exceeding 4 mm in size.

      Firstly, the working distance of the microscope objective lens is limited by its factory specifications. Secondly, this device is designed to fit a 35 mm petri dish, and the first surface mirror can capture a maximum sample size of 4.5 mm. Fortunately, this size is more than sufficient for cell spheroids.

      (3) Figure 2F. The scale bar covered the imaging and made it unclear. It was difficult to read and evaluate the quality of the images. And it seemed no obvious difference between 5 cm and 15 cm. Please carefully check this data.

      Thank you very much for your question. First, we checked the image scale and coverage issues and made adjustments in the revised version. Secondly, when the light source was placed 5 cm from the sample, the sample itself appeared relatively clear, but the boundary with the background was less distinct. At a distance of 15 cm, the light source not only illuminated the sample effectively but also made the distinction between the spheroid and the background more apparent. To ensure consistency and stability in image capture, we ultimately selected a 15 cm distance between the sample and the light source for imaging.

      (4) Figure 3A. It seemed that the seeding cells were initially located as a ring with a hole in the center. Why do not seed the cells evenly in the well?

      Thank you very much for your question. First, the cells were added as a suspension, naturally settling at the bottom of the well during imaging. When seeded in agarose wells, the cells spontaneously aggregated over time, as shown in sVideo4. Our previous study showed that the use of agarose wells offers high fault tolerance and efficiency in cell spheroid culture (Pan, R. et al. Biofabrication, 2024, 16, 035016).

      (5) I just wonder whether this design could be extended to the fluorescent imaging and how do it. Please give an expectation in the discussion.

      Thank you very much for raising this key question regarding the imaging capability of this device. As shown in Author response image 1A, due to the specific nature of fluorescence imaging light sources, it is feasible to perform fluorescence imaging of cell spheroids using a microscope, including the built-in light source. Using 4′,6-diamidino-2-phenylindole (DAPI) staining, we captured fluorescence images of cell spheroids in both bottom-view and side-view modes (Author response image 1B), demonstrating that side-view observation of cell spheroids with this device is indeed feasible.

      Author response image 1.

      (A) The schematic diagram of the principle of fluorescence images of spheroids using an inverted microscope with the side-view observation petri dish/device. (B) Bottom-view and side-view images of a 3D cell spheroid. Scale bar = 500 µm.

      (6) The first sentence in the introduction. "Three-dimensional (3D) spheroids" should be "Three-dimensional (3D) tumor spheroids".

      (7) P11, Line 7, "both lethal and lethal" should be corrected.

      (8) The writing and grammar should be polished.

      Thank you very much for your suggestions to improve the quality of the article. We have made the necessary revisions in the updated version.

      Reviewer #2:

      Summary:

      The author developed a new device to overcome current limitations in the imaging process of 3D spheroidal structures. In particular, they created a system to follow in real-time tumour spheroid formation, fusion and cell migration without disrupting their integrity. The system has also been exploited to test the effects of a therapeutic agent (chemotherapy) and immune cells.

      Strengths:

      The system allows the in situ observation of the 3D structures along the 3 axes (x,y and z) without disrupting the integrity of the spheroids; in a time-lapse manner it is possible to follow the formation of the 3D structure and the spheroids fusion from multiple angles, allowing a better understanding of the cell aggregation/growth and kinetic of the cells.

      Interestingly the system allows the analysis of cell migration/ escape from the 3D structure analyzing not only the morphological changes in the periphery of the spheroids but also from the inner region demonstrating that the proliferating cells in the periphery of the structure are more involved in the migration and dissemination process. The application of the system in the study of the effects of doxorubicin and NK cells would give new insights in the description of the response of tumor 3D structure to killing agents.

      We sincerely thank you for your detailed and supportive review of our manuscript. Your recognition of our system’s capabilities for in situ observation of 3D structures along multiple axes, as well as its potential applications in studying therapeutic effects, is highly encouraging. Your comments on the advantages of this system for analyzing cell migration, morphological changes, and responses to therapeutic agents are especially appreciated.

      Thank you again for your thoughtful feedback and for highlighting the contributions of our work. Your insights have been invaluable in refining the focus and clarity of our study, and we hope that our revisions meet your expectations.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations for the authors):

      Minor Points:

      • HEK293T cells are not typically Type 1 IFN-producing cells; it is recommended to use other immune cell lines to validate results obtained with ORMDL3 overexpression in 293T cells. The same applies to A549 alveolar basal epithelial cells.

      Thanks for the reviewer’s insightful comment. In Figure 1C, we overexpressed ORMDL3 in mouse primary BMDM cell and stimulated it with poly(I:C) or poly(dG:dC), which suggests that ORMDL3 inhibits IFN expression in primary cell BMDM.

      • Clarify whether TLR3 is expressed in the cell lines used in Figure 1 and whether TLR3 is present in mouse BMDMs.

      Thanks for your suggestions. We identified whether TLR3 is expressed in HEK293T, A549 and BMDM. We designed primers of human TLR3 and murine Tlr3, and the results showed that Tlr3 is expressed in BMDM but not in HEK293T and A549. As it shown in Author response image 1.

      Author response image 1.

      PCR amplification of human TLR3 was conducted on cDNA derived from HEK293T and A549 cells (lanes 1 and 2, respectively), and PCR amplification of murine Tlr3 was performed on cDNA from BMDM (lane 3). Human spleen cDNA (lane 4, TAKARA Human MTCTM Panel I, Cat# 636742) served as a positive control, and 18s rRNA was used as an internal control.

      primer sequences:

      human TLR3: forward TTGCCTTGTATCTACTTTTGGGG   reverse TCAACACTGTTATGTTTGTGGGT

      murine Tlr3: forward GTGAGATACAACGTAGCTGACTG   reverse TCCTGCATCCAAGATAGCAAGT

      18s (human/mice): forward GTAACCCGTTGAACCCCATT   reverse CCATCCAATCGGTAGTAGCG

      • Specify the type of luciferase reporter assay used in Figure 1E.

      Thanks for the reviewer’s insightful comment. The Dual-Luciferase® Reporter (DLR™) Assay System efficiently measures two luciferase signals. In brief, the IFN-reporter luciferase is derived from firefly (Photinus pyralis), while the internal control luciferase is from Renilla (Renilla reniformis or sea pansy). These dual luciferases are measured sequentially from a single sample. In Figure 1E, we measured the luciferase activity of IFN (firefly) and internal control gene TK (Renilla), and their ratio is shown in Figure 1E.

      • Clarify what was knocked down in the A549 stable KD cell line and whether HSV-1 infects and replicates in A549 cells.

      We sincerely appreciate the reviewer’s concern and apologize for any ambiguous descriptions. In Figure 1H, we knocked down ORMDL3 and infected the cell with HSV-1, which shows that ORMDL3 does not affect the infection and replication of HSV-1 in A549.

      • In Figure 2E, provide the rationale for using the same tag (Flag) in overexpression experiments with different molecules such as Flag-ORDML3 and Flag-RIG-I.

      We thank the reviewer’s concern. We tried to co-express different tags of ORMDL3 and innate immunity proteins, and we got the same results as before. ORMDL3-Myc overexpression can only promote the degradation of Flag-RIG-I-N, as shown in current Figure 2E.

      • Address the low knockdown efficiency shown in Figure 2D and consider whether it is sufficient for drawing conclusions.

      Thanks for the reviewer’s concern. Because ORMDL3 antibody (Abcam 107639) can recognize all ORMDL family members (ORMDL1, 2 and 3), this may explain why the knockdown efficiency of ORMDL3 is not apparent in Figure2D. We also detect the knockdown efficiency of ORMDL3 at mRNA level, which showed that ORMDL3 was silenced efficiently and specifically (Figure S2C).

      • Replace the Tubulin/β-Actin WB control with a more distinguishable band.

      Thanks for the suggestion. Owing to different gel concentration, sometimes the protein bands appear fused, but it is distinguishable that the internal controls are consistent.

      • In Figures 3D/E, the expression level of the Lysine mutant of RIG-I-N is too low. Please provide an explanation or repeat the experiment to achieve comparable expression levels and update the figure accordingly.

      Thanks for the question. The expression of lysine mutant of RIG-I-N is low, we have increased the amount of plasmid in transfection, but this still hasn't increased its expression level. Though its abundance is low, we provided evidence to show that it would not be degraded by ORMDL3. In some literatures (for example: RNF122 suppresses antiviral type I interferon production by targeting RIG-I CARDs to mediate RIG-I degradation. Proc Natl Acad Sci U S A. 2016 Aug 23;113(34):9581-6; TRIM4 modulates type I interferon induction and cellular antiviral response by targeting RIG-I for K63-linked ubiquitination. J Mol Cell Biol. 2014 Apr;6(2):154-63.), it has also been reported that lysine mutant can affect RIG-I stability. In addition, we speculate that the 4KR mutant (K146R, K154R, K164R, K172R) may change RIG-I conformation, so its expression is lower.

      • Explain why there is no difference in MAVS expression levels despite binding with MAVS.

      Thanks for the question. In our experiment, ORMDL3 has no effect on MAVS expression. Our results showed that ORMDL3 interacts with MAVS and promotes the degradation of RIG-I, so only RIG-I level has a significant difference.

      • Verify if Flag-tagged ORMDL3 is present in the IP sample in Figure 3G.

      Thanks for the comment. We reloaded the samples and blot flag, and we found that ORMDL3 cannot be pulled down by RIG-I. We have added the results in Figure 3G.

      • Reload the samples in Figure 4C to clearly identify the correct band for GFP-tagged ORMDL3.

      Thanks for the question. As ORMDL3 is small molecular protein, we fused it and its fragments to GFP to increase its molecular weight. In our GFP vector, for some unknown reason, the 26kDa band always exists. This is actually a technical difficulty. Although the GFP-fused protein and GFP band are very close, they can still be distinguished as two bands.

      • Rerun the Western blot for Actin IB in Figure 4E, as the ORMDL3-GFP (1-153) full-length appears abnormal.

      Thanks for the question. As we first blot GFP and then blot actin on the same membrane, so it appears abnormal. We reloaded the previous sample and blotted the actin again.

      • Clarify in which figure RIG-I ubiquitination is shown and whether ORMDL3 has E3 ubiquitin ligase activity. Explain how ORMDL3 facilitates USP10 transfer to RIG-I despite no direct interaction.

      Thank you for your question. In Figure 3B we showed the ubiquitination of RIG-I and ORMDL3 does not have an E3 ubiquitin ligase activity. Our results showed that although ORMDL3 does not directly interacted with RIG-I, it forms complex with USP10 (Figure 5B, 5C) and disrupt USP10 induced RIG-I stabilization by decreasing the interaction between USP10 and RIG-I (Figure 6A). The detailed mechanism needs further investigation.

      • Provide quantification for Figure 5D. Explain why the bands are not degraded by RIG-I and USP10.

      Thanks for the concern. We quantified the bands and found that overexpression of USP10 increased RIG-I protein abundance. The quantitative gray values are added into the image. USP10 functions to stabilize RIG-I rather than promoting its degradation.

      • Explain the decrease in RIG-I levels in Figure 5E when USP10 levels decrease.

      Thanks for the concern. As shown in the working model (Supplementary Figure 8), USP10 is a deubiquitinase that stabilizes RIG-I by decreasing its K48-linked ubiquitination. So, in Figure 5E, we knocked down USP10 and found a decrease in RIG-I levels, which is consistent with Figure 5D.

      • Clarify whether K48 ubiquitination on RIG-I has decreased in Figure 5F, as this is not clear from the image.

      Thanks for the question. In Figure 5F it is shown that the K48 ubiquitination level of RIG-I significantly decreased (please see the density of the bands in the IP samples).

      • Address whether ORMDL3 reduces RIG-I-N degradation in Figure 5H, as the results do not clearly support this claim.

      Thanks for the concern. We quantified the bands and the results showed that ORMDL3 promotes the degradation of RIG-I-N. The quantitative gray values are added into the image.

      • Reload Flag-ORMDL3 in Figure 6C to determine whether RIG-I-N is restored in the MG132-treated samples.

      Thank you for your question. We quantified the bands and the results showed that RIG-I-N is restored in the MG132-treated samples. The quantitative gray values are added into the image.

      • Correct numerous typos and errors, especially in the Discussion section, to improve readability

      Thanks for the suggestion. We have revised the manuscript carefully to correct these errors.

      Reviewer #2 (Recommendations for the authors):

      (1) In Figure 1G and H, The number of virus-infected cells was observed using a fluorescence microscope. In addition, can the author use other techniques to detect the impact of ORMDL3 on virus replication?

      Thanks for the question. Except for using a fluorescence microscope, we also used RT-PCR to quantify the amount of viral mRNA, and results were added in Figure 1G and H.

      (2) In Figure 3C, ORMDL3 overexpression promotes the degradation of RIG-I-N. ORMDL3 is one of three ORMDL proteins with similar amino acid sequences, does ORMDL1/2 also have this function?

      Thanks for the suggestion. We compared the function between ORMDLs and found that only ORMDL3 overexpression facilitated RIG-I-N degradation. The results were shown in Figure S2D.

      (3) In Figure 5A, USP10 is not the top protein in the Mass spec assay. Does the author verified the interaction between ORMDL3 and other protein (for example CAND1)?

      Thanks for your suggestion. We verified that ORMDL3 has no interaction with CAND1 and UFL1 but only interacts with USP10, as Figure S5 shows.

      (4) A scale bar to be added to the images in Figure 1 G, H and Figure 7K.

      Thanks for the suggestion. We have added the scale bars.

      (5) The annotations in Figure 4B, C and E should be aligned.

      Thanks for the suggestion. We have aligned the annotations.

      (6) Provide Statistical methods

      Thanks for the suggestion. We have provided the statistical methods in the materials and methods part.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1:

      In the article titled "Polyphosphate discriminates protein conformational ensembles more efficiently than DNA promoting diverse assembly and maturation behaviors," Goyal and colleagues investigate the role of negatively charged biopolymers, i.e., polyphosphate (polyP) and DNA, play in phase separation of cytidine repressor (CytR) and fructose repressor (FruR). The authors find that both negative polymers drive the formation of metastable protein/polymer condensates. However, polyPdriven condensates form more gel- or solid-like structures over time while DNA-driven condensates tend to dissipate over time. The authors link this disparate condensate behavior to polyP-induced structures within the enzymes. Specifically, they observe the formation of polyproline II-like structures within two tested enzyme variants in the presence of polyP. Together their results provide a unique insight into the physical and structural mechanism by which two unique negatively charged polymers can induce distinct phase transitions with the same protein. This study will be a welcomed addition to the condensate field and provide new molecular insights into how binding partner-induced structural changes within a given protein can affect the mesoscale behavior of condensates. The concerns outlined below are meant to strengthen the manuscript.

      Recommendation:

      We value the reviewer’s positive comments and appreciate time taken to provide detailed feedback that has certainly helped improve our manuscript.

      Major Concerns:

      (1) The biggest concern in this manuscript lies with experiments comparing polyP45, which has a net negative charge of -47, and double-stranded DNA of 45 base pairs (as stated in the methods), which will have a net negative charge of -90. Given the dependence of phase separation and phase transitions on not only net charge but charge density, this is an important factor to consider when comparing the effect of these molecules. It is unclear how or if the authors considered these factors in the design of their experiments. Because of the factor of 2 difference in net charge over the same number of polymer chain components, i.e. a chain of 45 pi vs. a chain of 45 double-stranded base pairs, it is unclear if the results from polyP vs. DNA are directly comparable. One solution would be to repeat all DNA experiments using single-stranded DNA so that the net charge is similar to polyP over the same chain length. Another possibility would be to repeat DNA experiments using a doublestranded DNA of 23 base pairs. This would allow for a nearly equal net charge (-46 vs. -47 for polyP), but the charge density would still be 2X polyP. As it stands now, the perceived differences in DNA vs. polyP behavior may be an artifact arising from the difference in net charge and charge density between DNA and polyP.

      To address the reviewer’s concerns regarding charge density differences between polyP and DNA, we conducted an experiment using a higher DNA concentration (11.24 µM) to obtain charge equivalence between the two experiments (i.e. the total concentration of charges). As shown in Figure S5, even at higher DNA concentration, the condensates undergo progressive dissolution over time. This observation indicates that the differential maturation of condensates, arising from distinct initial protein ensembles, are governed by the intrinsic properties of polyP. Charge density (i.e. the number of charges per unit volume of the polymer), on the other hand, is an intrinsic feature of the polymer which is naturally different between DNA and polyP. In fact, the primary result of our work is our observation that polyP can discern the starting ensembles more efficiently, likely through actively engaging and interacting with the ensemble while DNA appears to be a passive player. The differences are not an artifact as they arise from fundamental features of two natural anionic polymers found within cells. In other words, the outcomes could be very different if the concentration of one polymer dominates over the other (see the response below).

      (2) One outstanding question the authors do not address relates to how mixtures of CytR or FruR, DNA, and polyP behave. In the bacterial cytoplasm, these molecules are all in the same compartment (admittedly that compartment is not well mixed due to unique condensate-driven organization). Would the authors expect to see similar effects of polyP and DNA if they were in the same solution? Perhaps the authors could run a set of experiments where they vary the ratios of DNA and polyP to probe how increased levels of "stress", i.e. increased levels of polyP vs. DNA, alter the formation and behavior of enzymatic condensates.

      Following this comment, we investigated the phase separation behavior of CytR WT in the presence of different charge ratios of polyP-DNA mixtures. As seen in Author response image 1,panel A below, the outcomes are highly sensitive to the starting concentrations: at higher charge concentration of polyP (left panel), the OD and ThT fluorescence intensity is high at lower time points, both decrease and increase again. Fluorescence microscopy images (panel B) reveal similar trends, but the more fascinating outcome are the FRAP recovery profiles which recover extremely fast and fully at zero time point (panel C) despite aggregation-like tendencies observed in ThT fluorescence assays. However, at longer time points (20 and 40 mins) the FRAP recovery is significantly weaker but recovers to ~65% at 1 hour (panel C). At high relative polyP concentrations with respect to DNA, droplets are formed first which then transition into aggregates (liquid-to-solid transition; middle image in panel A). At relatively high DNA concentrations it appears that both droplets and aggregates co-exist as both OD and ThT fluorescence are moderately high. Given these complex behaviors, we have not included the same in the current manuscript as we still do not fully understand the origins of these differences. In fact, we are planning to extend this study by exploring the combinations in detail to understand the relative roles played by the two polymers in ternary mixtures.

      Author response image 1.

      (3) In Figure 1H, the recovery trace shows the fractional recovery of DM to near WT levels. It is clear from the images that recovery of the bleached region occurs, but the overall fluorescence intensity of DM is much lower than WT, even when accounting for the difference in starting condensate sizes in the Pre-Bleach images. Shouldn't this qualitative difference in total fluorescence be reflected in the quantitative trace?

      In Figure 2H, as the reviewer rightly points out, there is a clear difference in the absolute fluorescence intensity between WT and DM condensates. We would like to clarify that the recovery traces shown in Figure 2I were normalized to the pre-bleach intensity of each individual condensate to reflect fractional recovery. This normalization is intended to highlight the relative mobility of the protein within each condensate, but it does not capture the difference in total fluorescence intensity between WT and DM.

      (4) A description of the molten-globular variant Y19A FruR should be included in the main text where the variant is introduced. There is currently no additional description of the molten-globular variant in the Supplement as suggested by the manuscript.

      Figure 6A depicts the three-dimensional structure of FruR WT, with tyrosine residues Y19 and Y28, shown in red, forming stacking interactions. In the Y19A mutant, the loss of these interactions results in little changes in secondary structure (as shown in Figure 6E) but disrupts the protein’s tertiary structure, resulting in a molten globular state. The FruR work is now published in JPCB and can be found at https://doi.org/10.1021/acs.jpcb.4c03895, and is also appropriately cited in the revised version (reference 53).

      (5) Throughout the manuscript, the authors discuss polyP and DNA being able (or unable) to "distinguish" between different variants of CytR and FruR. This is confusing and suggests that DNA or polyP can choose to bind one form over another. The authors should re-work the language in this section to better reflect their direct observations for the behavior of protein in CD experiments and condensate behavior in imaging and turbidity experiments.

      We have now modified the text where necessary. The experiments were not done in the presence of both polyP and DNA, but in isolation (protein + polyP or protein + DNA). Hence, our aim is to convey that polyP is the polymer that leads to variable outcomes because of its ability to ‘interact’ differently with the different starting ensembles.

      Minor Concerns:

      (1) For all Figures, please include the number of measurements, i.e., N = ...

      We have updated all figure legends to include the number of measurements, indicated as N = ..., as suggested.

      (2) For all Figures, please place panel labels, i.e., A, B, C, etc., in the same respective location for each panel. As currently mapped out, it is difficult to easily determine which data are associated with each panel because the IDs are in various locations.

      Due to variations in data presentation and spacing within individual plots, it was challenging to place all labels in exactly the same position without obscuring important details. We have therefore maintained the labels as they were before.

      (3) In the introduction, it would be helpful for the authors to specify exactly what is meant by chaperone. Given the context, it seems that the authors refer to the chaperone activity as one that prevents aggregation. Is this correct?

      We refer to chaperone activity specifically as the ability to prevent aggregation of proteins. We have now clarified this definition in the Introduction section of the revised manuscript.

      (4) The results for experiments shown in Figure 3 need additional setup in the text. Were these measurements taken immediately after mixing WT, DM, or P33A with polyP? If so, why do condensates immediately appear and then dissipate before ThT-detected aggregates begin forming? Or were condensates allowed to form and then transferred to a different buffer, after which measurements were taken? Without a brief description of the experimental setup, interpreting the results is difficult.

      The condensates appear immediately after adding polyP to protein solutions, indicating that the condensate phase is kinetically accessible on mixing polyP with DM or the WT. As illustrated in Figure 3A and 3B, for WT protein, the condensates undergo liquid to solid transition over the time as this likely is the most thermodynamically stable phase. Effectively, this work is to convey that it is important to look at time-dependence of even droplets when formed as they may not be the most stable phase.

      (5) Please include images of P33A over the time course of the experiment in Figure 3B.

      We have included the representative images of P33A in presence of polyP over the time in Figure 3B in the revised manuscript.

      (6) In Figures 3D, E, G, and H, please plot each measurement separately with mean and standard deviation to enable the reader to see each data point.

      We have now revised Figures 3D, E, G, and H to show individual data points along with the mean and standard deviation.

      (7) In the top paragraph on page 12, "fast-moving molecules" can be replaced with "dynamic molecules", as this offers a better description of the FRAP data.

      We have incorporated the suggested changes.

      (8) In the "Structural changes within the condensates spans over three hours" results section on page 15, the conclusion reads "In summary, we find that both the WT and the DM 'unfold' on forming condensates with polyP..." The way this is written suggests that WT and DM behave in a similar manner. Given the CD data, however, it seems that by 4 hours, DM forms alpha helices while the WT does not. This suggests that while each unfolds, the conformation at 4 hours is different. The summary should reflect these differences.

      We fully agree with the reviewer on this. The summary is now modified to include the fact the DM forms alpha helices at 4 hours while the WT does not.

      (9) At the end of the first paragraph of the results section "DNA does not discriminate the conformational ensembles" the authors should refer to Figure 2G, where they show the altered morphology of polP-P33A condensates.

      We have now included the reference to Figure 2G.

      (10) The authors refer to droplets "solubilizing" throughout the manuscript. It seems that dissolve is a better term to use. Solubilize is better associated with individual biomolecules while dissolve is better associated with condensate behavior.

      We thank the reviewer for pointing this out. We have revised the manuscript to replace “solubilize” with “dissolve”.

      (11) In Figures 5L and 5N, please change the Y-axis scale so that each curve is visible on the plot.

      We have adjusted the Y-axis scale in Figures 5L, 5M, and 5N to ensure that each curve is clearly visible and for easier comparison among the variants.

      (12) The authors should show an image of FruR WT and Y19A with DNA for a direct comparison with experiments in which FruR and polyP were used. The addition of turbidity measurements of samples shown in Figure 6D will offer another direct comparison. As written, there is no way for the author to directly compare the effects of polyP and DNA on FruR phase transitions.

      As suggested, we have now included representative images of FruR WT and Y19A with DNA (Figure 6K and 6L) to enable a direct comparison with the FruR–polyP experiments. Also, we have already shown turbidity measurements in Figure 6B and 6C corresponding to the samples shown in Figure 6D.

      Reviewer 2:

      In this study, Goyal et al demonstrate that the assembly of proteins with polyphosphate into either condensates or aggregates can reveal information on the initial protein ensemble. They show that, unlike DNA, polyphosphate is able to effectively discriminate against initial protein ensembles with different conformational heterogeneity, structure, and compactness. The authors further show that the protein native ensemble is vital on whether polyphosphate induces phase separation or aggregation, whereas DNA induces a similar outcome regardless of the initial protein ensemble. This work provides a way to improve our mechanistic understanding of how conformational transitions of proteins may regulate or drive LLPS condensate and aggregate assemblies within biological systems.

      We thank the reviewer for the favorable comments on the manuscript.

      Major Concerns:

      (1) The authors are using bacterial proteins (CytR and FruR) and solely represent polyphosphates as polyP45 (a polyphosphate with 45 Pi units). However, in bacterial systems, polyphosphates can be significantly longer (in the order of 100s to 1000 Pi units). Additionally, the experiments were run at neutral pH (7.0), and though this is fairly appropriate for the cytoplasm, volutin granules (where polyphosphates often accumulate) are typically considered slightly acidic (pH 5.5-6.5). From a physiological perspective, understanding how pH and the length of polyphosphate influence the ability to induce condensates or aggregates could be of importance.

      We appreciate the reviewer’s insightful comments regarding the physiological relevance of polyphosphate length and pH. In our current study, we used polyP45 as it is easily available commercially and we conducted our experiments at pH 7 to mimic the general cytoplasm conditions. We agree that polyphosphates in bacterial cells can be significantly longer (hundreds to thousands of Pi units) and conducting experiments at slightly more acidic environment would be physiologically relevant. We plan to use longer polyP from Regene Tiss Inc. and acidic pH to explore how polyphosphate-induced phase separation of CytR vary with pH as a part of a future study. One could imagine doing all the experiments listed in the manuscript at different pH conditions for the different variants, but this could not be a part of the current work which has a specific focus on the differences in maturation properties depending on the nature of starting ensemble. However, the pKa values of the internal hydroxyl groups is ~2.2 (DOI:10.2147/IJN.S389819) indicating that the polyP carries near identical charges in the pH range between 4-7, and hence we expect little change in the charged status of polyP. On the other hand, the protonation states of charged amino acids within CytR could vary with pH, thus influencing its assembly properties.

      (2) In the study, the longest metastable condensate induced by polyphosphate lasted approximately 3 hours before resolubilizing. It would be nice if the authors were able to generate a longer-lived condensate phase that would enable further mechanistic studies (e.g., NMR).

      We agree that generating longer-lived condensates would be highly valuable for mechanistic studies. However, the formation and stability of condensates is an intrinsic property of protein, and optimizing different conditions for a longer-lived condensate phase is beyond the scope of the current study. It is possible that the condensates are long-lived with longer polyP, but it is not clear if this would indeed be the case. We would also like to state here that while it is common to report on the liquid-to-solid transition in condensates, the intrinsic metastability of droplets (when there is no aggregation) is rarely reported. One possibility is to mutationally introduce cysteine residues and induce the formation of disulphide bridges (as done in a recent work, doi: 10.1021/jacs.4c09557) that make the condensate highly stable kinetically; however, this would also complicate the interpretation as the mechanism of condensate formation might be very different. We have therefore reported our results as an observation arising from differences in the nature of the poly-anionic polymers.

      (3) The authors showed that CytR DM (fully folded), CytR WT (minor state folded), and CytR P33A (highly disordered) with polyphosphates lead to longer-lived condensates that resolubilize, shorterlived condensates that aggregate, and immediate aggregating, respectively. Whereas FruR (folded) and FruR Y19A (molten globular) with polyphosphate induce spontaneous aggregation and short-lived condensates, respectively. I would expect FruR to be more similar to CytR DM and FruR Y19A more similar to CytR WT in terms of structure and conformational dynamics and plasticity, yet they have opposing results. This raises a bit of concern. Meaning, that though polyphosphate discriminates between the different ensembles, is it actually possible to obtain information on the initial ensemble composition?

      In the current study, we show that CytR WT (less structured) and FruR Y19A (molten globule) form short-lived condensates that aggregate. We agree with the reviewer that while CytR DM (fully folded) forms condensates that dissolve over time, FruR WT (fully folded) variant forms aggregates immediately upon polyP addition. The observations show that polyP can discriminate between different protein conformations, in contrast to DNA, which does not show such selectivity. However, we acknowledge that while polyP-induced behavior reflects aspects of protein ensemble properties, it does not provide direct insight into the nature of the initial conformational ensemble.

      (4) In the case of FruR with polyphosphate, no CD for the secondary structure analysis was provided as it was for CytR. It would be useful to see if the polyphosphate-induced structural changes observed for CytR hold true for FruR as well.

      We thank the reviewer for the suggestion. In response, we have performed far-UV CD experiments on FruR variants in the presence of polyP. Similar to the CytR WT, FruR WT shows unfolding upon polyP addition. A similar outcome is noted for the Y19A variant though there is significant residual helix content in the condensate unlike the WT. The CD spectra of FruR variants have been added to Figure 6.

      Minor Concerns/Suggestions:

      Under conclusion, third paragraph, first sentence. This sentence reads, "Our observations thus establish that polyP efficiently discriminates the conformational features of proteins than DNA, contributing to the diverse outcomes."

      We thank the reviewer for pointing this out. The sentence has been revised for clarity. It now reads “Our observations establish that polyP is more sensitive to the conformational features of proteins than DNA, thereby contributing to the diverse outcomes.”

      One experimental suggestion. Seeing that protein dynamics and plasticity seem to play a role. For either CytR WT or DM, it would be interesting to see the influence of temperature. Altering the temperature is a good way to perturb the population distribution of conformation sub-states and to alter kinetics. It may be that at a lower temperature (maybe 5C) for the WT you reduce conformational dynamics and you obtain results more similar to that of the DM. Alternatively, heating the DM would be another option. Obviously, there are additional challenges that may arise with changing the temperature, but if it were to work I think it could add some value.

      We thank the reviewer for the thoughtful suggestion. Due to limitations in our current experimental setup (as the reviewer notes as ‘challenges’)- the confocal set up does not have a temperature controller - we will not be to perform temperature-controlled assays. However, the ‘structure’ of CytR variants do not vary much between 280 – 298 K, and this is one of the reasons for choosing three variants without altering any other thermodynamic property. If temperature were varied, the dynamics of polyP would also change and hence the true molecule origins of any differences we might observe will be confounded by the dynamic effects on polyP as well. In this work, we have eliminated any dynamic differences in polyP by performing the experiments at a fixed temperature.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Reviewer #1 (Public Review):

      Summary:

      This manuscript explores the impact of serotonin on olfactory coding in the antennal lobe of locusts and odor-evoked behavior. The authors use serotonin injections paired with an odorevoked palp-opening response assay and bath application of serotonin with intracellular recordings of odor-evoked responses from projection neurons (PNs).

      Strengths:

      The authors make several interesting observations, including that serotonin enhances behavioral responses to appetitive odors in starved and fed animals, induces spontaneous bursting in PNs, directly impacts PN excitability, and uniformly enhances PN responses to odors.

      Weaknesses:

      The one remaining issue to be resolved is the theoretical discrepancy between the physiology and the behavior. The authors provide a computational model that could explain this discrepancy and provide the caveat that while the physiological data was collected from the antennal lobe, but there could be other olfactory processing stages involved. Indeed other processing stages could be the sites for the computational functions proposed by the model. There is an additional caveat which is that the physiological data were collected 5-10 minutes after serotonin application whereas the behavioral data were collected 3 hours after serotonin application. It is difficult to link physiological processes induced 5 minutes into serotonin application to behavioral consequences 3 hours subsequent to serotonin application. The discrepancy between physiology and behavior could easily reflect the timing of action of serotonin (i.e. differences between immediate and longer-term impact).

      For our behavioral experiments, we waited 3 hours after serotonin injection to allow serotonin to penetrate through the layers of air sacks and the sheath, and for the locusts to calm down and recover their baseline POR activity levels. For the physiology experiments, we noticed that the quality of the patch decreased over time after serotonin introduction. Hence, it was difficult to hold cells for that long. However, the point raised by the reviewer is well-taken. We have performed additional experiments to show that the changes in POR levels to different odorants are rapid and can be observed within 15 minutes of injecting serotonin (Author response image 2) and that the physiological changes in PNs (bursting spontaneous activity, maintenance of temporal firing patterns, and increase odor-evoked responses) persists when the cells are held for longer duration (i.e. 3 hours akin to our behavioral experiments). It is worth noting that 3-hour in-vivo intracellular recordings are not easily achievable and come with many experimental constraints. So far, we have managed to record from two PNs that were held for this long and add them to this rebuttal to support our conclusions. (Author response image 1).

      Author response image 1.

      Spontaneous and odor-evoked responses in individual PNs remain consistent for three hours after serotonin introduction into the recording chamber/bath. (A) Representative intracellular recording showing membrane potential fluctuations in a projection neuron (PN) in the antennal lobe. Spontaneous and odor-evoked responses to four odorants (pink color bars, 4 s duration) are shown before (control) and after serotonin application (5HT). Voltage traces 30 minutes (30min), 1 hour (1h), 2 hours (2h), and 3 hours (3h) after 5HT application are shown to illustrate the persisting effect of serotonin during spontaneous and odor-evoked activity periods. (B) Rasterized spiking activities in two recorded PNs are shown. Spontaneous and odor-evoked responses are shown in all 5 consecutive trials. Note that the odor-evoked response patterns are maintained, but the spontaneous activity patterns are altered after serotonin introduction.

      Author response image 2.

      Palp-opening response (POR) patterns to different odorants remain consistent following serotonin introduction. The probability of PORs is shown as a bar plot for four different odorants; hexanol (green), benzaldehyde (blue), linalool (red), and ammonium (purple). PORs before serotonin injection (solid bars) are compared against response levels after serotonin injection (striped bars). As can be noted, PORs to the four odorants remain consistent when tested 15 minutes and 3 hours after (5HT) serotonin injection.

      Overall, the study demonstrates the impact of serotonin on odor-evoked responses of PNs and odor-guided behavior in locusts. Serotonin appears to have non-linear effects including changing the firing patterns of PNs from monotonic to bursting and altering behavioral responses in an odor-specific manner, rather than uniformly across all stimuli presented.

      We thank the reviewer for again providing very useful feedback for improving our manuscript.

      Reviewer #2 (Public Review):

      Summary:

      The authors investigate the influence of serotonin on feeding behavior and electrophysiological responses in the antennal lobe of locusts. They find that serotonin injection changes behavior in an odor-specific way. In physiology experiments, they can show that projection neurons in the antennal lobe generally increase their baseline firing and odor responses upon serotonin injection. Using a modeling approach the authors propose a framework on how a general increase in antennal lobe output can lead to odor-specific changes in behavior.

      Strengths:

      This study shows that serotonin affects feeding behavior and odor processing in the antennal lobe of locusts, as serotonin injection increases activity levels of projection neurons. This study provides another piece of evidence that serotonin is a general neuromodulator within the early olfactory processing system across insects and even phyla.

      Weaknesses:

      I still have several concerns regarding the generalizability of the model and interpretation of results. The authors cannot provide evidence that serotonin modulation of projection neurons impacts behavior.

      This is true and likely to be true for any study linking neural responses to behavior. There are multiple circuits and pathways that would get impacted by a neuromodulator like serotonin. What we showed with our physiology is how spontaneous and odor-evoked responses in the very first neural network that receives olfactory sensory neuron input are altered by serotonin. Given the specificity of the changes in behavioral outcomes (i.e. odor-specific increase and decrease in an appetitive behavior) and non-specificity in the changes at the level of individual PNs (general increase in odor-evoked spiking activity), we presented a relatively simple computational model to address the apparent mismatch between neural and behavioral responses. (Author response image 4).

      The authors show that odor identity is maintained after 5-HT injection, however, the authors do not show if PN responses to different odors were differently affected after serotonin exposure.

      The PN responses to different odorants changed in a qualitatively similar fashion. (Author response image 3)

      Author response image 3.

      PN activity before and after 5HT application are compared for different cellodor combinations. As can be noted, the changes are qualitatively similar in all cases. After 5HT application, the baseline activity became more bursty, but the odor-evoked response patterns were robustly maintained for all odorants.

      Regarding the model, the authors show that the model works for odors with non-overlapping PN activation. However, only one appetitive, one neutral, and one aversive odor has been tested and modeled here. Can the fixed-weight model also hold for other appetitive and aversive odors that might share more overlap between active PNs? How could the model generate BZA attraction in 5-HT exposed animals (as seen in behavior data in Figure 1) if the same PNs just get activated more?

      Author response image 4.

      Testing the generality of the proposed computational model. To test the generality of the model proposed we used a published dataset [Chandak and Raman, 2023]: Neural dataset – 89 PN responses to a panel of twenty-two odorants; Behavioral dataset – probability of POR responses to the same twenty-two odorants. We built the model using just the three odorants overlapping between the two datasets: hexanol, benzaldehyde and linalool. The true probability of POR values of the twenty odorants and the POR probability predicted by the model are shown for all twenty-two odorants as a scatter plot. As can be noted, there is a high correlation (0.79) between the true and the predicted values.

      The authors should still not exclude the possibility that serotonin injections could affect behavior via modulation of other cell types than projection neurons. This should still be discussed, serotonin might rather shut down baseline activation of local inhibitory neurons - and thus lead to the interesting bursting phenotypes, which can also be seen in the baseline response, due to local PN-to-LN feedback.

      As we agreed, there could be other cells that are impacted by serotonin release. Our goal in this study was to characterize how spontaneous and odor-evoked responses in the very first neural network that receives olfactory sensory neuron input are altered by serotonin. Within this circuit, there are local inhibitory neurons (LNs), as correctly indicated by this reviewer. Surprisingly, our preliminary data indicates that LNs are not shut down but also have an enhanced odor-evoked neural response. (Author response image 5.) Further data would be needed to verify this observation and determine the mechanism that mediate the changes in PN excitability. Irrespective, since PN activity should incorporate the effects of changes in the local neuron responses and is the sole output from the antennal lobe that drives all downstream odor-evoked activity, we focused on them in this study.

      Author response image 5.

      Representative traces showing intracellular recording from a local neuron in the antennal lobe. Five consecutive trials are shown. Note that LNs in the locust antennal lobe are non-spiking. The LN activity before, during, and after the presentation of benzaldehyde and hexanol (colored bar; 4s) are shown. The Left and Right panels show LN activity before and after the application of 5HT. As can be noted, 5HT did not shut down odor-evoked activity in this local neuron.

      The authors did not fully tone down their claims regarding causality between serotonin and starved state behavioral responses. There is no proof that serotonin injection mimics starved behavioral responses.

      Specific minor issues:<br /> It is still unclear how naturalistic the chosen odor concentrations are. This is especially important as behavioral responses to different concentrations of odors are differently modulated after serotonin injection (Figure 2: Linalool and Ammonium). The new method part does not indicate the concentrations of odors used for electrophysiology.

      All odorants were diluted to 0.01-10% concentration by volume in either mineral oil or distilled water. This information is included in the Methods section. For most odorants used in the study, the lower concentrations only evoked a very weak neural response, and the higher concentrations evoked more robust responses. The POR responses for these odorants at various concentrations chosen are included in Figure 2. Note, that the responses to linalool and ammonium remained weak throughout the concentration changes, compared to hexanol and benzaldehyde.

      Did all tested PNs respond to all odorants?

      No, only a subset of them responses to each odorant. These responses have been well characterized in earlier publications [included refs].

      The authors do not show if PN responses to different odors were differently affected after serotonin exposure. They describe that ON responses were robust, but OFF responses were less consistent after 5-HT injection. Was this true across all odors tested? Example traces are shown, but the odor is not indicated in Figure 4A. Figure 4D shows that many odor-PN combinations did not change their peak spiking activity - was this true across odorants? In Figure 5 - are PNs ordered by odor-type exposure?

      Also, Figure 6A only shows example trajectories for odorants - how does the average look? Regarding the data used for the model - can the new dataset from the 82 odor-PN pairs reproduce the activation pattern of the previously collected dataset of 89 pairs?

      What is shown in Figure 6A is the trial-averaged response trajectory combining activities of all 82 odor-PN pairs. 82 odor-PN pair was collected intracellularly examining the responses to four odorants before and after 5HT application. The second dataset involving 89 PN responses to 22 odorants was collected extracellularly. They have qualitative similarities in each odorant activate a unique subset of those neurons.

      The authors toned down their claims that serotonin injection can mimic the starved state behavioral response. However, some sentences still indicate this finding and should also be toned down:

      last sentence of introduction - "In sum, our results provide a more systems-level view of how a specific neuromodulator (serotonin) alters neural circuits to produce flexible behavioral outcomes."

      We believe we showed this with our computational model, how uniform changes in the neural responses could lead to variable and odor-specific changes in behavioral PORs.

      discussion: "Finally, fed locusts injected with serotonin generated similar appetitive responses to food-related odorants as starved locusts indicating the role of serotonin in hunger statedependent modulation of odor-evoked responses." This claim is not supported.

      Figure 7 shows that the fed locusts had lower POR to hex and bza. The POR responses significantly increased after the 5HT application. However, we have rephrased this sentence to limit our claims to this result. "Finally, fed locusts injected with serotonin generated similar appetitive palp-opening responses to food-related odorants as observed in starved locusts”

      last results: "However, consistent with results from the hungry locusts, the introduction of serotonin increased the appetitive POR responses to HEX and BZA. Intriguingly, the appetitive responses of fed locusts treated with 5HT were comparable or slightly higher than the responses of hungry locusts to the same set of odorants."

      Again this sentence simply describes the result shown in Figure 7.

      In Figure 7 - BZA response seems unchanged in hungry and fed animals and only 5-HT injection enhances the response. There is only one example where 5-HT application and starvation induce the same change in behavior - N=1 is not enough to conclude that serotonin influences food-driven behaviors.

      The reviewer is ignoring the lack of changes to PORs to linalool and ammonium. Taken together, serotonin increased PORs to only two of the four odorants in starved locusts. The responses after 5HT modulation to these four odorants were similar in fed locusts treated with 5HT and starved locusts.

      Also, this seems to be wrongly interpreted in Figure 7: "It is worth noting that responses to LOOL and AMN, non-food related odorants with weaker PORs, remained unchanged in fed locusts treated with 5HT." The authors indicate a significant reduction in POR after 5-HT injection on LOOL response in Figure 7.

      Revised.<br /> It is worth noting that responses to LOOL and AMN, non-food related odorants with weaker PORs, and reduced in fed locusts treated with 5HT."

      Also, the newly added sentence at the end of the discussion does not make sense: "However, since 5HT increased behavioral responses in both fed and hungry locusts, the precise role of 5HT modulation and whether it underlies hunger-state dependent modulation of appetitive behavior still remains to be determined."<br /> The authors did not test 5-HT injection in starved animals

      The results shown in Figure 1 compare the POR responses of starved locusts before and after 5HT introduction.

      We again thank the reviewer for useful feedback to further improve our manuscript.


      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      This manuscript explores the impact of serotonin on olfactory coding in the antennal lobe of locusts and odor-evoked behavior. The authors use serotonin injections paired with an odor-evoked palp-opening response assay and bath application of serotonin with intracellular recordings of odor-evoked responses from projection neurons (PNs).

      Strengths:

      The authors make several interesting observations, including that serotonin enhances behavioral responses to appetitive odors in starved and fed animals, induces spontaneous bursting in PNs, and uniformly enhances PN responses to odors. Overall, I had no technical concerns. Weaknesses:

      While there are several interesting observations, the conclusions that serotonin enhanced sensitivity specifically and that serotonin had feeding-state-specific effects, were not supported by the evidence provided. Furthermore, there were other instances in which much more clarification was needed for me to follow the assumptions being made and inadequate statistical testing was reported.

      Major concerns.

      • To enhance olfactory sensitivity, the expected results would be that serotonin causes locusts to perceive each odor as being at a relatively higher concentration. The authors recapitulate a classic olfactory behavioral phenomenon where higher odor concentrations evoke weaker responses which is indicative of the odors becoming aversive. If serotonin enhanced the sensitivity to odors, then the dose-response curve should have shifted to the left, resulting in a more pronounced aversion to high odor concentrations. However, the authors show an increase in response magnitude across all odor concentrations. I don't think the authors can claim that serotonin enhances the behavioral sensitivity to odors because the locusts no longer show concentration-dependent aversion. Instead, I think the authors can claim that serotonin induces increased olfactory arousal.

      The reviewer makes a valid point. Bath application of serotonin increased POR behavioral responses across all odor concentrations, and concentration-dependent aversion was also not observed. Furthermore, the monotonic relationship between projection neuron responses and the intensity of current injection is altered when serotonin is exogenously introduced (see Author response image 1; see below for more explanation). Hence, our data suggests that serotonin alters the dose-response relationship between neural/behavioral responses and odor intensity. As recommended, we have followed what the reviewer has suggested and revised our claim to serotonin inducing increase in olfactory arousal. The new physiology data has been added as Supplementary Figure 3 to the revised manuscript.

      • The authors report that 5-HT causes PNs to change from tonic to bursting and conclude that this stems from a change in excitability. However, excitability tests (such as I/V plots) were not included, so it's difficult to disambiguate excitability changes from changes in synaptic input from other network components.

      To confirm that the PN excitability did indeed change after serotonin application, we performed a new set of current-clamp recordings. In these experiments, we monitored the spiking activities in individual PNs as we injected different levels of current injections (200 – 1000 pico Amperes). Note that locust LNs that provide recurrent inhibition arborize and integrate inputs from a large number of sensory neurons and projection neurons. Therefore, activating a single PN should not activate the local neurons and therefore the antennal lobe network.

      We found that the total spiking activity monotonically increased with the magnitude of the current injection in all four PNs recorded (Author response image 1). However, after serotonin injection, we found that the spiking activity remained relatively stable and did not systematically vary with the magnitude of the current injection. While the changes in odor-evoked responses may incorporate both excitability changes in individual PNs and recurrent feedback inhibition through GABAergic LNs, these results from our current injection experiments unambiguously indicate that there are changes in excitability at the level of individual PNs. We have added this result to the revised manuscript.

      Author response image 1.

      Current-injection induced spiking activity in individual PNs is altered after serotonin application. (A) Representative intracellular recordings showing membrane potential fluctuations as a function of time for one projection neuron (PNs) in the locust antennal lobe. A two-second window when a positive 200-1000pA current was applied is shown. Firing patterns before (left) and after (right) serotonin application are shown for comparison. Note, the spiking activity changes after the 5HT application. The black bar represents the 20mV scale. (B) Dose-response curves showing the average number of action potentials (across 5 trials) during the 2second current pulse before (green) and after (purple) serotonin for each recorded PN. Note that the current intensity was systematically increased from 200 pA to 1000 pA. The (C) The mean number of spikes across the four recorded cells during current injection is shown. The color progression represents the intensity of applied current ranging 200pA (leftmost bar) to 1000pA (rightmost bar). The dose-response trends before (green) and after (purple) 5HT application are shown for comparison. The error bars represent SEM across the four cells.

      • There is another explanation for the theoretical discrepancy between physiology and behavior, which is that odor coding is further processing in higher brain regions (ie. Other than the antennal lobe) not studied in the physiological component of this study. This should at least be discussed.

      This is a valid argument. For our model of neural mapping onto behavior to work, we only need the odorant that evokes or suppresses PORs to activate a distinct set of neurons. Having said that, our extracellular recording results (Fig. 6E) indicate that hexanol (high POR) and linalool (low POR) do activate highly non-overlapping sets of PNs in the antennal lobe. Hence, our results suggest that the segregation of neural activity based on behavioral relevance already begins in the antennal lobe. We have added this clarification to the discussion section.

      • The authors cannot claim that serotonin underlies a hunger state-dependent modulation, only that serotonin impacts responses to appetitive odors. Serotonin enhanced PORs for starved and fed locusts, so the conclusion would be that serotonin enhances responses regardless of the hunger state. If the authors had antagonized 5-HT receptors and shown that feeding no longer impacts POR, then they could make the claim that serotonin underlies this effect. As it stands, these appear to be two independent phenomena.

      This is also a valid point. We have clarified this in the revised manuscript.

      Reviewer #2 (Public Review):

      Summary:

      The authors investigate the influence of serotonin on feeding behavior and electrophysiological responses in the antennal lobe of locusts. They find that serotonin injection changes behavior in an odorspecific way. In physiology experiments, they can show that antennal lobe neurons generally increase their baseline firing and odor responses upon serotonin injection. Using a modeling approach the authors propose a framework on how a general increase in antennal lobe output can lead to odorspecific changes in behavior. The authors finally suggest that serotonin injection can mimic a change in a hunger state.

      Strengths:

      This study shows that serotonin affects feeding behavior and odor processing in the antennal lobe of locusts, as serotonin injection increases activity levels of antennal lobe neurons. This study provides another piece of evidence that serotonin is a general neuromodulator within the early olfactory processing system across insects and even phyla. Weaknesses:

      I have several concerns regarding missing control experiments, unclear data analysis, and interpretation of results.

      A detailed description of the behavioral experiments is lacking. Did the authors also provide a mineral oil control and did they analyze the baseline POR response? Is there an increase in baseline response after serotonin exposure already at the behavioral output level? It is generally unclear how naturalistic the chosen odor concentrations are. This is especially important as behavioral responses to different concentrations of odors are differently modulated after serotonin injection (Figure 2: Linalool and Ammonium).

      POR protocol: Sixth instar locusts (Schistocera americana) of either sex were starved for 24-48 hours before the experiment or taken straight from the colony and fed blades of grass for the satiated condition. Locusts were immobilized by placing them in the plastic tube and securing their body with black electric tape (see Author response image 2). Locusts were given 20 - 30 minutes to acclimatize after placement in the immobilization tube. As can be noted, the head of the locusts along with the antenna and maxillary palps protruded out of this immobilization tube so they can be freely moved by the locusts. Note that the maxillary palps are sensory organs close to the mouth parts that are used to grab food and help with the feeding process.

      It is worth noting that our earlier studies had shown that the presentation of ‘appetitive odorants’ triggers the locust to open their maxillary palps even when no food is presented (Saha et al., 2017; Nizampatnam et al., 2018; Nizampatnam et al., 2022; Chandak and Raman, 2023.) Furthermore, our earlies results indicate that the probability of palp opening varies across different odorants (Chandak and Raman, 2023). We chose four odorants that had a diverse range of palp-opening: supra-median (hexanol), median (benzaldehyde), and sub-median (linaool). Therefore, each locust in our experiments was presented with one concentration of four odorants (hexanol, benzaldehyde, linalool, and ammonium) in a pseudorandomized order. The odorants were chosen based on our physiology results such that they evoked different levels of spiking activities.

      The odor pulse was 4 s in duration and the inter-pulse interval was set to 60 s. The experiments were recorded using a web camera (Microsoft) placed right in front of the locusts. The camera was fully automated with the custom MATLAB script to start recording 2 seconds before the odor pulse and end recording at odor termination. An LED was used to track the stimulus onset/offset. The POR responses were manually scored offline. Responses to each odorant were scored a 0 or 1 depending on if the palps remained closed or opened. A positive POR was defined as a movement of the maxillary palps during the odor presentation time window as shown on the locust schematic (Main Paper Figure 1).

      Author response image 2.

      Pictures showing the behavior experiment setup and representative palp-opening responses in a locust.

      As the reviewer inquired, we performed a new series of POR experiments, where we explored POR responses to mineral oil and hexanol, before and after serotonin injection. For this study, we used 10 locusts that were starved 24-48 hours before the experiment. Note that hexanol was diluted at 1% (v/v) concentration in mineral oil. Our results reveal that locusts PORs to hexanol (~ 50% PORs) were significantly higher than those triggered by mineral oil (~10% PORs). Injection of serotonin increased the POR response rate to hexanol but did not alter the PORs evoked by mineral oil (Author response image 3).

      Author response image 3.

      Serotonin does not alter the palp-opening responses evoked by paraffin oil. The PORs before and after (5HT) serotonin injection are summarized and shown as a bar plot for hexanol and paraffin oil. Striped bars signify the data collected after 5HT injection. Significant differences are identified in the plot (one-tailed paired-sample t-test; (*p<0.05).

      Regarding recordings of potential PNs - the authors do not provide evidence that they did record from projection neurons and not other types of antennal lobe neurons. Thus, these claims should be phrased more carefully.

      In the locust antennal lobe, only the cholinergic projection neurons fire full-blown sodium spikes. The GABAergic local neurons only fire calcium ‘spikelets’ (Laurent, TINS, 1996; Stopfer et al., 2003; see Author response image 4 for an example). Hence, we are pretty confident that we are only recording from PNs. Furthermore, due to the physiological properties of the LNs, their signals being too small, they are also not detected in the extracellular recordings from the locust antennal lobe. Hence, we are confident with our claims and conclusion.

      Author response image 4.

      PN vs LN physiological differences: Left: A representative raw voltage traces recorded from a local neuron before, during, and after a 4-second odor pulse are shown. Note that the local neurons in the locust antennal lobe do not fire full-blown sodium spikes but only fire small calcium spikelets. On the right: A representative raw voltage trace recorded from a representative projection neuron is shown for comparison. Clear sodium spikes are clearly visible during spontaneous and odor-evoked periods. The gray bar represents 4 seconds of odor pulse. The vertical black bar represents the 40mV.

      The presented model suggests labeled lines in the antennal lobe output of locusts. Could the presented model also explain a shift in behavior from aversion to attraction - such as seen in locusts when they switch from a solitarious to a gregarious state? The authors might want to discuss other possible scenarios, such as that odor evaluation and decision-making take place in higher brain regions, or that other neuromodulators might affect behavioral output. Serotonin injections could affect behavior via modulation of other cell types than antennal lobe neurons. This should also be discussed - the same is true for potential PNs - serotonin might not directly affect this cell type, but might rather shut down local inhibitory neurons.

      There are multiple questions here. First, regarding solitary vs. gregarious states, we are currently repeating these experiments on solitary locusts. Our preliminary results (not included in the manuscript) indicate that the solitary animals have increased olfactory arousal and respond with a higher POR but are less selective and respond similarly to multiple odorants. We are examining the physiology to determine whether the model for mapping neural responses onto behavior could also explain observations in solitary animals.

      Second, this reviewer makes the point raised by Reviewer 1. We agree that odor evaluation and decisionmaking might take place in higher brain regions. All we could conclude based on our data is that a segregation of neural activity based on behavioral relevance might provide the simplest approach to map non-specific increase in stimulus-evoked neural responses onto odor-specific changes in behavioral outcome. Furthermore, our results indicate that hexanol and linalool, two odorants that had an increase and decrease in PORs after serotonin injection, had only minimal neural response overlap in the antennal lobe. These results suggest that the formatting of neural activity to support varying behavioral outcomes might already begin in the antennal lobe. We have added this to our discussion.

      Third, regarding serotonin impacting PNs, we performed a new set of current-clamp experiments to examine this issue (Author response image 1). Our results clearly show that projection neuron activity in response to current injections (that should not incorporate feedback inhibition through local neurons) was altered after serotonin injection. Therefore, the observed changes in the odor-evoked neural ensemble activity should incorporate modulation at both individual PN level and at the network level. We have added this to our discussion as well.

      Finally, the authors claim that serotonin injection can mimic the starved state behavioral response. However, this is only shown for one of the four odors that are tested for behavior (HEX), thus the data does not support this claim.

      We note that Hex is the only appetitive odorant in the panel. But, as reviewer 1 has also brought up a similar point, we have toned down our claims and will investigate this carefully in a future study.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      • Was the POR of the locusts towards linalool and ammonium higher than towards a blank odor cartridge? I ask because the locusts appear to be less likely to respond to these odors and so I am concerned that this assay is not relevant to the ecological context of these odors. In other words, perhaps serotonin did not enhance the responses to these odors in this assay, because this is not a context in which locusts would normally respond to these odors.

      The POR response to linalool and ammonium is lower and comparable to that of paraffin oil. Serotonin does not increase POR responses to paraffin oil but does increase response to hexanol (an appetitive odorant). We have clarified this using new data (Author response image 5).

      • It seems to me that Figure 5C is the crux for understanding the potential impact of 5-HT on odor coding, but it is somewhat confusing and underutilized. Is the implication that 5-HT decorrelates spontaneous activity such that when an odor stimulus arrives, the odor-evoked activity deviates to a greater degree? The authors make claims about this figure that require the reader to guess as to the aspect of the figure to which they are referring.

      The reviewer makes an astute observation. Yes, the spontaneous activity in the antennal lobe network before serotonin introduction is not correlated with the ensemble spontaneous activity after serotonin bath application. Remarkably, the odor-evoked responses were highly similar, both in the reduced PCA space and when assayed using high-dimensional ensemble neural activity vectors. Whether the changes in network spontaneous activity have a function in odor detection and recognition is not fully understood and cannot be convincingly answered using our data. But this is something that we had pondered.

      • The modeling component summarized in Figure 6 needs clarification and more detail. Perhaps example traces associated with positive weighting within neural ensemble 1 relative to neural ensemble 2? I struggled to understand conceptually how the model resolved the theoretical discrepancy between physiology and behavior.

      As recommended, here is a plot showing the responses of four PNs that had positive weights to hexanol and linalool. As can be expected, each PN in this group had higher responses to hexanol and no response to linalool. Further, the four PNs that received negative weights had response only to linalool.

      Author response image 5.

      Odor-evoked responses of four PNs that received positive weights in the model (top panel), and four PNs that were assigned negative weights in the model (bottom).

      • Was there a significant difference between the PORs of hungry vs. fed locusts? The authors state that they differ and provide statistics for the comparisons to locusts injected with 5-HT, but then don't provide any statistical analyses of hungry vs. fed animals.

      The POR responses to HEX (an appetitive odorant) were significantly different between the hungry and starved locusts.

      Author response image 6.

      A bar plot summarizing PORs to all four odors for satiated locust (highlighted with stripes), before (dark shade), and after 5HT injection (lighter shade). To allow comparison before 5HT injection for starved locust plotted as well (without stripes). The significance was determined using a one-tailed paired-sample ttest(*p<0.05).

      • Were any of the effects of 5-HT on odor-evoked PN responses significant? No statistics are provided.

      We examined the distribution of odor-evoked responses in PNs before and after 5HT introduction. We found that the overall distribution was not significantly different between the two (one-tailed pairedsample t-test; p = 0.93).

      Author response image 7.

      Comparison of the distribution of odor-evoked PN responses before (green) and after (purple) 5HT introduction. One-tailed paired sample t-test was used to compare the two distributions.

      • The authors interchangeably use "serotonin", "5HT" and "5-HT" throughout the manuscript, but this should be consistent.

      This has been fixed in the revised manuscript.

      • On page 2 the authors provide an ecological relevance for linalool as being an additive in pesticides, however, linalool is a common floral volatile chemical. Is the implication that locusts have learned to associate linalool with pesticides?

      Linalool is a terpenoid alcohol that has a floral odor but has also been used as a pesticide and insect repellent [Beier et al., 2014]. As shown in Author response image 2, it evoked the least POR responses amongst a diverse panel of 22 odorants that were tested. We have clarified how we chose odorants based on the prior dataset in the Methods section.

      • In Figure 1, there should be a legend in the figure itself indicating that the black box indicates the absence of POR and the white box indicates presence, rather than just having it in the legend text.

      Done.

      • In Figure 2, the raw data from each animal can be moved to the supplements. The way it is presented is overwhelming and the order of comparisons is difficult to follow.

      Done.

      • For the induction of bursting in PNs by the application of 5-HT, were there any other metrics observed such as period, duration of bursts, or peak burst frequency? The authors rely on ISI, but there are other bursting metrics that could also be included to understand the nature of this observation. In particular, whether the bursts are likely due to changes in intrinsic biophysical properties of the PNs or polysynaptic effects.

      We could use other metrics as the reviewer suggests. Our main point is that the spontaneous activity of individual PNs changed. We have added a new current-injection experiments to show that the PNs output to square pulses of current becomes different after serotonin application (Author response image 1)

      • Were 4-vinyl anisole, 1-nonanol, and octanoic acid selected as additional odors because they had particular ecological relevance, or was it for the diversity of chemical structure?

      These odorants were selected based on both, chemical structure and ecological relevance. The logic behind this was to have a very diverse odor panel that consisted of food odorant – Hexanol, aggregation pheromone – 4-vinyl anisole, sex pheromone – benzaldehyde, acid – octanoic acid, base – ammonium, and alcohol – 1-nonanol. Additionally, we selected these odors based on previous neural and behavioral data on these odorants (Chandak and Raman, 2023, Traner and Raman, 2023, Nizampatnam et al, 2022 & 2018; Saha et al., 2017 & 2013).

      Reviewer #2 (Recommendations For The Authors):

      The electrophysiology dataset combines all performed experiments across all tested different PN-odor pairs. How many odors have been tested in a single PN and how many PNs have been tested for a single odor? This information is not present in the current manuscript. Can the authors exclude that there are odor-specific modulations?

      In total, our dataset includes recordings from 19 PNs. Seven PNs were tested on a panel of seven odorants (4-vinyl anisole, 1-nonanol, octanoic acid, Hex, Bza, Lool, and Amn), and the remaining twelve were tested with the four main odorants used in the study (Hex, Bza, Lool, and Amn). This information has been added to the Methods section

      How did the authors choose the concentrations of serotonin injections and bath applications - is this a naturalistic amount?

      The serotonin concentration for ephys experiments was chosen based on trial-error experiments:

      0.01mM was the highest concentration that did not cause cell death. For the behavioral experiments, we increased the concentration (0.1 M) due to the presence of anatomical structures in the locust's head such as air sacks, sheath as well as hemolymph which causes some degree of dilution that we cannot control.

      Behavior experiments were performed 3 hours after injection - ephys experiments 5-10 minutes following bath application. Can the authors exclude that serotonin affects neural processing differently on these different timescales?

      We cannot exclude this possibility. We did ePhys experiments 5-10 minutes after bath application as it would be extremely hard to hold cells for that long.

      A longer delay was required for our behavioral experiments as the locusts tended to be a bit more agitated with larger spontaneous movements of palps as well as exhibited unprompted vomiting. A 3hour period allowed the locust to regain its baseline level movements after 5HT introduction. [This information has been added to the methods section of the revised manuscript]

      Concerning the analysis of electrophysiological data. The authors should correct for changes in the baseline before performing PCA analysis. And how much of the variance is explained by PC1 and PC2?

      We did not correct for baseline changes or subtract baseline as we wanted to show that the odor-evoked neural responses still robustly encoded information about the identity of the odorant.

      The authors should perform dye injections after recordings to visualize the cell type they recorded from. Serotonin might affect also other cell types in the antennal lobe.

      As mentioned above, in the locust antennal lobe only PNs fire full-blown sodium spikes, and LNs only fire calcium spikelets (Author response image 4). Since these signals are small, they will be buried under the noise floor when using extracellular recording electrodes for monitoring responses in the AL antennal lobe.

      Hence we are pretty certain what type of cells we are recording from.

      There were several typos in the manuscript, please check again.

      We have fixed many of the grammatical errors and typos in the revised version.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1:

      (1) In cardiac and renal transplantation, cold preservation in ice remains a common practice for transporting explanted hearts to donors which remains a cheap and easily accessible way of preserving organs. While ex-vivo mechanical circulatory platforms have been developed and are increasingly being utilized to prolong organ viability, cold preservation remains widely used. The authors perfused explanted hearts with oxygenated perfusion preservation devices at subnormothermic temperatures (20-23C) which is even much lower than routinely used in clinical cardiopulmonary bypass scenarios (28-32C) (in the discussion, the authors allude to SNC80's possible "protective effect" in cardiac bypass). It is unclear how much of the hypometabolic state is related to WB3 administration versus hypothermia. The study will benefit from a comparison of WB3 administration and hypothermia in Xenopus, explanted porcine organs versus cold preservation alone to show distinction in biostasis parameters.

      Indeed, we expect that both pharmaceutical interventions and cooling could contribute to a hypometabolic state. To assess this, the controls and the treated groups were exposed to the same temperatures for both the Xenopus (18C) and porcine heart experiments (20-23C). Therefore, we can conclude that any changes in the treatment group relative to control can be attributed to the introduction of SNC80 or WB3 and not from cooling alone.  

      (2) The authors selected SNC80 based on a literature survey where it was identified based on its ability to induce hypothermia and protect against the effects of spinal cord ischemia in rodents. While this makes sense, were other drugs (eg. Puerarin) considered? The induction of hypothermia and spinal cord protective effect of SNC80 may be multifactorial and not necessarily related to its biostatic effects as the authors describe. Please provide some more context into the background of SNC80.

      During our research program, we considered and tested other drugs (>100 existing compounds in Xenopus screens). Although the published hypothermic and tissue protective effects suggested to us that SNC80 should be included in screening, it was not until we observed effects across multiple test parameters, systems, and species that we honed in on SNC80 as a lead compound. We have added additional information to further clarify the background of SNC80 on pgs. 3-4. 

      (3) In most of the models, the primary metric that the authors utilize to characterize metabolic activity is oxygen consumption, which is a somewhat limited indicator. For instance, this does not provide any information, however, on anaerobic metabolic activity. In addition, the ATP/ADP ratio was found to decrease in the organ chips where SNC80 was utilized, but similar findings were not presented for the other models. 

      We thank reviewers for their important point. We have therefore added additional experiments, including the Seahorse Mitostress assay for the four human cell types (Caco-2, Huh7, LSEC and HUVEC) used in the Organ Chip systems. We have added a description and an interpretation of the results in the section, Stasis induction in cultured human cells and tissues and mention the role of glycolysis and cytosolic reductive carboxylation as compensatory mechanisms.  Although the ATP/ADP ratio gave us useful insight into Huh-7 cells and chips metabolic activity, this method requires transfection and live imaging which does not suit other models such as Xenopus, or whole organs. Additionally, in animal models there may be other confounding factors that might influence ATP/ADP.

      (4) The authors should provide a more detailed explanation of SNC80's mechanisms of interaction with proteins related to transmembrane transport, mitochondrial activity, and metabolic processes. What is the impact of SNC80 on mitochondrial function, particularly ATP production and mitochondrial respiration? Are there changes in mitochondrial membrane potential, electron transport chain activity, or oxidative phosphorylation? In this context, the authors discuss the potential role of NCX1 as a binding target for SNC80 and its various mechanisms in slowing metabolism. However, no experiments have been done to confirm this binding in the present study. Coimmunoprecipitation studies using appropriate antibodies against SNC80 and NCX1 should be considered to demonstrate their direct binding. Additionally, surface plasmon resonance (SPR) or isothermal titration calorimetry (ITC) experiments could be employed to quantify the binding affinity between SNC80 and NCX1, providing further evidence of their interaction. These experiments would elucidate the binding mechanism between SNC80 and NCX1 and reveal more information on the mechanism of action for SNC80. 

      We agree that further definition of the mechanism of action is an important next step for this work; however, it is far beyond the scope of the present study.

      (5) The manuscript notes that histological analysis was conducted, but it seems that only example images are provided, such as Figure 4f. Quantified histological data would provide a more thorough understanding of tissue integrity. 

      We have added quantified histological data to the manuscript that was performed by a clinician blinded to the groups and interventions (Figure 4f).

      (6) Some of the points mentioned in the discussion and conclusion are rather strong and based on possible associations such as SNC80's potential vasodilatory capacity conferring a cardioprotective effect, and ability to reversibly suppress metabolism across different temperatures and species. Please tone this down and stay limited to the organs studied. Further, the reversibility of the findings may be more objectively assessed by biomarkers with decreased immunofluorescence in response to ischemia such as troponin I for the heart and albumin for the liver. Additionally, an investigation of proteins involved in inflammation, hypoxia, and key cell death pathways using immunohistochemistry analysis can better describe the impact of treatment on apoptosis/necroptosis. 

      We have revised aspects of the Discussion and Conclusion to focus on the organs studied in the present work (pgs. 14-17). We agree that markers of inflammation, hypoxia, and cell death are critical for assessing tissue health post-treatment. We performed PCR to assess such markers (Figure 4e) and found reductions in inflammatory cytokine and injury biomarker levels. Although we agree that immunohistochemistry may be useful, such as for looking at any spatial patterns of injury, PCR offers broader dynamic range and higher sensitivity and therefore was chosen for this assay.

      (7) What could be the underlying cause of the observed increase in intercellular spacing after SNC80 administration in porcine limbs which also seems to be evident in the heart histology samples? This seems to be more prominent in the SNC80 compared to the vehicle group. 

      Since the muscle bundle areas of baseline and treated tissues were essentially the same, the increase in intracellular space in the SNC80-treated tissue suggests a compensatory reduction in muscle fiber diameter.  Intracellular metabolite concentrations have been shown to be quite stable over a large range of metabolic activities (Hochachka et al. 1998). As such, a reduction in metabolic activity induced by SNC80 may suggest reduction in the accumulation of intracellular metabolites. In order to maintain a stable intracellular metabolite concentration, water would have to be expelled accounting for the increased intracellular space.

      P W Hochachka, G B McClelland, G P Burness, J F Staples, R K Suarez Comp Biochem Physiol B Biochem Mol Biol 120, 17–26 (1998).

      (8) In the Discussion section, it would be valuable to provide a concise interpretation of the lipidomic data, particularly explaining how changes in acylcarnitine and cholesterol ester levels may relate to tadpole metabolism, hibernation, or other biological processes. 

      An interpretation of the lipidomics data has been summarized in the Discussion (pg. 14).

      (9) What are the limitations or disadvantages of the study? Does SNC80 possess any immunomodulatory properties that might affect the outcomes of organ transplantation? Are there specific organs for which SNC80 may not be a suitable preservation agent, and if so, what are the reasons behind this? 

      This study is limited in two ways. The first is that we characterized the function of the donor pig heart outside of the body, and therefore future work will be required to verify the function and quality of the hearts after they have been transplanted. Secondly, SNC80 is not currently approved for use in clinical settings and during earlier pre-clinical trials of the drug, side effects including seizures were noted and its development was halted. It is hypothesized that these seizures are related to SNC80’s delta opioid activity, so we developed a new, non-opioid analog called WB3, which will be used in future work. We have added a description of the prior seizure findings to the text (pg. 5).

      Based on assessment of tissue biomarkers by PCR, it seems that SNC80 does exhibit immunomodulating properties. Because organ transplant recipients are treated with strong immunosuppressants to prevent organ rejection, we anticipate that SNC80 would either further support this goal, have little additional effect, or reduce the amount of additional immunosuppressive drugs that would need to be administered. To date, our data does not suggest that there are specific organs for which SNC80 may not be a suitable preservation agent.     

      Reviewer #2:

      (1) The authors developed an analog of a known delta opioid receptor activator SNC80 with three orders of magnitude lesser binding with the delta opioid receptor WB3. This will likely reduce the undesirable effects of SNC80 while preserving the metabolic slowing needed for organ preservation. Yet, most experiments were done with SNC80, not the superior modification, WB3, shown in only a limited set of experiments, Figure 3.  

      We included the WB3 studies in Xenopus to confirm that the biostatic activity is not mediated through the delta opioid receptor. We have only performed a limited number of experiments with WB3 because we are focused on improving its solubility so that it can be easily dissolved in common organ perfusates without DMSO, which we were able to use in the Xenopus experiments. 

      (2) The heart is one of the most challenging organs to preserve, and some experiments are done to establish the metabolic effects of SNC80. However, the biodistribution study, shown in Figure 2, conspicuously omitted the heart. 

      Thank you for this suggestion. We returned to the biodistribution study dataset and were able to measure uptake by the heart at the 1-hour time point. We observe an increase in uptake above levels observed for other tissues at 1 hour and at levels similar to the skeletal muscle at 2 hours (plot below). Unfortunately, the heart was not visible in a sufficient number of Xenopus tissue sections to reevaluate uptake at the 2-hour time point. We were also able to re-evaluate the lipidomics data for the heart. Acylcarnitine and cholesterol ester were not significantly different between vehicle and SNC80-treated groups. The lack of change in acylcarnitine is particularly important since its upregulation has been shown to be a marker for cardiovascular disease in humans (Deda et al. 2022). The expanded lipidomics data have been added to Figure 2.

      Deda O, Panteris E, Meikopoulos T, Begou O, Mouskeftara T, Karagiannidis E, Papazoglou AS, Sianos G, Theodoridis G, Gika H. Correlation of serum acylcarnitines with clinical presentation and severity of coronary artery disease. Biomolecules. 2022 Feb 23;12(3):354.

      Author response image 1.

      (3) I do not understand the design of the electrophysiology and contractility experiments with the porcine hearts. How did you defibrillate the hearts after removal and establishing perfusion? Lines 173-175 on Page 7 state: "After defibrillation with epinephrine, the P and QRS waveforms were visible in ECGs from 3 of 4 SNC80-treated hearts (Table S1), suggesting that those hearts regain atrial and ventricular polarization." Please clarify.

      Defibrillation is done with an electric shock. Also, please show the ECG recordings to support your conclusions about "polarization." What did you mean by "polarization"? Depolarization? Repolarization? Or resting potential. To establish a normal physiological state, please show ECG waveforms and present data on basic ECG characteristics: heart rate, PQ and QT intervals, and P and QRS durations. I recommend perfusion of the porcine heart with WB3, not only SNC80.  

      Hearts were defibrillated by the application of a 10 to 30 Joule electrical shock delivered from internal paddles positioned at the right atrium (negative) across to the left ventricle (positive). Once rhythm was established, 0.5 ml of 1:1000 epinephrine was administered via the aortic inflow. Electrocardiogram (ECG) showed that both vehicle and SNC80-treated hearts exhibited irregular contractions after perfusate flush and during rewarming prior to defibrillation. After defibrillation (10-30 J electrical shock) followed by epinephrine, a regular heartbeat was established in 3 of 4 SNC80-treated hearts, exhibiting normal P and QRS waveforms (Table S1). That observation suggested that the intrinsic atrial and ventricular muscle fiber contractility was preserved, and the overall conduction system of the heart was viable. The pulse rates of SNC80-treated hearts were at or near normal for porcine hearts (70-120 beats/min) after defibrillation. Vehicle-treated hearts exhibited tachycardia following defibrillation, with all exhibiting pulse rates above the normal range for porcine hearts. We have added clarifying text and definitions (pg. 8). We have only performed a limited number of experiments with WB3 because we are focused on improving its solubility so that it can be easily dissolved in common organ perfusates without DMSO, which we were able to use in the Xenopus experiments.

      (4) Pathology data also raises concerns. The histology images shown in Figure 4f are not quantified, and they show apparently higher levels of tissue disruption in SNC80-treated tissue vs vehicle-treated. The test (lines 169-171) confirms this concern: "In some hearts treated with SNC80, greater waviness of muscle fibers was observed, possibly indicating a state of muscle contraction."  

      The histology images shown in Figure 4f were quantified and the myocardial injury score quantification show comparable histology between the groups.

      (5) The apparent state of contracture suggests a higher degree of myocardial damage and a high intracellular calcium level in SNC80-treated hearts. 

      The authors suggested that the sodium-calcium exchanger NCX is a possible target of SNC80 and could be responsible for the "hypometabolic state." However, NCX1 is critically important in the extrusion of cytosolic Ca2+ during the diastolic phase. Failure to remove excessive calcium and restore ionic homeostasis would lead to calcium overload and heart failure. 

      The histological assessment doesn’t indicate a higher degree of myocardial damage in SNC80 treated hearts. Our data are not suggestive of high intracellular calcium buildup in SNC80treated hearts. If that were the case, we would have had challenges restoring the rhythm of the hearts on the Langendorff post-preservation, which was not observed.

      (6) I am surprised the authors did not consider using the gold standard assay for measuring mitochondrial function in cells by the Seahorse Cell Mito Stress Test. 

      Thank you for this important point. We have added data from the Seahorse Mitostress assay for the four human cell types (Caco-2, Huh7, LSEC and HUVEC) included in the Organ Chip experiments. We have added a description and an interpretation of the results in the section Stasis induction in cultured human cells and tissues. We now mention the role of glycolysis and cytosolic reductive carboxylation as compensatory mechanisms.   

      Reviewer #3:

      (1) The authors perform a literature search to identify SNC80 as a promising hit. However, the details of the literature search, a list of other potential hits, and the criteria for identification of SNC80 are not described. The hypometabolic effect of SNC80 exposure is well-characterized in the Xenopus model. Furthermore, the authors show that SNC80 localises to the brain, but do not discuss several studies that have pointed to convulsions induced by exposure to high doses of SCN80, and whether this would be apparent in the Xenopus studies. The authors have promising data on the WB3 morpholino that retains or even improves on the hypometabolism phenotype of SCN80 while likely not retaining delta opioid activity. However, this is not functionally demonstrated. Moreover, WB3 is not used in any of the other assays and models used in the study. In the setting of cardiac transplant surgery, co-administration of SNC80 reduces metabolic activity and inflammation, although it is unclear if there is an improvement in recovery of organ function due to SCN80.

      Thank you for raising these important points. We have added details of the process to identify SNC80 (pgs. 3-4) and a discussion of the studies pointing to convulsions with high doses of SNC80 (pg. 5) (which were not observed in Xenopus studies). We have also incorporated measurements of oxygen consumption during WB3 treatment in Xenopus (Figure 3d).

      (2) The reversible induction of hypometabolic status is also demonstrated in two different organ chips. These models could identify the differential response of epithelial cells and vascular cells to drug perfusion, but the authors have mostly focused on the former. Finally, the authors identify specific targets for the hypometabolic effect of SNC80, which is a valuable resource for other screening studies and can form the basis for future work. 

      In the revised manuscript, we have also added data from the Seahorse Mitostress assay for the four human cell types (Caco-2, Huh7, LSEC and HUVEC). We have added a description and an interpretation of the results in the section Stasis induction in cultured human cells and tissues. We highlight the differences in metabolic response from the four cell types to SNC80 treatment. It is important to note that the metabolism-suppressing effects of SNC80 were most potent in the epithelial cells that were originally derived from highly metabolic tumors (Caco-2 and Huh7) versus primary normal endothelial cells (HUVEC and LSEC), which is also consistent with past work suggesting that targeting of the NCX1 channel might offer a way to slow tumor growth (Wan et al. 2022). Because we observed more prominent effects in epithelial cells in 2D assays, we decided to focus the 3D organ chips assays on epithelial cells.

      Wan, H. et al. NCX1 coupled with TRPC1 to promote gastric cancer via Ca2+/AKT/β-catenin pathway. Oncogene (2022) doi:10.1038/s41388-022-02412-9.

      Recommendations for the authors:

      Reviewer #1:

      (1) Line 136, "Based on these intriguing findings with human Organ Chips". No mention of human organ chips was made in the text at this point, suggest rewording.  

      Thank you for identifying this error. We have revised this line (pg. 6).

      (2) Please provide more information on previous studies that have explored other drugs for organ protection, the novelty of the findings of this study, and how the findings of this study compare to prior data. 

      Building on the background of organ preservation drugs provided in the Introduction, we have added details to compare our outcomes to other drugs explored for organ protection (pg. 15).

      (3) The dosing study in Supplemental Figure S1 provides some context on why the authors utilized the 100 uM SNC80 concentration. It would be helpful if the authors could elaborate in the Discussion on the mechanistic rationale for this concentration. 

      This dose was chosen to maximize suppression of metabolic and activity parameters, while ensuring reversibility of biostasis. We have clarified this in the Discussion (pg. 14).

      (4) In Supplement Figure S2a, the y-axis measures the relative metabolic rate. It seems from the text that this is a relative measure of oxygen consumption, but it should be clarified accordingly. 

      We have clarified this point in the Methods section.  

      (5) What is the specific time or time frame when the reversed effect of SNC80 is most pronounced or at its peak? 

      When Xenopus are moved to fresh medium after SNC80 treatment, we observe a 15-minute period during which no reversal is evident from motion measurements. After that period, we observe a gradual, linear recovery over 2 hours. We cannot designate a specific period in which the reversal effect is most pronounced from these data.

      (6) WB3 seems to show a faster and stronger impact on swimming in comparison to SNC80. What could be the potential reasons for this difference, and could this have any clinical implications? 

      From our current data, we understand the key difference to be that SNC80 has greater affinity for the delta opioid receptor compared to WB3. Therefore, we hypothesize that by not interacting with the opioid system, WB3 induces faster and stronger impacts on swimming. In mice, it has been shown that SNC80 directly inhibits forebrain GABAergic neurons via activity at their delta opioid receptors, which leads to convulsions (Chung et al. 2015). Although we do not observe seizure-like behavior in Xenopus, drugs that inhibit GABAergic neurons can produce stimulant effects in vivo. Since WB3 has a lower affinity for the delta opioid receptor, it likely produces less stimulation, leading to faster and stronger suppression of swimming behaviors. Additionally, it is possible that WB3 interacts with additional targets we have not yet identified.

      Chung PC, Boehrer A, Stephan A, Matifas A, Scherrer G, Darcq E, Befort K, Kieffer BL. Delta opioid receptors expressed in forebrain GABAergic neurons are responsible for SNC80-induced seizures. Behavioural brain research. 2015 Feb 1;278:429-34.

      (7) Elaborate on the potential significance of SNC80's distribution in the GI tract, gill region, and skeletal muscle. How might this distribution relate to the observed physiological effects? 

      In Xenopus tadpoles, we observe SNC80 uptake in the gill region and GI tract within 1 hour. The multiple possible routes of uptake in Xenopus (skin, gills, and mouth) may account for the relatively rapid physiological effects observed in our experiments. The uptake observed in the muscle may be specifically responsible for the slowed motion observed in Xenopus activity assays. This has been elaborated upon in the text (pg. 5).

      (8) Please use italics where needed, e.g., in vitro, in vivo, etc. 

      This has been updated throughout the article.

      (9) Supplemental Figure S1 - Is there any reason for having 3 replicates for the 100uM compared to the 4 replicates in the other groups? 

      Each group had 4 replicates; however, a review of the replicates for the 100 µM group suggested the presence of a leak or air bubble in one oxygen measurement vial, which, therefore, had to be excluded from the analysis.

      (10) Figure 3 description - 'c' should be bold. 

      Figure 3 has been updated.

      Reviewer #3:

      Title: The title suggests that several candidate compounds are identified but the study focuses primarily on SCN80. Please consider rephrasing to make it more specific to this molecule. Alternatively, the manuscript would be significantly strengthened if more data is provided for WB3. 

      Although the study focuses on SNC80, we introduce an entirely novel molecule, WB3, and therefore, we feel it is more appropriate to indicate that multiple molecules were studied.

      Line 58-59: please cite additional primary literature papers for the different therapeutics discussed. As an example, the authors do not cite or discuss Massen et al PMID: 31743376 which suggests that H2S is able to induce similar hypometabolic effects even at 37C. 

      Thank you for this suggestion. We have expanded our discussion of primary literature paper for the therapeutics discussed (pg. 15).

      Line 76 - 77: The authors do not provide any data on the other possible hits from their literature search or methods details on how this was done. No relevant literature has been cited. What criteria were used to finalise SNC80? 

      During our research program, we considered and tested other drugs (>100 existing compounds in Xenopus screens). Although the published hypothermic and tissue-protective effects suggested that SNC80 should be included in screening, it was not until we observed effects across multiple test parameters, systems, and species that we honed in on SNC80 as a lead compound. We have added additional information to further clarify the background of SNC80 on pgs. 3-4.  

      Line 85 and Lines 342-345 in the Discussion: SNC80 is reported to induce convulsions at high doses in rodents and primates - was this also evident in the Xenopus studies? How does the dose used in the Xenopus studies compare with the high dose (ca. 10 mg/kg) used in primate studies Danielson et al., PMID: 17112570? 

      We did not observe convulsions in SNC80-treated Xenopus. However, we have updated the manuscript to include previous observations of convulsions in rodents and primates treated with SNC80 (pg. 5). Due to a number of differences, it is challenging to directly compare the dosing in Xenopus studies to those in the primate. In the present study, groups of 10 Xenopus were exposed to a 10 mL pool of 100 µM SNC80, which may be absorbed via oral, gill, and skin routes. Primates were dosed with 10 mg/kg delivered intramuscularly. Because these models may result in different drug biodistributions, any direct comparisons would be speculative. Further work in rodent models may help clarify the relevant dosing differences.

      Line 117: what does 'double the concentration' mean? Is this with reference to the dose of SNC80? If so, is this sufficient to completely block opioid receptor activity? 

      Yes, we meant that naltrindole was dosed at double the concentration of SNC80. We have clarified this in the text (pg. 5). Prior work in rodent brain tissue has shown that radiolabeled naltrindole binds to saturation at picomolar to nanomolar concentrations (Yamamura et al. 1992). To confirm our initial observations with naltrindole and SNC80, we also tested a SNC80 analog (WB3) with very low delta opioid activity (Figure 3), which showed similar effects.

      Yamamura MS, Horvath R, Toth G, Otvos F, Malatynska E, Knapp RJ, Porreca F, Hruby VJ, Yamamura HI.

      Characterization of [3H] naltrindole binding to delta opioid receptors in rat brain. Life sciences. 1992 Jan 1;50(16):PL119-24.

      Figure 3c, d: It appears that WB3 is even more effective at rapidly reducing motion and inducing faster recovery which is an exciting result. However, in 3d it appears that longterm exposure of 8h has detrimental effects since the heart rate remains depressed. Please clarify. 

      Yes, at 8 hours, we observe slow recovery and, in some cases, maintenance of depressed heart rates. This could be because the drug is more lipophilic and might remain in fat tissue for longer times. Although our current goal is to lengthen the time window for heart transplant surgery to 6 hours, we are working on formulating WB3 to optimize safety for longer applications (8+ hours).

      Figure 4: the experiments with the heart transplants are well done, but do not demonstrate an additional protective effect over the current standard of care except for the reduced metabolism. Could the authors discuss this further in the discussion or provide data with WB83, which may show a stronger effect? Scale bars are missing in panel f.  

      In addition to reduced metabolism, we also demonstrate reduced expression of inflammation, hypoxia, and cell death-related markers compared to machine perfusion alone (Figure 4e). The potential protective effect of the biostasis-inducing compounds will be further investigated in a planned orthotopic porcine transplant study where pigs will be followed up for 6 hours post weaning off a bypass machine allowing enough time to assess potential benefit of biostasisinducing drugs. Additionally, we have added scale bars (Figure 4f).

      Order of manuscript: Line 136 already refers to the organ-chip data, which is only presented at the end. Please edit. I feel the manuscript would flow better with the organchip data presented before the heart transplant data. 

      Organ-chip data: this is an important component of the story but is only shown in supplementary figures. Consider showing this data in the main figures, as eLife has no space restrictions. Furthermore, it is unclear if the effluent collected and analysed is from apical or vascular, or both. In any case, the analysis via microscopy-based methods appears restricted to the epithelium. The manuscript would be significantly strengthened by providing some data on the effect of SNC80 on vascular cells. 

      As requested, we have moved the Organ Chips results to a main figure (new Fig. 5). We have added additional experiments, including the Seahorse Mitostress assay for the four human cell types (Caco-2, Huh7, LSEC and HUVEC). We have added a description and an interpretation of the results in the section Stasis induction in cultured human cells and tissues. The 2D assays showed that metabolism-suppressing effects of SNC80 were most potent in the epithelial cells that were originally derived from highly metabolic tumors (Caco-2 and Huh7) versus endothelial cells (HUVEC and LSEC). Based on these results, we decided to focus the 3D organ chips assays on epithelial cells only, and hence only analyzed effluents from the epithelial (apical) channel.

      Methods section for fabrication of oxygen sensors: Please refer to prior papers from your lab (Grant et al., PMID: 35274118) with regards to details of the fabrication of the devices with inbuilt oxygen sensors. 

      The methods used for the fabrication of oxygen sensors will be included in a separate manuscript currently in preparation.  

      Figure S3 and Line 243-244: Please provide the data for untreated control organ chips in panels d and e a mean value for which is quoted in the main text. The images in panel f are too small for the reader to appreciate the point, please provide zooms. Scalebars are also missing from these images. Please increase the number of replicates for S3f - the liver-chip data has only two replicates which has very low power for statistical testing. In general, the number of organ chips used for the data for each panel is missing. 

      As mentioned in the captions, Figure S3 (now Figure S5) panels d and e show average albumin production of Liver Chips at day 7-10 of culture. These measurements were performed before any treatment with SNC80 to characterize the chip’s functional metabolism. In panel g, although we only show biological N=2-3, each datapoint corresponds to an average of multiple fields of view (multiple technical replicates). We have now clarified this in the figure legend.

      Figure S4 - I do not quite understand why the perfusion with the vehicle only also affects oxygen release in the liver chip. Is it possible to use a different vehicle? 

      The liver and gut oxygen levels are not on the same y-axis (gut on the left and liver on the right). The oxygen fold change of the liver control chip is below 0.5, which is in the same range as the gut control chip (0 +/- 0.25). There is a natural variation in oxygen consumption over the lifetime of the chips (now Figure 5c), and untreated cells are metabolically active and consuming oxygen. The small drop observed suggests that liver chips may not have reached a stable oxygen consumption rate at the time of the experiment, whereas the gut chips have stabilized.  

      Figure S5c-f: The units on the Y-axis are missing. 

      Panels S5c-d (now Figure S6c-d) depict the percent cytotoxicity and are thus unitless. Panels S5e-h (now Figure S6e-h) show the effluent levels relative to baseline and are also unitless. We have updated the figure caption to clarify this.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Please find below our detailed point-by-point response to the eLife reviewer comments. As suggested by the reviewers, we have 1) replaced most of the Bar charts by Box plots, 2) highlighted the sucellular regions that are analyzed in the measurement experiments, and 3) have rewritten and toned down several subsections of the discussion.

      Reviewer #1 (Recommendations For The Authors):

      I suggest that the authors consider the following points in future versions of this manuscript:

      1). The link between the striking plant phenotype and GXM misregulation is unclear since GXM overexpression doesn't alter plant phenotypes or lignin content (Yuan et al 2014 Plant Science), so misregulation of GXMs in msil2msil4 mutants clearly is not the whole story. The authors should discuss alternative interpretations of their results and other possible targets of MSIL2/4 that might be contributing to the plant phenotype.

      We completely agree with the reviewer that the misregulation of GXMs in msil2/4 is not the whole story and we are currently developing specific strategies in order to characterize in an unbiased manner the full repertoire of MSIL mRNA targets in the stem, hoping we can identify other targets relevant to the formation of SCW. We have also toned-down our discussion concerning the possible impact of glucuronoxylan methylation level on lignin deposition (L546-552).

      2) Similarly, it remains unclear why one particular secondary cell wall enzyme is regulated post-transcriptionally, while so much of the pathway is regulated at the transcriptional level. Please discuss.

      We do not exclude that other genes encoding for SCW enzymes are impacted and it will be the subject of further investigations. We have extended the discussion concerning these points. We have extended the discussion concerning these points (L486-498).

      3) Thirdly, it seems that MSIL2 and MSIL4 are expressed in tissues that are not synthesizing secondary cell walls. The authors should discuss other possible targets of MSIL2/4 from their work.

      We have extended the discussion concerning the pleiotropic effects of MSIL mutation in Arabidopsis (L 416-425). The variability of the msil2/4 phenotype is so large that we expect these proteins to regulate various cellular functions through the binding of specific set of mRNA. The mRNA targets specifically involved in these regulations will need to be determined on a case-by-case basis.

      4) The discussion is extremely speculative and introduces new abbreviations (LTAc, XTRe) that are only used in their model (Figure 7). I suggest replacing these with dashed lines and/or question marks in the model, since as currently depicted, it looks as if these could be known gene products, which could be very misleading.

      We have removed the Ltac and XTRe abbreviations in Figure 7, and the corresponding text in the discussion section.

      5) Similarly, the speculation that cellulose content somehow regulates glucuronoxylan levels via xylan-cellulose interactions, leading to degradation of excess glucuronoxylan after synthesis is, to my knowledge, completely unsupported by any evidence except the correlation between cellulose and xylan levels. Please either support this claim with references or remove it from the discussion.

      We have removed the claim and have rewritten and toned down the text accordingly to the reviewer 1 comments (L 499-512).

      6) Bar charts are rarely the most appropriate method for displaying biological data (Streit & Gehlenborg 2014 Nature Methods). Authors should replace bar charts with one of the following options: A) plot all individual datapoints and overlay summary statistics, B) box plots with all individual datapoints show, C) violin plots (when n is large, i.e. n > 50). R and R studio are free software that can generate such plots. Several excellent tools exist online to generate such plots via a free, graphical user interface, such as boxplotr (Spitzer et al 2014 Nature Methods): http://shiny.chemgrid.org/boxplotr/ and PlotsOfData (Postma & Goedhart 2019 PLoS Biology): https://huygens.science.uva.nl/PlotsOfData/

      We have replaced the Bar charts in figure 4E,G and Fig 5E with Box plots and acknowledged the software used in the corresponding Materials and methods section.

      Reviewer #2 (Recommendations For The Authors):

      Minor points:

      Which cells from Fig. 4b were measured for 4c? Some highlighted annotations to delineate the regions that were measured would help.

      We have highlighted in figure 4B the subcellular regions cells analyzed in the measurement experiments.

      In line 254, the phrase "not merely affected" in the mutant should be rephrased for clarity

      We have replaced “not merely affected” by “not significantly” (L274).

      Line 317: "we first performed glycome profiling", the data shows monosaccharide profile, not glycome profiling usually involving antibodies microarrays

      We have corrected the text according to the reviewer comment (L339-340).

      Reviewer #3 (Recommendations For The Authors):

      Altogether, the study shows clear biological relevance of the MSL family of RNA-binding proteins, and provides good arguments that the underlying mechanism is control of mRNAs encoding enzymes involved in secondary cell wall metabolism (although concluding on translational control in the abstract is perhaps saying too much - post-transcriptional control will do given the evidence presented). One observation reported in the study makes it vulnerable to alternative interpretation, however, and I think this should be explicitly treated in the discussion:

      The fact that immune responses are switched on in msl2/4 mutants could also mean that MSL2/4 have biological functions unrelated to cell wall metabolism in wild type plants, and that cell wall defects arise solely as an indirect effect of immune activation (that is known to involve changes in expression of many cell wall-modifying enzymes and components such as pectin methylesterases, xyloglucan endotransglycosylases, arabinogalactan proteins etc. Indeed, the literature is rich in examples of gene functions that have been misinterpreted on the basis of knockout studies because constitutive defense activation mediated by immune receptors was not taken into account (see for example Lolle et al., 2017, Cell Host & Microbe 21, 518-529).

      With the evidence presented here, I am actually close to being convinced that the primary defect of msl2/msl4 mutants is directly related to altered cell wall metabolism, and that defense responses arise as a consequence of that, not the other way round. But I do not think that the reverse scenario can be formally excluded with the evidence at hand, and a discussion listing arguments in favor of the direct effect proposed here would be appropriate. Elements that the authors could consider to include would be the isolation of a cellulose synthase mutant as a constitutive expressor of jasmonic acid responses (cev1) as a clear example that a primary defect in cell wall metabolism can produce defense activation as secondary effect. The interaction of MSL4 with GXM1/3 mRNAs is also helpful to argue for a direct effect, and it would strengthen the argument if more examples of this kind could be included.

      In accordance to Rev3 comments, we have extended the discussion, listing the arguments, that we believe, are not in favor of a primary effect of the MSIL2/4 proteins on the activation of plant defense pathways (L468-485).

      SUGGESTIONS FOR IMPROVED ANALYSES & MINOR TEXT AND FIGURE CORRECTIONS.

      (1) Unless there is a very good reason to use homology modelling such as SWISS-MODEL (for example ligand-bound proteins), Alphafold2 is now the tool to use for structure prediction. I would at least verify that Alphafold agrees with SWISS-MODEL on the predicted structures shown in Fig 2a.

      We have analyzed the MSIL4 sequence using the Alphafold2 prediction software and the output of this analysis completely agrees with the SWISS-Model prediction. We have added an additional panel showing the Alphafold 2 prediction (see figure 2-figure supplement 1B).

      (2) The plant pictures shown in Figure 2d are not publication quality in terms of resolution, mounting, size. They really should be redone before final publication.

      We thank the reviewer for this important observation, and have improved the resolution of the figure 2D.

      (3) The colocalization in Figure 3d/e would benefit from some statistical analysis of the data: How many foci were examined? How many showed colocalization? Is that fraction statistically significant? It can be done from the images at hand; I do not think that additional data acquisition is necessary.

      We have used an ImageJ plugin to perform colocalization analysis on the microscopy images corresponding to the bottom panel of the figure 3D (heat stress). This analysis confirmed that most of the foci are actually colocalizing (see Author response image 1). However our initial image data acquisition do not allow us to perform statistical analysis on it. We have added a sentence indicating that colocalization is supported by an analysis using an ImageJ plugin.

      Author response image 1.

      4) Typographical and other writing errors:

      Line 72 "prior to"

      Line 77 "in the Arabidopsis model"

      Line 97 "RBP-mediated..."

      Line 110 "aspects of development"

      Line 128 "little is known" (no yet)

      Line 253 "Col-0"

      Line 346 "previous"

      All the writing errors have been corrected in the revised version.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Most studies in sensory neuroscience investigate how individual sensory stimuli are represented in the brain (e.g., the motion or color of a single object). This study starts tackling the more difficult question of how the brain represents multiple stimuli simultaneously and how these representations help to segregate objects from cluttered scenes with overlapping objects.

      Strengths

      The authors first document the ability of humans to segregate two motion patterns based on differences in speed. Then they show that a monkey's performance is largely similar; thus establishing the monkey as a good model to study the underlying neural representations.

      Careful quantification of the neural responses in the middle temporal area during the simultaneous presentation of fast and slow speeds leads to the surprising finding that, at low average speeds, many neurons respond as if the slowest speed is not present, while they show averaged responses at high speeds. This unexpected complexity of the integration of multiple stimuli is key to the model developed in this paper.

      One experiment in which attention is drawn away from the receptive field supports the claim that this is not due to the involuntary capture of attention by fast speeds.

      A classifier using the neuronal response and trained to distinguish single-speed from bi-speed stimuli shows a similar overall performance and dependence on the mean speed as the monkey. This supports the claim that these neurons may indeed underlie the animal's decision process.

      The authors expand the well-established divisive normalization model to capture the responses to bi-speed stimuli. The incremental modeling (eq 9 and 10) clarifies which aspects of the tuning curves are captured by the parameters.

      We thank the Reviewer for the thorough summary of the findings and supportive comments.

      Weaknesses

      While the comparison of the overall pattern of behavioral performance between monkeys and humans is important, some of the detailed comparisons are not well supported by the data. For instance, whether the monkey used the apparent coherence simply wasn't tested and a difference between 4 human subjects and a single monkey subject cannot be tested statistically in a meaningful manner. I recommend removing these observations from the manuscript and leaving it at "The difference between the monkey and human results may be due to species differences or individual variability" (and potentially add that there are differences in the task as well; the monkey received feedback on the correctness of their choice, while the humans did not.)

      Thanks for the suggestion. We agree and have modified the text accordingly. We now state on page 8, lines 189-191, "The difference between the monkey and human results may be due to species differences or individual variability. The differences in behavioral tasks may also play a role – the monkey received feedback on the correctness of the choice, whereas human subjects did not."

      A control experiment aims to show that the "fastest speed takes all" behavior is general by presenting two stimuli that move at fast/slow speeds in orthogonal directions. The claim that these responses also show the "fastest speed takes all" is not well supported by the data. In fact, for directions in which the slow speed leads to the largest response on its own, the population response to the bi-speed stimulus is the average of the response to the components (This is fine. One model can explain all direction tuning curve, which also explain averaging at the slower speed stronger directions). Only for the directions where the fast speed stimulus is the preferred direction is there a bias towards the faster speed (Figure 7A). The quantification of this effect in Figure 7B seems to suggest otherwise, but I suspect that this is driven by the larger amplitude of Rf in Figure 8, and the constraint that ws and wf are constant across directions. The interpretation of this experiment needs to be reconsidered.

      The Reviewer raised a good question. Our model with fixed weights for faster and slower components across stimulus directions provided a parsimonious explanation for the whole tuning curve, regardless of whether the faster component elicited a stronger response than the slower component. Because the model can be well constrained by the measured direction-tuning curves, we did not restrain 𝑤 and 𝑤 to sum to one, which is more general. The linear weighted summation (LWS) model fits the neuronal responses to the bi-speed stimuli very well, accounting for an average of 91.8% (std = 7.2%) of the response variance across neurons. As suggested by the Reviewer, we now use the normalization model to fit the data with fixed weights across all motion directions. The normalization model also provides a good fit, accounting for an average of 90.5% (std = 7.1%) of the response variance across neurons.

      Note that in the new Figure 8A, at the left side of the tuning curve (i.e., at negative vector average (VA) directions), where the slower component moving in a more preferred direction of the neurons than the faster component, the bi-speed response (red curve) is slightly lower than the average of the component response (gray curve), indicating a bias toward the weaker faster component. Therefore, the faster speed bias does not occur only when the faster component moves in the more preferred direction. This can also be seen in the direction-tuning curves of an example neuron that we added to the figure (new Fig. 8B). The peak responses to the slower and faster component were about the same, but the neuron still showed a faster-speed bias. At negative VA directions, the red curve is lower than the response average (gray curve) and is biased toward the weaker (faster) component.  

      The faster-speed bias also occurs when the peak response to the slower component is stronger than the faster component. As a demonstration, Author response image 1 1 shows an example MT neuron that has a slow preferred speed (PS = 1.9 deg/s) and was stimulated by two speeds of 1.2 and 4.8 deg/s. The peak response to the faster component (blue) was weaker than that to the slower component (green). However, this neuron showed a strong bias toward the faster component. A normalization model fit with fixed weights for the faster and slower components (black curve) described the neuronal response to both speeds (red) well. This neuron was not included in the neuron population shown in Figure 8 because it was not tested with stimulus speeds of 2.5 and 10 deg/s.

      Author response image 1.

      An example MT neuron was tested with stimulus speeds of 1.2 and 4.8 deg/s. The preferred speed of this neuron was 1.9 deg/s. Fixed weights of 0.59 for the faster component and 0.12 for the slower component described the responses to the bispeed stimuli well using a normalization model. The neuron showed a faster-speed bias although its peak response to the slower component was higher than that of the faster component.

      We modified the text to clarify these points:

      Page 19, lines 405 – 410, “The bi-speed response was biased toward the faster component regardless of whether the response to the faster component was stronger (in positive VA directions) or weaker (in negative VA directions) than that to slower component (Fig. 8A). The result from an example neuron further demonstrated that, even when the peak firing rates of the faster and slower component responses were similar, the response elicited by the bi-speed stimuli was still biased toward the faster component (Fig. 8B). ”

      Page 19, lines 421 – 427, “Because the model can be well constrained by the measured direction-tuning curves, it is not necessary to require 𝑤 and 𝑤 to sum to one, which is more general. An implicit assumption of the model is that, at a given pair of stimulus speeds, the response weights for the slower and faster components are fixed across motion directions. The model fitted MT responses very well, accounting for an average of 91.8% of the response variance (std = 7.2%, N = 21) (see Methods). The success of the model supports the assumption that the response weights are fixed across motion directions.”

      Reviewer #2 (Public Review):

      Summary:

      This is a paper about the segmentation of visual stimuli based on speed cues. The experimental stimuli are random dot fields in which each dot moves at one of two velocities. By varying the difference between the two speeds, as well as the mean of the two speeds, the authors estimate the capacity of observers (human and non-human primates) to segment overlapping motion stimuli. Consistent with previous work, perceptual segmentation ability depends on the mean of the two speeds. Recordings from area MT in monkeys show that the neuronal population to compound stimuli often shows a bias towards the faster-speed stimuli. This bias can be accounted for with a computational model that modulates single-neuron firing rates by the speed preferences of the population. The authors also test the capacity of a linear classifier to produce the psychophysical results from the MT data.

      Strengths:

      Overall, this is a thorough treatment of the question of visual segmentation with speed cues. Previous work has mostly focused on other kinds of cues (direction, disparity, color), so the neurophysiological results are novel. The connection between MT activity and perceptual segmentation is potentially interesting, particularly as it relates to existing hypotheses about population coding.

      We thank the Reviewer for the summary and comments.

      Weaknesses:

      Page 10: The relationship between (R-Rs) and (Rf-Rs) is described as "remarkably linear". I don't actually find this surprising, as the same term (Rs) appears on both the x- and y-axes. The R^2 values are a bit misleading for this reason.

      The Reviewer is correct that subtracting a common term Rs from R and Rf would introduce correlation between (R-Rs) and (Rf-Rs). To address this concern, we conducted an additional analysis. We showed that, at most speed pairs, the R^2 values between (R-Rs) and (Rf-Rs) based on the data are significantly higher than the R^2 values between (R’-Rs) and (RfRs), in which R’ was a random combination of Rs and Rf. Since the same Rs was commonly subtracted in calculating R^2 (data) and R^2 (simulation), the difference between R^2 (data) and R^2 (simulation) suggests that the response pattern of R contributes to the additional correlation.

      We now acknowledge this confounding factor and describe the new analysis results on page 14, lines 309 – 326. Please also see the response to Reviewer 3 about a similar concern.

      Figure 9: I'm confused about the linear classifier section of the paper. The idea makes sense - the goal is to relate the neuronal recordings to the psychophysical data. However the results generally provide a poor quantitative match to the psychophysical data. There is mention of a "different paper" (page 26) involving a separate decoding study, as well as a preprint by Huang et al. (2023) that has better decoding results. But the Huang et al. preprint appears to be identical to the current manuscript, in that neither has a Figure 12, 13, or 14. The text also says (page 26) that the current paper is not really a decoding study, but the linear classifier (Figure 9F) is a decoder, as noted on page 10. It sounds like something got mixed up in the production of two or more papers from the same dataset.

      We apologize for the confusion regarding the reference of Huang et al. (2023, bioRxiv). We referred to an earlier version of this bioRxiv manuscript (version 1), which included decoding analysis. In the bibliography, we provided two URLs for this pre-print. While the second link was correct, the first URL automatically links to the latest version (version 2), which did not have the abovementioned decoding analysis.

      The analysis in Figure 9 is to apply a classifier to discriminate two-speed from singlespeed stimuli, which is a decoding analysis as the Reviewer pointed out. We revised the result section about the classifier to make it clear what the classifier can and cannot explain (pages 2223, lines 516-534). We also included a sentence at the end of this section that leads to additional decoding analysis to extract motion speed(s) from MT population responses (page 23, lines 541543), “To directly evaluate whether the population neural responses elicited by the bi-speed stimulus carry information about two speeds, it is important to conduct a decoding analysis to extract speed(s) from MT population responses.”

      In any case, I think that some kind of decoding analysis would really strengthen the current paper by linking the physiology to the psychophysics, but given the limitations of the linear classifier, a more sophisticated approach might be necessary -- see for example Zemel, Dayan, and Pouget, 1998. The authors might also want to check out closely related work by Treue et al. (Nature Neuroscience 2000) and Watamaniuk and Duchon (1992).

      We thank the Reviewer for the suggestion and agree that it is useful to incorporate additional decoding analysis that can better link physiology results to psychophysics. The decoding analysis we conducted was motivated by the framework proposed by Zemel, Dayan, and Pouget (1998), and also similar to the idea briefly mentioned in the Discussion of Treue et al. (2000). We have added the decoding analysis to this paper on pages 25-32.  

      What do we learn from the normalization model? Its formulation is mostly a restatement of the results - that the faster and slower speeds differentially affect the combined response. This hypothesis is stated quantitatively in equation 8, which seems to provide a perfectly adequate account of the data. The normalization model in equation 10 is effectively the same hypothesis, with the mean population response interposed - it's not clear how much the actual tuning curve in Figure 10A even matters, since the main effect of the model is to flatten it out by averaging the functions in Figure 10B. Although the fit to the data is reasonable, the model uses 4 parameters to fit 5 data points and is likely underconstrained; the parameters other than alpha should at least be reported, as it would seem that sigma is actually the most important one. And I think it would help to examine how robust the statistical results are to different assumptions about the normalization pool.

      In the linear weighted summation model (LWS) model (Eq. 8), the weights Ws and Wf are free parameters. We think the value of the normalization model (Eq. 9) is that it provides an explanation of what determines the response weights. We agree with the Reviewer that using the normalization model (Eq. 9) with 4 parameters to fit 5 data points of the tuning curves to bispeed stimuli of individual neurons is under-constrained. We, therefore, removed the section using the normalization model to fit overlapping stimuli moving in the same direction at different speeds.

      A better way to constrain the normalization model is to use the full direction-tuning curves of MT neurons in response to two stimulus components moving in different directions at different speeds, as shown in Figure 8. We now use the normalization model (Eq. 9) to fit this data set (also suggested by Reviewer 1), in addition to the LWS model. We now report the median values of the model parameters of the normalization model, including the exponent n, sigma, alpha, and the constant c. We also compared the normalization model fit with the linear summation (LWS) model. We discuss the limitations of our data set and what needs to be done in future studies. The revisions are on page 20, lines 434-467 in the Results, and pages 34-35, lines 818-829 in Discussion.

      Reviewer #3 (Public Review):

      Summary:

      This study concerns how macaque visual cortical area MT represents stimuli composed of more than one speed of motion.

      Strengths:

      The study is valuable because little is known about how the visual pathway segments and preserves information about multiple stimuli. The study presents compelling evidence that (on average) MT neurons represent the average of the two speeds, with a bias that accentuates the faster of the two speeds. An additional strength of the study is the inclusion of perceptual reports from both humans and one monkey participant performing a task in which they judged whether the stimuli involved one vs two different speeds. Ultimately, this study raises intriguing questions about how exactly the response patterns in visual cortical area MT might preserve information about each speed, since such information could potentially be lost in an average response as described here, depending on assumptions about how MT activity is evaluated by other visual areas.

      Weaknesses:

      My main concern is that the authors are missing an opportunity to make clear that the divisive normalization, while commonly used to describe neural response patterns in visual areas (and which fits the data here), fails on the theoretical front as an explanation for how information about multiple stimuli can be preserved. Thus, there is a bit of a disconnect between the goal of the paper - how does MT represent multiple stimuli? - and the results: mostly averaging responses which, while consistent with divisive normalization, would seem to correspond to the perception of a single intermediate speed. This is in contrast to the psychophysical results which show that subjects can at least distinguish one from two speeds. The paper would be strengthened by grappling with this conundrum in a head-on manner.

      We thank the Reviewer for the constructive comments. We agree with the Reviewer that it is important to connect the encoding of multiple speeds with the perception. The Reviewer also raised an important question regarding whether multiple speeds can be extracted from population neural responses, given the encoding rules characterized in this study.

      It is a hard problem to extract multiple stimulus values from the population neural response. Inspired by the theoretical framework proposed by Zemel et al. (1998), we conducted a detailed decoding study to extract motion speed(s) from MT population responses. We used the decoded speed(s) to perform a discrimination task similar to our psychophysics task and compared the decoder's performance with perception. We found that, at X4 speed difference, we could decode two speeds based on MT response, and the decoder's performance was similar to that of perception. However, at X2 speed difference, except at the slowest speeds of 1.25 and 2.5 deg/s, the decoder cannot extract two speeds and cannot differentiate between a bi-speed stimulus and a single log-mean speed stimulus. We have added the decoding analysis to this paper on pages 25-32. We also discuss the implications and limitations of these results (pages 35-36, lines 852-884).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Classifier:

      One question I have is how the classifier's performance scales with the number of neurons used in the analysis. Here that number is set to the number that was recorded, but it is a free parameter in this analysis. Why does the arbitrary choice of 100 neurons match the animals' performance?

      We apologize for the unclearness of this point. The decoding using the classifier was based on the neural responses of 100 recorded MT neurons in our data set. The number of 100 neurons was not a free parameter. We need to reconstruct the population neural response based on the responses of the recorded neurons and their preferred speeds (red and black dots in Figure 9A-E).  

      We spline-fitted the reconstructed population neural responses (red and black curves in Figure 9-E). One way to change the number of neurons used for the decoding is to resample N points along the spline-fitted population responses, using N as a free parameter. However, we think it is better to conduct decoding based on the responses from the recorded neurons rather than based on interpolated responses. We now clarify on page 22, lines 520-522, that we based on the responses of the 100 recorded neurons in our dataset to do the classification (decoding).

      Normalization Model:

      Although the model is phenomenological, a schematic circuit diagram could help the reader understand how this could work (I think this is worthwhile even though the data cannot distinguish among different implementations of divisive normalization).

      Thanks for this suggestion. We agree that a circuit diagram would help the readers understand how the model works. However, as the Reviewer pointed out, our data cannot distinguish between different implementations of the model. For example, divisive normalization can occur on the inputs to MT neurons or on MT neurons themselves. The circuit mechanism of weighting the component responses is not clear either. A schematic circuit diagram then mainly serves to recapitulate the normalization model in Equation 9. We, therefore, choose not to add a schematic circuit diagram at this time. We are interested in developing a circuit model to account for how visual neurons represent multiple stimuli in future studies.

      Another suggestion is that the time courses could be used to constrain the model; the fact that it takes a while after the onset of the slow-speed response for averaging to reveal itself suggests the presence of inertia/hysteresis in the circuit).

      We agree that the time course of MT responses could be used to constrain the model. This is also why we think it is important to document the time course in this paper. We now state in the Results, page 17, lines 354-357:

      “At slow speeds, the very early faster-speed bias suggests a likely role of feedforward inputs to MT on the faster-speed bias. The slightly delayed reduction (normalization) in the bispeed response relative to the stronger component response also helps constrain the circuit model for divisive normalization.”

      Two-Direction Experiment:

      Applying the normalization model to this dataset could help determine its generality.

      This is a good point. We now apply the normalization model (Eq. 9) to fit this data set with the full direction tuning curves in response to two stimuli moving in different directions at different speeds. Please also see the response to Reviewer 2 about the normalization model fit.

      The results of the normalization model fit are now described on page 20 and Figure 8A, B, D.

      Reviewer #2 (Recommendations For The Authors):

      In terms of impact, I would say that the presentation is geared largely toward people who go to VSS. To broaden the appeal, the authors might consider a more general formulation of the four hypotheses stated at the bottom of page 3. These are prominent ideas in systems neuroscience - population encoding, Bayesian inference, etc.

      We thank the Reviewer for the suggestion. We have revised the Introduction accordingly on pages 3-4, lines 43-69. Please also see the response to Reviewer 3 about the Introduction.

      Figure 5: It might be helpful to show the predictions for different hypotheses. If the response to the transparent stimulus is equal to that of the faster stimulus, you will have a line with slope 1. If it is equal to the response to the slow stimulus, all points will lie on the x-axis. In between you get lines with slopes less than 1.

      In Figures 5F1 and 5F2, we show dotted lines indicating faster-all (i.e., faster-componenttake-all), response averaging, and slower-all (i.e., slower-component-take-all) on the X-axis. We show those labels in between Figs. 5F1 and F2.

      Figure 6: The analysis is not motivated by any particular question, and the results are presented without any quantitation. This section could be better motivated or else removed.

      We now better motivate the section about the response time course on page 16, lines 336 – 339: “The temporal dynamics of the response bias toward the faster component may provide a useful constraint on the neural model that accounts for this phenomenon. We therefore examined the timecourse of MT response to the bi-speed stimuli. We asked whether the faster-speed bias occurred early in the neuronal response or developed gradually.”

      On page 17, lines 354-357, we also state that “At slow speeds, the very early faster-speed bias suggests a likely role of feedforward inputs to MT on the faster-speed bias. The slightly delayed reduction (normalization) in the bi-speed response relative to the stronger component response also helps constrain the circuit model for divisive normalization.”

      Equation (9): There appears to be an "S" missing in the denominator.

      We double-checked and did not see a missing "S" in Equation 9, on page 20.  

      Reviewer #3 (Recommendations For The Authors):

      This is an impressive study, with the chief strengths being the computational/theoretical motivation and analyses and the inclusion of psychophysics together with primate neurophysiology. The manuscript is well-written and the figures are clear and convincing (with a couple of suggestions detailed below).

      We thank the Reviewer for the comments.

      Specific suggestions:

      (1) Intro para 3

      "It is conceivable that the responses of MT neurons elicited by two motion speeds may follow one of the following rules: (1) averaging the responses elicited by the individual speed components; (2) bias toward the speed component that elicits a stronger response, i.e. "soft-max operation" (Riesenhuber and Poggio, 1999); (3) bias toward the slower speed component, which may better represent the more probable slower speeds in nature scenes (Weiss et al., 2002); (4) bias toward the faster speed component, which may benefit the segmentation of a faster-moving stimulus from a slower background."

      This would be a good place to point out which of these options is likely to preserve vs. lose information and how.

      It seems to me that only #2 is clearly information-preserving, assuming that there are neurons with a variety of different speed preferences such that different neurons will exhibit different "winners". #1 would predict subjects would perceive only an intermediate speed, whereas #3 would predict perceiving only/primarily the slower speed and #4 would predict only/primarily perceiving the faster speed.

      The difference between "only" and "primarily" would depend on whether the biases are complete or only partial. I acknowledge that the behavioral task in the study is not a "report all perceived speeds" task, but rather a 1 vs 2 speeds task, so the behavioral assay is not a direct assessment of the question I'm raising here, but I think it should still be possible to write about the perceptual implications of these different possibilities for encoding in an informative way.

      Thanks for the suggestions. We have revised this paragraph in the Introduction on pages 3 – 4, lines 43 – 69.

      (2) Analysis clarifications

      The section "Relationship between the responses to bi-speed stimuli and constituent stimulus components" could use some clarification/rearrangement/polish. I had to read it several times. Possibly, rearrangement, simplification/explanation of nomenclature, and building up from a simpler to a more complex case would help. If I understand correctly, the outcome of the analysis is to obtain a weight value for every combination of slow and fast speeds used. The R's in equation 5 are measured responses, observed on the single stimulus and combined stimulus trials. It was not clear to me if the R's reflect average responses or individual trial responses; this should be clarified. Ws = 1- wf so in essence only 1 weight is computed for each combination. Then, in the subsequent sections of the manuscript, the authors explore whether the weight computed for each stimulus combination is the same or does it vary across conditions. If I have this right, then walking through these steps will aid the reader.

      The Reviewer is correct. We now walk through these steps and better state the rationale for this approach. The R's in Equation 5 are trial-averaged responses, not trial-by-trial responses.

      We have clarified these points on page 13.

      To take a particular example, the sentence "Using this approach to estimate the response weights for individual neurons can be inaccurate because, at each speed pair, the weights are determined only by three data points" struck me as a rather backdoor way to get at the question. Is the estimate noisy? Or does the weighting vary systematically across speeds? I think the authors are arguing the latter; if so, it would be valuable to say so.

      We wanted to estimate the weighting for each speed pair and determine whether the weights change with the stimulus speeds. Indeed, we found that the weights change systematically across speed pairs. The issue was not because the estimate was noisy (see below in response to the second paragraph for point 3.  

      We have clarified this point in the text, on page 13, lines 273 – 280: “Our goal was to estimate the weights for each speed pair and determine whether the weights change with the stimulus speeds. In our main data set, the two speed components moved in the same direction. To determine the weights of 𝑤 and w<sub>f</sub> for each neuron at each speed pair, we have three data points R, R<sub>s</sub>, and R<sub>f</sub>, which are trial-averaged responses. Since it is not possible to solve for both variables, 𝑤 and w<sub>f</sub>, from a single equation (Eq. 5) with three data values, we introduced an additional constraint: 𝑤 + w<sub>f</sub> =1. While this constraint may not yield the exact weights that would be obtained with a fully determined system, it nevertheless allows us to characterize how the relative weights vary with stimulus speed.”

      (3) Figure 5

      Related to the previous point, Figures 5A-E are subject to a possible confound. When plotting x vs y values, it is critical that the x and y not depend trivially on the same value. Here, the plots are R-Rs and Rf-Rs. Rs, therefore, is contained in both the x and y values. Assume, for the sake of argument, that R and Rf are constants, whereas Rs is drawn from a distribution of random noise. When Rs, by chance, has an extreme negative value, R-Rs and Rf-Rs will be large positive values. The solution to this artificial confound is to split the trials that generate Rs into two halves and subtract one half from R and the other half from Rf. Then, the same noisy draw will not be contributing to both x and y. The above is what is needed if the authors feel strongly about including this analysis.

      The Reviewer is correct that subtracting a common term (Rs) would introduce a correlation between (R-Rs) and (Rf-Rs) (Reviewer 2 also raised this point). R's in Equations 5, 6, 7 (and Figure 5A-E) are trial-averaged responses. So, we cannot address the issue by dividing R’s into two halves. Our results showed that the regression slope (W<sub>f</sub>) changed from near 1 to about 0.5 as the stimulus speeds increased, and the correlation coefficient between (R – Rs) and (R<sub>f</sub> – Rs) was high at slow stimulus speeds. To determine whether these results can be explained by the confounding factor of subtracting a common term Rs, rather than by the pattern of R in representing two speeds, we did an additional analysis. We acknowledged the issue and described the new analysis on page 13, lines 303 – 326:

      “Our results showed that the bi-speed response showed a strong bias toward the faster component when the speeds were slow and changed progressively from a scheme of ‘fastercomponent-take-all’ to ‘response-averaging’ as the speeds of the two stimulus components increased (Fig. 5F1). We found similar results when the speed separation between the stimulus components was small (×2), although the bias toward the faster component at low stimulus speeds was not as strong as x4 speed separation (Fig. 5A2-F2 and Table 1).  

      In the regression between (𝑅 – 𝑅<sub>s</sub>) and (𝑅<sub>f</sub> – 𝑅<sub>s</sub>), 𝑅<sub>s</sub> was a common term and therefore could artificially introduce correlations. We wanted to determine whether our estimates of the regression slope (𝑤<sub>f</sub>) and the coefficient of determination (𝑅<sup>2</sup>) can be explained by this confounding factor. At each speed pair and for each neuron from the data sample of the 100 neurons shown in Figure 5, we simulated the response to the bi-speed stimuli (𝑅 <sub>e</sub>) as a randomly weighted sum of 𝑅<sub>f</sub> and 𝑅<sub>s</sub> of the same neuron.

      𝑅<sub>e</sub> = 𝑎𝑅<sub>f</sub> + (1 − 𝑎)𝑅<sub>s</sub>,

      in which 𝑎 was a randomly generated weight (between 0 and 1) for 𝑅<sub>f</sub>, and the weights for 𝑅<sub>f</sub> and 𝑅<sub>s</sub> summed to one. We then calculated the regression slope and the correlation coefficient between the simulated 𝑅<sub>e</sub> - 𝑅<sub>s</sub> and 𝑅<sub>f</sub> - 𝑅<sub>s</sub> across the 100 neurons. We repeated the process 1000 times and obtained the mean and 95% confidence interval (CI) of the regression slope and the 𝑅<sup>2</sup>. The mean slope based on the simulated responses was 0.5 across all speed pairs. The estimated slope (𝑤<sub>f</sub>) based on the data was significantly greater than the simulated slope at slow speeds of 1.25/5, 2.5/10 (Fig. 5F1), and 1.25/2.5, 2.5/5, and 5/10 degrees/s (Fig. 5F2) (bootstrap test, see p values in Table 1). The estimated 𝑅<sup>2</sup> based on the data was also significantly higher than the simulated 𝑅<sup>2</sup> for most of the speed pairs (Table 1). These results suggest that the faster-speed bias at the slow stimulus speeds and the consistent response weights across the neuron population at each speed pair are not analysis artifacts.”

      However, I don't see why the analysis is needed at all. Can't Figure 5F be computed on its own? Rather than computing weights from the slopes in 5A-E, just compute the weights from each combination of stimulus conditions for each neuron, subject to the constraint ws=1-wf. I think this would be simpler to follow, not subject to the noise confound described in the previous point, and likely would make writing about the analysis easier.

      We initially tried the suggested approach to determine the weights of the individual neurons. The weights from each speed combination for each neuron are calculated by:  𝑤<sub>s</sub> = , 𝑤<sub>f</sub> , and 𝑤<sub>s</sub> and 𝑤<sub>f</sub> sum to 1. 𝑅, 𝑅<sub>f</sub> and  𝑅<sub>s</sub> are the responses to the same motion direction. Using this approach to estimate response weights for individual neurons can be unreliable, particularly when 𝑅<sub>f</sub> and 𝑅<sub>s</sub> are similar. This situation often arises when the two speeds fall on opposite sides of the neuron's preferred speed, resulting in a small denominator (𝑅<sub>f</sub> - 𝑅<sub>s</sub>) and, consequently, an artificially inflated weight estimate. We therefore used an alternative approach. We estimated the response weights for the neuronal population at each speed pair (𝑅<sub>f</sub> - 𝑅<sub>s</sub>) using linear regression of (𝑅 - 𝑅<sub>s</sub>) against (𝑅<sub>f</sub> - 𝑅<sub>s</sub>). The slope is the weight for the faster component for the population. This approach overcame the difficulty of determining the response weights for single neurons.

      Nevertheless, if the data provide better constraints, it is possible to estimate the response weights for each speed pair for individual neurons. For example, we can calculate the weights for single neurons by using stimuli that move in different directions at two speeds. By characterizing the full direction tuning curves for R, R<sub>f</sub>, and Rs, we have sufficient data to constrain the response weights for single neurons, as we did for the speed pair of 2.5 and 10º/s in Figure 8. In future studies, we can use this approach to measure the response weights for single neurons at different speed pairs and average the weights across the neuron population.  

      We explain these considerations in the Results (pages 13–14, lines 265-326) and Discussion (pages 34-35, lines 818-829).

      (4) Figure 7

      Bidirectional analysis. It would be helpful to have a bit more explanation for why this analysis is not subject to the ws=1-wf constraint. In Figure 7B, a line could be added to show what ws + wf =1 would look like (i.e. a line with slope -1 going from (0,1) to (1,0); it looks like these weights are a little outside that line but there is still a negative trend suggesting competition.

      For the data set when visual stimuli move in the same direction at different speeds, we included a constraint that W<sub>s</sub> and W<sub>f</sub> sum to 1. This is because one cannot solve two independent variables (Ws and Wf) using one equation R = W<sub>s</sub> · R<sub>s</sub> + W<sub>f</sub> R<sub>f</sub>, with three data values (R, Rs, Rf).

      In the dataset using bi-directional stimuli (now Fig. 8), we can use the full direction tuning curves to constrain the linear weighted (LWS) summation model and the normalization model. So, we did not need to impose the additional constraint that Ws and Wf sum to one, which is more general. We now clarify this in the text, on page 19, lines 421-423.

      As suggested, we added a line showing Ws + Wf = 1 for the LWS model fit (Fig. 8C) and the normalization model fit (Fig. 8D) (also see page 21, lines 482-484). Although 𝑤 and 𝑤 are not constrained to sum to one in the model fits, the fitted weights are roughly aligned with the dashed lines of Ws + Wf = 1.

      (5) Attention task

      General wording suggestions - a caution against using "attention" as a causal/mechanistic explanation as opposed to a hypothesized cognitive state. For example, "We asked whether the faster-speed bias was due to bottom-attention being drawn toward the faster stimulus component". This could be worded more conservatively as whether the bias is "still present if attention is directed elsewhere" - i.e. a description of the experimental manipulation.

      We intended to test the hypothesis of whether the faster-speed bias can be explained by attention automatically drawn to the faster component and therefore enhance the contribution of the faster component to the bi-speed response. We now state it as a possible explanation to be tested. We changed the subtitle of this section to be more conservative: “Faster-speed bias still present when attention was directed away from the RFs”, on page 18, line 363.

      We also modified the text on page 18, lines 364-367: “One possible explanation for the faster-speed bias may be that bottom-up attention is drawn toward the faster stimulus component, enhancing the response to the faster component. To address this question, we asked whether the faster-speed bias was still present if attention was directed away from the RFs.”

      Relatedly, in the Discussion, the section on "Neural mechanisms", the sentence "The faster-speed bias was not due to an attentional modulation" should be rephrased as something like 'the bias survived or was still present despite an attentional modulation requiring the monkey to attend elsewhere'.

      Our motivation for doing the attention-away experiment was to determine whether a bottom-up attentional modulation can explain the faster-speed bias. We now describe the results as suggested by the Reviewer. But we’d also like to interpret the implications of the results. In Discussion, page 34, lines 789-790, we now state: “We found that the faster-speed bias was still present when attention was directed away from the RFs, suggesting that the faster-speed bias cannot be explained by an attentional modulation.”  

      (6) "A model that accounts for the neuronal responses to bi-speed stimuli". This section opens with: "We showed that the neuronal response in MT to a bi-speed stimulus can be described by a weighted sum of the neuron's responses to the individual speed components". "Weighted average" would be more appropriate here, given that ws = 1-wf.

      As mentioned above, the added constraint of Ws+Wf = 1 was only a practical solution for determining the weights for the data set using visual stimuli moving in the same direction. More generally, Ws and Wf do not need to sum to one. As such, we prefer the wording of weighted sum.

      (7) "As we have shown previously using visual stimuli moving transparently in different directions, a classifier's performance of discriminating a bi-directional stimulus from a singledirection stimulus is worse when the encoding rule is response-averaging than biased toward one of the stimulus components" - this is important! Can this be worked into the Introduction?

      Yes, we now also mention this point in the Introduction regarding response averaging on page 4, lines 54-57: “While decoding two stimuli from a unimodal response is theoretically possible (Zemel et al., 1998; Treue et al., 2000), response averaging may result in poorer segmentation compared to encoding schemes that emphasize individual components, as demonstrated in neural coding of overlapping motion directions (Xiao and Huang, 2015).” Also, please see the response to point 1 above.

      (8) Minor, but worth catching now - is the use of initials for human participants consistent with best practices approved at your institution?

      Thanks for checking. The letters are not the initials of the human subjects. They are coded characters. We have clarified it in the legend of Figure 1, on page 7, line 168.

    1. Author Response

      Reviewer #1 (Public Review):

      Summary:

      In this paper, the effects of two sensory stimuli (visual and somatosensory) on fMRI responsiveness during absence seizures were investigated in GEARS rats with concurrent EEG recordings. SPM analysis of fMRI showed a significant reduction in whole-brain responsiveness during the ictal period compared to the interictal period under both stimuli, and this phenomenon was replicated in a structurally constrained whole-brain computational model of rat brains.

      The conclusion of this paper is that whole-brain responsiveness to both sensory stimuli is inhibited and spatially impeded during seizures.

      I also suggest the manuscript should be written in a way that is more accessible to readers who are less familiar with animal experiments. In addition, the implementation and interpretation of brain simulations need to be more careful and clear.

      Several sections of the manuscript were clarified and simplified to be more accessible. Also, implementation and interpretations of brain simulations were modified to be more precise.

      Strengths:

      1) ZTE imaging sequence was selected over traditional EPI sequence as the optimal way to perform fMRI experiments during absence seizures.

      2) A detailed classification of stimulation periods is achieved based on the relative position in time of the stimulation period with respect to the brain state.

      3) A whole-brain model embedded with a realistic rat connectome is simulated on the TVB platform to replicate fMRI observations.

      We thank the reviewer for indicating the strengths of our manuscript.

      Weaknesses:

      1) The analysis in this paper does not directly answer the scientific question posed by the authors, which is to explore the mechanisms of the reduced brain responsiveness to external stimuli during absence seizures (in terms of altered information processing), but merely characterizes the spatial involvement of such reduced responsiveness. The same holds for the use of mean-field modeling, which merely reproduces experimental results without explaining them mechanistically as what the authors have claimed at the head of the paper.

      We agree with the reviewer that the manuscript does not answer specifically about the mechanisms of reduced brain responsiveness. The main scientific question addressed in the manuscript was to compare whole-brain responsiveness of stimulus between ictal and interictal states. The sentence that can lead to misinterpretations in the manuscript abstract: “The mechanism underlying the reduced responsiveness to external stimulus remains unknown.” was therefore modified to the following “The whole-brain spatial and temporal characteristics of reduced responsiveness to external stimulus remains unknown”.

      2) The implementations of brain simulations need to be more specific.

      Contribution:

      The contribution of this paper is performing fMRI experiments under a rare condition that could provide fresh knowledge in the imaging field regarding the brain's responsiveness to environmental stimuli during absence seizures.

      Reviewer #2 (Public Review):

      Summary:

      This study examined the possible effect of spike-wave discharges (SWDs) on the response to visual or somatosensory stimulation using fMRI and EEG. This is a significant topic because SWDs often are called seizures and because there is non-responsiveness at this time, it would be logical that responses to sensory stimulation are reduced. On the other hand, in rodents with SWDs, sensory stimulation (a noise, for example) often terminates the SWD/seizure.

      In humans, these periods of SWDs are due to thalamocortical oscillations. A certain percentage of the normal population can have SWDs in response to photic stimulation at specific frequencies. Other individuals develop SWDs without stimulation. They disrupt consciousness. Individuals have an absent look, or "absence", which is called absence epilepsy.

      The authors use a rat model to study the responses to stimulation of the visual or somatosensory systems during and in between SWDs. They report that the response to stimulation is reduced during the SWDs. While some data show this nicely, the authors also report on lines 396-8 "When comparing statistical responses between both states, significant changes (p<0.05, cluster-) were noticed in somatosensory auditory frontal..., with these regions being less activated in interictal state (see also Figure 4). That statement is at odds with their conclusion.

      We thank the reviewer for noting this discrepancy. The statement should have been written vice versa and it has been corrected as: “When comparing statistical responses between both states, significant changes (p<0.05, cluster-level corrected) were noticed in the somatosensory, auditory and frontal cortices: these regions were less activated in ictal than in interictal state (see also Figure 4).”

      They also conclude that stimulation slows the pathways activated by the stimulus. I do not see any data proving this. It would require repeated assessments of the pathways in time.

      We agree with the reviewer that there are no data showing slowing of the pathways in response to stimulus. However, we are a bit confused about this comment, as to what part in conclusion section it refers to. We did not intentionally claim that stimulation slows the activated pathways in the manuscript.

      The authors also study the hemodynamic response function (HRF) and it is not clear what conclusions can be made from the data.

      Hemodynamic response functions were studied for two reasons:

      • To account for possible change in HRF during the detection of activated regions. Indeed, a physiological change in HRF can mask the detection of an activation when the software uses a standard HRF to convolve the design matrix (David et al. 2008).

      • To characterize the shape and polarity of fMRI activations in brain regions that we noticed to be differently activated between ictal and interictal states and evaluate whether alteration in activation was associated to alteration in hemodynamic.

      The observed HRF decreases (rather than increases) in the cortex when stimulation was applied during SWD, was discussed in section 4.4., where we speculated that neuronal suppression caused by SWD can prevent responsiveness. In this case, the decreased HRF could either be a consequence or a cause of the observed neuronal suppression. The assumption that the HRF reduction is causal would be supported by a possible vascular steal effect from other activation regions. However, in the conclusion section we did not state this and therefore the following sentence was added to conclusions: “Moreover, the detected decreases in the cortical HRF when sensory stimulation was applied during spike-and-wave discharges, could play a role in decreased sensory perception. Further studies are required to evaluate whether this HRF change is a cause or a consequence of the reduced neuronal response”.

      Finally, the authors use a model to analyze the data. This model is novel and while that is a strength, its validation is unclear. The conclusion is that the modeling supports the conclusions of the study, which is useful.

      Details about the model were added.

      Strengths:

      Use of fMRI and EEG to study SWDs in rats.

      Weaknesses:

      Several aspects of the Methods and Results are unclear.

      Reviewer #3 (Public Review):

      Summary:

      This is an interesting paper investigating fMRI changes during sensory (visual, tactile) stimulation and absence seizures in the GAERS model. The results are potentially important for the field and do suggest that sensory stimulation may not activate brain regions normally during absence seizures. However the findings are limited by substantial methodological issues that do not enable fMRI signals related to absence seizures to be fully disentangled from fMRI signals related to the sensory stimuli.

      Strengths:

      Investigating fMRI brain responses to sensory stimuli during absence seizures in an animal model is a novel approach with the potential to yield important insights.

      The use of an awake, habituated model is a valid and potentially powerful approach.

      Weaknesses:

      The major difficulty with interpreting the results of this study is that the duration of the visual and auditory stimuli was 6 seconds, which is very close to the mean seizure duration per Table 1. Therefore the HRF model looking at fMRI responses to visual or auditory stimuli occurring during seizures was simultaneously weighting both seizure activity and the sensory (visual or auditory) stimuli over the same time intervals on average. The resulting maps and time courses claiming to show fMRI changes from visual or auditory stimulation during seizures will therefore in reality contain some mix of both sensory stimulation-related signals and seizure-related signals. The main claim that the sensory stimuli do not elicit the same activations during seizures as they do in the interictal period may still be true. However the attempts to localize these differences in space or time will be contaminated by the seizure-related signals.

      The claims that differences were observed for example between visual cortex and superior colliculus signals with visual stim during seizures vs. interictal are unconvincing due to the above.

      We understand this concern expressed by the reviewer and agree that seizure-related signals must be considered in the analysis when studying stimulation responses. Therefore, in modelling the responses in the SPM framework, we considered both stimulation and seizure-only states as regressors of interest and used seizure-only responses as nuisance regressors to account for error variance. Thereby, the effects caused by the stimulation should be, in theory, separated as much as possible from the effects caused by the seizure itself. Additionally, the cases where stimulations occurred fully inside a seizure (included in Figure 3, “...stimulation during ictal state) actually had a longer average seizure duration of 45 ± 60 s, therefore being much longer than 6s which an average duration taken from all seizures.

      However, we acknowledge that there is a potential that some leftover effects from a seizure are still present, and we have noted this caution in the “Physiologic and methodologic considerations” section: “We note a caution that presented maps and time courses showing fMRI changes from visual or whisker stimulation during seizures may contain mixture of both sensory stimulation-related signals and seizure-related signals. To minimize this contamination, we considered in SPM both stimulation and seizure-only states as regressors of interest and used seizure-only responses as nuisance regressors to account for error variance. Thereby, the effects caused by the seizure itself should be separated as much as possible from the effects caused by stimulation.”

      The maps shown in Figure 3 do not show clear changes in the areas claimed to be involved.

      We clarified the overall appearance of Figure 3, by enlarging the selected cross sections for better anatomical differentiation and added anterior and posterior directions on all images.

      Reviewer #1 (Recommendations For The Authors):

      1) The implementations of brain simulations need to be more specific: How is the stimulation applied in the mean-field model in terms of its mathematical expression? The state variable of the model is the rate of neuronal firing, but how is it subsequently converted into fMRI responses? How are the statistical plots calculated? How much does this result depend on the model parameter?

      Further details and explanations about the model have now been added to the manuscript. The stimulation of a specific region is simulated as an increase in the excitatory input to the specific node. In particular we use a square function for representing the stimulus (see for example panel A in Figure 6–figure supplement 1). As the referee mentions, the model describes the dynamics of the neuronal firing rates. This provides direct information about neuronal activity and responsiveness for which all the statistical analyses of the simulations shown in the paper were performed using the firing rates. For these analyses, no conversion to fMRI was needed. To build the statistical maps, an ANOVA (analysis of variance) test was used. The ANOVA test is originally designed to assess the significance of the change in the mean between two samples, and is calculated via an F-test as the ratio of the variance between and within samples. In our case it allowed us to assess the impact of the stimulation on the ongoing neuronal activity by performing a comparison of the timeseries of the firing rate with and without stimulation (this was performed independently for each state). For the results presented in this paper, the ANOVA analysis was performed using the “f_oneway” function of the scipy.stats. module in python. Regarding the dependence on the model parameter, the main results obtained in our paper are related with the responsiveness of the system under two quantitatively different types of ongoing dynamics: an asynchronous irregular activity (interictal period) and an oscillatory SWD type of dynamics (ictal period). In particular, we show how for the SWD dynamics the activity evoked by the stimulus is overshadowed by the ongoing activity which imposes a strong limitation in the response of the system and the propagation of the stimulus. In this sense, the main results of the simulations are very general, and no significant dependence on specific cellular or network parameters was observed within a physiologically relevant range or should be expected. Nevertheless, we point out that, as mentioned in the text, the key parameter that triggers the transition between the two types of dynamics is the strength of the adaptation current (in particular the strength of the spike-triggered adaptation parameter ‘b’ described in the Supplementary information), which in addition has the capacity of controlling the frequency of the oscillations. In the paper, this parameter was set such that the SWD frequency falls within the range observed in the GAERS (between 7-12Hz). We believe that further analysis around the region of transition between states, in particular from a dynamical point of view, could be of relevance for future work.

      2) In the abstract, what exactly does "typical information flow in functional pathways" mean and which part of the results does this refer to?

      We note that this sentence was overly complicated. By “typical information flow”, we were referring to sensory responsiveness during interictal state. Therefore, we made the following modifications to the abstract: “These results suggest that sensory processing observed during an interictal state can be hindered or even suppressed by the occurrence of an absence seizure, potentially contributing to decreased responsiveness.”

      3) Figure 4 - Figure Supplement 1 performed an analysis of comparing states between 'when stimulation ended a seizure' and 'stimulation during an ictal period'. The authors should explain more clearly in the manuscript what is the reason and significance of considering the state of 'when stimulation ended a seizure'. And how is a seizure considered to be terminated by stimulation rather than ending spontaneously?

      We have now added explanations to the manuscript section 2.5.3 as why this state was also of interest: “The case when stimulation ended a seizure is particularly interesting for studying the spatial and temporal aspects explaining shift from ictal, i.e. non-responsiveness state, to non-ictal, i.e. responsiveness state.” We agree that there is a possibility that seizures ended spontaneously at the same time as stimulus was applied but argue that seizures most probably end due to stimulation, based on results published previously (https://doi.org/10.1016/j.brs.2012.05.009).

      4) In Section 3.1, some detailed descriptions of methods should be moved to Section 2, e.g. how the spatial and temporal SNR is obtained and the description of bad quality data. Also, I suggest the significance of selecting the optimal MRI sequence be stated earlier in the paper, as Section 3.1 cannot be expected from reading the abstract and introduction.

      We moved some technical explanations of SNRs from section 3.1. to section 2.4.1. Significance of the selection of the MRI sequence is also now stated earlier in the introduction section: “For this purpose, the functionality of ZTE sequence was first piloted, and selected over traditional EPI sequence for its lower acoustic noise and reduced magnetic susceptibility artefacts. The selected MRI sequence thus appeared optimal for awake EEG-fMRI measurements.”

      Some minor issues:

      1) How is ROI defined in this paper? What type of atlas is used?

      Anatomical ROIs were drawn based on Paxinos and Watson rat brain atlas 7th edition. Region was selected if there were statistically significant activations detected inside that region, based on activation maps. We clarified the definition of ROI as the following: “Anatomical ROIs, based on Paxinos atlas (Paxinos and Watson rat brain atlas 7th edition), were drawn on the brain areas where statistical differences were seen in activation maps.”

      2) Section 4.3.2, "In addition, some responses were seen in the somatosensory cortex during the seizure state, which may be due to the fact that the linear model used did not completely remove the effect of the seizure itself" What is the reason for the authors to make such comments?

      This claim was made because we saw similar trend of responses (deactivation) in F-contrast maps in the somatosensory cortex, when comparing “stimulation during ictal state” maps to "seizure map", leading us to assume that the effect of seizure was still apparent in the maps (even though “seizure only” states were used as nuisance regressors). However, as this claim is highly speculative, we have decided to delete this sentence in the manuscript.

      3) Abbreviations such as SPM, HRF, CBF, etc. are not defined in the manuscript.

      Definitions for these abbreviations were added.

      4) Supplementary information-AdEx mean-field model, 've and vi', e and i should be subscripted.

      Subscripts were added.

      Reviewer #2 (Recommendations For The Authors):

      Below are more detailed questions and concerns. Many questions are about the Methods, which seem to be written by a specialist. However, there are also questions about the experimental approach and conclusions.

      One of the strengths of the study is the use of fMRI and EEG. However, to allow rats to be still in the magnet, isoflurane was used, and then as soon as rats recovered they were imaged. However isoflurane has effects on the brain long after the rats have appeared to wake up. Moreover, to train rats to be still, repetitive isoflurane sessions had to be used. Repetitive isoflurane should have a control of some kind, or be discussed as a limitation.

      The repetitive use of isoflurane is indeed an important limiting factor that was not yet discussed in the manuscript. We have added the following sentences to the “Physiologic and methodologic considerations” section:

      “As the used awake habituation and imaging protocol didn’t allow us to avoid the usage of isoflurane during the preparation steps, we cannot rule out the possible effect of using repetitive anesthesia on brain function. However, duration (~15 min) and concentration of anesthesia (~1.5%) during these steps were still moderate, whereas extended durations (1-3 h) of either single or repetitive isoflurane exposures have been used in previous studies where long-term effects on brain function have been observed (Long II et al., 2016; Stenroos et al., 2021). Moreover, there was a 5-15 min waiting period between the cessation of anesthesia and initiation of fMRI scan, to avoid the potential short-term effects of isoflurane that has been found to be most prominent during the 5 min after isoflurane cessation (Dvořáková et al., 2022).

      An assumption of the study is that interictal periods are normal. However, they may not be. A control is necessary. One also wants to know how often GAERS have spontaneous spike-wave discharges (SWDs), what the authors call seizures. The reason is that the more common the SWDs, the less likely interictal periods are normal. It seems from the Methods that rats were selected if they had frequent seizures so many could be captured in a recording session. Those without frequent seizures were discarded.

      A good control would be a normal rat that has spontaneous SWDs, since almost all rat strains have them, especially with age and in males (PMID: 7700522). However, whether they are frequent enough might be a problem. Alternatively, animals could be studied with rare seizures to assess the normal baseline, and compared to interictal states in GAERS.

      We appreciate this concern raised by the Reviewer. Even though it would be interesting to study different strains and SWD frequency dependence, the aim of this study was to compare interictal vs ictal states in this specific animal model. We also understand that interictal periods could not necessarily model “normal” state and therefore went through the manuscript again to remove any claims referring to this.

      About the mechanisms of SWDs, the authors should update their language which seems imprecise and lacks current citations (starting on line 71):

      "Although the origin of absence seizures is not fully understood, current studies on rat models of absence seizures suggest that they arise from atypical excitatory-inhibitory patterns in the barrel field of the somatosensory cortex (Meeren et al. 2002; Polack et al. 2007) and lead to synchronous cortico-thalamic activity (Holmes, Brown, and Tucker 2004)."

      Some of the best explanations for SWDs that I know of are from the papers of John Huguenard. His reviews are excellent. They discuss the mechanisms of thalamocortical oscillations.

      We have reformatted the sentences discussing the mechanism of SWDs and included the explanations provided by manuscripts from Huguenard and McCafferty et al.: “Although the origin of absence seizures is not fully understood, current studies on rat models of absence seizures suggest that they arise from excitatory drive in the barrel field of the somatosensory cortex (Meeren et al. 2002; Polack et al. 2007, 2009, David et al., 2008) and then propagate to other structures (David et al., 2008) including thalamus, knowing to play an essential role during the ictal state (Huguenard, 2019). Notably, the thalamic subnetwork is believed to play a role in coordinating and spacing SWDs via feedforward inhibition together with burst firing patterns. These lead to the rhythms of neuronal silence and activation periods that are detected in SWD waves and spikes (McCafferty et al., 2018; Huguenard, 2019).”

      The following also is not precise:

      "Although seizures are initially triggered by hyperactive somatosensory cortical neurons, the majority of neuronal populations are deactivated rather than activated during the seizure, resulting in an overall decrease in neuronal activity during SWD (McCafferty et al. 2023)." What neuronal populations? Cortex? Which neurons in the cortex? Those projecting to the thalamus? What about thalamocortical relay cells? Thalamic gabaergic neurons?

      Lines 85-8: "In addition, a previous fMRI study on GAERS, which measured changes in cerebral blood volume, found both deactivated and activated brain areas during seizures (David et al. 2008). Which areas and conditions led to reduced activity? Increased activity? How was it surmised?

      "concurrent stimuli and therefore could contribute to the alterations in behavioral responsiveness" - This idea has been raised before by others (Logthetis, Barth). Please discuss these as the background for this study.

      The particular section was modified to the following:

      “Previous results on GAERS have indicated that, during an absence seizure, hyperactive electrophysiological activity in the somatosensory cortex can contribute to bilateral and regular SWD firing patterns in most parts of the cortex. These patterns propagate to different cortical areas (retrosplenial, visual, motor and secondary sensory), basal ganglia, cerebellum, substantia nigra and thalamus (David et al. 2008; Polack et al. 2007). Although SWDs are initially triggered by hyperactive somatosensory cortical neurons, neuronal firing rates, especially in majority of frontoparietal cortical and thalamocortical relay neurons, are decreased rather than increased during SWD, resulting in an overall decrease in activity in these neuronal populations (McCafferty et al. 2023). Previous fMRI studies have demonstrated blood volume or BOLD signal decreases in several cortical regions including parietal and occipital cortex, but also, quite surprisingly, increases in subcortical regions such as thalamus, medulla and pons (David et al., 2008; McCafferty et al., 2023). In line with these findings, graph-based analyses have shown an increased segregation of cortical networks from the rest of the brain (Wachsmuth et al. 2021). Altogether, alterations in these focal networks in the animal models of epilepsy impairs cognitive capabilities needed to process specific concurrent stimuli during SWD and therefore could contribute to the lack of behavioral responsiveness (Chipaux et al. 2013; Luo et al. 2011; Meeren et al. 2002; Studer et al. 2019), although partial voluntary control in certain stimulation schemes can be still present (Taylor et al., 2017).”

      Please discuss the mean-field model more. What are its assumptions? What is its validation? Do other models also provide the same result?

      We have now extended the discussion and explanation of the mean-field model, both in the main text and in the Supplementary information. The mean-field model is a statistical tool to estimate the mean activity of large neuronal populations, and as such its main assumptions are centered around the size of the population analyzed and the characteristic times of the neuronal dynamics under study. It has been shown that the formalism is valid for characteristic times of neuronal dynamics with a lower bond in the order of few milliseconds and with population size of in the order thousands of neurons (see El Boustani and Destexhe, Neural computation 2009; and Di Volo et al, Neural computation 2019), with both conditions satisfied in the simulations made for this work. Regarding the validation, the model has been extensively validated and used for simulating different brain states (Di Volo et al. 2009; Goldman et al. 2023), signal propagation in cortical circuits (Zerlaut et al, 2018) and to perform whole-brain simulations (Goldman et al, 2023). The standard validation of the mean-field implies its comparison with the activity obtained from the corresponding spiking neural network. For completeness we show in Author response image 1 an example of the SWD type of dynamics obtained from a spiking neural network together with the one obtained from the mean-field. This figure has been added now to the Supplementary information of the paper. Regarding the extension of the results to other models, we think that the generality of our results is an interesting point from our work. The main results obtained from our simulation are related with the responsiveness of the system during two different type of ongoing activity: in the interictal state there is a significant variation on the ongoing activity evoked by the stimulation that is propagated to other regions, while in the SWD state the evoked activity is overshadowed by the ongoing activity which imposes a strong limit to the responsiveness of the system and the propagation of the signal. In this sense, the results of the simulations are very general and should be extensible to other models. Of course, the advantage of using a model like ours is the capability of reproducing the different states, its applicability to large scale simulations, and the fact that it is built from biologically relevant single-cell models (AdEx).

      Author response image 1.

      Comparison of the SWD dynamics in the mean-field model and the underlying spiking-neural network of AdEx neurons. A) Raster plot (top) and mean firing rate (bottom) from an SWD type of dynamics obtained from the spiking- network simulations. The network is made of 8000 excitatory neurons and 2000 inhibitory neurons. Neurons in the network are randomly connected with probability p=0.05 for inhibitory-inhibitory and excitatory-inhibitory connections, and p=0.06 for excitatory-excitatory connections. Cellular parameters correspond to the ones used in the mean-field, with spike-triggered adaptation for excitatory neurons set to b=200pA. We show the results for excitatory (green) and inhibitory (red) neurons. B) Mean-firing rate obtained from a single mean-field model. We see that, although the amplitude of oscillations is larger in the spiking-network, the mean-field can correctly capture the general dynamics and frequency of the oscillations.

      Line 11: "rats were equally divided by gender." Given n=11, does that mean 5 males and 6 females or the opposite?

      Out of 11 animals, 6 were males, and 5 females. This is now mentioned in the manuscript.

      What was the type of food?

      Type of food was added to the manuscript (Extrudat, vitamin-fortified, irradiated > 25 kGy)

      What were the electrodes?

      This was provided in the manuscript. Carbon fiber filament was produced by World Precision Instruments. The tips of this filament were spread to brush-like shape to increase the contact surface above the skull.

      "low noise zero echo time (ZTE) MRI sequence"- please explain for the non-specialist or provide references.

      Reference added.

      Lines 148-150: "The length of habituation period was selected based on pilot experiments and was sufficient for rats to be in low-stress state and produce absence seizures inside the magnet." How do the authors know the rats were in a low-stress state?

      This claim was based on two factors. At the end of the habituation protocol, the motion of animals was considerably decreased according to previous study using similar restraint/habituation protocol (DOI: 10.3389/fnins.2018.00548). In this study the decreased motion is also correlated with decreased blood corticosterone levels which reduced to baseline levels (indicating low-stress state) after 4 days of habituation. Another factor is when epileptic rodents are continuously recorded for 24h, most SWDs occur during a state of passive wakefulness or drowsiness (Lannes et al. 1988, Coenen et al. 1991) . Either way, as we don’t have a way to provide direct evidence of low-stress state, we modified the sentence to the following:

      “The length of habituation period was selected based on pilot experiments to provide low-motion data therefore giving rats a better chance to be in a low-stress state and thus produce absence seizures inside the magnet.”

      Lines 150-2: "Respiration rate and motion were monitored during habituation sessions using a pressure pillow and video camera to estimate stress level." What were the criteria for a high stress level?

      Criteria for high (or low) stress levels were based mostly on motion levels according to previous study (DOI: 10.1016/s0149-7634(05)80005-3). Still, as we didn’t measure direct measures of stress, we modified the sentence to the following:

      “Pressure pillow and video camera were used to estimate physiological state, via breathing rate, and motion level, respectively.”

      Lines 152-3: "During the last habituation session, EEG was measured to confirm that the rats produced a sufficient amount of absence seizures (10 or more per session)." If 10 min, the rats would basically be seizing the entire session, leading to doubt about what the interictal state was.

      The length of the last habituation session was 60min and the fMRI scan 45min. Given that rats produced ~40-50 seizures during fMRI scan, on average they produced ~1 seizures/min, and one seizure lasting on average of 5-6s, giving ~45s periods for interictal states. 10 or more seizures were used as a threshold to give statistically meaningful findings based on pilot experiments.

      Line 153: "Total of 2-5 fMRI experiments were conducted per rat within a 1-3-week period." What was the schedule for each animal? A table would be useful. If it varied, how do the authors know this was justified?

      Please see Figure 1–figure supplement 2 for examples of habituation timelines for individual rats:

      We found an error when stating 2-5 fMRI experiments, but it should be 3-5 fMRI experiments. This was corrected. We had an aim to acquire 12-14 sessions per stimulation condition and once a sufficient number of sessions were acquired, part of the animals was not used further. Two of the animals that were found to have good quality EEG and produced sufficient amounts of SWDs were kept, and briefly retrained for later second stimulation condition experiments. This was done to replace animals that needed to be excluded in the second stimulation condition due to bad quality EEG or lost implant. Extended use of some animals could theoretically bring slight variation to results but could actually be an advantage as animals were already well trained providing low-motion data.

      "Before and after each habituation session, rats were given a treat of sugar water and/or chocolate cereals as positive reinforcement. " How much and what was the concentration of sugar water; chocolate cereal?

      Rats were given 3 chocolate cereals and/or 1% sugar water. This was added to the manuscript now.

      Line 188: "We relied on pilot calibration of the heated water to maintain the body temperature" Please explain.

      Sentence was clarified:

      “We relied on pilot calibration of the temperature of heated water circulating inside animal bed to maintain the normal body temperature of ~37 °C"

      Line 190: "After manual tuning and matching of the transmit-receive coil, shimming and anatomical imaging" Please explain for the non-specialist.

      Sentence was simplified:

      “After routine preparation steps in the MRI console were done"

      Lines 199-201: "Anatomical imaging was conducted with a T1-FLASH sequence (TR: 530 ms, TE: 4 ms, flip angle 196 18{degree sign}, bandwidth 39,682 kHz, matrix size 128 x 128, 51 slices, field-of-view 32 x 32 mm², resolution 0.25 x 0.25 x 0.5 mm3). fMRI was performed with a 3D ZTE sequence (TR: 0.971 ms, TE: 0 ms, flip angle 4{degree sign}, pulse length 1 µs, bandwidth 150 kHz, oversampling 4, matrix size 60 x 60 x 60, field-of-view 30 x 30 x 60 mm3 , resolution of 0.5 x 0.5 x 1 mm3 , polar under sampling factor 5.64 nr. of projections 2060 resulting to a volume acquisition time of about 2 s). A total of 1350 volumes (45 min) were acquired." Please explain for the non-specialist.

      These technical parameters are provided for the sake of repeatability. Section was however clarified as the following and citation was added:

      Anatomical imaging was conducted with a T1-FLASH sequence (repetition time: 530 ms, echo time: 4 ms, flip angle 18°, bandwidth 39,682 kHz, matrix size 128 x 128, 51 slices, field-of-view 32 x 32 mm², spatial resolution 0.25 x 0.25 x 0.5 mm3). fMRI was performed with a 3D ZTE sequence (repetition time: 0.971 ms, TE: 0 ms, flip angle 4°, pulse length 1 µs, bandwidth 150 kHz, oversampling 4, matrix size 60 x 60 x 60, field-of-view 30 x 30 x 60 mm3, spatial resolution of 0.5 x 0.5 x 1 mm3, polar under sampling factor 5.64, number of projections 2060 resulting to a volume acquisition time of about 2 s (look Wiesinger & Ho, 2022 for parameter explanations)). A total of 1350 volumes (45 min) were acquired.

      "Visual (n=14 sessions, 5 rats) and somatosensory whisker (n=14 sessions, 4 rats)" - Please explain how multiple sessions were averaged for a single rat. Please justify the use of different numbers of sessions per rat.

      All the sessions belonging to the same stimulus scheme (multiple sessions per rat) were put at the once as sessions in SPM analysis together with all the stimulus conditions belonging to these sessions. Justifications for using a different number of sessions per rat, were given above.

      Lines 205-206: "For the visual stimulation, light pulses (3 Hz, 6 s total length, pulse length 166 ms) were produced by a blue led, and light was guided through two optical fibers to the front of the rat's eyes. What wavelength of blue? Why blue? Is the stimulation strong? Weak?

      Wavelength was 470 nm and brightness 7065 mcd with a current of 20mA. Blue was selected as it is in the frequency range that rat can differentiate and this color has been used in previous literature ( https://doi.org/10.1016/j.neuroimage.2020.117542, https://doi.org/10.1016/j.jneumeth.2021.109287)

      Line 212: "Stimulation parameters were based on previous rat stimulation fMRI studies to produce robust responses" What is a robust response? One where a lot of visual cortical voxels are activated?

      Sentence was corrected as the following:

      “Stimulation parameters were based on previous rat stimulation fMRI studies and chosen to activate voxels widely in visual and somatosensory pathways, correspondingly.”

      Line 245: "Seizures were confirmed as SWDs if they had a typical regular pattern, had at least double the amplitude compared to baseline signal..." What was the "typical" pattern? What baseline signal was it compared to? Was the baseline measured as an amplitude? Peak to trough?

      Sentence was corrected to the following:

      “Seizures were confirmed as SWDs if they had a typical regular spike and wave pattern with 7-12 Hz frequency range and had at least double the amplitude compared to baseline signal. All other signals were classified as baseline i.e. signal absent of a distinctive 7-12 Hz frequency power but spread within frequencies from 1 to 90 Hz.”

      "using rigid, affine, and SYN registrations" Please explain for the non-specialist.

      Corrected as the following:

      “using rigid, affine (linear) and SYN (non-linear) registrations”

      Line 274-5: "However, there were also intermediate cases where the seizure started or ended during the stimulation block (Figure 1 - Figure Supplement 1). These intermediate cases were modeled as confounds" Why confounds? They could be very interesting because the stimulation may not be affected if timed at the end of the seizure. What was the definition of start and end? Defining the onset and end of seizures is tricky.

      We agree that these cases are also highly interesting. Indeed, all the intermediate cases were also analyzed separately but not included in the manuscript (other than the case when stimulation immediately ended a seizure) as no statistical findings were found when comparing these cases to the baseline. E.g. for the case when stimulation was applied towards the end of seizure, it provided weakened responses but still stronger compared to case when stimulation was applied fully during a seizure (indicating some responsiveness after the cessation of seizure). As these intermediate cases led to results with higher variance, we considered them as confounds in the general linear model (i.e. reducing unwanted variance from the results of interests).

      Definition of onset and end of seizure can be difficult in some cases. When looking at the signal itself, especially towards the end of seizure the amplitude of SWDs can get weaker and thus the shift from seizure to baseline signal can be more problematic to differentiate. However, when looking at the power spectrum the boundaries were more easily detectable. Thus, in the definitions of onsets and ends of seizure we relied on both the signal and power spectrum (stated in the manuscript).

      "in the SPM analysis" Please explain for the non-specialist.

      Definition of SPM together with a link to software site was added.

      Line 276: "of fMRI data (see 2.5.3.) and thus explained variance that was not accounted for by the main effects of interest. " Please clarify.

      Clarified as:

      “Intermediate cases, where the seizure started or ended during the stimulation block (Figure 1–figure supplement 1), were considered as confounds of no-interest in the SPM analysis of fMRI data and the explained variance caused by the confounds were reduced from the main effects of interests”

      Line 277: "Additionally, a contrast..." What is meant?

      This chapter in 2.5.3. was modified as a whole to be more clear.

      Line 278-9: "...was given to two cases: i) when stimulation ended a seizure (0-2 s between stimulation start and seizure end)..." Again, how is the seizure onset and end defined?

      Look comment above.

      Lines 281-2: "Stimulations that did not fully coincide with a seizure were considered as nuisance regressors in the second level analysis." What is meant by nuisance regressor?

      Reference to SPM 12 manual was given for technical terms referring to analysis software.

      Lines 283-8: "Motion periods were also included as multiple regressors (not convolved with a basis function) to be used as nuisance regressors. Stimulations that coincided with a motion above 0.3% of the voxel size were not considered stimulation inputs. Stimulation and seizure inputs were convolved with "3 gamma distribution basis functions" (i.e. 3rd 285 order gamma) in SPM (option: basis functions, gamma functions, order: 3), to account for temporal and dispersion variations in the hemodynamic response. The choice of 3rd order gamma was based on the expectation that time-to peak and shape of HRFs of seizure could vary across voxels (David et al. 2008)." Please explain the technical terms.

      Reference for SPM 12 manual was given for technical terms referring to analysis software, and HRF was defined.

      "BAMS rat connectome" - Please explain the technical terms.

      Modified as:

      “…connection matrix of the rat nervous system (BAMS rat connectome, Bota, Dong, and Swanson 2012).”

      Results

      After removing problematic animals and sessions, was there sufficient power? There probably wasn't enough to determine sex differences.

      After removing problematic sessions, we found statistically significant results (multiple comparison corrected) results in both activation maps, and hemodynamic responses. To determine sex differences, there were not enough animals for statistical findings (p>0.05).

      Figure 2 - I don't understand "tSNR" here. What is the point here?

      B vs C. Are these different brain areas or the same but SNR was adjusted?

      D. Where is FD explained? I think explaining what the parts of the figure show would be helpful.

      tSNR, the temporal signal-to-noise ratio, demonstrates the behavior of noise through time. Readers who are planning to mimic the used awake fMRI protocol together with the single loop coil, might be interested on data quality aspect, and ability for the coil to capture signal from noise, as it is one of the most important factors in fMRI designs where small signal changes have to be distinguished from the background noise.

      B and C illustrate the same brain area, but B was acquired with high resolution anatomical scanning (T1 FLASH), and C was acquired with low resolution ZTE scanning. We clarified the figure legend to the following:

      “…spatial signal-to-noise ratios of an illustrative high resolution anatomical T1-FLASH (B), and low resolution ZTE image (C)

      FD was explained in section 2.5.1. Some parts of the explanation were clarified: “Framewise displacement (FD) (Figure 2E) was calculated as follows. First, the differential of successive motion parameters (x, y, z translation, roll, pitch, yaw rotation) was calculated. Then absolute value was taken from each parameter and rotational parameters were divided by 5 mm (as estimate of the rat brain radius) to convert degrees to millimeters (Power et al. 2012). Lastly, all the parameters were summed together.”

      Table 1 has no statistical comparisons.

      Table 1 is purely an illustration of stimulation and seizure occurrence. There is no specific interest to compare stimulation types (in what state of seizure it occurred) as it does not provide any meaningful inferences to the study.

      Statistical activation maps - it is not clear how this was done.

      Creation of statistical maps are explained in section 2.5.3.

      Line 384-5: "In addition, some responses were observed in the somatosensory cortex during a seizure state, probably due to incomplete nuisance removal of the effect of the seizure itself by the linear model used." I don't see why the authors would not suggest that the result is logical given that stimuli should activate the somatosensory cortex.

      Sentence was modified as the following:

      “In addition, responses were observed in the somatosensory cortex during a seizure state”

      Fig 3 "F-contrast maps." Please explain.

      Creation of statistical maps are explained in section 2.5.3.

      HRF- please define. The ROI selection is unclear - it "was based on statistical differences seen in activation maps." But how were ROIs drawn? Also, why were HRFs examined at the end of seizures?

      HRF was defined, and definitions of HRF and ROI were moved from results section 3.3. to method section 2.5.3.

      Definition of ROI was clarified:

      “Anatomical ROIs, based on Paxinos atlas (Paxinos and Watson rat brain atlas 7th edition), were drawn on the brain areas where statistical differences were seen in activation maps.”

      HRFs were estimated additionally at the end of seizure as it was specifically interesting to study brain state shifts from ictal to interictal. This shift was also providing us statistically significant findings in means that brain responses differed from ictal stimulation.

      Line 421: "Interestingly, the response amplitude was higher when the stimulation ended a seizure compared to when it did not" Why is this interesting?

      Word “interestingly” was changed to “additionally” to avoid any inferences in the results section.

      Line 427: "Notably, HRFs amplitudes were both negatively and positively signed during the ictal 427 state, depending on the brain region." Why is this notable?

      Word “notably” was removed to avoid any inferences in the results section.

      Please explain the legends of Figures 4 and 6 more clearly.

      Figure 4, and figure 4 – figure supplement 1, legends were clarified:

      “HRFs was calculated in selected ROI, belonging to visual or somatosensory area, by multiplying gamma basis functions (Figure 1–figure supplement 1, B) with their corresponding average beta values over a ROI and taking a sum of these values.”

      Using the comments above as a guide, please revise the Discussion to be more precise and more clear about what was shown and what can be concluded in light of limitations. Please ensure the literature is cited where appropriate.

      Some parts of the discussion and conclusion sections were modified.

      Reviewer #3 (Recommendations For The Authors):

      Minor comments:

      Formatting: fMRI maps in Figures 3 and 5 should be more clearly labeled, indicating anterior and posterior directions on all images, and the cross sections should be enlarged to enable anatomical areas to be more clearly differentiated.

      Anterior and posterior directions were added, and cross sections were enlarged.

      The Methods section 2.41 and other places in the text, and Figure 2 - Figure Supplement 1 say that there was less artifact on the EEG with ZTA than with GE-EPI. However the EEG shown in Figure 2 - Figure Supplement 1 Part C shows much more artifact in the left (ZTE) trace than the right (GE-EPI) trace. This apparent contradiction should be resolved.

      The figure was actually demonstrating the relative change to the signal when MRI sequences were on, and by this standard, the ZTE produced both less amplitude and frequency changes than EPI. In the example figure, the baseline fluctuations in the EEG trace in the left were higher in amplitude than in the right, and this could potentially lead to misconception of ZTE producing more noise. Figure legend was clarified to highlight relative change:

      “ZTE also caused relatively less artificial noise on EEG signal, keeping both amplitude of the signal and frequencies relatively more intact, which improved live detection of absence seizures.”

      Figure 2 - Supplement 1, part B horizontal axis should provide units.

      Units were added.

      Figure 2 - Supplement 1, legend last sentence says arrows mark the beginning of each "sequence." Is this a typo and should this instead say "each seizure"?

      Should state “each fMRI sequence” which was corrected.

      Line 307, Methods "to reveal brain areas where ictal stimulation provided higher amplitude response than interictal" - should this be reversed, ie weren't the authors analyzing a contrast to determine where interictal signals were higher than ictal signals?

      This should be reversed, and was corrected, thank you for noting this.

      Figure 6 - Figure Supplement 1, the scales are very different for many of the plots so they are hard to compare. Especially in the ictal periods (D, E, F) it is hard to see if any changes are happening during ictal stimulation similar to interictal stimulation due to very different scales. The activity related to SWD is so large that it overshadows the rest and perhaps should be subtracted out.

      We point out that Figure 6 - Figure Supplement 1 reproduces with a higher level of detail the results shown of Figure 6 from the main text, where all signals are plotted in the same scale. The difference between scales used in this figure is intended, and its purpose is to show and highlight the large differences observed on the ongoing activity and the evoked response between the two states (ictal and interictal). In interictal periods the ongoing activity is characterized by fluctuations around a baseline level whose variance is highly affected by the application of the stimulus. On the contrary, ictal periods are characterized by large oscillations, with periods of high and synchronized activity followed by periods of nearly no activity, where the effect of the stimulus on the dynamics is overshadowed by the ongoing dynamics (both from local and from afferent nodes) as the referee mentions, and which imposes a strong limit to the responsiveness of the system and the propagation of the signal.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This valuable contribution studies factors that impact molecular exchange between dense and dilute phases of biomolecular condensates through continuum models and coarse-grained simulations. The authors provide solid evidence that interfacial resistance can cause molecules to bounce off the interface and limit mixing. Results like these can inform how experimental results in the field of biological condensates are interpreted.

      We would like to sincerely thank the editors for spending time on our manuscript and for the very positive assessment of our work. We have carefully considered and addressed the reviewers’ comments in the point-by-point response below and have revised our manuscript accordingly.

      Reviewer #1 (Public Review):

      Summary:

      In this paper by Zhang, the authors build a physical framework to probe the mechanisms that underlie the exchange of molecules between coexisting dense and dilute liquid-like phases of condensates. They first propose a continuum model, in the context of a FRAP-like experiment where the fluorescently labeled molecules inside the condensate are bleached at t=0 and the recovery of fluorescence is measured. Through this model, they identify how the key timescales of internal molecular mixing, replenishment from dilute phase, and interface transfer contribute to molecular exchange timescale. Motivated by a recent experiment reported by some of the co-authors previously (Brangwynne et al. in 2019) finding strong interfacial resistance in in-vitro protein droplets of LAF-1, they seek to understand the microscopic features contributing to the interfacial conductance (inversely proportional to the resistance). To check, they perform coarse-grained MD simulations of sticker-spacer self-associative polymers and report how conductance varies significantly even across the few explored sequences. Further, by looking at individual trajectories, they postulate that "bouncing" - i.e., molecules that approach the interface but are not successfully absorbed - is a strong contributor to this mass transfer limitation. Consistent with their predictions, sequences that have more free unbound stickers (i.e., for example through imbalance sequence sticker stoichiometries) have higher conductances and they show a simple linear scaling between the number of unbound stickers and conductance. Finally, they predict a droplet-size-dependent transition in recovery time behavior.

      Strengths:

      (1) This paper is well-written overall and clear to understand.

      (2) By combining coarse-grained simulations, continuum modeling, and comparison to published data, the authors provide a solid picture of how their proposed framework relates to molecular exchange mechanisms that are dominated by interface resistance and LAF-1 droplets.

      (3) The choice of different ways to estimate conductance from simulation and reported data are thoughtful and convincing in their near agreement (although a little discussion of why and when they differ would be merited as well).

      We would like to thank the reviewer for the positive evaluation of our work. Indeed, we are grateful to the reviewer for this thoughtful, detailed, and constructive report, which has helped us strengthen the manuscript.

      Weaknesses:

      (1) Almost the entirety of this paper is motivated by a previously reported FRAP experiment on a particular LAF-1 droplet in vitro. There are a few major concerns I have with how the original data is used, how these results may generalize, and the lack of connection of predictions with any other experiments (published or new).

      a. The mean values of cdense, cdilute, diffusivities, etc. are taken from Taylor et al. to rule in the importance of interfacial mass transfer limits. While this may be true, the values originally inferred (in the 2019 paper that this paper is strongly built off) report extremely large confidence intervals/inferred standard errors. The authors should accordingly report all their inferences with correct standardized errors or confidence intervals, which in turn, allow us to better understand these data.

      Yes, agreed. We have now included the standard errors of the parameters from Taylor et al. (2019), and reported the corresponding standard errors for the timescales and interface conductance using error propagation. We have modified Fig. 1C right panel as well as the text in the figure caption:

      “(Right) Expected recovery times and if the slowest recovery process was either the flux from the dilute phase or diffusion within the droplet, respectively, with and taken from Taylor et al. (2019). While the timescale associated with interface resistance is unknown, the measured recovery time is much longer than and , suggesting the recovery is limited by flux through the interface, with an interface conductance of  (Below Figure 1)”

      b. The generalizability of this model is hard to gauge when all comparisons are made to a single experiment reported in a previous paper.

      i. Conceptually, the model is limited to single-component sticker-spacer polymers undergoing phase separation which is already a very simplified model of condensates - for e.g., LAF1 droplets in the cell have no perceptible interfacial mass limitations, also reported in Taylor et al. 2019 - so how these mechanisms relate to living systems as opposed to specific biochemistry experiments. So the authors need to discuss the implications and limitations of their model in the living context where there are multiple species, finite-size effects, and active processes at play.

      We thank the reviewer for the critical comment. To address this point, we have included a paragraph in the Discussion regarding in vivo situations:

      “In this work, we focused on the exchange dynamics of in vitro single-component condensates. How is the picture modified for condensates inside cells? It has been shown that Ddx4-YFP droplets in the cell nucleus exhibit negligible interface resistance Taylor et al. (2019), which raises the question whether interface resistance is relevant to natural condensates in vivo. Future quantitative FRAP and single-molecule tracking experiments on different types of droplets in the cell will address this question. One complication is that condensates in cells are almost always multi-component, which can increase the complexity of the exchange dynamics. Interestingly, formation of multiple layers or the presence of excess molecules of one species coating the droplet is likely to increase interface resistance. A notable example is the Pickering effect, in which adsorbed particles partially cover the interface, thereby reducing the accessible area and the overall condensate surface tension, slowing down the exchange dynamics Folkmann et al. (2021). The development of theory and modeling for the exchange dynamics of multi-component condensates is currently underway. (Lines 323-334)”

      ii. Second, can the authors connect their model to make predictions of the impact of perturbations to LAF-1 on exchange timescales? For example, are mutants (which change the number or positioning of "stickers") expected to show particular trends in conductances or FRAP timescales? Since LAF-1 is a relatively well-studied protein in vitro, can the authors further contrast their expectations with already published datasets that explore these perturbations, even if they don't generate new data?

      Our model is intended to address interface exchange dynamics at the conceptual level. The underlying mechanism for the large interface resistance of LAF-1 droplets could be more complicated than explored in our work. To study the impact of perturbations to LAF-1 on exchange timescales likely requires substantially more sophisticated molecular dynamics simulations. We undertook an extensive search for FRAP experiments on LAF-1 droplets where the whole droplet is photobleached, but were not able to find another dataset. We would be grateful if the reviewer is aware of such data and can point us to it.

      iii. A key prediction of the interface limitation model is the size-dependent crossover in FRAP dynamics. Can the authors reanalyze published data on LAF-1 (albeit of different-size droplets) to check their predictions? At the least, is the crossover radius within experimentally testable limits?

      Based on our prediction, the crossover radius for LAF-1 droplet is around 70 𝜇m. We have added a sentence in the text to point this out:

      “We also predict the crossover for LAF-1 droplets to be around 𝑅 = 71 𝜇m, which in principle can be tested experimentally. (Lines 285-286)”

      Unfortunately, most of FRAP experiments in Taylor at al. (2019) are partial FRAP experiments, in which only part of the dense phase is photobleached. The recovery time for such experiments reflects primarily the internal mixing speed of the dense phase rather than the exchange dynamics at the interface or transport from the dilute phase.

      c. The authors nicely relate the exchange timescale to various model parameters. Is LAF-1 the only protein for which the various dilute/dense concentrations/diffusivities are known? Given the large number of FRAP and other related studies, can the authors report on a few other model condensate protein systems? This will help broaden the reach of this model in the context of other previously reported data. If such data are lacking, a discussion of this would be important.

      Yes, indeed, we have found numerous publications with FRAP experiments performed on whole droplets of various proteins. However, none of these have provided a complete set of parameters to allow a quantitative analysis. Part of the reason is because it is nontrivial to have an accurate measurement of the partition coefficient (cden/cdil). We have added a sentence in the Discussion to promote future quantitative experiment and analysis of condensate exchange dynamics:

      “We hope that our study will motivate further experimental investigations into the anomalous exchange dynamics of LAF-1 droplets and potentially other condensates, and the mechanisms underlying interface resistance. (Lines 320-322)”

      To broaden the audience for this work in the hope of stimulating such studies, we have also modified the title and abstract so that it will be more visible to the FRAP community:

      “The exchange dynamics of biomolecular condensates (Line 1)”

      “A hallmark of biomolecular condensates formed via liquid-liquid phase separation is that they dynamically exchange material with their surroundings, and this process can be crucial to condensate function. Intuitively, the rate of exchange can be limited by the flux from the dilute phase or by the mixing speed in the dense phase. Surprisingly, a recent experiment suggests that exchange can also be limited by the dynamics at the droplet interface, implying the existence of an “interface resistance”. Here, we first derive an analytical expression for the timescale of condensate material exchange, which clearly conveys the physical factors controlling exchange dynamics. We then utilize sticker-spacer polymer models to show that interface resistance can arise when incident molecules transiently touch the interface without entering the dense phase, i.e., the molecules “bounce” from the interface. Our work provides insight into condensate exchange dynamics, with implications for both natural and synthetic systems. (Lines 16-26)”

      (2) The reported sticker-spacer simulations, while interesting, represent a very small portion of the parameter space. Can the authors - through a combination of simulation, analyses, or physical reasoning, comment on how the features of their underlying microscopic model (sequence length, implicit linker length, relative stoichiometry of A/B for a given length, overall concentration, sequence pattern properties like correlation length) connect to conductance? This will provide more compelling evidence relating their studies beyond the cursory examination of handpicked sequences. A more verbose description of some of the methods would be appreciated as well, including specifically how to (a) calculate the bond lifetime of isolated A-B pair, and (b) how equilibration/convergence of MD simulations is established.

      In our simulation, the interface conductance is essentially controlled by the fraction of unbound stickers, the encounter rate of a pair of unbound stickers, the dilute- and dense-phase concentrations, and the width of the interface. As a result, weaker binding strength and/or deviation of A:B stoichiometry from 1:1 result in a higher interface conductance. A6B6 polymers with long blocks of stickers of the same type (compared to (A2B2)3 and (A3B3)2) have a lower dilute-phase concentration and thinner interface width, so lower conductance. Sequence length and implicit linker length can have more complex effects, which are beyond the scope of the current study. We have now provided an explicit expression for 𝜅 in Equation (14) and added a discussion sentence in the text:

      “More generally, we find that the interface conductance of the sticker-spacer polymers is controlled by the encounter rate of a pair of unbound stickers and the availability of these stickers, which in turn depends on the sticker-sticker binding strength, the dilute- and dense-phase polymer concentrations, and the width of the interface:

      where 𝓃 is the number of monomers in a polymer,  is the global stoichiometry (i.e., ), and are the fractions of unbound A/B monomers in the dilute and dense phases. (Lines 208-214)”

      We have also added a few sentences in Appendix 2 to describe how we calculate the bond lifetime of an isolated A-B pair and how equilibration in simulations is established.

      “Briefly, the bond lifetime of an isolated pair is obtained by simulating a bound pair of A-B stickers in a box and recording the time when they first separate by the cutoff distance of the attractive interaction nm. The mean bond lifetime 𝜏 is found by averaging results of 1000 replicates with different random seeds. (Lines 642-645)”

      “To test if the system has reached equilibrium, we compare the dense- and dilute-phase concentrations derived from the first and second halves of the recorded data. The agreement indicates that the system has reached equilibrium. (Lines 586-589)”

      (3) A lot of the main text repeats previously published models (continuum ones in Taylor et al. 2019 and Hubsatch et al., 2021, amongst others) and the idea of interface resistance being limiting was already explored quantitatively in Taylor 2019 (including approximate estimates of mass transfer limitations) - this is fine in context. While the authors do a good job of referring to past work in context, the main results of this paper, in my reading, are:

      - a simplified physical form relating conductance timescales.

      - sticker-spacer simulations probing microscopic origins.

      - analysis of size-dependent FRAP scaling.

      I am stating this not as a major weakness, but, rather - I would recommend summarizing and categorizing the sections to make the distinctions between previously reported work and current advances sufficiently clear.

      We thank the reviewer for a clear summary of the contributions of our work. We have highlighted our main contributions in multiple places:

      “Here, we first derive an analytical expression for the timescale of condensate material exchange, which clearly conveys the physical factors controlling exchange dynamics. We then utilize sticker-spacer polymer models to show that interface resistance can arise when incident molecules transiently touch the interface without entering the dense phase, i.e., the molecules “bounce” from the interface. (Lines 21-25)”

      “In the following, we first derive an analytical expression for the timescale of condensate material exchange, which conveys a clear physical picture of what controls this timescale. We then utilize a “sticker-spacer” polymer model to investigate the mechanism of interface resistance. We find that a large interface resistance can occur when molecules bounce off the interface rather than being directly absorbed. We finally discuss characteristic features of the FRAP recovery pattern of droplets when the exchange dynamics is limited by different factors. (Lines 65-70)”

      “Specifically, we first derived an analytical expression for the exchange rate, which conveys the clear physical picture that this rate can be limited by the flux of molecules from the dilute phase, by the speed of mixing inside the dense phase, or by the dynamics of molecules at the droplet interface. Motivated by recent FRAP measurements Taylor et al. (2019) that the exchange rate of LAF-1 droplets can be limited by interface resistance, which contradicts predictions of conventional mean-field theory, we investigated possible physical mechanisms underlying interface resistance using a “sticker-spacer” model. Specifically, we demonstrated via simulations a notable example in which incident molecules have formed all possible internal bonds, and thus bounce from the interface, giving rise to a large interface resistance. Finally, we discussed the signatures in FRAP recovery patterns of the presence of a large interface resistance. (Lines 291-300)”

      Reviewer #2 (Public Review):

      Summary:

      In this paper, the authors have obtained an analytical expression that provides intuition about regimes of interfacial resistance that depend on droplet size. Additionally, through simulations, the authors provide microscopic insight into the arrangement of sticky and non-sticky functional groups at the interface. The authors introduce bouncing dynamics for rationalizing quantity recovery timescales.

      I found several sections that felt incomplete or needed revision and additional data to support the central claim and make the paper self-contained and coherent.

      We thank the reviewer for spending time on our manuscript and for the helpful critical comments.

      First, the analytical theory operates with diffusion coefficients for dilute and dense phases. For the dilute phase, this is fine. For the dense phase, I have doubts that dynamics can be described as diffusive. Most likely, dynamics is highly subdiffusive due to crowded, entangled, and viscoelastic environments of densely packed interactive biomolecules. Some explanation and justification are in order here.

      The reviewer is correct in noting that molecules within a condensate can move subdiffusively due to the viscoelastic nature of the condensate. However, subdiffusion only occurs at short time and small length scales, the motion of molecules becomes diffusive at longer time and larger length scales. The crossover time here is the terminal relaxation time measured to be on the order of milliseconds to seconds for typical condensates (see Alshareedah, Ibraheem, et al. "Determinants of viscoelasticity and flow activation energy in biomolecular condensates" Science Advances 10.7, 2024). We previously have also found that, for sticker-spacer polymers, this relaxation time is determined by the time it takes for a sticker to switch to a new partner (see Ronceray et al. (2022) in References), which is therefore largely determined by the bond lifetime of a sticker pair. The crossover length scale is expected to be comparable to the size of a molecule based on the theory of polymer disentanglement. Importantly, in order for the bleached droplet to recover its fluorescence, the bleached molecules must travel for a much longer time and a much larger length than the crossover time and length. It is therefore expected that the molecules move diffusively on the relevant timescale of a FRAP experiment, albeit with a diffusion coefficient that reflects crowding and entanglement on short time and length scales.

      The second major issue is that I did not find a clean comparison of simulations with the derived analytical expression. Simulations test various microscopic properties on the value of k, which is important. But how do we know that it is the same quantity that appears in the expressions? Also, how can we be sure that analytical expressions can guide simulations and experiments as claimed? The authors should provide sound evidence of the predictive aspect of their derived expressions.

      We thank the reviewer for raising this critical issue. We agree with the reviewer that we did not perform an explicit simulation to validate the developed theory, which leaves a gap between our theory and simulations. The main reason is because simulation of an in silico “FRAP experiment” on a 3D droplet is very computationally costly. Nevertheless, following the reviewer’s suggestion, we have now performed such a simulation in which we “bleached” a small A6B6 droplet and measured its recovery time. The good agreement between simulation and theory helps validate our overall combined computational and analytical approach. We have incorporated the new simulation and results into the manuscript. Two new sections including new figures (Figure 4 and Appendix 2 Figure 4) are added: “Direct simulation of droplet FRAP” in the main text (lines 232-261) and “Details of simulation and theory of FRAP recovery of an A6B6 droplet” in Appendix 2 (lines 665-715).

      Are the plots in Figure 4 coming from experiment, theory, and simulation? I could not find any information either in the text or in the caption.

      Figure 4 (now Figure 5) is from theory which uses parameters of the A6B6 system in simulation. We have added the following sentences to clarify:

      “We compare the measured FRAP recovery time for the small droplet (green circle) to theoretical predictions from Equation (6) (gray) and Equations (1) - (4) (black) in Figure 5A. (Lines 255-257)”

      “Figure 5. FRAP recovery patterns for large versus small droplets can be notably different for condensates with a sufficiently large interface resistance. (A) Expected relaxation time as a function of droplet radius for in silico “FRAP experiments” on the A6B6 system. The interface resistance dominates recovery times for smaller droplets, whereas dense-phase diffusion dominates recovery times for larger droplets. Green circle: FRAP recovery time obtained from direct simulation of an A6B6 droplet of radius 37 nm. Black curve: the recovery time as a function of droplet radius from a single exponential fit of the exact solution of the recovery curve from Equations (1) - (4). Gray curve: the recovery time predicted by Equation (6). Yellow, blue, and red curves: the recovery time when dense-phase, dilute-phase, and interface flux limit the exchange dynamics, i.e., the first, second, and last term in Equation (6), respectively. Parameters matched to the simulated A6B6 system in the slab geometry: (B) Time courses of fluorescence profiles for A6B6 droplets of radius  (top) and  (bottom); red is fully bleached, green is fully recovered. These concentration profiles are the numerical solutions of Equations (1) - (3) using the parameters in (A). (Below Figure 5)”

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      The goal of Knudsen-Palmer et al. was to define a biological set of rules that dictate the differential RNAi-mediated silencing of distinct target genes, motivated by facilitating the long-term development of effective RNAi-based drugs/therapeutics. To achieve this, the authors use a combination of computational modeling and RNAi function assays to reveal several criteria for effective RNAi-mediated silencing. This work provides insights into how (1) cis-regulatory elements influence the RNAi-mediated regulation of genes; (2) it is determined that genes can "recover" from RNAi-silencing signals in an animal; and 3) pUGylation occurs exclusively downstream of the dsRNA trigger sequence, suggesting 3º siRNAs are not produced. In addition, the authors show that the speed at which RNAi-silencing is triggered does not correlate with the longevity of the silencing. These insights are significant because they suggest that if we understand the rules by which RNAi pathways effectively silence genes with different transcription/processing levels then we can design more effective synthetic RNAi-based

      therapeutics targeting endogenous genes. The conclusions of this study are mostly supported by the data, but there are some aspects that need to be clarified.

      We thank the reviewer for their kind words and for appreciating the practical utility of our approach and discoveries. 

      (1) The methods do not describe the "aged RNAi plates feeding assay" in Figure 2E. The figure legend states that "aged RNAi plates" were used to trigger weaker RNAi, but the detail explaining the experiment is insufficient. How aged is aged? If the goal was to effectively reduce the dsRNA load available to the animals, why not quantitatively titrate the dsRNA provided? Were worms previously fed on the plates, or was simply a lawn of bacteria grown until presumably the IPTG on the plate was exhausted?

      We have elaborated our methods section to describe that the plates were left at 4ºC for about 4 months before adding bacteria and performing the assay, with one possible reason for the weaker knockdown being that perhaps the IPTG in the RNAi plates is less effective. However, it is worth noting that the robustness of a feeding RNAi assay can vary from culture to culture and/or batch of plates. We therefore always perform RNAi assays with wild-type animals alongside test strains to gauge the strength of the RNAi assay for a given culture and batch of plates. We called the data in Figure 2E “weak” because of the response of wild-type animals was weak as evidenced by weak twitching in levamisole. Despite this reduced effect, we observed 100% penetrance in wild-type animals, enabling us to sensitively detect the reduced responses of the mutants. 

      (2) Is the data presented in Figure 2F completed using the "aged RNAi plates" to achieve the partial silencing of dpy-7 observed? Clarification of this point would be helpful.

      No. The only occasion when plates were older was as in response to comment 1 above.

      (3) Throughout the manuscript the authors refer to "non-dividing cells" when discussing animals' ability to recover from RNA silencing. It is not clear what the authors specifically mean with the phrase "non-dividing cells", but as this is referred to in one of their major findings, it should be clarified. Do they mean the cells are somatic cells in aged animals, thus if they are "non-dividing" the siRNA pools within the cells cannot be diluted by cell division? Based on the methods, the animals of RNAi assays were L4/Young adults that were scored over 8 days after the initial pulse of dsRNA feeding. If this is the case, wouldn't these animals be growing into gravid adults after the feeding, and thus have dividing cells as they grew?

      We thank the reviewer for highlighting the need to explain this point further. Our experiment test the silencing of the unc-22 gene, which is expressed and functions in body-wall muscle cells. Most of the body wall muscles in C. elegans are developed by the L1 stage (reviewed in Krause and Liu, 2012), and they do not divide between the L4 and adult stages. Therefore, during the duration of the experiment where we delivered a pulse of dsRNA and examined responses over days, none of these cells divide. We have added a statement in the main text to explicitly say that the recovery from silencing by dsRNA that we observed cannot be explained by dilution during cell divisions.

      (4) What are the typical expression levels/turnover of unc-22 and bli-1? Based on the results from the altered cis-regulatory regions of bli-1 and unc-22 in Figure 5, it seems like the transcription/turnover rates of each of these genes could also be used as a proof of principle for testing the model proposed in Figure 4. The strength of the model would be further increased if the RNAi sensitivity of unc-22 reflects differences in its transcription/turnover rates compared to bli-1.

      We can get a sense of the relative abundances of unc-22 and bli-1 across development from the RNA-seq experiments that have been performed by others in the field (see below). However, these data cannot be used to infer either the production or the turnover rates. Future experiments that measure production (the combined rate of transcriptional run-on, splicing, export from the nucleus, etc.) will be required to define the production rates. Similarly, assays that detect the rate of degradation of transcripts without confounding presence from continued production will be needed to establish turnover rates. Future efforts to obtain values for these in vivo rates for multiple genes will help further test the model.

      Author response image 1.

      Expression data for unc-22:

      Author response image 2.

      Expression data for bli-1:

      Reviewer #2 (Public Review):

      Summary:

      This manuscript by Knudsen-Palmer et al. describes and models the contribution of MUT-16 and RDE-10 in the silencing through RNAi by the Argonaute protein NRDE-3 or others. The authors show that MUT-16 and RDE-10 constitute an intersecting network that can be redundant or not depending on the gene being targeted by RNAi. In addition, the authors provide evidence that increasing dsRNA processing can compensate for NRDE-3 mutants. Overall, the authors provide convincing evidence to understand the factors involved in RNAi in C. elegans by using a genetic approach.

      Major Strengths:

      The author's work presents a compelling case for understanding the intricacies of RNA interference (RNAi) within the model organism Caenorhabditis elegans through a meticulous genetic approach. By harnessing genetic manipulation, they delve into the role of MUT-16 and RDE-10 in RNAi, offering a nuanced understanding of the molecular mechanisms at play in two independent case study targets (unc-22 and bli-1).

      We thank the reviewer for their kind words and for appreciating our genetic analysis.

      Major Weaknesses:

      (1) It is unclear how the molecular mechanisms of amplification are different under the MUT-16 and RDE-10 branches of the regulatory pathway, since they are clearly distinct proteins structurally. It would be interesting to do some small-RNA-seq of products generated from unc-22 and bli-1, on wild-type conditions and some of the mutants studied (eg. mut-16, rde-10 and mut16 + rde-10). That would provide some insights into whether the products of the 2 amplifications are the same in all conditions, just changing in abundance, or whether they are distinct in sequence patterns.

      As we highlight in the paper, MUT-16 and RDE-10 are indeed very different proteins. One possible hypothesis suggested by this difference is that different kinds of small RNAs are made when the underlying mechanism relies on MUT-16 versus on RDE-10. However, postulating such a difference is not necessary for explaining the data. Furthermore, since the amounts of 2º siRNAs do not have to be correlated with the strength of silencing (Figure 4E), this work raises caution against the over-reliance on small RNA sequencing for inferring gene silencing. Nevertheless, it is indeed an attractive possibility that the amounts of small RNA, their distributions along mRNA sequence, and/or the sequence biases of the accumulating small RNAs could be different when relying on MUT-16- or RDE-10-dependent mechanisms. Future work that directly examine the small RNAs that accumulate in different mutant strains after initiating RNAi can shed light on these possibilities.

      (2) In the same line, Figure 5 aims to provide insights into the sequence determinants that influence the RNAi of bli-1. It is unclear whether the changes in transcript stability dictated by the 3'UTR are the sole factor governing the preference for the MUT-16 and RDE-10 branches of the regulatory pathway. In line with the mutant jam297, it might be interesting to test whether factors like codon optimality, splicing, ... of the ORF region upstream from bli-1-dsRNA can affect its sensitivity to the MUT-16 and RDE-10 branches of the regulatory pathway.

      In Figure 5, we eliminated the possibility that any gene that is transcribed using the bli-1 promoter would require NRDE-3, and showed using jam297 that modifications to the 3’ cis regulatory regions of a target can alter the dependence on NRDE-3 for knockdown. We agree that future experiments that control individual aspects of bli-1, potentially one feature at a time, can reveal the separate contributions of each characteristic of the gene to the observed dependence on NRDE-3 of the wild-type bli-1 gene. However, given the many ways that the same level of transcript knockdown can be achieved in our modeling (Figure 4 and its supplemental figures) we expect that multiple characteristics could contribute to NRDE-3 dependence. 

      Recommendations For The Authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) On page 5, the authors state that "MUT-16 and RDE-10 are redundantly or additively required for silencing unc-22"; however, based on their data in Figure 1D, it seems nearly 100% silencing of unc-22 is achieved in single mut-16 or rde-10 mutants. If this is the case, wouldn't it suggest that redundancy of MUT-16 and RDE-10, and not an "additive effect" of MUT-16 and RDE-10 function? Although, as the mutator complex nucleates around MUT-16, the data in Figure 1D suggests it is possible that the presence of MUT-16 or RDE-10 is sufficient for the recruitment of one or more factors that triggers the silencing of unc-22, and thus only one of these factors is necessary.

      Because we are seeing 100% silencing in wild-type, mut-16(-), or rde-10(-) animals in Figure 1D, this assay (where the silencing response is strong) does not allow us to discriminate between differing levels of silencing. The “weak” RNAi assay in Figure 2E provides the opportunity to observe differences in the contributions made by MUT-16 or RDE-10, supporting the idea that the 2º siRNAs and relative contributions to silencing can indeed be additive, explaining the complete loss of silencing only in the double mutant. While MUT-16 has been shown to be required for the recruitment of other Mutators in the germline, Mutator foci are not detectable in the soma. Given that unc-22 and bli-1 are somatic targets, we are hesitant to assume a mechanism for the production of small RNAs that requires a similar MUT-16-dependent nucleation in somatic cells. MUT-16 is clearly required for full silencing. But, if it functions similarly in the soma and the germline remains an open question. Indeed the mechanism(s) for producing small RNAs in somatic cells could be different from that used for production of small RNAs in the germline because of known differences in the use of RNA-dependent RNA polymerases (e.g. Ravikumar et al., Nucleic Acids Res. 2019). Future studies that determine the subcellular localization(s) and potential biochemical function(s) of RDE-10 and MUT-16 in somatic cells are needed to further delineate mechanisms.

      (2) On page 10, "rather than one that looks a frequency" - the "a" should be "at".

      We thank the reviewer and have fixed this typo. 

      (3) Figure 4 is very crowded, further dividing 4A (right) and 4B into subpanels would help the readability of the figure.

      We thank the reviewer for identifying these figures as being particularly crowded. These panels are presented as single units because the left and right portions of each panel are intimately connected. In Fig. 4A, the outline of mechanism deduced on the left is based on experiments at various scales shown on the right. We have now clarified this in the figure legend. In Fig. 4B, the equations on the right define and use the constants depicted on the left and the definitions below apply to both parts. We have now adjusted both figure parts to make these connections clearer. 

      (4) References to the subpanels of Figure 4 in the text on page 12 are off from the figure and figure legend.

      For example:

      "Overall, τkd and tkd were uncorrelated..." refers to 4C when it should refer to 4D. "However, the maximal amount of 2ºsiRNAs..." refers to 4D when it should refer to 4E. "Additionally, an increase in transcription..." refers to 4E when it should refer to 4F.

      "When a fixed amount of dsRNA was exposed..." refers to 4F when it should refer to 4G.

      We thank the reviewer for catching these errors and we have corrected these figure references.

      Reviewer #2 (Recommendations For The Authors):

      I would encourage the authors to follow up on some of the more mechanistic comments made above, that would strengthen and complement the genetic part of the work presented.

      We agree that additional work is needed to elucidate differences in molecular mechanisms for amplifying small RNAs in an MUT-16-dependent vs. RDE-10-dependent manner. We hope to address these extensions of our work in future manuscripts that focus on the biochemistry of these proteins and the populations of small RNAs generated using them.

      I appreciate the efforts to computationally model the dynamics of the system, but I am not sure that it helps that the mathematical modelling treats both branches of the pathway as functionally equals, since they could have some mechanistic specialisation that is not yet elucidated by the current work.

      Our assumption that both branches are equivalent is the most parsimonious. If we allowed for differences, even more values for the parameters of the model will agree with experimental data. The strength of the model is that despite such conservative assumptions, it agrees with experimental data. Biochemical elaborations that make the MUT-16 and RDE-10 branches qualitatively different could exist in vivo as suggested by the reviewer. Even with such qualitative differences in detail, the overall impact on gene silencing is a quantitative and additive one as demonstrated by our experiments. Future experimental work focused on biochemistry could elucidate how a Maelstrom domain-containing protein (RDE-10) and an intrinsically disordered protein (MUT-16) act differently to ultimately promote small RNA production.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      The authors tried to identify the relationships among the gut microbiota, lipid metabolites, and the host in type 2 diabetes (T2DM) by using macaques that spontaneously develop T2DM, considered one of the best models of the human disease.

      Strengths:

      The authors comprehensively compared the gut microbiota and plasma fatty acids between macaques with spontaneous T2DM and control macaques and verified the results with macaques on a high-fat diet-fed mice model.

      Weaknesses:

      Comment 1: The observed multi-omics of the macaques can be done on humans, which weakens the impact of the conclusion of the manuscript.

      We fully acknowledge the critical role of human studies in T2DM research. In our study, the spontaneous T2DM macaque model provided a unique window to address inherent challenges in human studies, including medication interference and environmental heterogeneity. Human studies have struggled to standardize confounding factors such as diet, exercise, and antibiotic use. Moreover, most human T2DM patients receive long-term glucose-lowering medications (e.g., metformin), which directly alter gut microbiota composition and function, masking disease-associated microbial signatures (Sun et al., 2018; Petakh et al., 2023). In contrast, the spontaneous T2DM macaques, untreated with glucose-lowering drugs or antibiotics under strictly controlled conditions, revealed microbiota dysbiosis driven purely by disease progression. Our work bridged the gap between rodent studies and human clinical trials, providing an important clinical reference for guiding targeted interventions, particularly microbiota modulation. We sincerely appreciate the valuable comments. We have added background to the part of the introduction, “In fact, T2DM macaques avoid medication interference and environmental heterogeneity under controlled experimental conditions, and share key pathological features with humans, such as amyloidosis of pancreatic islets, which is absent in mouse models (25, 26), suggesting that T2DM macaques are the optimal animal model for simulating human T2DM and its complications (27).” (Lines 98-103).

      References:

      Sun L., Xie C., Wang G., Wu Y., Wu Q., Wang X., Liu J., Deng Y., Xia J., et al. 2018) Gut microbiota and intestinal FXR mediate the clinical benefits of metformin Nat. Med 24:1919-1929 https://doi.org/10.1038/s41591-018-0222-4

      Petakh P., Kamyshna I., Kamyshnyi A 2023) Effects of metformin on the gut microbiota: A systematic review Mol. metab 77:101805-101805 https://doi.org/10.1016/j.molmet.2023.101805

      Comment 2: In addition, the age and sex of the control macaque group did not necessarily match those of the T2DM group, leaving the possibility for compromising the analysis.

      Thank you for pointing this out. The availability of spontaneous T2DM macaques is very limited. Wang et al. (2018) identified only nine diabetic macaques among 2,000 screened, and our prior study (Jiang et al., 2022) found merely seven diabetic cases in 1,408 macaques. In this work, we obtained eight spontaneous T2DM macaques with FPG ≥ 7 mmol/L and eight heathy control macaques with FPG ≤ 6.1 mmol/L (three consecutive detections, each detection interval of one month) from a population of 1,698 captive macaques. To avoid confound factors affect the investigated macaques, all macaques were individually housed with standardized diets and environmental controls. While age and sex partially matched, controls originated from the same population to minimize confounding. The T2DM and control groups were matched for age period (5 adult and 3 elder) and had comparable mean ages (mean age of T2DM individuals = 12.88, mean age of control individuals = 11.25) (Table S1). In terms of gender matching, we compared blood metabolome data of 12 healthy adult female and 12 healthy adult male macaques from another study (Liu et al., 2023) and obtained only a small number of differential metabolites that were not associated with tryptophan (Table 1). We acknowledge this limitation and will prioritize matched controls in future studies.

      Author response table 1.

      List of all differential metabolites.

      References:

      Wang J., Xu S., Gao J., Zhang L., Zhang Z., Yang W., Li Y., Liao S., Zhou H., Liu P., et al. 2018) SILAC-based quantitative proteomic analysis of the livers of spontaneous obese and diabetic rhesus monkeys Am. J. Physiol-endoc. M 315:E29-E306 https://doi.org/10.1152/ajpendo.00016.2018

      Jiang C., Pan X., Luo J., Liu X., Zhang L., Liu Y., Lei G., Hu G., Li J 2022) Alterations in microbiota and metabolites related to spontaneous diabetes and pre-diabetes in rhesus macaques Genes 13:1513 https://doi.org/10.3390/genes13091513

      Liu X., Liu X.Y., Wang X.Q., Shang K., Li J.W., Lan Y., Wang J., Li J., et al. 2023). Multi-Omics Analysis Reveals Changes in Tryptophan and Cholesterol Metabolism before and after Sexual Maturation in Captive Macaques BMC Genomics 24:308. https://doi.org/10.1186/s12864-023-09404-3

      Comment 3: Regarding the metabolomic analysis, the authors did not include fecal samples which are important, considering the authors' claim about the importance of gut microbiota in the pathogenesis of T2DM.

      We thank the reviewer for this suggestion. This study employed untargeted metabolomics on macaque fecal samples to identify metabolites associated with spontaneously developing T2DM. To validate the metabolites identified through the untargeted metabolomic analysis, we conducted targeted medium- and long-chain fatty acid (MLCFA) metabolomics on macaque serum, and we further quantitatively examined the content of palmitic acid (PA) in mice feces, ileum, and serum. Although targeted MLCFA metabolomics was not performed on macaque fecal samples, we performed untargeted metabolomics on macaque feces and confirmed the contribution of PA in mice that underwent fecal microbiota transplantation (FMT) from T2DM macaques. We have added future expectations in the part of the discussion, “Previous studies have shown that insulin-resistant patients exhibit increased fecal monosaccharides associated with microbial carbohydrate metabolism (70). Furthermore, commensal species of Lachnospiraceae actively overproduce long-chain fatty acids during metabolic dysfunction through altered bacterial lipid metabolism. The microbe-derived fatty acids impair intestinal epithelial integrity to exacerbate metabolic dysregulation (71). Given that microbial metabolic activity causally modulates host metabolic homeostasis, the content change of PA was potentially associated with a dynamic equilibrium between host absorption and microbial metabolism. Further integrative studies on the fecal fatty acid metabolome, microbial PA metabolism, and functional pathways will be crucial for delineating causal links between dysbiosis and lipid metabolic dysfunction in T2DM.” (Lines 426-437).

      Comment 4: In the mouse experiments, the control group should be given a FMT from control macaques rather than just untreated SPF mice since the fecal microbiota composition is likely very different between macaques and mice.

      Thanks for your helpful suggestion. We recognized the importance of a FMT control group and supplemented mouse experiments (using the C57BL/6J strain) with FMT from control macaques (HFT group). Another group of mice without FMT was set as control. Due to the lengthy experimental period, observations were concluded at 30 days post-FMT. We compared changes in the gut microbiota before and after antibiotic treatment in mice (-14D and 0D), and tracked body weight and fasting plasma glucose (FPG) levels from day -14 to day 30. At 30 days after FMT, fecal samples from all groups were collected for 16S rRNA sequencing. Additionally, samples of T2DM microbiota transplant (TP), and control transplant (HTP) were sequenced. Finally, we integrated the 16S sequencing data from the FTPA group (palmitic acid (PA) diet and FMT from T2DM macaques) and FT group (normal diet and FMT from T2DM macaques) at day 30 for combined analysis. The results showed that the antibiotic treatment used in this study effectively depleted the gut microbiota. Following FMT, gut microbial diversity stabilized within 30 days, with similar microbial community proportions between HFT and control groups. Core functional groups of the healthy microbiota (Bacteroidota and Bacillota) stably colonized mice despite host species divergence, confirming that T2DM phenotypes originate specifically from macaque microbiota. Importantly, increased abundance of Lachnospiraceae (including genera Ruminococcus (current name: Mediterraneibacter), Coprococcus, and Clostridium) and the key species Ruminococcus gnavus (current name: Mediterraneibacter gnavus) were also observed in FT group versus HFT group on day 30, validating our original findings. We have added findings in the results, “To eliminate interference from host species divergence in gut microbiota composition, we supplemented mouse experiments using FMT from control macaques (HFT group) (Figure S4A). By day 30, the HFT group exhibited significantly lower body weight than the untreated control group (p < 0.05) (Figure S4B). Throughout the experimental period, FPG levels in both HFT and control groups remained within the normal range (< 6 mmol/L) without significant differences, indicating that transplantation of control macaque microbiota did not induce glycemic alterations (Figure S4C).” (Lines 276-283), and “Integrating 16S rRNA sequencing data from the HFT, FT, and FTPA groups showed that the antibiotic treatment effectively depleted the gut microbiota, resulting in microbial diversity decreased sharply, with the dominant phyla shifting from Bacteroidota and Bacillota to Pseudomonadota (Figure S4D-G). The HFT group restored microbial diversity within 30 days, achieving community proportions comparable to untreated controls. Core functional phyla (Bacteroidota and Bacillota) stably colonized in HFT group (Figure S4D-I). Critically, FT and FTPA groups exhibited increased Lachnospiraceae (including genera Ruminococcus (current name: Mediterraneibacter), Coprococcus, and Clostridium) compared with the HFT group on day 30. In addition, LEfSe comparison identified significant R. gnavus (current name: M. gnavus) enrichment in the FT group (LDA > 3, p < 0.01) (Figure S4J-M).” (Lines 324-334, 825-837). Specifically:

      (1) Experimental design: transplant preparation and FMT from control macaques

      After single cage feeding and FPG detection, fecal samples from three control macaques were collected and mixed for transplantation preparation. Then, 4 ml diluent (Berland et al., 2021) was added per gram of feces. Sodium L-ascorbic acid (5% (w/v)) and L-cysteine hydrochloride monohydrate (0.1% (w/v)) were added to all suspensions (The sterile diluent of control group was added with the same amount of reagent). The mixture was homogenized and filtered sequentially through 200, 400, and 800 μm sterile mesh screens. The filtrate was centrifuged (600 × g, 5 min), and supernatants were aliquoted (400 μL/tube) for storage at -80°C. For use, the transplant was quickly thawed in a 37℃ water bath.

      Specific-pathogen-free male C57BL/6J mice aged 6 weeks were randomized into control and HFT (receiving FMT from control macaques) groups. Mice received antibiotic water (ampicillin, neomycin sulfate, and metronidazole, 1 g/L each) from days -14 to 0. All mice were maintained under standard conditions (12h light/dark, 22-25°C, 40-60% humidity) with sterile diet and twice-daily water changes. Body weight, fasting plasma glucose (FPG) were monitored, and fecal samples were collected throughout the study, with fecal 16S rRNA sequencing performed (Figure S4). The study was approved by the Ethics Committee of College of Life Sciences, Sichuan University, and conducted in accordance with the local legislation and institutional requirements.

      (2) Results

      Body weight monitoring revealed no significant difference between HFT and control groups before (-14D) and after (0D) antibiotic treatment. By day 30, the HFT group exhibited significantly lower body weight than the untreated control group (p < 0.05) (Figure S4B). Throughout the experimental period, FPG levels in both HFT and control groups remained within the normal range (< 6 mmol/L) without significant differences, indicating that transplantation of control macaque microbiota did not induce glycemic alterations (Figure S4C).

      Shannon and Simpson indices showed a significant reduction in gut microbiota diversity after antibiotic treatment (0D) (p < 0.01) (Figure S4D,E). The intestinal microbiota of normal mice (-14D) was predominantly composed of Bacteroidota and Bacillota. After two weeks of antibiotic treatment (0D), microbial diversity decreased sharply compared to the -14D group, with the dominant phyla shifting from Bacteroidota and Bacillota to Pseudomonadota (Author response image 1A; Figure S4L). In healthy gut homeostasis, obligate anaerobes such as Bacillota and Bacteroidota maintain intestinal equilibrium. Antibiotic disruption induced dysbiosis in mice, causing substantial restructuring of fecal microbial composition. During dysbiosis, colon epithelial cells shift to anaerobic glycolysis for energy production, increasing epithelial oxygenation and driving expansion of facultative anaerobic Pseudomonadota (de Nies et al., 2023; Szajewska et al., 2024).

      NMDS analysis of integrated 16S rRNA sequencing data of FTPA30D (PA diet and FMT from T2DM macaques) and FT30D (normal diet and FMT from T2DM macaques) revealed high intra-group repeatability among pre-antibiotic (-14D), post-antibiotic (0D), HFT30D, T2DM microbiota transplant (TP), and control transplant (HTP) groups. The 0D group showed maximal separation from other clusters, while the -14D, control30D, and HFT30D clustered closely together, with HFT30D nearest to control30D (Figure S4F). On the day 30, all groups showed restoration of microbiota community structure, and the composition of gut microbiota in HFT30D was basically consistent with the control30D group at all taxonomic levels (Author response image 1A-C). At the phylum level, HFT30D group showed significantly reduced relative abundance of Pseudomonadota and increased abundance of Bacteroidota, Bacillota_A, Bacillota_I, and gut barrier-enhancing Verrucomicrobiota (Author response image 1A). These findings demonstrated that FMT from control macaques effectively restored the gut microbiota of antibiotic-treated mice toward a normative state.

      Author response image 1.

      Composition of gut microbiota in mice. (A) Phylum level; (B) Family level; (C) Genus level.

      At the phylum level, the FT30D and FTPA30D groups exhibited lower proportions of Bacteroidota/Bacillota compared to the HFT30D (Author response image 1A). Family-level analysis revealed markedly increased abundance of Lactobacillaceae and Lachnospiraceae in FTPA30D and FT30D groups relative to HFT30D, consistent with the changes in the microbiota of spontaneously T2DM macaques (Author response image 1B). Notably, while both HTP and TP groups contained Lachnospiraceae, only FT30D and FTPA30D mice demonstrated significant increase of this family, which was close to that in TP group. Although Muribaculaceae and Bacteroidaceae showed partial recovery in these groups, their relative abundances remained substantially lower than in control30D and HFT30D groups, suggesting that microbiota transplantation from T2DM macaques may reduce specific beneficial taxa while promoting expansion of conditionally pathogenic or metabolically-altered bacteria, such as Lachnospiraceae.

      Further analysis of Lachnospiraceae dynamics revealed that at the genus level, most Lachnospiraceae members exhibited higher abundance in the TP group compared to the HTP group. FT30D and FTPA30D groups showed increased abundance of Ruminococcus (current name: Mediterraneibacter), Coprococcus, and Clostridium relative to HFT30D group, consistent with prior analyses (Figure S4). LEfSe comparison between FT30D and HFT30D identified significantly enriched Ruminococcus gnavus (current name: Mediterraneibacter gnavus) in FT30D recipients (LDA > 3, p < 0.01), corroborating earlier findings (Figure S4L). As a mucin-degrading microbe, R. gnavus (current name: M. gnavus) promotes insulin resistance through modulation of tryptamine/phenethylamine levels (Zhai et al., 2023) and exhibits pro-inflammatory properties (Henke et al., 2019; Paone and Cani, 2020). The absence of R. gnavus (current name: M. gnavus) enrichment in FTPA30D was potentially related to differential long-term impacts of T2DM microbiota transplantation across the 30- versus 120-day experimental timelines.

      Author response image 2.

      Identification of differential microbiota in mice. (A) Linear discriminant analysis Effect Size (LEfSe) analysis between pre-antibiotic (-14D) and post-antibiotic (0D) groups; (B) HFT and FTPA groups; (C) HFT and FT groups.

      References:

      Berland M., Cadiou J., Levenez F., Galleron N., Quinquis B., Thirion F., Gauthier F., Le ChatelierE., Plaza Oñate F., Schwintner C., et al. 2021) High engraftment capacity of frozen ready-to-use human fecal microbiota transplants assessed in germ-free mice Sci. Rep 11 https://doi.org/10.1038/s41598-021-83638-7

      Szajewska H., Scott KP., Meij T de., Forslund-Startceva S.K., Knight R., Koren O., Little P., Johnston B.C., Łukasik J., Suez J., Tancredi D.J., Sanders M.E 2024) Antibiotic-perturbed microbiota and the role of probiotics Nat. Rev. Gastro. Hepat 1-18 https://doi.org/10.1038/s41575-024-01023-x

      de Nies L., Kobras C.M., Stracy M 2023) Antibiotic-induced collateral damage to the microbiota and associated infections. Nat. Rev. Microbiol 21:789-804 https://doi.org/10.1038/s41579-023-00936-9

      Zhai L., Xiao H., Lin C., Wong H.L.X., Lam Y.Y., Gong M., Wu G., Ning Z., Huang C., Zhang Y., et al. 2023) Gut microbiota-derived tryptamine and phenethylamine impair insulin sensitivity in metabolic syndrome and irritable bowel syndrome Nat. Commun 14 https://doi.org/10 .1038/s41467-023-40552-y

      Henke M.T., Kenny D.J., Cassilly C.D., Vlamakis H., Xavier R.J., Clardy J 2019) Ruminococcusgnavus, a member of the human gut microbiome associated with Crohn's disease, produces an inflammatory polysaccharide Proc. Nat. Acad. Sci 116:12672-12677 https://doi.org/10.1073/pnas.1904099116

      Paone P., Cani P.D 2020) Mucus barrier, mucins and gut microbiota: the expected slimy partners? Gut 69:2232-2243 https://doi.org/10.1136/gutjnl-2020-322260

      Comment 5: Additionally, the palmitic acid-containing diets fed to mice to induce a diabetes-like condition do not mimic spontaneous T2DM in macaques.

      Thanks for your helpful suggestion. We agree that the palmitic acid (PA)-containing diet alone could not fully mimic spontaneous T2DM in macaques. In our study, the PA diet was employed in mouse experiments to investigate whether gut microbiota modulates serum PA levels and mediates T2DM progression. Our critical finding revealed that microbiota was essential for enhanced PA absorption, while simply increasing dietary levels of PA did not effectively enhance intestinal uptake. The fecal microbiota transplantation (FMT) combined with PA-diet approach successfully induced prediabetic states in mice, which can be further applied to the induction of T2DM in macaques. We have added future expectations in the part of the discussion, “Our study highlights the essential roles of gut microbiota in T2DM development, which may account for the inability of prior studies to induce T2DM in macaques through high-fat diet intervention alone (28, 29). Furthermore, applying this approach to induce T2DM in macaques will enable deeper investigation into gut-microbiota-driven mechanisms underlying disease pathogenesis.” (Lines 393-398).

      Reviewer #1 (Recommendations for the authors):

      General comments

      Comment 1: The authors used macaques in this study. The author claims that macaques may be the best animal model to investigate the relationships among gut microbiota, lipid metabolites, and the host in type 2 diabetes (T2DM). However, there have already been some studies investigating these relationships in humans (for example, doi: 10.1016/j.cmet.2022.12.013, and doi: 10.1038/s41586-023-06466-x). The authors should cite and discuss these papers.

      We thank the reviewer for this suggestion. We have cited the two papers in the part of discussion, “Previous studies have shown that insulin-resistant patients exhibit increased fecal monosaccharides associated with microbial carbohydrate metabolism (70). Furthermore, commensal species of Lachnospiraceae actively overproduce long-chain fatty acids during metabolic dysfunction through altered bacterial lipid metabolism. The microbe-derived fatty acids impair intestinal epithelial integrity to exacerbate metabolic dysregulation (71).” (Lines 426-432).

      Specific comments

      Major:

      Comment 2: (1) First of all, sex and age of the T2DM and control groups are different (Suppl Table 1). Since the size of the captive population is 1,698, the authors should be able to select the factors including the sex and age of the control group to match those of the T2DM group and they should do so.

      In this work, we obtained eight spontaneous T2DM macaques with FPG ≥ 7 mmol/L and eight heathy control macaques with FPG ≤ 6.1 mmol/L (three consecutive detections, each detection interval of one month) from a population of 1,698 captive macaques. To avoid confound factors affect the investigated macaques, all macaques were individually housed with standardized diets and environmental controls. While age and sex partially matched, controls originated from the same population to minimize confounding. The T2DM and control groups were matched for age period (5 adult and 3 elder) and had comparable mean ages (mean age of T2DM individuals = 12.88, mean age of control individuals = 11.25) (Table S1). In terms of gender matching, we compared blood metabolome data of 12 healthy adult female and 12 healthy adult male macaques from another study (Liu et al., 2023) and obtained only a very small number of differential metabolites that were not associated with tryptophan (Author response table 1). We acknowledge this limitation and will prioritize matched controls in future studies.

      References:

      Liu X., Liu X.Y., Wang X.Q., Shang K., Li J.W., Lan Y., Wang J., Li J., et al. 2023). Multi-Omics Analysis Reveals Changes in Tryptophan and Cholesterol Metabolism before and after Sexual Maturation in Captive Macaques BMC Genomics 24:308. https://doi.org/10.1186/s12864-023-09404-3

      Comment 3: (2) Are the normal ranges known for the parameters of macaques shown in Table 1? If so, the authors should include those values in Table 1. If not, the authors should show the values of average and SD or SE of all 1,698 individuals as the reference.

      We thank the reviewer for this suggestion. In this study, the normal ranges of fasting plasma glucose (FPG), fasting plasma insulin (FPI), homeostasismodel assessment- insulin resistance (HOMA-IR), and glycosylated hemoglobin A1cwe (HbA1c) were referenced against human standards. According to the American Diabetes Association (ADA) for glucose metabolism status and the diagnostic criteria for diabetes, individuals with FPG ≥ 7 mmol/L were diagnosed as T2DM subjects, and individuals with FPG ≤ 6.1 mmol/L were controls. More sensitive assays show a normal fasting plasma insulin level to be under 12 μU/mL (Matsuda and DeFronzo, 1999). HOMA-IR ≥ 2.67 indicated the possibility of insulin resistance, which is used in clinical diagnosis (Lorenzo et al., 2012). HbA1c percentages higher than 6.5% were used as an auxiliary diagnostic index for diabetic macaques (Cowie et al., 2010). The normal ranges of triglycerides (TG), total cholesterol (TC), high-density lipoprotein cholesterol (HDL), and low-density lipoprotein cholesterol (LDL) were referenced against the blood lipid index of rhesus macaques (Yu et al., 2019). We have added the normal ranges of parameters to Table 1, “FPG: fasting plasma glucose (normal range: ≤ 6.1 mmol/L); FPI: fasting plasma insulin (normal range: ≤ 12 μU/mL); HOMA-IR: homeostasismodel assessment- insulin resistance (normal range: ≤ 2.67); BMI: body mass index; HbA1c: glycosylated hemoglobin A1c (normal range: < 6.5%); TG: triglycerides (normal range: 0.95±0.47 mmol/L); TC: total cholesterol (normal range: 3.06±0.98 mmol/L); HDL: high-density lipoprotein cholesterol (normal range: 1.62±0.46 mmol/L); LDL: low-density lipoprotein cholesterol (normal range: 2.47±0.98 mmol/L). (30, 31, 32, 33).”.

      References:

      Matsuda M., DeFronzo R.A 1999) Insulin sensitivity indices obtained from oral glucose tolerance testing: comparison with the euglycemic insulin clamp Diabetes care 22:1462-1470 https://doi.org/10.2337/diacare.22.9.1462

      Lorenzo C., Hazuda H.P., Haffner S.M 2012) Insulin resistance and excess risk of diabetes in Mexican-Americans: the San Antonio Heart Study J. Clin. Endocr. Metab 97:793-799 https://doi.org/10.1210/jc.2011-2272

      Cowie C.C., Rust K.F., Byrd-Holt D.D., Gregg E.W., Ford E.S., Geiss L.S., Bainbridge K.E., Fradkin J.E 2010) Prevalence of diabetes and high risk for diabetes using A1C criteria in the US population in 1988–2006 Diabetes care 33:562-568 https://doi.org/10.2337/dc09-1524

      Yu W., Hao X., Yang F., Ma J., Zhao Y., Li Y., Wang J., Xu H., Chen L., Liu Q., et al. 2019) Hematological and biochemical parameters for Chinese rhesus macaque PLoS One 14:e0222338 https://doi.org/10.1371/journal.pone.0222338

      Comment 4: (3) The authors measured the fasting plasma glucose (FPG) levels, but it is common to measure whole blood glucose since glucose is consumed during the processing of obtaining plasma which could compromise the results. Please explain why plasma glucose levels were measured.

      The criteria for screening spontaneous T2DM macaques were guided by the American Diabetes Association (ADA) for glucose metabolism status and the diagnostic criteria for diabetes. Individuals with FPG ≥ 7 mmol/L were diagnosed as T2DM subjects, and individuals with FPG ≤ 6.1 mmol/L were controls. For the identified subjects, a total of three times of FPG tests were employed, with an interval of one month to reduce the possible error. These individuals were raised in a single cage, and blood samples were collected after an overnight fast at least 12 h. After the three test results meet the standards, venous blood was collected for FPG testing to ensure the reliability of the data to the greatest extent. We have added FPG values of three time to the Table S1.

      Comment 5: (4) Since the BMI of the T2DM and control groups did not significantly differ (p>0.05, Table 1), the food intake of the two groups may not significantly differ as well. The authors should examine the food intake data. The food intake is also important in considering the relevance of feeding the PA diet in mice experiments. Were the intake of T2DM macaques including PA more than the control group?

      All macaques in this study were individually housed under standardized environments with timed and measured feeding to minimize confounders. Given the non-significant BMI difference between T2DM and control groups, food intake was probably not significantly different. In this study, our findings highlight the essential roles of gut microbiota in T2DM development, and this is probable also the reason that previous studies have failed to induce T2DM in macaques because they have only used a high-fat diet (Ji et al., 2012; Tang, 2020). We agree that PA intake in T2DM macaques warrants focused investigation. Future investigations will incorporate detailed dietary monitoring including palmitic acid (PA) intake and nutrient composition to examine potential relationships between specific dietary components, metabolic parameters, and diabetes progression.

      References

      Ji F., Jin L., Zeng X., Zhang X., Zhang Y., Sun Y., Gao L., He H., Rao J., Liu X., et al. 2012) Comparison of gene expression between naturally occurring and diet-induced T2DM in cynomolgus monkeys Dongwuxue Yanjiu 33:79–84 https://doi.org/10.3724/SP.J.1141.2012 .01079

      Tang MT. 2020) Study on the Role of Glucose and Lipid in the Establishment of Type 2 Diabetic Cynomolgus Monkey Model M.S. Thesis, Dept. Veterinary Med., South China Agricultural Univ. 2020

      Comment 6: (5) It may be that the fecal microbiome of the T2DM macaques is involved in the pathogenesis of T2DM; however, it is more important how the gut microbiota compositions were obtained/established by those T2DM macaques. There was no description of when the fecal samples were collected during the course of T2DM. If it was after T2DM symptoms appeared, the authors should perform gut metagenome and also gut metabolome analyses to see the change in those parameters to try to understand how gut microbiome changes are induced leading to T2DM pathogenesis.

      The spontaneous T2DM macaques untreated with glucose-lowering drugs or antibiotics, revealed microbiota dysbiosis driven purely by disease progression. After macaques met diagnostic thresholds across three FPG assessments (each detection interval of one month), we collected fresh fecal samples and stored them aseptically at -80 °C until analysis. The scarcity of spontaneous T2DM macaques precludes invasive sampling, restricting tissue collection to naturally deceased diabetic individuals, which prevented us to explicitly define the disease stage of the T2DM individuals. We recognize the scientific value of gut metagenomic and metabolomic analyses to track microbiome evolution during diabetes progression. This study explored the interaction of gut microbiota and metabolites in T2DM macaques, and future studies can continue to investigate its dynamic changes in the disease process of T2DM.

      Comment 7: (6) Regarding the fatty acids, the authors only measured them in the plasma, but they also should measure in feces, since the authors focus on gut microbiota; in addition, a recent report showed fecal fatty acids, especially elaidic acid, contributed the pathogenesis of obesity and T2DM by acting on the gut epithelial cells (doi: 10.1016/j.cmet.2022.12.013). Besides, this study showed the link between a Lachnospiraceae species and fecal palmitic and elaidic acids, which the authors also focused on in this manuscript.

      We thank the reviewer for this suggestion. This study employed untargeted metabolomics on macaque fecal samples to identify metabolites associated with spontaneously developing T2DM. To validate the metabolites identified through the untargeted metabolomic analysis, we conducted targeted medium- and long-chain fatty acid (MLCFA) metabolomics on macaque serum, and we further quantitatively examined the content of palmitic acid (PA) in mice feces, ileum, and serum. Although targeted MLCFA metabolomics was not performed on macaque fecal samples, we did perform untargeted metabolomics on macaque feces and confirmed the contribution of PA in mice that underwent fecal microbiota transplantation (FMT) from T2DM macaques. We have added future expectations in the part of the discussion, “Previous studies have shown that insulin-resistant individuals exhibit increased fecal monosaccharides associated with microbial carbohydrate metabolism (70). Furthermore, commensal species of Lachnospiraceae actively overproduce long-chain fatty acids during metabolic dysfunction through altered bacterial lipid metabolism. The microbe-derived fatty acids impair intestinal epithelial integrity to exacerbate metabolic dysregulation (71). Given that microbial metabolic activity causally modulates host metabolic homeostasis, the content change of PA was potentially associated with a dynamic equilibrium between host absorption and microbial metabolism. Further integrative studies on the fecal fatty acid metabolome, microbial PA metabolism, and functional pathways will be crucial for delineating causal links between dysbiosis and lipid metabolic dysfunction in T2DM.” (Lines 426-437).

      Comment 8: (7) In FMT and PA diet experiments, SPF mice were used as the control group. However, the gut microbiota composition of the SPF mice is markedly different from that of macaques; the difference must be much bigger than the difference between T2DM and healthy control macaques; therefore, mice with FMT from healthy control macaques have to be used as the control group. As mentioned above (in point #4), is the feeding of mice with PA diet a relevant model reflecting the condition observed in macaques in this study?

      Thanks for your helpful suggestion. We recognized the importance of a FMT control group and supplemented mouse experiments (using the C57BL/6J strain) with FMT from control macaques (HFT group). Another group of mice without FMT was set as control. Due to the lengthy experimental period, observations were concluded at 30 days post-FMT. We compared changes in the gut microbiota before and after antibiotic treatment in mice (-14D and 0D), and tracked body weight and fasting plasma glucose (FPG) levels from day -14 to day 30. At 30 days after FMT, fecal samples from all groups were collected for 16S rRNA sequencing. Additionally, samples of T2DM microbiota transplant (TP), and control transplant (HTP) were sequenced. Finally, we integrated the 16S sequencing data from the FTPA group (palmitic acid (PA) diet and FMT from T2DM macaques) and FT group (normal diet and FMT from T2DM macaques) at day 30 for combined analysis. The results showed that the antibiotic treatment used in this study effectively depleted the gut microbiota. Following FMT, gut microbial diversity stabilized within 30 days, with similar microbial community proportions between HFT and control groups. Core functional groups of the healthy microbiota (Bacteroidota and Bacillota) stably colonized mice despite host species divergence, confirming that T2DM phenotypes originate specifically from macaque microbiota. Importantly, increased abundance of Lachnospiraceae (including genera Ruminococcus (current name: Mediterraneibacter), Coprococcus, and Clostridium) and the key species Ruminococcus gnavus (current name: Mediterraneibacter gnavus) were also observed in FT group versus HFT group on day 30, validating our original findings. We have added findings in the results, “To eliminate interference from host species divergence in gut microbiota composition, we supplemented mouse experiments using FMT from control macaques (HFT group) (Figure S4A). By day 30, the HFT group exhibited significantly lower body weight than the untreated control group (p < 0.05) (Figure S4B). Throughout the experimental period, FPG levels in both HFT and control groups remained within the normal range (< 6 mmol/L) without significant differences, indicating that transplantation of control macaque microbiota did not induce glycemic alterations (Figure S4C).” (Lines 276-283), and “Integrating 16S rRNA sequencing data from the HFT, FT, and FTPA groups showed that the antibiotic treatment effectively depleted the gut microbiota, resulting in microbial diversity decreased sharply, with the dominant phyla shifting from Bacteroidota and Bacillota to Pseudomonadota (Figure S4D-G). The HFT group restored microbial diversity within 30 days, achieving community proportions comparable to untreated controls. Core functional phyla (Bacteroidota and Bacillota) stably colonized in HFT group (Figure S4D-I). Critically, FT and FTPA groups exhibited increased Lachnospiraceae (including genera Ruminococcus (current name: Mediterraneibacter), Coprococcus, and Clostridium) compared with the HFT group on day 30. In addition, LEfSe comparison identified significant R. gnavus (current name: M. gnavus) enrichment in the FT group (LDA > 3, p < 0.01) (Figure S4J-M).” (Lines 324-334, 825-837).

      We agree that the PA-containing diet alone could not fully mimic spontaneous T2DM in macaques. In our study, the PA diet was employed in mouse experiments to investigate whether gut microbiota modulates serum PA levels and mediates T2DM progression. Our critical finding revealed that microbiota was essential for enhanced PA absorption, while simply increasing dietary levels of PA did not effectively enhance intestinal uptake. The FMT combined with PA-diet approach successfully induced prediabetic states in mice, which can be further applied to the induction of T2DM in macaques. We have added future expectations in the part of the discussion, “Our study highlights the essential roles of gut microbiota in T2DM development, which may account for the inability of prior studies to induce T2DM in macaques through high-fat diet intervention alone (28, 29). Furthermore, applying this approach to induce T2DM in macaques will enable deeper investigation into gut-microbiota-driven mechanisms underlying disease pathogenesis.” (Lines 393-398).

      Comment 9: FPG was measured here in the mouse experiments, but there was no description of whether mice were under fasting conditions, and this should be clarified. If there are no fasting durations, this should be described in the Materials and Methods section.

      As suggested, we have added description to the Materials and Methods section, “Throughout the experiment, body weight and feces were collected every month, FPG was detected every half month under fasting at least 12 h.” (Lines 619-620).

      Comment 10: From the PA contents in feces, ileum, and serum in mice (Figures 5A-D), the authors concluded that the absorption of PA was significantly enhanced in the ileum leading to the increase of PA in serum. However, it could also be possible that consumption of PA by gut microbiota occurs at the same time and the authors should discuss the possibility.

      We thank the reviewer for spotting this. We have added a discussion to the manuscript, “Previous studies have shown that insulin-resistant individuals exhibit increased fecal monosaccharides associated with microbial carbohydrate metabolism (70). Furthermore, commensal species of Lachnospiraceae actively overproduce long-chain fatty acids during metabolic dysfunction through altered bacterial lipid metabolism. The microbe-derived fatty acids impair intestinal epithelial integrity to exacerbate metabolic dysregulation (71). Given that microbial metabolic activity causally modulates host metabolic homeostasis, the content change of PA was potentially associated with a dynamic equilibrium between host absorption and microbial metabolism. Further integrative studies on the fecal fatty acid metabolome, microbial PA metabolism, and functional pathways will be crucial for delineating causal links between dysbiosis and lipid metabolic dysfunction in T2DM.” (Lines 426-437).

      Comment 11: (8) Nomenclature and classification of bacteria has been revised by the List of Prokaryotic names with Standing in Nomenclature (LPSN) (https://lpsn.dsmz.de/) and recognized as Global Core Biodata Resource in 2023. For example, Ruminococcus gnavus is now Mediterraneibacter gnavus. Therefore, the name of microbes should be corrected accordingly; one proposal is to show the revised correct name with the previous name in parenthesis, such as "Mediterraneibacter gnavus (previously Ruminococcus gnavus)".

      Thank you for pointing this out. We have corrected the name of microbe, “Ruminococcus (current name: Mediterraneibacter)”, “Ruminococcus gnavus (current name: Mediterraneibacter gnavus), and “R. gnavus (current name: M. gnavus)” (Lines 146, 313, 316-317, 336, 345, 367-368, 401, 404-405, 409, 448, 764-765)

      Minor:

      Comment 12:

      (1) The sentence starting "A total of..." (lines 143-144) seems grammatically wrong; a word such as "represented" should be inserted after "differentially", or alternatively "differentially" should be "differential"?

      (2) "medium-and" (line 220) needs a space between "medium-" and "and" to make it "medium- and".

      (3) Abbreviations should be spelled out when they appear for the first time in the main text; for example, WBC, NEU, and LYM in line 237.

      (4) Should FGP (line 437) be FPG?

      (5) What is the definition of "prediabetes" in mice? Is this clearly defined elsewhere?

      We sincerely thank the reviewer for careful reading. As suggested, we have improved the statements and revised it according to the requirements:

      (1) Line 143: “A total of 21 microbes were identified as differential microbes”.

      (2) Line 221: “targeted medium- and long-chain fatty acid”.

      (3) Lines 238-239: “white blood cell (WBC)”, “neutrophil (NEU)”, and “lymphocyte (LYM)”.

      (4) Line 472: “FPG, HbA1c and FPI were detected”.

      (5) Prediabetes or impaired glucose regulation (IGR) is diagnosed when one exhibits blood glucose level higher than normal yet below the diabetic threshold, which is even more prevalent than T2DM in the population (American Diabetes, 2021). Given the higher glycemic diagnostic criteria in mice, we assessed diabetic manifestations integrating physiological and pathological evidence. Compared to control mice, those receiving FMT from T2DM macaques combined with a high-palmitic-acid diet (FTPA group) developed prediabetic characteristics by day 120. Physiological alterations included elevated fasting plasma glucose (FPG), increased fasting plasma insulin (FPI), impaired glucose tolerance, heightened insulin resistance, weight gain, and elevated serum total cholesterol (TC) and triglyceride (TG) levels. Particularly in pathological changes, hepatocytes focal necrosis with inflammatory cell infiltration was commonly observed in FTPA group, alongside decreased volume in pancreatic islets and inflammatory cell infiltration (lines 258-276).

      References:

      American Diabetes Association 2021) 2. Classification and diagnosis of diabetes: standards of medical care in diabetes—2021 Diabetes care 44:S15-S33 https://doi.org/10.2337/dc21-S002

      Reviewer #2 (Public review):

      This study analyzes the interaction among the gut microbiota, lipid metabolism, and the host in type 2 diabetes (T2DM) using rhesus macaques. The authors first identified 8 macaques with T2DM from 1698 individuals. Then, they observed in T2DM macaques: dysbiosis by 16S rRNA gene amplicon analysis and shotgun sequencing, imbalanced tryptophan metabolism and fatty acid beta oxidization in the feces by metabolome analysis, increased plasma concentration of palmitic acid by MS analysis, and sn inflammatory gene signature of blood cells by transcriptomic analysis. Finally, they transplanted feces of T2DM macaques into mice and fed them with palmitic acid and showed that those mice became diabetic through increased absorption of palmitic acid in the ileum.

      Comment 1: This study clearly shows the interaction among gut microbiota, lipid metabolism, and the host in T2DM. The experiments were well designed and performed, and the data are convincing. One point I would suggest is that in the experiments of mice with FMT, control mice should be those colonized with feces of healthy macaques, but not with no FMT.

      See response to Reviewer 1, Public review comment 4.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Assessment:

      The manuscript titled 'Rab7 dependent regulation of goblet cell protein CLCA1 modulates gastrointestinal 1 homeostasis' by Gaur et al discusses the role of Rab7 in the development of ulcerative colitis by regulating the lysosomal degradation of Clca1, a mucin protease. The manuscript presents interesting data and provides a potential molecular mechanism for the pathological alterations observed in ulcerative colitis. Gaur et al demonstrate that Rab7 levels are lowered in UC and CD. However, a similar analysis of Rab7 levels in ulcerative colitis (UC) and Crohn's disease (CD) patient samples was conducted recently (Du et al, Dev Cell, 2020) which showed that Rab7 levels are found to be elevated under these conditions. While Gaur et al have briefly mentioned Du et al's paper in passing in the discussion, they need to discuss these contradictory results in their paper and clarify these differences. Additionally, Du et al are not included in the list of references.

      Strengths:

      The manuscript used a multi-pronged approach and compares patient samples, mouse models of DSS, and protocols that allow differentiation of goblet cells. They also use a nanogel-based delivery system for siRNAs, which is ideal for the knockdown of specific genes in the gut.

      Weaknesses:

      (1) Du et al, Dev Cell 2020 (https://doi.org/10.1016/j.devcel.2020.03.002) have previously shown that Rab7 levels are elevated in a similar set of colonic samples (age group, number etc.) from UC and CD patients. Gaur et al have not discussed this paper or its findings in detail, which directly contradicts their results. Clarification regarding this should be provided.

      We thank and appreciate the reviewer for bringing this point.

      The results shown by Du et al, Dev Cell, 2020 depict elevated expression of Rab7 in UC and CD patients compared to controls. In first occurrence, these results appear contradictory, but there may be a few possible explanations for this.

      Firstly, Rab7 expression levels may fluctuate in the tissue depending on the degree of the gut inflammation. This can be concluded from our observations in DSS-mice dynamics model and the human patient samples with mild and moderate UC. Furthermore, Du et al provide no information of the severity of the condition among the patients employed in the study. Our motive, in the current work, was to emphasize this aspect. This point was mentioned in the discussion section of the manuscript. However, in view of the reviewer’s concern, we have now added a detailed comment on this in the main text of the revised version of the manuscript.

      Secondly, the control biopsies in our investigation were acquired from non-IBD patients, and not what was done by Du et al., wherein biopsies from the normal para-carcinoma region of the colorectal cancer patients were used. One cannot overlook the fact that physiological and molecular changes are apparent even in non-inflamed regions in the gut of an IBD or CRC patient. It is possible that the observed discrepancy arises due to the differences in the sample type used for comparing the Rab7 expression.

      Finally, the main sub-tissue region showing a decrease in Rab7 expression in UC samples, appeared to be the Goblet cells which was not covered by Du et al.

      Keeping these points in mind we do not think that there is a contradiction in our findings with that of Du et al., 2020. In the revised submission some of these explanations are incorporated (Lines 106-109).

      This was an oversight from our side. We have actually mentioned Du et al., 2020 in the discussion (line number 345) but somehow the reference was missing in the main list. We have ensured that the reference is included in the revised version and that their findings are included both in main text and in the discussion.

      Reviewer #2 (Public Review):

      Summary:

      In this work, the authors report a role for the well-studied GTPase Rab7 in gut homeostasis. The study combines cell culture experiments with mouse models and human ulcerative colitis patient tissues to propose a model where, Rab7 by delivering a key mucous component CLCA1 to lysosomes, regulates its secretion in the goblet cells. This is important for the maintenance of mucous permeability and gut microbiota composition. In the absence of Rab7, CLCA1 protein levels are higher in tissues as well as the mucus layer, corroborating with the anticorrelation of Rab7 (reduced) and CLCA1 (increased) from ulcerative colitis patients. The authors conclude that Rab7 maintains CLCA1 level by controlling its lysosomal degradation, thereby playing a vital role in mucous composition, colon integrity, and gut homeostasis.

      Strengths:

      The biggest strength of this manuscript is the combination of cell culture, mouse model, and human tissues. The experiments are largely well done and, in most cases, the results support their conclusions. The authors go to substantial lengths to find a link, such as alteration in microbiota, or mucus proteomics.

      Weaknesses:

      (1) There are also some weaknesses that need to be addressed. The association of Rab7 with UC in both mice and humans is clear, however, claims on the underlying mechanisms are less clear. Does Rab7 regulate specifically CLCA1 delivery to lysosomes, or is it an outcome of a generic trafficking defect?

      We thank the reviewer for the insightful comment. We would like to bring forth the following explanation for each these concerns:

      Our immunofluorescence imaging experiments revealed co-localization of Rab7 protein with CLCA1 and the lysosomes (Fig 7I). In addition, the absence of Rab7 affects the transport of CLCA1 to lysosomes (Fig 7J). This demonstrates that Rab7 may be involved in regulation of CLCA1 transport (presumably along with other cargo), to lysosomes selectively. However, we do recognize that the point raised by the reviewer about possible effect of a generic trafficking defect is valid.

      (2) CLCA1 is a secretory protein, how does it get routed to lysosomes, i.e., through Golgi-derived vesicles, or by endocytosis of mucous components? Mechanistic details on how CLCA1 is routed to lysosomes will add substantial value.

      As mentioned in the manuscript, the trafficking of CLCA1 protein or CLCA1-containing vesicles within the goblet cell is unknown, with no information on the proteins involved in its mobility. The switching of CLCA1 containing vesicles from the secretory route to lysosomes needs extensive investigation involving overall trafficking of the protein. Taken together, the complete answer to both these important questions will need a series of experiments and those may be interesting avenues for future research.

      (3) Why does the level of Rab7 fluctuate during DSS treatment (Fig 1B)?

      This is a very thoughtful point from the reviewer. We detected a distinct pattern of Rab7 expression fluctuation in intestinal epithelial cells after DSS-dynamics treatment in mice. Perhaps, these changes are the result of complex cellular signaling in response to the DSS treatment. Rab7, being a fundamental protein involved in protein sorting pathway, is expected to undergo alteration based on cells requirement. Presently there are no reports suggesting the regulatory mechanisms that govern Rab7 levels in the gut.

      (4) Does the reduction seen in Rab7 levels (by WB) also reflect in reduced Rab7 endosome numbers?

      We observed reduction in Rab7 expression both at RNA and protein levels. To confirm whether this alteration will lead to reduced Rab7 positive endosome numbers may require detailed investigations.

      (5) Are other late endosomal (and lysosomal) populations also reduced upon DSS treatment and UC? Is there a general defect in lysosomal function?

      There are no direct evidences showing reduction in the late endosomal and lysosomal population during gut inflammation, but few studies link lysosomal dysfunction with risk for colitis (doi: 10.1016/j.immuni.2016.05.007).

      (6) The evidence for lysosomal delivery of CLCA1 (Fig 7 I, J) is weak. Although used sometimes in combination with antibodies, lysotracker red is not well compatible with permeabilization and immunofluorescence staining. The authors can substantiate this result further using lysosomal antibodies such as Lamp1 and Lamp2. For Fig 7J, it will be good to see a reduction in Rab7 levels upon KD in the same cell.

      We used Lysotracker red in live cells followed by fixation. So, permeabilization issues were resolved. Lamp1, as suggested by the reviewer, is definitely a better marker for lysosomes in immunofluorescence studies, but is also shown to mark late endosomes (doi: 10.1083/jcb.132.4.565). As Rab7 protein also marks the late endosomes, using Lamp1 may leave the ambiguity of CLCA1 in Rab7 positive late endosomes versus lysosomes. Nevertheless, we have carried out this experiment, as suggested by the reviewer, by staining the cells with LAMP1 (author response image 1). As demonstrated in our previous data, the colocalization of CLCA1 with LAMP1 positive vesicles decreased upon Rab7 knockdown. Also, we observed a decrease in the intensity of LAMP1 staining in cells with Rab7 knockdown. Additionally, we noted a reduction in the LAMP1 staining intensity in cells where Rab7 was knocked down. This observation can be attributed to the decrease in the presence of Rab7-positive vesicles or late endosomes which also exhibit LAMP1 staining.

      Author response image 1.

      (A) Representative confocal images of HT29-MTX-E12 cells transfected with either scrambled siRNA (control) or Rab7 siRNA (Rab7Knockdown). Cells are stained with CLCA1 (green) using antiCLCA1 antibody and lysosomes with LAMP1. (B) Graph shows quantitation of colocalization between CLCA1 and LAMP1 from images (n=20) using Mander’s overlap coefficient. Inset shows zoomed areas of the image with colocalization puncta (yellow) marked with arrows.

      (7) In this connection, Fig S3D is somewhat confusing. While it is clear that the pattern of Muc2 in WT and Rab7-/- cells are different, how this corroborates with the in vivo data on alterations in mucus layer permeability -- as claimed -- is not clear.

      The data in Fig. S3D suggest the involvement of Rab7 in packaging of Muc2. The whole idea for doing this experiment was to support our observation in the Rab7KD-mice model where mucus layer was seen to be loose and more permeable in Rab7 deficient mice.

      (8) Overall, the work shows a role for a well-studied GTPase, Rab7, in gut homeostasis. This is an important finding and could provide scope and testable hypotheses for future studies aimed at understanding in detail the mechanisms involved.

      We thank the reviewer for this comment.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Specific questions to the authors:

      (1) Why is the dotted line in Fig. 1c at -7.5? What does this signify?

      Response: The dotted line was intended to represent the baseline; in the revised manuscript it is corrected and placed at y=0.

      (2) Du et al should be cited. Fig 6 K-Q from Du et al should be discussed and reasons for contradictory findings should be given in greater detail, rather than a single sentence in the discussion.

      Response: The reference for Du et al is included in the list and the possible reasons the findings of the current work are discussed in the main text (Line 106-109).

      (3) Fig1. Why are Rab7 levels low even in remission patient samples? Can DSS be withdrawn to induce remission followed by analysis of colonic samples?

      Response: A possible explanation for this observation could be that the restoration of Rab7 levels may not immediately follow the resolution of clinical symptoms in remission patients. After the remission initiation, the normalization of cellular processes, including the regulation of Rab7 expression, might exhibit a time lag. A thorough investigation of Rab7 levels and the allied pathways at different time points during the remission phase could provide deeper insights into the gradual dynamics of recovery. As suggested by the reviewer, DSS withdrawal induced recovery model can be utilized for understanding the same and could be a good approach for future investigations.

      (4) Fig. 2: Single-channel fluorescence should be shown.

      Response: The single channel fluorescence images are incorporated in Fig. S2.

      (5) Line 456 should be modified. 'Blind pathologist' does not read well!

      Response: The line has been modified with ‘Blinded pathologist’.

      (6) Other inflammatory markers, cytokine levels should be looked at in addition to TNF alpha.

      Response: TNF-α is a crucial mediator in intestinal inflammation, actively contributing to the development of IBD. Elevated levels of TNF-α are observed in patients of IBD (Billmeier U. et al, World J Gastroenterol. 2016). In the current work, while probing for TNF-α our primary objective was to examine this significant indicator of colitis following Rab7 knockdown in mice, aiming to gain insights into heightened gut inflammation.

      (7) Quantitation of S3D should be provided.

      Response: The dispersed expression of Muc2 was observed in n=20 cells per sample and it was a qualitative observation. The aim was to identify any changes in Muc2 packaging under Rab7 knockout conditions.

      (8) Microbiota analysis should include Rab7KD+DSS mice.

      Response: We understand the importance of this point, however, in the current work our primary objective was to specifically investigate changes in microbial diversity and abundance in Rab7KD mice compared to both DSS+CScr and CScr mice. Rab7KD+DSS mice is expected to show higher dysbiosis in comparison to DSS+CScr.

      (9) Fig 6 H and I, G. How do Clca1 levels reduce in Rab7kd +DSS relative to Scr+DSS while they are higher in Rab7kd compared to Scr. Comment.

      Response: The decreased expression of CLCA1 in the mucus of DSS+Rab7KD mice can be attributed to a consequence of significant reduction in goblet cell numbers in these mice, as evidenced by the observed loss of these cells (Fig.S3 B and Fig. S3C). CLCA1 is exclusively secreted by goblet cells, so a decline in their numbers directly affects CLCA1 levels.

      (10) How are Rab7 levels downregulated? What is the predicted mechanism?

      Response: While our current study didn't explore this aspect, it's worth noting that Rab7 protein levels undergo regulation through various mechanisms, including post-translational modifications such as Ubiquitination and SUMOylation. These modifications are known to regulate Rab7 stability, transport and recycling. Specific experiments conducted during this study (work not included in the manuscript) indicated the participation of SENP7, a deSUMOylase, in controlling the stability of Rab7 protein, particularly in the context of colitis. Additionally, goblet cell specific mechanisms are also likely to be controlling the Rab7 in the gut.

      (11) What is the explanation for opposite changes in CLCa1 RNA (down) and protein (up).

      Response: The reduction in CLCA1 at the RNA level could be associated with the decrease in goblet cell numbers during colitis. Our investigation indicates that Rab7 predominantly influences CLCA1 at the protein level by impacting its degradation pathway. It is important to acknowledge that not all the alterations in CLCA1 observed during colitis can be solely attributed to Rab7, but our study has identified a connection between Rab7 and CLCA1.

      (12) In light of Du et al, it would be interesting to see how the number of peroxisomes changes upon alteration of Rab7 levels.

      Response: The suggestion by the reviewer is noteworthy. Since, being an altogether different domain, it deviates from the primary objectives of current work. Here, our goal was specifically on exploring the role of Rab7 in goblet cell functioning. Thus is an attractive theme for future investigations.

      (13) While Gaur et al suggest in their discussion that Du et al may have observed an upregulation in Rab7 levels in different cell types of the intestine, this is not apparent from the data provided. Tissue sections should be carefully analysed to provide data supporting this observation. Differences in reagents used (antibodies) should also be considered. As far as the human patient data is concerned, it does not appear that the sample stages are very different across the two manuscripts (based on age, inclusion criteria etc.).

      Response: This has been explained in detail in our public comments.

      Reviewer #2 (Recommendations For The Authors):

      (1) In general, image-based measurements could be done better (for example, object-based statistics than pixel-based overlaps) and represented differently. It is difficult to appreciate the reduction in Rab7 levels in goblet cells in Fig 2 A, C. It might be good to show the channels separately, and perhaps use an intensity gradient LUT for the Rab7 channel.

      Response: The single channel fluorescence images are incorporated in Fig. S2.

      (2) The EM images, and particularly Fig 2F are not convincing, with an oddly square-shaped vesicle. I'm not sure what value they are adding to the interpretation.

      Response: The observed square-shaped vesicle in Fig. 2F could be attributed to the dynamic nature of vesicles within a cell. This dynamicity allows them to adopt various shapes depending on their state and function within the cell. The presence of Rab7 near vacuoles of goblet cells signify its probable involvement in the regulation of secretory function of these cells which is the key aspect being covered in this work.

      (3) A general method question concerns the definition of the distal colon. How is this decided, particularly when colon lengths are reduced upon DSS treatment?

      Response: The murine colon is divided into proximal and distal colon of mouse and has a visual difference of inner folds which are quite prominent in proximal colon. Additionally, the portion towards the rectum (predominantly distal colon) was majorly utilized for the experiments. In each case the various experimental groups were matched for the respective areas.

      (4) The use of an in vivo intestine-specific Rab7 silencing model is good. Why does Rab7 KD itself not capitulate aspects of DSS treatment, rather it seems to exacerbate it.

      Response: Our objective was to determine whether the downregulation of Rab7 during colitis was the cause or consequence of gut inflammation. Interestingly, our investigation using the murine Rab7 knockdown model revealed that the reduction of Rab7 expression in the intestine exacerbates inflammation. Subsequent analysis demonstrated that the absence of Rab7 disrupts goblet cell secretory function, consequently contributing to heightened inflammation. Our findings overall suggest that Rab7 downregulation is not merely a consequence but plays a contributory role in aggravating inflammation in the context of colitis.

      (5) The axes labels in Fig 5 are not readable. It is unclear how Rab7 KD is more similar in gut microbiota phenotypes to DSS than to CScr.

      Response: The microbial analysis revealed an abnormal composition of gut microbiota in Rab7KD mice compared to CScr. Interestingly, this composition exhibited some similarity to the inflamed gut microbiota observed in DSSScr mice. The analysis further demonstrated a shift in microbial diversity in Rab7KD mice, showcasing characteristics akin to those observed in inflamed mice. This similarity in gut microbiota phenotypes between Rab7KD and DSSScr suggests a potential link or influence of Rab7 downregulation on the microbiota, contributing to the observed similarities with DSS-induced inflammation.

      (6) The use of mucous proteomics to identify mechanisms of Rab7-mediated phenotype is a good approach. The replicates in the proteomics dataset (Fig 6F) do not seem to match. Detailing of methodology used for analysis will help to overcome these doubts.

      Response: The identified proteins in different samples of mucus proteomics were subjected to label free quantification. Subsequently, the significantly altered proteins were subjected to analysis with the False Discovery Rate (FDR) to control for potential false positives and ascertain the validity of the findings.

      (7) It will be good to see the immunoblots showing the negative correlation between Rab7 and CLCL1 in Fig 7D.

      Response: Fig. 7C shows western blot for protein expression of CLCA1of the same control and UC samples which were used in Fig. 1F to show Rab7 expression. Fig. 7D is the quantitative correlation plot for Fig. 1F (Rab7 expression) and Fig. 7C (CLCA1 expression).

      (8) Why is UC different from the DSS model for Rab7 gene expression but not protein levels? Endosomal counts could help address this.

      Response: We encountered challenges in accurately counting the individual puncta of Rab7 expression in immunofluorescence images due to the nature of tissue samples. Locating endosomes within a single cell proved to be challenging, and the proximity of many puncta made it difficult to delineate them individually. Despite these technical difficulties, the intriguing prospect of correlating Rab7 expression with endosomal counts remains a compelling aspect that may well be area for future investigations.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The manuscript entitled "Phosphodiesterase 1A Physically Interacts with YTHDF2 and Reinforces the Progression of Non-Small Cell Lung Cancer" explores the role of PDE1A in promoting NSCLC progression by binding to the m6A reader YTHDF2 and regulating the mRNA stability of several novel target genes, consequently activating the STAT3 pathway and leading to metastasis and drug resistance.

      Strengths:

      The study addresses a novel mechanism involving PDE1A and YTHDF2 interaction in NSCLC, contributing to our understanding of cancer progression.

      Weaknesses:

      The following issues should be addressed:

      (1) The body weight changes and/or survival times of each group in the in vivo metastasis studies should be provided.

      Thank you for your suggestion! We have already provided the body weight of each group in the in vivo metastasis studies in FigureS4D and FigureS5D (see below).

      (2) In Figure 7, the direct binding between YTHDF2 and the potential target genes should be further validated by silencing YTHDF2 to observe the half-life of the mRNA levels of target genes, in addition to silencing PDE1A.

      Thank you for your suggestion! We have found that siYTHDF2 does not significantly affect expression of SOCS2 in NSCLC cells (see author response image 1 below). We hypothesize that YTHDF2 functions as a m6A reader to recognize the target mRNA, thus if YTHDF2 is silence by siRNA, there is still some expression in the cells, allowing it to continue recognizing and exerting its function. Therefore, the mRNA of SOCS2 could not significantly affect expressed. However, PDE1A functions as a degrader of mRNA, thus when it is disrupted, the mRNA degradation effect could be strong.

      Author response image 1.

      SOCS2 mRNA expression after siYTHDF2 in NSCLC cells

      (3) In Figure 7, the potential methylation sites of "A" on the target genes such as SOCS2 should be verified by mutation analysis, followed by m6A IP or reporter assays.

      Thank you for your suggestion! The m6A IP or reporter assays may be carried out to detect the potential methylation sites in future. We have added the suggestion in manuscript “Meanwhile, YTHDF2 might act as an m6A RNA “reader” by interacting with PDE1A, but the mechanism might need further investigation”.

      (4) In Figure 6G, the correlation between the mRNA levels of STAT3 and YTHDF2 needs clarification. According to the authors' mechanism, the STAT3 pathway is activated, rather than upregulation of mRNA levels (or protein levels, as shown in Figure 6F). Figure 7 does not provide evidence that STAT3 is a bona fide target gene regulated by YTHDF2.

      Thank you for your suggestion! The reviewer is right, STAT3 pathway is activated, rather than upregulation of mRNA levels by YTHDF2, so the relationship between YTHDF2 mRNA and STAT3 mRNA is not suitable for this study. Meanwhile, the relationship between YTHDF2 mRNA and STAT3 mRNA is not as strong as we expected with Pearson value 0.37. Thus, we have already deleted Figure 6G in the revised version.

      (5) The final figure, which discusses sensitization to cisplatin by PDE1A suppression, does not appear to be closely related to the interaction or regulation of PDE1A/YTHDF2. If the authors claim this is an m6A-associated event, additional evidence is needed. Otherwise, this part could be removed from the manuscript.

      Thank you for your suggestion! We have already deleted Figure 8 just as the reviewer suggested.

      Reviewer #2 (Public review):

      This manuscript aims to investigate the biological impact and mechanisms of phosphodiesterase 1A (PDE1A) in promoting non-small cell lung cancer (NSCLC) progression. They first analyzed several databases and used three established NSCLC cell lines and a normal cell line to demonstrate that PDE1A is overexpressed in lung cancer and its expression negatively correlated with the outcomes of patients. Based on this data, they suggested PDE1A could be considered as a novel prognostic predictor in lung cancer treatment and progression. To study the biological function of PDE1A in NSCLC, they focused on testing the effect of inhibition of PDE1A genetically and pharmacologically on cell proliferation, migration, and invasion in vitro. They also used an experimental metastasis model via tail vein injection of H1299 cells to test if PDE1A promoted metastasis. By database analysis, they also decided to investigate if PDE1A promoted angiogenesis by co-culturing NSCLC cells with HUVECs as well as assessing the tumors from the subcutaneous xenograft model. However, in this model, whether PDE1A modulation impacted tumor metastasis was not examined. To address the mechanism of how PDE1A promotes metastasis, the authors again performed a bioinformatic and GSEA enrichment analysis and confirmed PDE1A indeed activated STAT3 signaling to promote migration. In combination with IP followed by Mass spectrometry, they found PDE1A is a partner of YTHDF2, the cooperation of PDE1A and YTHDF2 negatively regulated SOCS2 mRNA as demonstrated by RIP assay, and ultimately activated STAT3 signaling. Finally, the authors shifted the direction from metastasis to chemoresistance, specifically, they found that PDEA1 inhibitions sensitized NSCLC cells to cisplatin through MET and NRF2 signaling.

      Strength:

      Overall, the manuscript was well-written and the majority of the data supported the conclusions. The authors used a series of methods including cell lines, animal models, and database analysis to demonstrate the novel roles and mechanism of how PDE1 promotes NSCLC invasion and metastasis as well as cisplatin sensitivity. Given that PDE1A inhibitors have been perused to use in clinic, this study provided valuable findings that have the translational potential for NSCLC treatment.

      Weaknesses:

      The role of YTHDF2 in PDE1A-promoted tumor metastasis was not investigated. To make the findings more clinical and physiologically relevant, it would be interesting to test if inhibition of PDE1A impacts metastasis using lung cancer orthotopic and patient-derived xenograft models. It is also important to use a cisplatin-resistant NSCLC cell line to test if a PDE1A inhibitor has the potential to sensitize cisplatin in vitro and in vivo.

      Thank you for your suggestion! The role of YTHDF2 in PDE1A-promoted tumor metastasis may need in vivo analysis. Therefore, we discussed the point in the discussion section “In addition, it is worth testing if PDE1A inhibition affects metastasis in lung cancer orthotopic and patient-derived xenograft models. The role of YTHDF2 in PDE1A-driven tumor metastasis should be elucidated in future studies”.

      The reviewer is absolutely right, it is very important to use a cisplatin-resistant NSCLC cell line to test the potential effect of PDE1A in sensitization to cisplatin. The current data could not support the conclusion, more data is needed to make the final conclusion. As suggested by reviewer 1, we have deleted these data in this version.

      Furthermore, this study relied heavily on different database analyses, although providing novel and compelling data that was followed up and confirmed in the paper, it is critical to have detailed statistical description section on data acquisition throughout the manuscript.

      Thank you for your suggestion! We have already added the detailed statistical description section in Figure legends.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Scale Bar Display: Scale bars should be included in Figures 4F, 5F, and 6E to ensure clarity and accuracy in the presented microscopic images.

      Thank you for your suggestion! We have already added the scale bars on Figures 4F, 5F, and 6E.

      (2) HE Staining Images: The authors are suggested to provide more images for HE staining of lungs to offer a comprehensive visual representation and to substantiate the findings.

      Thank you for your suggestion! We have already provided more images for HE staining of lungs in Figure S4E and Figure S5E.

      Reviewer #2 (Recommendations for the authors):

      It would be helpful to clarify several points in the manuscript for better understanding.

      (1)The HELF cells were stated between the epithelial cell line (page 7, line 118) and fibroblast (page 12, line 288) which needs to be clarified. It is not clear if the cells used in this study were periodically authenticated.

      Thank you for your suggestion! We have already revised the expression of HELF cells, and it is actually the human lung fibroblasts.

      (2) More details could be added to the methods such as the amount of Matrigel coated for invasion assay and the components for the lysis buffer and IP buffer.

      Thank you for your suggestion! We have already added more details in the Methods section.

      (3) Providing the rationale for using 20% FBS instead of using some chemoattracts such as EGF, LPA, or HGF or a low level of FBS for migration will be helpful.

      Thank you for your suggestion! Although chemoattracts are suitable for cell migration experiment, and 20% FBS is also suitable for cell migration experiment. We listed the literatures using this system below for example.

      (1) Xiaolin Peng, Zhengming Wang, Yang Liu. et al. Oxyfadichalcone C inhibits melanoma A375 cell proliferation and metastasis via suppressing PI3K/Akt and MAPK/ERK pathways, Life Sciences, 2018, 206, 35-44. https://doi.org/10.1016/j.lfs.2018.05.032

      (2) Rong, S., Dai, B., Yang, C. et al. HNRNPC modulates PKM alternative splicing via m6A methylation, upregulating PKM2 expression to promote aerobic glycolysis in papillary thyroid carcinoma and drive malignant progression. J Transl Med, 2024, 22, 914 (2024). https://doi.org/10.1186/s12967-024-05668-9

      (4) For HPA analysis In Figure 1, it would be great to assess how many lung cancer cases are NSCLC and define IDO/area for the y-axis.

      Thank you for your suggestion! There are 19 samples were analyzed, they are all NSCLC sample, and we have already revised our manuscript accordingly. Meanwhile, we also made a mistake, it should be IOD/area which means Integral optical density/area. We have revised the Figures and Figure legends.

      (5) On page 23, line 480, "Therefore, this study reveals the effect and mechanism of PDEA1 in promoting HCC metastasis...", should HCC be NSCLC?

      Thank you for your suggestion! We have already revised the manuscript accordingly.

      (6) Specific scramble siRNAs should be clearly shown in their respective figures. In Figure 7F, it is not clear why DMSO did not scramble siRNA was used as the control.

      Thank you for your suggestion! It is our fault to show the DMSO in Figure 5F, DMSO is the negative control of Figure 5G, and we have revised the Figure 5F and 5G accordingly.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      Park et al. conducted various analyses attempting to elucidate the biological significance of SARS-CoV-2 mutations. However, the study lacks a clear objective. The specific goals of the analyses in each subsection are unclear, as is how the results from these subsections are interconnected. Compiling results from unrelated analyses into a single paper can be confusing for readers. Clarifying the objective and narrowing down the topics would make the paper's purpose clearer.

      The logic of the study is also unclear. For instance, the authors developed an evaluation score, APESS, for analyzing viral sequences. Although they state that the APESS score correlates with viral infectivity, there is no explanation in the results section about why this is the case.

      The structure of the paper should be reconsidered.

      Thank you for your feedback. We have heeded the input that the study lacks a clear objective and made sure that the overall goal of the study is reflected in the Abstract, Results, and Discussion.

      We have made sure that the specific goals in each subsection are clearer in the Results section that better explain the goals of those sections and elaborated on how the components of our study connect to each other. We have addressed these in more detail in the ‘Recommendations for the authors’ section.

      Thank you for the feedback on APESS, our evaluation model. APESS was created based on virus properties that we discovered of SARS-CoV-2 in our study. When applying our evaluation model, high APESS scores indicated high infectivity. APESS is calculated from a comprehensive evaluation of SARS-CoV-2 at the nucleotide, amino acid, and protein structure levels.

      The detailed explanations and exact calculations of APESS are detailed in the Materials and Methods section in line 571 but we should have been more detailed in the Results section as well. We have made sure to properly indicate this in the Results section in line 284.

      And overall, we have made edits to the manuscript that accurately explain our research by amending terms, restructuring arguments, and providing more clarity for the interconnectivity of the research.

      Reviewer #2 (Public review):

      Summary:

      The authors have developed a machine learning tool AIVE to predict the infectivity of SARS-CoV-2 variants and also a scoring metric to measure infectivity. A large number of virus sequences were used with a very detailed analysis that incorporates hydrophobic, hydrophilic, acid, and alkaline characteristics. The protein structures were also considered to measure infectivity and search for core mutations. The study especially focused on the S protein of SARS-CoV-2. The contents of this study would be of interest to many researchers related to this area and the web service would be helpful to easily analyze such data without in-depth bioinformatics expertise.

      Strengths:

      - Analysis of large-scale data.

      - Experimental validation on a partial set of searched mutations.

      - A user-friendly web-based analysis platform that is made public.

      Weaknesses:

      - Complexity of the research.

      Thank you for your kind feedback. Our study explored a wide range of topics including biochemical properties, machine learning, and viral infectivity.

      In presenting our research, we recognize that our comprehensive analysis may have slightly obscured the specific aims and overall objective of the study. We investigated properties in the viral sequences of SARS-CoV-2 and examined big data, clinical data, and expression data to elucidate their effect on viral infectivity. We then used evaluation modeling and in silico and in vitro validation.

      We have clarified the aims of our research and improved upon the flow of the manuscript by adding sentences that outline the goals of our research in the appropriate sub sections of the Results and Discussion sections.

      Reviewer #1 (Recommendations for the authors):

      The abstract should clearly state the backgrounds, objectives, strategies, and findings of this study in an orderly manner.

      Thank you for your feedback. We have restructured the Abstract to better reflect the goals and methods of our study. We start the Abstract by introducing the background of the study ‘An unprecedented amount of SARS-CoV-2 data has been accumulated compared with previous infectious diseases, enabling insights into its evolutionary process and more thorough analyses.’ in line 48. Then we more clearly stated the overall objectives of our research in line 50 as ‘This study investigates SARS-CoV-2 features as it evolves to evaluate its infectivity.’ Then, we clearly defined our specific discoveries in the virus, the purpose of our evaluation model, and how we validated our findings.

      In the Introduction, the message of each paragraph is unclear. Please clearly state the objectives of the study and what was done to achieve these objectives.

      Thank you for the feedback. We have updated the Introduction section to more clearly state the objectives of the study.

      To increase clarity, we have moved ‘Furthermore, hydrophobic properties in the amino acid sequence affect protein folding. Coronavirus hydrophobicity has significant effects on amino acid properties and protein folding.’ to line 127.

      In line 130, we rephrased the first sentence of the paragraph to ‘For these prior approaches to virus analysis and prediction, expertise with the relevant fields is required for a full understanding.’ to better establish the link between the background information and aims of the study. Then in line 134, we added ‘elucidate properties about the virus’ to clarify the aims of the study.

      In line 141, we have improved the clarity of the sentence to better present the scope and objectives of the study.

      The relationship between the sections in the Results is unclear. Clarify why each section is necessary and how they are interconnected.

      We investigated properties in the viral sequences of SARS-CoV-2 that highlighted amino acid substitutions or changes in polarity (Figure 1). In VOCs, we noted trends or absences of amino acid substitutions at specific positions (Figure 2). We examined epidemiological and clinical data to determine the infectivity, severity, and symptomaticity of lineages. Looking at expression data and binding affinity further illuminated the effect of amino acid substitutions (Figure 3). We created APESS, an evaluation modeling, that is comprehensively calculated from the nucleotide, amino acid, and protein structure levels of the virus. Evaluation of lineages revealed that higher APESS scores were associated with higher infectivity (Figure 4). We used in silico and in vitro validation to reinforce our findings then used machine learning to make predictions on future developments (Figure 5). We created candidate sequences for evaluation and utilized machine learning in predictions (Figure 6).

      We have added explanations to each section in Results that elucidate the objective of each section and how they connect with each other in the wider study.

      In line 157, we have added ‘We examined the amino acid sequences of SARS-CoV-2 to make discoveries about biochemical properties.’ to clearly outline the objective of the subsection.

      In line 207, we have improved the phrasing of the sentence.

      In line 278, we stressed that ‘We developed APESS, an evaluation model to analyze viral sequences based on the nucleotide, amino acid, and protein structure properties.’ to properly define the purpose and background of APESS.

      Please define abbreviations when they first appear.

      We have added the full terms for the stated abbreviations in the relevant sections of the manuscript.

      In line 107, we have added the proper abbreviation for Our World in Data (OWID).

      In lines 143, 175, and 489 we have added the full term for Variants of Concern (VOCs).

      In line 160, we have added the full term for Receptor Binding Motif (RBM).

      Reviewer #2 (Recommendations for the authors):

      (1) pg 9, line 51, full name of RBM should be declared.

      We have added the full name of Receptor Binding Motif (RBM) to the appropriate section in the Abstract.

      (2) How are the Variants of Concern (VOCs) defined?

      Thank you for the comment and we apologize for the confusion. Variants of Concern as defined by the World Health Organization are specified in the Materials and Methods section. We have also added the full name for Variants of Concern (VOCs) when they are first mentioned in the Introduction and Results sections.

      (3) pg 17, line 297. The purpose of using AI/ML to predict amino acid substitutions at specific locations is not clear. The VOCs and related mutation loci were already searched, so the AA substitution prediction step seems a little repetitive. Is it to create customized sequences? Also, if prediction (or probability) was made, some performance evaluation would be helpful.

      Thank you for this feedback. The purpose of utilizing machine learning to make predictions about amino acid substitutions is to assess the possibility of amino acid substitutions occurring at specific locations. These potential amino acid substitutions were evaluated by APESS to have high scores, linking them to high infectivity. As the feedback suggests, amino acid substitutions in VOCs are researched but our prediction sought to ascertain the likelihood of amino acid substitutions that our evaluation model associated with infectivity. In the Results section in line 330, we assessed the probability of amino acid substitutions N460K and Q493R that the study found to be significant. The datasets that we utilized for these predictions are detailed in the Materials and Methods section in line 677.

      The models we trained with machine learning predicted the probability of mutations based on samples in each group and their performance was evaluated by comparing the presence of mutations in the clades they diverged from. We have added the following sentences to line 330: “We used Accuracy, Precision, Recall, and F1 score to evaluate performance. All models showed high performance scores above 0.95 in Precision, Recall, and F1 score. For accuracy, XGBoost, scored above 0.89, exhibiting relatively high performance while LightGBM scored above 0.78.”

      (4) pg 17, line 289. The objective of creating candidate lineages is not clear and would be helpful for the readers if its purpose is elaborated on. Since there are enough SARS-CoV-2 sequences, wouldn't it be more realistic and accurate to use those real sequences instead of creating them? Furthermore, the candidate lineages should be defined but they were missing in this section. This part made it a little difficult to follow the overall paper's logic.

      The manuscript should have been clearer on what ‘candidate lineages’ signified, we apologize for the confusion. In line 314, we included the following sentences for clarity: ‘We introduced amino acid substitutions at specific locations in the SARS-CoV-2 backbone for the wildtype and VOCs. The amino acid substitutions were lysine (K), arginine (R), asparagine (N), serine (S), tyrosine (Y), and glycine (G). We then evaluated the infectivity of these candidate lineages with our evaluation model APESS.’

      The purpose of creating candidate lineages in our study was to assess the effect of specific amino acid substitutions on the virus’ infectivity. The amino acid substitutions we evaluated were lysine (K), arginine (R), asparagine (N), serine (S), tyrosine (Y), and glycine (G). We determined that examining the introduction of specific amino acid substitutions to SARS-CoV-2 sequences would highlight the significance they had on infectivity. We have revised the paragraph in line 314 of the Results section to convey what we were doing.

      (5) This study covers very detailed contents regarding lineages, mutations, and their effect on infectivity. It would be more readable if subsections could be added per group of investigation, especially in the results and discussion section.

      In the Results section, we have emphasized the objective of each subsection and how they connect with one another for the overall goals of our study.

      In line 157, we have added ‘We examined the amino acid sequences of SARS-CoV-2 to make discoveries about biochemical properties.’ to clearly outline the objective of the subsection.

      In line 207, we have improved the phrasing of the sentence.

      In line 278, we stressed that ‘We developed APESS, an evaluation model to analyze viral sequences based on the nucleotide, amino acid, and protein structure properties.’ to properly define the purpose and background of APESS.

      We have made edits to the Discussion section to more clearly indicate subsections.

      In line 389, we have added ‘In our investigation of various viruses’ to clearly indicate the background on other viruses.

      In line 409, we added the sentence ‘We made discoveries on specific amino acid substitutions at positions.’ to indicate the subsection talking about N437R, N460K, and D467 mutations.

      In line 471, we added the sentence ‘We created AIVE to feature our findings and analyses on an online platform.’ And modified the following sentence to better explain AIVE.

      (6) pg 26, line 557. The criteria for the SCPSi scores were set to 0.9 and 0.1 by the proportion of the Omicron and Delta variants. How do other criteria affect the performance of the method?

      Thank you for the question and check point. We used 0.9/0.1 for our initial criteria in our SCPS calculation. To determine how that affected performance, we have used 0.8/0.2 and 0.7/0.3 as the criteria.

      After calculating APESS with different SCPS weights (0.9/0.1, 0.8/0/2, 0.7/0.3), we used a Gaussian Mixture Model (GMM) to compare how the groups were divided based on APESS. All three groups with different SCPS weights were determined to accurately reflect data patterns when they had four components.

      When comparing parameter values, the group that used the original weights of 0.9 and 0.1 for SCPS showed the lowest values for variance and standard error across all four components. This indicates that each component was stable and clearly distinguishable from one another.

      The group where the weights were adjusted to 0.7 and 0.3 for SCPS showed significantly higher variance and a large error for the G2 component. The distribution of each component was more widespread, signifying that the stability and reliability was lower.

      The group where the weights were adjusted to 0.8 and 0.2 for SCPS was positioned between the two previous groups for finer data classification and reliability. However, the group notably lacked reliability when it came to the SE values for the G4 component.

      Thus, the original model with 0.9 and 0.1 weight is the most reliable.

      When the Gaussian Density for each group was plotted, the group with 0.9/0.1 SCPS weights showed the highest peak near 2 (G1), with a value of approximately 2. For the group with SCPS 0.8/0.2 weights, the highest peak appeared near 4.2 (G3), showing a high value around 14. For the group with SCPS 0.7/0.3 weights, the highest peak appeared near 3.7 (G3) showing a value around 5. The group with 0.9/0.1 SCPS weights exhibited a more uniform Gaussian distribution compared to the other two.

      Author response image 1.

      Superposition of Gaussian Densities for SCPS weight 0.9/0.1

      Author response table 1.

      Statistical values of the Superposition of Gaussian Densities for SCPS weight 0.9/0.1

      Author response image 2.

      Superposition of Gaussian Densities for SCPS weight 0.8/0.2

      Author response table 2.

      Statistical values of the Superposition of Gaussian Densities for SCPS weight 0.8/0.2

      Author response image 3.

      Superposition of Gaussian Densities for SCPS weight 0.7/0.3

      Author response table 3.

      Statistical values of the Superposition of Gaussian Densities for SCPS weight 0.7/0.3

      (7) Overall, the approach is very detailed and realistic. Just curious if this approach would be also applicable to other viruses such as influenza.

      We appreciate the insightful comments from the reviewer, and this is a direction we hope to take our research in the future. Our study focused on SARS-CoV-2 and the properties we discovered from the virus’ spike protein interacting with the host’s ACE2 receptor. In our investigation of other coronaviruses such as MERS-CoV, SARS-CoV-1 possesses a different structure and properties than these viruses as we have illustrated in Supplementary Figure 24. We had provided explanations about our investigation of other viruses in the Discussion section. In line 389, we have added ‘In our investigation of various viruses’ to better signpost this section.

    1. Author response:

      (1) General Statements

      As you will see in our attached rebuttal to the reviewers, we have added several new experiments and revised manuscript to fully address their concerns.

      (2) Point-by-point description of the revisions

      Reviewer #1:

      Evidence, reproducibility and clarity

      Summary:

      The manuscript by Yang et al. describes a new CME accessory protein. CCDC32 has been previously suggested to interact with AP2 and in the present work the authors confirm this interaction and show that it is a bona fide CME regulator. In agreement with its interaction with AP2, CCDC32 recruitment to CCPs mirrors the accumulation of clathrin. Knockdown of CCDC32 reduces the amount of productive CCPs, suggestive of a stabilisation role in early clathrin assemblies. Immunoprecipitation experiments mapped the interaction of CCDC42 to the α-appendage of the AP2 complex α-subunit. Finally, the authors show that the CCDC32 nonsense mutations found in patients with cardio-facial-neuro-developmental syndrome disrupt the interaction of this protein to the AP2 complex. The manuscript is well written and the conclusions regarding the role of CCDC32 in CME are supported by good quality data. As detailed below, a few improvements/clarifications are needed to reinforce some of the conclusions, especially the ones regarding CFNDS.

      We thank the referee for their positive comments. In light of a recently published paper describing CCDC32 as a co-chaperone required for AP2 assembly (Wan et al., PNAS, 2024, see reviewer 2), we have added several additional experiments to address all concerns and consequently gained further insight into CCDC32-AP2 interactions and the important dual role of CCDC32 in regulating CME. 

      Major comments:

      (1) Why did the protein could just be visualized at CCPs after knockdown of the endogenous protein? This is highly unusual, especially on stable cell lines. Could this be that the tag is interfering with the expressed protein function rendering it incapable of outcompeting the endogenous? Does this points to a regulated recruitment?

      The reviewer is correct, this would be unusual; however, it is not the case. We misspoke in the text (although the figure legend was correct) these experiments were performed without siRNA knockdown and we can indeed detect eGFP-CCDC32 being recruited to CCPs in the presence of endogenous protein. Nonetheless, we repeated the experiment to be certain (see Author response image 1).  

      Author response image 1.

      Cohort-averaged fluorescence intensity traces of CCPs (marked with mRuby-CLCa) and CCP-enriched eGFPCCDC32(FL).

      (2) The disease mutation used in the paper does not correspond to the truncation found in patients. The authors use an 1-54 truncation, but the patients described in Harel et al. have frame shifts at the positions 19 (Thr19Tyrfs*12) and 64 (Glu64Glyfs*12), while the patient described in Abdalla et al. have the deletion of two introns, leading to a frameshift around amino acid 90. Moreover, to be precisely test the function of these disease mutations, one would need to add the extra amino acids generated by the frame shift. For example, as denoted in the mutation description in Harel et al., the frameshift at position 19 changes the Threonine 19 to a Tyrosine and ads a run of 12 extra amino acids (Thr19Tyrfs*12).

      The label of the disease mutant p.(Thr19Tyrfs12) and p.(Glu64Glyfs12) is based on a 194aa polypeptide version of CCDC32 initiated at a nonconventional start site that contains a 9 aa peptide (VRGSCLRFQ) upstream of the N-terminus we show. Thus, we are indeed using the appropriate mutation site (see: https://www.uniprot.org/uniprotkb/Q9BV29/entry). The reviewer is correct that we have not included the extra 12 aa in our construct; however as these residues are not present in the other CFNDS mutants, we think it unlikely that they contribute to the disease phenotype.  Rather, as neither of the clinically observed mutations contain the 78-98 aa sequence required for AP2 binding and CME function, we are confident that this defect contributed to the disease. Thus, we are including the data on the CCDC32(1-54) mutant, as we believe these results provide a valuable physiological context to our studies. 

      (3) The frameshift caused by the CFNDS mutations (especially the one studied) will likely lead to nonsense mediated RNA decay (NMD). The frameshift is well within the rules where NMD generally kicks in. Therefore, I am unsure about the functional insights of expressing a diseaserelated protein which is likely not present in patients.

      We thank the reviewer for bringing up this concern. However, as shown in new Figure S1, the mutant protein is expressed at comparable levels as the WT, suggesting that NMD is not occurring.

      (4) Coiled coils generally form stable dimers. The typically hydrophobic core of these structures is not suitable for transient interactions. This complicates the interpretation of the results regarding the role of this region as the place where the interaction to AP2 occurs. If the coiled coil holds a stable CCDC32 dimer, disrupting this dimer could reduce the affinity to AP2 (by reduced avidity) to the actual binding site. A construct with an orthogonal dimeriser or a pulldown of the delta78-98 protein with of the GST AP2a-AD could be a good way to sort this issue.

      We were unable to model a stable dimer (or other oligomer) of this protein with high confidence using Alphafold 3.0. Moreover, we were unable to detect endogenous CCDC32 coimmunoprecipitating with eGFP-CCDC32 (Fig. S6C). Thus, we believe that the moniker, based solely on the alpha-helical content of the protein is a misnomer.  We have explained this in the main text.

      Minor comments:

      (1) The authors interchangeably use the term "flat CCPs" and "flat clathrin lattices". While these are indeed related, flat clathrin lattices have been also used to refer to "clathrin plaques". To avoid confusion, I suggest sticking to the term "flat CCPs" to refer to the CCPs which are in their early stages of maturation.

      Agreed. Thank you for the suggestion. We have renamed these structures flat clathrin assemblies, as they do not acquire the curvature needed to classify them as pits, and do not grow to the size that would classify then as plaques. 

      Significance

      General assessment:

      CME drives the internalisation of hundreds of receptors and surface proteins in practically all tissues, making it an essential process for various physiological processes. This versatility comes at the cost of a large number of molecular players and regulators. To understand this complexity, unravelling all the components of this process is vital. The manuscript by Yang et al. gives an important contribution to this effort as it describes a new CME regulator, CCDC32, which acts directly at the main CME adaptor AP2. The link to disease is interesting, but the authors need to refine their experiments. The requirement for endogenous knockdown for recruitment of the tagged CCDC32 is unusual and requires further exploration.

      Advance:

      The increased frequency of abortive events presented by CCDC32 knockdown cells is very interesting, as it hints to an active mechanism that regulates the stabilisation and growth of clathrin coated pits. The exact way clathrin coated pits are stabilised is still an open question in the field.

      Audience:

      This is a basic research manuscript. However, given the essential role of CME in physiology and the growing number of CME players involved in disease, this manuscript can reach broader audiences.

      We thank the referee for recognizing the ‘interesting’ advances our studies have made and for considering these studies as ‘an important contribution’ to ‘an essential process for various physiological processes’ and able ‘to reach broader audiences’. We have addressed and reconciled the reviewer’s concerns in our revised manuscript. 

      Field of expertise of the reviewer:

      Clathrin mediated endocytosis, cell biology, microscopy, biochemistry.

      Reviewer #2:

      Evidence, reproducibility and clarity

      In this manuscript, the authors demonstrate that CCDC32 regulates clathrin-mediated endocytosis (CME). Some of the findings are consistent with a recent report by Wan et al. (2024 PNAS), such as the observation that CCDC32 depletion reduces transferrin uptake and diminishes the formation of clathrin-coated pits. The primary function of CCDC32 is to regulate AP2 assembly, and its depletion leads to AP2 degradation. However, this study did not examine AP2 expression levels. CCDC32 may bind to the appendage domain of AP2 alpha, but it also binds to the core domain of AP2 alpha.

      We thank the reviewer for drawing our attention to the Wan et al. paper, that appeared while this work was under review.  However, our in vivo data are not fully consistent with the report from Wan et al. The discrepancies reveal a dual function of CCDC32 in CME that was masked by complete knockout vs siRNA knockdown of the protein, and also likely affected by the position of the GFP-tag (C- vs N-terminal) on this small protein. Thus:

      -  Contrary to Wan et al., we do not detect any loss of AP2 expression (see new Figure S3A-B) upon siRNA knockdown. Most likely the ~40% residual CCDC32 present after siRNA knockdown is sufficient to fulfill its catalytic chaperone function but not its structural role in regulating CME beyond the AP2 assembly step.  

      - Contrary to Wan et al., we have shown that CCDC32 indeed interacts with intact AP2 complex (Figure S3C and 6B,C) showing that all 4 subunits of the AP2 complex co-IP with full length eGFP-CCDC32. Interestingly, whereas the full length CCDC32 pulls down the intact AP2 complex, co-IP of the ∆78-98 mutant retains its ability to pull down the β2-µ2 hemicomplex, its interactions with α:σ2 are severely reduced.  While this result is consistent with the report of Wan et al that CCDC32 binds to the α:σ2 hemi-complex, it also suggests that the interactions between CCDC32 and AP2 are more complex and will require further studies.

      - Contrary to Wan et al., we provide strong evidence that CCDC32 is recruited to CCPs. Interestingly, modeling with AlphaFold 3.0 identifies a highly probably interaction between alpha helices encoded by residues 66-91 on CCDC32 and residues 418-438 on α. The latter are masked by µ2-C in the closed confirmation of the AP2 core, but exposed in the open confirmation triggered by cargo binding, suggesting that CCDC32 might only bind to membrane-bound AP2.

      Thus, our findings are indeed novel and indicate striking multifunctional roles for CCDC32 in CME, making the protein well worth further study. 

      (1) Besides its role in AP2 assembly, CCDC32 may potentially have another function on the membrane. However, there is no direct evidence showing that CCDC32 associates with the plasma membrane.

      We disagree, our data clearly shows that CCDC32 is recruited to CCPs (Fig. 1B) and that CCPs that fail to recruit CCDC32 are short-lived and likely abortive (Fig. 1C). Wan et al. did not observe any colocalization of C-terminally tagged CCDC32 to CCPs, whereas we detect recruitment of our N-terminally tagged construct, which we also show is functional (Fig. 6F).  Further, we have demonstrated the importance of the C-terminal region of CCDC32 in membrane association (see new Fig. S7).  Thus, we speculate that a C-terminally tagged CCDC32 might not be fully functional. Indeed, SIM images of the C-terminally-tagged CCDC32 in Wan et al., show large (~100 nm) structures in the cytosol, which may reflect aggregation. 

      (2) CCDC32 binds to multiple regions on AP2, including the core domain. It is important to distinguish the functional roles of these different binding sites.

      We have localized the AP2-ear binding region to residues 78-99 and shown these to be critical for the functions we have identified. As described above we now include data that are complementary to those of Wan et al. However, our data also clearly points to additional binding modalities. We agree that it will be important and map these additional interactions and identify their functional roles, but this is beyond the scope of this paper.  

      (3) AP2 expression levels should be examined in CCDC32 depleted cells. If AP2 is gone, it is not surprising that clathrin-coated pits are defective.

      Agreed and we have confirmed this by western blotting (Figure S3A-B) and detect no reduction in levels of any of the AP2 subunits in CCDC32 siRNA knockdown cells. As stated above this could be due to residual CCDC32 present in the siRNA KD vs the CRISPR-mediated gene KO.

      (4) If the authors aim to establish a secondary function for CCDC32, they need to thoroughly discuss the known chaperone function of CCDC32 and consider whether and how CCDC32 regulates a downstream step in CME.

      Agreed. We have described the Wan et al paper, which came out while our manuscript was in review, in our Introduction.  As described above, there are areas of agreement and of discrepancies, which are thoroughly documented and discussed throughout the revised manuscript.  

      (5) The quality of Figure 1A is very low, making it difficult to assess the localization and quantify the data.

      The low signal:noise in Fig. 1A the reviewer is concerned about is due to a diffuse distribution of CCDC32 on the inner surface of the plasma membrane. We now, more explicitly describe this binding, which we believe reflects a specific interaction mediated by the C-terminus of CCDC32; thus the degree of diffuse membrane binding we observe follows: eGFP-CCDC32(FL)> eGFPCCDC32(∆78-98)>eGFP-CCDC32(1-54)~eGFP/background (see new Fig. S7). Importantly, the colocalization of CCDC32 at CCPs is confirmed by the dynamic imaging of CCPs (Fig 1B).

      (6) In Figure 6, why aren't AP2 mu and sigma subunits shown?

      Agreed. Not being aware of CCDC32’s possible dual role as a chaperone, we had assumed that the AP2 complex was intact.  We have now added this data in Figure 6 B,C and Fig. S3C, as discussed above. 

      Page 5, top, this sentence is confusing: "their surface area (~17 x 10 nm<sup>2</sup>) remains significantly less than that required for the average 100 nm diameter CCV (~3.2 x 103 nm<sup>2</sup>)."

      Thank you for the criticism. We have clarified the sentence and corrected a typo, which would definitely be confusing.  The section now reads,  “While the flat CCSs we detected in CCDC32 knockdown cells were significantly larger than in control cells (Fig. 4D, mean diameter of 147 nm vs. 127 nm, respectively), they are much smaller than typical long-lived flat clathrin lattices (d≥300 nm)(Grove et al., 2014). Indeed, the surface area of the flat CCSs that accumulate in CCDC32 KD cells (mean ~1.69 x 10<sup>4</sup> nm<sup>2</sup>) remains significantly less than the surface area of an average 100 nm diameter CCV (~3.14 x 10<sup>4</sup> nm<sup>2</sup>). Thus, we refer to these structures as ‘flat clathrin assemblies’ because they are neither curved ‘pits’ nor large ‘lattices’. Rather, the flat clathrin assemblies represent early, likely defective, intermediates in CCP formation.” 

      Significance

      Overall, while this work presents some interesting ideas, it remains unclear whether CCDC32 regulates AP2 beyond the assembly step.

      Our responses above argue that we have indeed established that CCDC32 regulates AP2 beyond the assembly step. We have also identified several discrepancies between our findings and those reported by Wan et al., most notably binding between CCDC32 and mature AP2 complexes and the AP2-dependent recruitment of CCDC32 to CCPs.  It is possible that these discrepancies may be due to the position of the GFP tag (ours is N-terminal, theirs is C-terminal; we show that the N-terminal tagged CCDC32 rescues the knockdown phenotype, while Wan et al., do not provide evidence for functionality of the C-terminal construct). 

      Reviewer #3: 

      Evidence, reproducibility and clarity (Required): 

      In this manuscript, Yang et al. characterize the endocytic accessory protein CCDC32, which has implications in cardio-facio-neuro-developmental syndrome (CFNDS). The authors clearly demonstrate that the protein CCDC32 has a role in the early stages of endocytosis, mainly through the interaction with the major endocytic adaptor protein AP2, and they identify regions taking part in this recognition. Through live cell fluorescence imaging and electron microscopy of endocytic pits, the authors characterize the lifetimes of endocytic sites, the formation rate of endocytic sites and pits and the invagination depth, in addition to transferrin receptor (TfnR) uptake experiments. Binding between CCDC32 and CCDC32 mutants to the AP2 alpha appendage domain is assessed by pull down experiments. Together, these experiments allow deriving a phenotype of CCDC32 knock-down and CCDC32 mutants within endocytosis, which is a very robust system, in which defects are not so easily detected. A mutation of CCDC32, known to play a role in CFNDS, is also addressed in this study and shown to have endocytic defects.

      We thank the reviewer for their positive remarks regarding the quality of our data and the strength of our conclusions.  

      In summary, the authors present a strong combination of techniques, assessing the impact of CCDC32 in clathrin mediated endocytosis and its binding to AP2, whereby the following major and minor points remain to be addressed: 

      - The authors show that CCDC32 depletion leads to the formation of brighter and static clathrin coated structures (Figure 2), but that these were only prevalent to 7.8% and masked the 'normal' dynamic CCPs. At the same time, the authors show that the absence of CCDC32 induces pits with shorter life times (Figure 1 and Figure 2), the 'majority' of the pits.

      Clarification is needed as to how the authors arrive at these conclusions and these numbers. The authors should also provide (and visualize) the corresponding statistics. The same statement is made again later on in the manuscript, where the authors explain their electron microscopy data. Was the number derived from there? 

      These points are critical to understanding CCDC32's role in endocytosis and is key to understanding the model presented in Figure 8. The numbers of how many pits accumulate in flat lattices versus normal endocytosis progression and the actual time scales could be included in this model and would make the figure much stronger. 

      Thank you for these comments.  We understand the paradox between the visual impression and the reality of our dynamic measurements. We have been visually misled by this in previous work (Chen et al., 2020), which emphasizes the importance of unbiased image analysis afforded to us through the well-documented cmeAnalysis pipeline, developed by us (Aguet et al., 2013) and now used by many others (e.g. (He et al., 2020)). 

      The % of static structures was not derived from electron microscopy data, but quantified using cmeAnalysis, which automatedly provides the lifetime distribution of CCPs. We have now clarified this in the manuscript and added a histogram (Fig. S4) quantifying the fraction of CCPs in lifetime cohorts  <20s, 21-60s, 61-100s, 101-150s and >150s (static). 

      - In relation to the above point, the statistics of Figure 2E-G and the analysis leading there should also be explained in more detail: For example, what are the individual points in the plot (also in Figures 6G and 7G)? The authors should also use a few phrases to explain software they use, for example DASC, in the main text. 

      Each point in these bar graphs represents a movie, where n≥12. These details have been added to the respective figure legend. We have also added a brief description of DASC analysis in the text. 

      -  There are several questions related to the knock-down experiments that need to be addressed:

      Firstly, knock-down of CCDC32 does not seem to be very strong (Figure S2B). Can the level of knock-down be quantified? 

      We have now quantified the KD efficiency. It is ~60%. This turns out to be fortuitous (see responses to reviewer 2), as a recent publication, which came out after we completed our study, has shown by CRISPR-mediated knockout, that CCD32 also plays an essential chaperone function required for AP2 assembly.  We do not see any reduction in AP2 levels or its complex formation under our conditions (see new Supplemental Figure S3), which suggests that the effects of CCDC32 on CCP dynamics are more sensitive to CCDC32 concentration than its roles as a chaperone. Our phenotypes would have been masked by more efficient depletion of CCDC32.  

      In page 6 it is indicated that the eGFP-CCDC32(1-54) and eGFP-CCDC32(∆78-98) constructs are siRNA-resistant. However in Fig S2B, these proteins do not show any signal in the western blot, so it is not clear if they are expressed or simply not detected by the antibody. The presence of these proteins after silencing endogenous CCDC32 needs to be confirmed to support Figures 6 and Figures 7, which critically rely on the presence of the CCDC32 mutants. 

      Unfortunately, the C-terminally truncated CCDC32 proteins are not detected because they lack the antibody epitope, indeed even the ∆78-98 deletion is poorly detected (compare the GFP blot in new S1A with the anti-CCDC32 blot in S1B).  However, these constructs contain the same siRNA-resistance mutation as the full length protein. That they are expressed and siRNA resistant can be seen in Fig. S2A (now Fig. S1A) blotting for GFP.

      In Figures 6 and 7, siRNA knock-down of CCDC32 is only indicated for sub-figures F to G. Is this really the case? If not, the authors should clarify. The siRNA knock-down in Figure 1 is also only mentioned in the text, not in the figure legend. The authors should pay attention to make their figure legends easy to understand and unambiguous. 

      No, it is not the case.  Thank you for pointing out the uncertainty. We have added these details to the Figure legends and checked all Figure legends to ensure that they clearly describe the data shown.  

      - It is not exactly clear how the curves in Figure 3C (lower panel) on the invagination depth were obtained. Can the authors clarify this a bit more? For example, what are kT and kE in Figure 3A? What is I0? And how did the authors derive the logarithmic function used to quantify the invagination depth? In the main text, the authors say that the traces were 'logarithmically transformed'. This is not a technical term. The authors should refer to the actual equation used in the figure. 

      This analysis was developed by the Kirchhausen lab (Saffarian and Kirchhausen, 2008). We have added these details and reference them in the Figure legend and in the text. We also now use the more accurate descriptor ‘log-transformed’.

      - In the discussion, the claim 'The resulting dysregulation of AP2 inhibits CME, which further results in the development of CFNDS.' is maybe a bit too strong of a statement. Firstly, because the authors show themselves that CME is perturbed, but by no means inhibited. Secondly, the molecular link to CFNDS remains unclear. Even though CCDC32 mutants seem to be responsible for CFNDS and one of the mutant has been shown in this study to have a defect in endocytosis and AP2 binding, a direct link between CCDC32's function in endocytosis and CFNDS remains elusive. The authors should thus provide a more balanced discussion on this topic. 

      We have modified and softened our conclusions, which now read that the phenotypes we see likely “contribute to” rather than “cause” the disease.

      - In Figure S1, the authors annotate the presence of a coiled-coil domain, which they also use later on in the manuscript to generate mutations. Could the authors specify (and cite) where and how this coiled-coil domain has been identified? Is this predicted helix indeed a coiled-coil domain, or just a helix, as indicated by the authors in the discussion?

      See response to Reviewer 1, point 4.  We have changed this wording to alpha-helix. The ‘coiled-coil’ reference is historical and unlikely a true reflection of CCDC32 structure. AlphaFold 3.0 predictions were unable to identify with certainly any coiled-coil structures, even if we modelled potential dimers or trimers; and we find no evidence of dimerization of CCDC32 in vivo. We have clarified this in the text.

      Minor comments

      - In general, a more detailed explanation of the microscopy techniques used and the information they report would be beneficial to provide access to the article also to non-expert readers in the field. This concerns particularly the analysis methods used, for example: 

      How were the cohort-averaged fluorescence intensity and lifetime traces obtained? 

      How do the tools cmeAnalysis and DASC work? A brief explanation would be helpful. 

      We have expanded Methods to add these details, and also described them in the main text. 

      - The axis label of Figure 2B is not quite clear. What does 'TfnR uptake % of surface bound' mean? Maybe the authors could explain this in more detail in the figure legend? Is the drop in uptake efficiency also accessible by visual inspection of the images? It would be interesting to see that. 

      This is a standard measure of CME efficiency. 'TfnR uptake % of surface bound' = Internalized TfnR/Surface bound TfnR. Again, images may be misleading as defects in CME lead to increased levels of TfnR on the cell surface, which in turn would result in more Tfn uptake even if the rate of CME is decreased.

      - Figure 4: How is the occupancy of CCPs in the plasma membrane measured? What are the criteria used to divide CCSs into Flat, Dome or Sphere categories? 

      We have expanded Methods to add these details. Based on the degree of invagination, the shapes of CCSs were classified as either: flat CCSs with no obvious invagination; dome-shaped CCSs that had a hemispherical or less invaginated shape with visible edges of the clathrin lattice; and spherical CCSs that had a round shape with the invisible edges of clathrin lattice in 2D projection images. In most cases, the shapes were obvious in 2D PREM images. In uncertain cases, the degree of CCS invagination was determined using images tilted at ±10–20 degrees. The area of CCSs were measured using ImageJ and used for the calculation of the CCS occupancy on the plasma membrane.

      - Figure 5B: Can the authors explain, where exactly the GFP was engineered into AP2 alpha? This construct does not seem to be explained in the methods section. 

      We have added this information. The construct, which corresponds to an insertion of GFP into the flexible hinge region of AP2, at aa649, was first described by (Mino et al., 2020) and shown to be fully functional.  This information has been added to the Methods section.

      - Figure S1B: The authors should indicate the colour code used for the structural model.

      We have expanded our structural modeling using AlphaFold 3.0 in light of the recent publication suggesting the CCDC32 interacts with the µ2 subunit and does not bind full length AP2. These results are described in the text. The color coding now reflects certainty values given by AlphaFold 3.0 (Fig. S6B, D). 

      - The list of primers referred to in the materials and methods section does not exist. There is a Table S1, but this contains different data. The actual Table S1 is not referenced in the main text. This should be done. 

      We apologize for this error. We have now added this information in Table S2.

      Significance (Required):

      In this study, the authors analyse a so-far poorly understood endocytic accessory protein, CCDC32, and its implication for endocytosis. The experimental tool set used, allowing to quantify CCP dynamics and invagination is clearly a strength of the article that allows assessing the impact of an accessory protein towards the endocytic uptake mechanism, which is normally very robust towards mutations. Only through this detailed analysis of endocytosis progression could the authors detect clear differences in the presence and absence of CCDC32 and its mutants. If the above points are successfully addressed, the study will provide very interesting and highly relevant work allowing a better understanding of the early phases in CME with implication for disease. 

      The study is thus of potential interest to an audience interested in CME, in disease and its molecular reasons, as well as for readers interested in intrinsically disordered proteins to a certain extent, claiming thus a relatively broad audience. The presented results may initiate further studies of the so-far poorly understood and less well known accessory protein CCDC32.

      We thank the reviewer for their positive comments on the significance of our findings and the importance of our detailed phenotypic analysis made possible by quantitative live cell microscopy. We also believe that our new structural modeling of CCDC32 and our findings of complex and extensive interactions with AP2 make the reviewers point regarding intrinsically disordered proteins even more interesting and relevant to a broad audience.  We trust that our revisions indeed address the reviewer’s concerns. 

      The field of expertise of the reviewer is structural biology, biochemistry and clathrin mediated endocytosis. Expertise in cell biology is rather superficial.

      References:

      Aguet, F., Costin N. Antonescu, M. Mettlen, Sandra L. Schmid, and G. Danuser. 2013. Advances in Analysis of Low Signal-to-Noise Images Link Dynamin and AP2 to the Functions of an Endocytic Checkpoint. Developmental Cell. 26:279-291.

      Chen, Z., R.E. Mino, M. Mettlen, P. Michaely, M. Bhave, D.K. Reed, and S.L. Schmid. 2020. Wbox2: A clathrin terminal domain–derived peptide inhibitor of clathrin-mediated endocytosis. Journal of Cell Biology. 219.

      Grove, J., D.J. Metcalf, A.E. Knight, S.T. Wavre-Shapton, T. Sun, E.D. Protonotarios, L.D. Griffin, J. Lippincott-Schwartz, and M. Marsh. 2014. Flat clathrin lattices: stable features of the plasma membrane. Mol Biol Cell. 25:3581-3594.

      He, K., E. Song, S. Upadhyayula, S. Dang, R. Gaudin, W. Skillern, K. Bu, B.R. Capraro, I. Rapoport, I. Kusters, M. Ma, and T. Kirchhausen. 2020. Dynamics of Auxilin 1 and GAK in clathrinmediated traffic. J Cell Biol. 219.

      Mino, R.E., Z. Chen, M. Mettlen, and S.L. Schmid. 2020. An internally eGFP-tagged α-adaptin is a fully functional and improved fiduciary marker for clathrin-coated pit dynamics. Traffic. 21:603-616.

      Saffarian, S., and T. Kirchhausen. 2008. Differential evanescence nanometry: live-cell fluorescence measurements with 10-nm axial resolution on the plasma membrane. Biophys J. 94:23332342.

    1. Author response:

      Reviewer #1 (Evidence, reproducibility and clarity):

      Minor comments:

      In the results section (lines 498-499), the authors describe free kinetochores in many cells without associated spindle microtubules. However, some nuclei appear to have kinetochores, as presented in Figure 6. Could the authors clarify how this conclusion was derived using transmission electron microscopy (TEM) without serial sectioning, as this is not explicitly mentioned in the materials and methods?

      We observed free kinetochores in the ALLAN-KO parasites with no associated spindle microtubules (see Fig. 6Gh), while kinetochores are attached to spindle microtubules in WT-GFP cells (see Fig. 6Gc). To provide further evidence we analysed additional images and found that ALLAN-KO cells have free kinetochores in the centre of nucleus, unattached to spindle microtubules. We provide some more images clearly showing free kinetochores in these cells (new supplementary Fig. S11).

      However, in the ALLAN mutant, this difference is not absolute: in a search of over 50 cells, one example of a cell with a “normal” nuclear spindle and attached kinetochores was observed.

      The use of serial sectioning has limitations for examining small structures like kinetochores in whole cells. The limitations of the various techniques (for example, SBF-SEM vs tomography) are highlighted in our previous study (Hair et al 2022; PMID: 38092766), and we consider that examining a population of randomly sectioned cells provides a better understanding of the overall incidence of specific features.

      Discussion Section:

      Could the authors expand on why SUN1 and ALLAN are not required during asexual replication, even though they play essential roles during male gametogenesis?

      We observed no phenotype in asexual blood stage parasites associated with the sun1 and allan gene deletions. Several other Plasmodium berghei gene knockout parasites with a phenotype in sexual stages, for example CDPK4 (PMID: 15137943), SRPK (PMID: 20951971), PPKL (PMID: 23028336) and kinesin-5 (PMID: 33154955) have no phenotype in blood stages, so perhaps this is not surprising. One explanation may be the substantial differences in the mode of cell division between these two stages. Asexual blood stages produce new progeny (merozoites) over 24 hours with closed mitosis and asynchronous karyokinesis during schizogony, while male gametogenesis is a rapid process, completed within 15 min to produce eight flagellated gametes. During male gametogenesis the nuclear envelope must expand to accommodate the increased DNA content (from 1N to 8N) before cytokinesis. Furthermore, male gametogenesis is the only stage of the life cycle to make flagella, and axonemes must be assembled in the cytoplasm to produce the flagellated motile male gametes at the end of the process. Thus, these two stages of parasite development have some very different and specific features.

      Lines 611-613 states: "These loops serve as structural hubs for spindle assembly and kinetochore attachment at the nuclear MTOC, separating nuclear and cytoplasmic compartments." Could the authors elaborate on the evidence supporting this statement?

      We observed the loops/folds in the nuclear envelope (NE) as revealed by SUN1-GFP and 3D TEM images during male gametogenesis. These folds/loops occur mainly in the vicinity of the nuclear MTOC where the spindles are assembled (as visualised by EB1 fluorescence) and attached to kinetochores (as visualised by NDC80 fluorescence). These loops/folds may form due to the contraction of the spindle pole back to the nuclear periphery, inducing distortion of the NE. Since there is no physical segregation of chromosomes during the three rounds of mitosis (DNA increasing from 1N to 8N), we suggest that these folds provide additional space for spindle and kinetochore dynamics within an intact NE to maintain separation from the cytoplasm (as shown by location of kinesin-8B).

      In lines 621-622, the authors suggest that ALLAN may have a broader role in NE remodelling across the parasite's lifecycle. Could they reflect on or remind readers of the finding that ALLAN is not essential during the asexual stage?

      ALLAN-GFP is expressed throughout the parasite life cycle but as the reviewer points out, a functional role is more pronounced during male gametogenesis. This does not mean that it has no role at other stages of the life cycle even if there is no obvious phenotype following deletion of the gene during the asexual blood stage. The fact that ALLAN is not essential during the asexual blood stage is noted in lines 628-29.

      Reviewer #2 (Evidence, reproducibility and clarity):

      Introduction

      Line 63: The authors stat: "NE is integral to mitosis, supporting spindle formation, kinetochore attachment, and chromosome segregation..". Seemingly at odds, they also say (Line 69) that 'open' "mitosis is "characterized by complete NE disassembly".

      The authors could explain better the ideas presented in their quoted review from Dey and Baum, which points out that truly 'open' and 'closed' topologies may not exist and that even in 'open' mitosis, remnants of the NE may help support the mitotic spindle.

      We have modified the sentence in which we discuss current opinions about ‘open’ and ‘closed’ mitosis. It is believed that there is no complete disassembly of the NE during open mitosis and no completely intact NE during closed mitosis, respectively. In fact, the NE plays a critical role in the different modes of mitosis during MTOC organisation and spindle dynamics. Please see the modified lines 64-71.

      Results

      Fig 7 is the final figure; but would be more useful upfront.

      We have provided a new introductory figure (Fig 1) showing a schematic of conventional /canonical LINC complexes and evidence of SUN protein functions in model eukaryotes and compare them to what is known in apicomplexans.

      Fig 1D. The authors generated a C-terminal GFP-tagged SUN1 transfectants and used ultrastructure expansion microscopy (U-ExM) and structured illumination microscopy (SIM) to examine SUN1-GFP in male gametocytes post-activation. The immuno-labelling of SUN1-GFP in these fixed cells appears very different to the live cell images of SUN1-GFP. The labelling profile comprises distinct punctate structures (particularly in the U-ExM images), suggesting that paraformaldehyde fixation process, followed by the addition of the primary and secondary antibodies has caused coalescing of the SUN1-GFP signal into particular regions within the NE.

      We agree with the reviewer. Fixation with paraformaldehyde (PFA) results in a coalescence of the SUN1-GFP signal. We have also tried methanol fixation (see new Fig. S2), but a similar problem was encountered.

      Given these fixation issues, the suggestion that the SUN1-GFP signal is concentrated at the BB/ nuclear MTOC and "enriched near spindle poles" needs further support.

      These statements seem at odd with the data for live cell imaging where the SUN1-GFP seems evenly distributed around the nuclear periphery. Can the observation be quantitated by calculating the percentage of BB/ nuclear MTOC structures with associated SUN1-GFP puncta? If not, I am not convinced these data help understand the molecular events.

      We agree with the reviewer that whilst the live cell imaging showed an even distribution of SUN1-GFP signal, after fixation with either PFA or methanol, then SUN1-GFP puncta are observed in addition to the peripheral location around the stained DNA (Hoechst) (See Fig. S2; puncta are indicated by arrows). These SUN1-GFP labelled puncta were observed at the junction of the nuclear MTOC and the basal body (Fig. 2F). Quantification of the distribution showed that these SUN1-GFP puncta are associated with nuclear MTOC in more than 90 % of cells (18 cells examined). Live cell imaging of the dual labelled parasites; SUN1xkinesin-8B (Fig. 2H) and SUN1x EB1 (Fig. 2I) provides further support for the association of SUN1-GFP puncta with BB (kinesin-8B) /nuclear MTOC (EB1).

      The authors then generated dual transfectants and examined the relative locations of different markers in live cells. These data are more informative.

      The authors state; " ..SUN1-GFP marked the NE with strong signals located near the nuclear MTOCs situated between the BB tetrads". The nuclear MTOCs are not labelled in this experiment. The SUN1-GFP signal between the kinesin-8B puncta is evident as small puncta on regions of NE distortion. I would prefer to not describe this signal as "strong". The signal is stronger in other regions of the NE.

      We have modified the sentence on line 213 to accommodate this suggestion.

      Line 219. The authors state; "..SUN1-GFP is partially colocalized with spindle poles as indicated by EB1,.. it shows no overlap with kinetochores (NDC80)." The authors should provide an analysis of the level of overlap at a pixel by pixel level to support this statement.

      We now provide the overlap at a pixel-by-pixel level for representative images, and we have quantified more cells (n>30), as documented in the new Fig. S4A. We have also modified the sentence on line 219 to reflect these additions.

      The SUN1 construct is C-terminally GFP-tagged. By analogy with human SUN1, the C-terminal SUN domain is expected to be in the NE lumen. That is in a different compartment to EB1, which is located in the nuclear lumen (on the spindle). Thus, the overlap of signal is expected to be minimal.

      We agree with the reviewer that the overlap between EB1 and Sun1 signals is expected to be minimal. We have quantified the data and included it in Supplementary Fig. S4A.

      Similarly, given that EB1 and NDC80 are known to occupy overlapping locations on the spindle, it seems unlikely that SUN1 can overlap with one and not the other.

      We agree with the reviewer’s analysis that EB1 and NDC80 occupy overlapping locations on the spindle, although the length of NDC80 is less at the ends of spindles (see Author response image 1A) as shown in our previous study where we compared the locations of two spindle proteins, ARK2 and EB1, with that of NDC80 (Zeeshan et al, 2022; PMID: 37704606). In the present study we observed that Sun1-GFP partially overlaps with EB1 at the ends of the spindle, but not with NDC80. Please see Author response image 1B.

      Author response image 1.

      I note on Line 609, the authors state "Our study demonstrates that SUN1 is primarily localized to the nuclear side of the NE.." As per Fig 7D, and as discussed above, the bulk of the protein, including the SUN1 domain, is located in the space between the INM and the ONM.

      We appreciate the reviewer’s correction; we have now modified the sentence to indicate that the protein is largely localized in the space between the INM and the ONM on line 617.

      Interestingly, as the authors point out, nuclear membrane loops are evident around EB1 and NDC80 focal regions. The data suggests that the contraction of the spindle pole back to the nuclear periphery induces distortion of the NE.

      We agree with the reviewer’s suggestion that the data indicate that contraction of spindle poles back to the nuclear periphery may induce distortion of the NE.

      The author should discuss further the overlap of findings of this study with that from a recent manuscript (https://doi.org/10.1016/j.cels.2024.10.008). That Sayers et al. study identified a complex of SUN1 and ALLC1 as essential for male fertility in P. berghei. Sayers et al. also provide evidence that this complex particulate in the linkage of the MTOC to the NE and is needed for correct mitotic spindle formation during male gametogenesis.

      We thank the reviewer for this suggestion. The study by Sayers et al, (2024) was published while our manuscript was under preparation. It was interesting to see that these complementary studies have similar findings about the role of SUN1 and the novel complex of SUN1-ALLAN. Our study contains a more detailed, in-depth analysis both by Expansion and TEM of SUN1. We include additional studies on the role of ALLAN.  We discuss the overlap in the findings of the two studies in lines 590-605.

      While the work is interesting, the conclusions may need to be tempered. The authors suggestion that in the absence of KASH-domain proteins, the SUN1-ALLAN complex forms a non-canonical LINC complex (that is, a connection across the NE), that "achieves precise nuclear and cytoskeletal coordination".

      We have toned down the wording of this conclusion in lines 665-677.

      In other organisms, KASH interacts with the C-terminal domain on SUN1, which as mentioned above is located between the INM and ONM. By contrast, ALLAN interacts with the N-terminal domain of SUN1, which is located in the nuclear lumen. The SUN1-ALLAN interaction is clearly of interest, and ALLAN might replace some of the roles of lamins. However, the protein that functionally replaces KASH (i.e. links SUN1 to the ONM) remains unidentified.

      We agree with reviewer, and future studies will need to focus on identifying the KASH replacement that links SUN1 to the ONM.

      It may also be premature to suggest that the SUN1-ALLAN complex is promising target for blocking malaria transmission. How would it be targeted?

      We have deleted the sentence that raised this suggestion.

      While the above datasets are interesting and internally consistent, there are two other aspects of the manuscript that need further development before they can usefully contribute to the molecular story.

      The authors undertook a transcriptomic analysis of Δsun1 and WT gametocytes, at 8 and 30 min post-activation, revealing moderate changes (~2-fold change) in different genes. GO-based analysis suggested up-regulation of genes involved in lipid metabolism. Given the modest changes, it may not be correct to conclude that "lipid metabolism and microtubule function may be critical functions for gametogenesis that can be perturbed by sun1 deletion." These changes may simply be a consequence of the stalled male gametocyte development.

      Following the reviewer’s suggestion we have moved these data to the supplementary information (Fig. S5D-I) and toned down their discussion in the results and discussion sections.

      The authors have then undertaken a detailed lipid analysis of the Δsun1 and WT gametocytes, before and after activation. Substantial changes in lipid metabolites might not be expected in such a short period of time. And indeed, the changes appear minimal. Similarly, there are only minor changes in a few lipid sub-classes between Δsun1 and WT gametocytes. In my opinion, the data are not sufficient to support the authors conclusion that "SUN1 plays a crucial role, linking lipid metabolism to NE remodelling and gamete formation."

      In agreement with the reviewer’s comments we have moved  these data to supplementary information (Fig. S6) and substantially toned down the conclusions based on these findings.

      Reviewer #3 (Evidence, reproducibility and clarity):

      Major comments:

      My main concern with this manuscript is that the authors do conclude not only that SUN1 is important for spindle formation and basal body segregation, but also that it influences for lipid metabolism and NE dynamics. I don't think the data supports this conclusion, for several reasons listed below. I would suggest to remove this claim from the manuscript or at least tone it down unless more supporting data are provided, in particular showing any change in NE dynamics in the SUN1-KO. Instead I would recommend to focus on the more interesting role of SUN1-ALLAN in bipartite MTOC organisation, which likely explains all observed phenotypes (including those in later stages of the parasite life cycle). In addition, some aspects of the knockout phenotype should be quantified to a bit deeper level.

      In more detail:

      - The lipidomics analysis is clearly the weakest point of the manuscript: The authors state that there are significant changes in some lipid populations between WT and sun1-KO, and between activated and non-activated cells, yet no statistical analysis is shown and the error bars are quite high compared to only minor changes in the means. For some discussed lipids, the result text does not match the graphs, e.g. PA, where the increase upon activation is more pronounced in the SUN1-KO vs WT (contrary to the text), or MAG, which is reduced in the SUN1-KO vs WT (contrary to the text). I don't see the discussed changes in arachidonic acid levels and myristic acid levels in the data either. Even if the authors find after analysis some statistically significant differences between some groups, they should carefully discuss the biological significance of these differences. As it is, I do not think the presented data warrants the conclusion that deletion of SUN1 changes lipid homeostasis, but rather shows that overall lipid homeostasis is not majorly affected by gametogenesis or SUN1 deletion. As a minor comment, if you decide to keep the lipidomics analysis in the manuscript, please state how many replicates were done.

      As detailed above we have moved the lipidomics data to supplementary information (Fig. S6) and substantially toned down the discussion of these data in the results and discussion sections.

      - I can't quite follow the logic why the authors performed transcriptomic analysis of the SUN1 and how they chose their time points. Their data up to this point indicate that SUN1 has a structural or coordinating role in the bipartite MTOC during male gametogenesis. Based on that it is rather unlikely that SUN1 KO directly leads to transcriptional changes within the 8 min of exflagellation. Isn't it more likely that transcriptional differences are purely a downstream effect of incomplete/failed gametogenesis? This is particularly true for the comparison at 30 min, which compares a mixture of exflagellated/emerged gametes and zygotes in WT to a mixture of aberrant, arrested gametes in the knockout, which will likely not give any meaningful insight. The by far most significant GO-term is then also nuclear-transcribed mRNA catabolic process, which is likely not related at all to SUN1 function (and the authors do not even comment on this in the main text). I would therefore suggest removing the 30 min data set from this manuscript. As a minor point, I would suggest highlighting some of the top de-regulated gene IDs in the volcano plots and stating their function. Also, please state how you prepared the cells for the transcriptomes and in how many replicates this was done.

      As suggested by the reviewer we have removed the 30 min post activation data from the manuscript. We have also moved the rest of the transcriptomics data to supplementary information (Fig. S5) and toned down the presentation of this aspect of the work in the results and discussion sections.

      - Live-cell imaging of SUN1-GFP does nicely visualise the NE during gametogenesis, showing a highly dynamic NE forming loops and folds, which is very exciting to see. It would be beneficial to also show a video from the life-cell imaging.

      We have now added videos to the manuscript as suggested by the reviewer. Please see the supplementary Videos S1 and S2.

      In their discussion, the authors state multiple times that NE dynamics are changed upon SUN1 KO. Yet, they do not provide data supporting this claim, i.e. that the extended loops and folds found in the nuclear envelope during gametogenesis are affected in any way by the knockout of SUN1 or ALLAN. What happens to the NE in absence of SUN1? Are there less loops and folds? In absence of a reliable NE marker this may not be entirely easy to address, but at least some SBF-SEM images of the sun1-KO gametocytes could provide insight.

      It was difficult to provide SBF-SEM images as that work is beyond the scope of this manuscript. We will consider this approach in our future work. We re-examined many of our TEM images of SUN1-KO and ALLAN-KO parasites and did find some micrographs showing aberrant nuclear membrane folding (<5%) (Please see Author response image 2). However, we also observed similar structures in some of the WT-GFP samples (<5%), so we do not think this is a strong phenotype of the SUN1 or ALLAN mutants.

      Author response image 2.

       

      - I think the exciting part of the manuscript is the cell biological role of SUN1 on male gametogenesis, which could be carved out a bit more by a more detailed phenotyping. Specifically it would be good to quantify

      (1) If DNA replication to an octoploid state still occurs in SUN1-KO and ALLAN-KO,

      DNA replication is not affected in the SUN1-KO and ALLAN-KO mutants: DNA content increases to 8N (data added in Fig. 3J and Fig. S10F).

      (2) The proportion of anucleated gametes in WT and the KO lines

      We have added these data in Fig. 3K and Fig. S10G

      (3) A quantification of the BB clustering phenotype (in which proportion of cells do the authors see this phenotype). This could be addressed by simple fixed immunofluorescence images of the respective WT/KO lines at various time points after activation (or possibly by reanalysis of the already obtained images) and would really improve the manuscript.

      We have reanalysed the BB clustering phenotype and added the quantitative data in Fig. 4E and Fig. S7.

      Especially the claim that emerged SUN1-KO gametes lack a nucleus is currently only based on single slices of few TEM cells and would benefit from a more thorough quantification in both SUN1- and ALLAN-Kos

      We have examined many microgametes (100+ sections). In WT parasites a small proportion of gametes can appear to lack a nucleus if it does not extend all the way to the apical and basal ends (Hair et al. 2022). However, the proportion of microgametes that appear to lack a nucleus (no nucleus seen in any section) was much higher in the SUN1 mutant. In contrast, this difference was not as clear cut in the ALLAN mutant with a small proportion of intact (with axoneme and nucleus) microgametes being observed.

      We have done additional analysis of male gametes, looking for the presence of the nucleus by live cell imaging after DNA staining with Hoechst. These data are added in Fig. 3K (for Sun1-KO) and Fig. S10G (for Allan-KO).

      - The TEM suggests that in the SUN1-KO, kinetochores are free in the nucleus. Are all kinetochores free or do some still associate to a (minor/incorrectly formed) spindle? The authors could address this by tagging NDC80 in the KO lines.

      Our observation and quantification of the data indicated that 100% of kinetochores were attached to spindle microtubules and that 0% were unattached kinetochores in the WT parasites. However, the exact opposite was found for the SUN1 mutant with 100% unattached kinetochores and 0% attached. The result was not quite as clear cut in the ALLAN mutant, with 98% unattached and 2% attached. An important observation was the lack of separation of the nuclear poles and any spindle formation. Spindle formation was never or very rarely observed in the mutants.

      - Finally, I think it is curious that in contrast to SUN1, ALLAN seems to be less important, with some KO parasite completing the life cycle. Maybe a more detailed phenotyping as above gives some more hints to where the phenotypic difference between the two proteins lies. I would assume some ALLAN-KO cells can still segregate the basal body. Can the authors speculate/discuss in more detail why these two proteins seems to have slightly different phenotypes?

      We agree with the reviewer. Overall, the ALLAN-KO has a less prominent phenotype than that of the Sun1-KO. The main difference is that in the ALLAN-KO mutant some basal body segregation can occur, leading to the production of some fertile microgametocytes, and ookinetes, and oocyst formation (Fig. 8). Approximately 5% of oocysts sporulated to release infective sporozoites that could infect mice in bite back experiments and complete the life cycle. In contrast the Sun1-KO mutant made no healthy oocysts, or infective sporozoites, and could not complete the life cycle in bite back experiments. We have analysed the phenotype in detail and provide quantitative data for gametocyte stages by EM and ExM in Figs. 4 and S8 (SUN1) and Figs. 7 and S11 (ALLAN). We have also performed detailed analysis of oocyst and sporozoite stages and included the data in Fig. 3 (SUN1) and S10 (ALLAN).

      Based on the location, and functional and interactome data, we think that SUN1 plays a central role in coordinating nucleoplasm and cytoplasmic events as a key component of the nuclear membrane lumen, whereas ALLAN is located in the nucleoplasm. Deleting the SUN1 gene may disrupt the connection between INM and ONM whereas the deletion of ALLAN may affect only the INM.

      Some additional points where the data is not entirely sound yet or could be improved:

      - Localisation of SUN1: There seems to be a discrepancy between SUN1-GFP location as observed by live cell microscopy, and by Expansion Microscopy (ExM), similar for ALLAN-GFP. By live-cell microscopy, the SUN1 localisation is much more evenly distributed around the NE, while the localisation in ExM is much more punctuated, and e.g. in Figure 1E seems to be within the nucleus. Do the authors have an explanation for this? Also, in Fig. 1D there are two GFP foci at the cell periphery (bottom left of the image), which I would think are not SUN1-Foci, as they seem to be outside of the cell. Is the antibody specific? Was there a negative control done for the antibody (WT cells stained with GFP antibodies after ExM)?

      High resolution SIM and expansion microscopy showed that the SUN1-GFP molecules coalesce to form puncta, in contrast to the more uniform distribution observed by live cell imaging. This apparent difference may be due to a better resolution that could not be achieved by live cell imaging. We agree with the reviewer that the two green foci are outside of the cell. As a negative control we have used WT-ANKA cells (which contain no GFP) and the anti-GFP antibody, which gave no signal. This confirms the specificity of the antibody (please see the new Fig. S3). 

      - The authors argue that SIM gave unexpected results due to PFA fixation leading to collapse of the NE loops. However, they also fix their ExM cells and their EM cells with PFA and do not observe a collapse, at least from what I see in the two presented images and in the 3D reconstruction. Is there something else different in the sample preparation?

      There was no difference in the fixation process for samples examined by SIM and ExM, but we used an anti-GFP antibody in ExM to visualise the SUN1-GFP, while in SIM the images of GFP signal were collected directly after fixation.  We used both PFA and methanol as fixative, and both methods showed a coalescing of the SUN1-GFP signal (please see the new Fig. S2 and S3).

      Can the authors trace their NE in ExM according to the NHS-Ester signal?

      We could trace the NE in the ExM by the NHS-ester signal and observed that the SUN1-GFP signal was largely coincident with the NE (Please see the new Fig. S3B).

      - Fig 2D: It would be good to not just show images of oocysts but actually quantify their size from images. Also, have the authors determined the sporozoite numbers in SUN1-KO?

      We have measured oocyst size (data added in new Fig. 3) and added the sporozoite quantification data in Fig. 3D.

      - Line 481-483: the authors state that oocyst size is reduced in ALLAN-KO but do not show the data. Please quantify oocyst size or at least show representative images. Also the drastic decrease in sporozoite numbers (Fig. 6D, E) is not mentioned in the text. Please add reference to Fig S7D when talking about the bite back data.

      We have added the oocyst size data in Fig. S10. We mention the changes in sporozoite numbers (now  shown in Fig. 7D, E), and refer to  the bite back data shown in current Fig. 7E.

      - Fig S1C, 6C: Both WB images are stitched, but this is not clearly indicated e.g. by leaving a small gap between the lanes. Also please show a loading control along with the western blots. Also there seems to be a (unspecific?) band in the control, running at the same height as Allan-GFP WB. What exactly is the control?

      We have provided the original blot showing the bands of ALLAN-GFP and SUN1-GFP. As a positive control, we used an RNA associated protein (RAP-GFP) that is highly expressed in Plasmodium and regularly used in our lab for this purpose.

      - Regarding the crossing experiment: The authors conclude from this cross that SUN1 is only needed in males, yet for this conclusion they would need to also show that a cross with a female line does not rescue the phenotype. The authors should repeat the cross with a male-deficient line to really test if the phenotype is an exclusively male phenotype. In addition, line 270-272 states that no oocysts/sporozoites were detected in sun1-ko and nek4-ko parasites. However, the figure 2E shows only oocysts, not sporozoites, and shows also that sun1-ko does form oocysts, albeit dead ones.

      We have now performed the experiment of crossing the Sun1-KO parasite line with a male deficient line (Hap2-KO) and added the data in Fig. 3I. We have added images showing sporozoites in oocysts.

      - In Fig S1 the authors show that they also generated a SUN1-mCherry line, yet they do not use it in any of the presented experiments (unless I missed it). Would it be beneficial to cross the SUN1-mCherry line with the Allan1-GFP line to test colocalisation (possibly also by expansion microscopy)?

      We did generate a SUN1-mCherry line, with the intent to cross ALLAN-GFP and SUN1-mCherry lines and observe the co-location of the proteins. Despite multiple attempts this cross was unsuccessful. This may have been due to their close proximity such that the addition of both GFP and mCherry was difficult to facilitate a proper protein-protein interaction between either of the proteins.

      - Line 498: "In a significant proportion of cells" - What was the proportion of cells, and what does significant mean in this context?

      Approximately 67% of cells showed the clumping of BBs. We have now added the numbers in Figs. 6H and S11I.

      - The authors should discuss a bit more how their work relates to the work of Sayers et al. 2024, which also identified the SUN1-ALLAN complex. The paper is cited, but only very briefly commented on.

      We have extended this discussion now in lines 590-605.

      Suggestions how to improve the writing and data presentation.

      - General presentation of microscopy images: Considering that large parts of the manuscript are based on microscopy data, their presentation could be improved. Single-channel microscopy images would benefit from being depicted in gray scale instead of color, which would make it easier to see the structures and intensities (especially for blue channels).

      Whilst we agree with the reviewer, sometimes it is difficult to see the features in the merged images. Therefore, we would like to request to be allowed to retain the colours, which can be easily followed in both individual and merged images.

      Also, it would be good to harmonize in which panels arrows are shown (e.g. Fig 1G, where some white arrows are in the SUN1-GFP panel, while others are in the merge panel, but they presumably indicate the same thing.). At the same time, Fig 1H doesn't have any with arrows, even though the figure legend states so.

      We apologise for this lack of consistency, and we have now added arrows wherever they are missing to harmonise in the presentations.

      Fig 3A and S4 show the same experiment but are coloured in different colours (NHS-Eester in green vs grey scale).

      - Are the scale bars of all expansion microscopy images adjusted for the expansion factor?

      Yes, the scale bars are adjusted accordingly.

      - The figure legends would benefit from streamlining, as they have very different style between figures (eg Fig. 6 which has a concise figure legend vs microscopy figures where figure legends are very long and describe not only the figure but the results)

      The figure legends have been streamlined, with removal of the description of results.

      - Line 155-156: The text makes it sound like the expression only happens after activation. is that the case? Are these images activated or non-activated gametocytes?

      They are expressed before activation, but the signal intensifies after activation. Images from before and after activation of gametocytes have been added in Fig. S1F.

      - Line 267: Reference to the original nek4-KO paper missing

      This reference is now included.

      - Line 301: The reference to Figure 2J seems to be a bit arbitrarily placed. Also, this schematic of lipid metabolism is never discussed in relation to the transcriptomic or lipidomic data.

      We have moved these data to supplementary information and modified the text.

      - Line 347-349 states that gametes emerged, but the referenced figure shows activated gametocytes before exflagellation.

      We have corrected the text to the start of exflagellation.

      - Line 588: Spelling mistake in SUN1-domain

      Corrected.

      - Line 726/731: i missing in anti-GFP

      Corrected.

      - Line 787-789: statement of scale bar and number of cells imaged is not at the right position in the figure legend.

      Moved to right place

      - Line 779, 783: "shades of green" should be just "green". Same goes for line 986, 989 with "shades of grey"

      Changed.

      - Line 974, 976: please correct to WT-GFP and dsun1

      Corrected.

      - Line 1041, 1044: WT-GFP instead of WTGFP.

      Corrected to WT-GFP.

      - Fig 1B, D, E, Fig S1G, H: What are the time points of imaging?

      We have added the time points to the images in these figures.

      - Fig 1D/Line 727: the scale of the scale bar on the inset is missing.

      We have added the scale bar.

      - Fig 3 E-G and 6H-J: Please indicate total number of cells/images analysed per quantification, either in the graphs themselves or in the figure legend.

      We indicate now the number of cells analysed in individual figures and also in Fig. S5C and S8C, respectively.

      - Fig 5B: What is NP

      Nuclear Pole (NP), also known as the nuclear/acentriolar MTOC (Zeeshan et al 2022; PMID: 35550346).

      - Fig S1B/D: The legend states that there is an arrow indicating the band, but there is none.

      We have added the arrow.

      - Fig S2C: Is the scale bar really the same for the zygote and the ookinete?

      We have checked this and used the same for both zygote and ookinete.

      - Fig S3C, S7C: which stages was qRT-PCR done on?

      Gametocytes activated for 8 min.

      - Fig. S3D, S7D: According to the figure legend, three independent experiments were performed. How many mice were used per experiment? It would be good to depict the individual data points instead of the bar graph. For S7D, 3 data points are depicted (one in WT, two in allan-KO), what do they mean?

      The bite back experiment was performed using 15-20 mosquitoes infected with WT-GFP and gene knockout lines to feed on one naïve mouse each, in three different experiments. We have now included the data points in the bar diagrams.

      - Fig S3: Panel letters E and G are missing

      We have updated the lettering in current Fig. S5

      - Fig 3D: Please indicate what those boxes are. I presume that these are the insets show in b, e and j, but it is never mentioned. J is not even larger than i. Also, f is quite cropped, it would be good to see the large-scale image it comes from to see where in the nucleus these kinetochores are placed. Were there unbound kinetochores found in WT?

      We mention the boxes in the figure legends. It is rare to find unbound kinetochores in WT parasite. We provide large scale and zoomed-in images of free kinetochores in Fig. S8.

      - Fig S4: Insets are not mentioned in the figure legend. Please add scale bar to zoom-ins

      We now describe the insets in the figure legends and have added scale bars to the zoomed-in images.

      - Fig S5A, B: Please indicate which inset belongs to which sub-panel. Where does Ac stem from?

      We have now included the full image showing the inset (new Fig. S8).

      - Fig S5C and S8C: Change "DNA" to "Nucleus".

      We have changed “DNA” to “Nucleus”. Now they are Fig. S8K and S11I.

      Reviewer #3 (Significance):

      Yet, the statement that SUN1 is also important for lipid homoeostasis and NE dynamics is currently not backed up by sufficient data. I believe that the manuscript would benefit from removing the less convincing transcriptomic and lipidomic datasets and rather focus on more deeply characterising the cell biology of the knockouts. This way, the results would be interesting not only for parasitologists, but also for more general cell biologists.

      We have moved the lipidomics and transcriptomics data to supplementary information and toned down the emphasis on these data to make the manuscript more focused on the cell biology and analysis of the genetic KO data.

    1. Author response:

      We thank the reviewers for recognizing the strengths of our work, as well as for their thoughtful and constructive feedback. In this provisional response, we focus on the main concern raised—namely, the need for stronger evidence that the effect is specific to suicide. A full revision of the manuscript will follow, in which we will address this point in greater depth and respond carefully to all additional comments in a point-by-point manner.

      More specifically, reviewer 3 points out that “The main analyses control for illness duration and medication but not for symptom severity. The supplementary analysis in Figure S11 is insufficient as it mistakes the absence of evidence (i.e., p > 0.05) for evidence of absence.”. This is indeed an important point that we address below.

      (1) Correction for symptom severity.

      To address the request for evidence on specificity to suicidality beyond general symptom severity, we performed separate linear regressions to explain in gambling behaviour, value-insensitive approach parameter (β<sub>gain</sub>), and mood sensitivity to certain rewards (β<sub>CR</sub>) with group as a predictor (1 for S<sup>+</sup> group and 0 for S<sup>-</sup> group) and scores for anxiety and depression as covariates. Results remained significant after controlling anxiety and depression (ps < 0.027).

      Author response table 1.

      Given high correlations among anxiety and depression questionnaires (rs > 0.753, ps < 0.001), we performed Principal Components Analysis (PCA) on the clinical questionnaire to extract the orthogonal components, where each component explained 86.95%, 7.09%, 3.27%, and 2.68% variance, respectively. We then performed linear regressions using these components as covariates to control for anxiety and depression. Our main results remained significant (ps < 0.027).

      Author response table 2.

      We believe that these analyses provide evidence that the main effects on gambling and on mood were specific to suicide.

      (2) Evidence of absence of effect of symptom severity

      Based on clinical interviews, we included patients with and without suicidality (S<sup>+</sup> and S<sup>-</sup> groups). However, in line with suicidal-related literature (e.g., Tsypes et al., 2024), S<sup>+</sup> and S<sup>-</sup> differed substantially in the severity of symptoms (see Table 1). Although we median-split patients by the scores of general symptoms (e.g., depression and anxiety) and verified no significant differences in these severities (Figure S11), the “absence of evidence” cannot provide insights of “evidence of absence”. We, therefore, additionally conducted Bayesian statistics in gambling behavior, value-insensitive approach parameter, and mood sensitivity to certain rewards. BF<sub>01</sub> is a Bayes factor comparing the null model (M<sub>0</sub>) to the alternative model (M<sub>1</sub>), where M<sub>0</sub> assumes no group difference. BF<sub>01</sub> > 1 indicates that evidence favors M<sub>0</sub>. As can be seen below, most results supported null hypothesis, suggesting that general symptoms of anxiety and depression overall did not influence our main results.

      Author response table 3.

      Overall, we believe that these analyses provide compelling evidence for the specificity of the effect to suicide, above and beyond depression and anxiety.

    1. Author Response

      eLife assessment

      This paper by Aitchison and colleagues describes nanobody neutralizing and binding activity against various SARS-CoV-2 variants of concern. The findings are important in that the described nanobodies may have broad therapeutic relevance against current and future variants of concern and may be able to avoid significant resistance. The claims are incomplete: while the study is well-executed and uses a nice balance of biochemical and cellular assays, the efficacy of the proposed nanobody library against VOCs is not completely supported as IC50 values appear to increase against newer variants and are higher than previously used therapeutic bNAbs, animal data showing in vivo efficacy is lacking, and protection against future possible variants is not proven.

      This manuscript is a follow-up of our previous eLife manuscript “Highly synergistic combinations of nanobodies that target SARS-CoV-2 and are resistant to escape” https://elifesciences.org/articles/73027 where we described an “impressive collection of hundreds of new nanobodies binding SARS-CoV-2 spike by combining in vivo antibody affinity maturation and proteomics. [Editor’s evaluation]”. As a follow-up this submission extends the findings of our previous eLife publication and thus focuses on how our repertoire functions in the context of a rapidly evolving SARS-CoV-2 virus, relying on the established methodologies and approaches of the original paper. We explore how nanobody functions have been influenced by the emergence of SARS-CoV-2 variants containing extensive mutations in spike protein, which largely reduced the usefulness of therapeutic monoclonal antibody therapeutics. Our findings show that while some nanobodies lost efficacy in binding to and neutralizing these evolved spikes, a surprising number of nanobodies retained their binding and neutralization activity. This is an important finding, because these efficacious nanobodies target regions that appear rarely targetable by monoclonal antibodies. We also provide experimental validation of the importance of the interplay between binding and neutralization in synergy experiments, where even weakened binding still contributed to strongly enhancing the neutralization.

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, Ketaren, Mast, Fridy et al. assessed the ability of a previously generated llama nanobody library (Mast, Fridy et al. 2021) to bind and neutralize SARS-CoV-2 delta and omicron variants. The authors identified multiple nanobodies that retain neutralizing and/or binding capacity against delta, BA.1 and BA.4/5. Nanobody epitope mapping on spike proteins using structural modeling revealed possible mechanisms of immune evasion by viral variants as well as mechanisms of cross-variant neutralization by nanobodies. The authors additionally identified two nanobody pairs involving non-neutralizing nanobodies that exhibited synergy in neutralization against the delta variant. These results enabled the refinement of target epitopes of the nanobody repertoire and the discovery of several pan-variant nanobodies for further preclinical development.

      Strengths:

      Overall, this study is well executed and provides a valuable framework for assessing the impact of emerging SARS-CoV-2 variants on nanobodies using a combination of in vitro biochemical and cellular assays as well as computational approaches. There are interesting insights generated from the epitope mapping analyses, which offer possible explanations for how delta and omicron variants escape nanobody responses, as well as how some nanobodies exhibit cross-variant neutralization capacity. These analyses laid out a clear path forward for optimizing these promising next-gen therapeutics, particularly in the face of rapidly emerging SARS-CoV-2 variants. This work will be of interest to researchers in the fields of antibody/nanobody engineering, SARS-CoV-2 therapeutics, and host-virus interaction.

      Weaknesses:

      A main weakness of the study is that the efficacy statement is not thoroughly supported. While the authors comprehensively characterized the neutralizing ability of nanobodies in vitro, there is no animal data involving mice or hamsters to demonstrate the real protective efficacy in vivo. Yet, in the title and throughout the manuscript, the authors repeatedly used phrases like "retains efficacy" or "remains efficacious" to describe the nanobodies' neutralization or binding capacities.

      This claim is not well supported by the data and underestimates the impact of variants on the nanobodies, especially the omicron sublineages. For example, the authors showed that S1-RBD-15 had a ~100-fold reduction in neutralization titer against Omicron, with an IC50 at around 1 uM. This is much higher than the IC50 value of a typical anti-ancestral RBD nanobody reported in the previous study (Mast, Fridy et al. 2021). In fact, the authors themselves ascribe nanobodies with an IC50 above 1 uM as weak neutralizers. And there were many in the range of 0.1-1 uM.

      Furthermore, many nanobodies selected for affinity measurement against BA.4/5 had no detectable binding.

      Without providing in vivo protection data or including monoclonal antibodies that are known to be efficacious against variants in the in vitro assays as a benchmark, it is difficult to evaluate the efficacy just with the IC50 values.

      We respectfully disagree with the reviewer on several aspects of this critique.

      As to our use of the word efficacy - the quality of being successful in producing an intended result; effectiveness - we were specific to nanobody binding and in vitro neutralization of the variant spike proteins tested in the manuscript. Indeed, our manuscript made no claim of efficacy outside of this intended meaning. However, to prevent misinterpretation we will modify the final paragraph of our introduction to state explicitly that the nanobody repertoire retains efficacy in binding and neutralizing variants of spike. The final paragraph of the Introduction will include the following:

      “Here, we demonstrate that a subset of our previously published repertoire of nanobodies, generated against spike from the ancestral SARS-CoV-2 virus (Mast, Fridy et al. 2021), retains binding and in vitro neutralization efficacy against circulating variants of concern (VoC), including omicron BA.4/BA.5.”

      We agree that in vivo neutralization data would be an important complement to the in vitro binding and neutralization data. Experiments along these lines are ongoing, but are not considered part of a follow-up to our original paper where in vivo data were not included.

      We disagree with the Reviewer that “This claim is not well supported by the data and underestimates the impact of variants on the nanobodies, especially the omicron sublineages.” As we specifically state: “In comparison, groups I, I/II, I/IV, V, VII, VIII and the anti-S2 nanobodies contained the majority of omicron BA.1 neutralizers, though here the neutralization potency of many nanobodies was decreased compared to wild-type. This decrease in neutralization potency largely correlates with the accumulation of omicron BA.1 specific mutations throughout the RBD, which likely alters the epitope-binding site of these nanobodies, weakening their interaction with BA.1 spike (Fig. 1B). (emphasis added)”

      Naturally, we expected that some of our nanobodies would lose the ability to bind BA.4/BA.5. This enabled us to determine which areas on spike remained susceptible to our nanobodies. We show that 10/29 nanobodies tested retained binding to BA.4/5. We did not test our entire repertoire, just a subset was selected for. We stated the following:

      “Of the nanobodies that neutralized both delta and omicron BA.1, representatives from each of the nanobody epitope groups were selected for SPR analysis, where S1 binders with mapped epitopes that neutralized one or both variants well, were prioritized.”

      Reviewer #2 (Public Review):

      Summary:

      Interest in using nanobodies for therapeutic interventions in infectious diseases is growing due to their ability to bind hidden or cryptic epitopes that are inaccessible to conventional immunoglobulins. In the present study, the authors were posed (sic) to characterize nanobodies derived from the library produced earlier with the Wuhan strain of SARS-CoV-2, map their epitopes on SARS-CoV-2 spike protein, and demonstrate that some nanobodies retain binding and even neutralization against antigenically distant Variants of Concern (VOCs) that are currently circulating.

      Strengths:

      The authors demonstrate that some nanobodies - despite being obtained against the ancestral virus strain - retain high affinity binding to antigenically distant SARS-CoV-2 strains. This is despite the majority of the repertoire losing binding. Although limited to only two nanobody combinations, the demonstration of synergy in virus neutralization between nanobodies targeting different epitopes is compelling.

      We thank the Reviewer for this positive summary of the strengths of our study. In our previous work, we applied stringent criteria for the down-selection of nanobodies based on their affinity and diversity, as elaborated on in https://elifesciences.org/articles/73027. The current dataset is a further judiciously curated subset, featuring 41 nanobodies chosen to represent and inform on the 10 structurally mapped epitope groups that we initially identified. This subset is but the tip of an iceberg. For each nanobody demonstrating high-affinity binding and neutralization, we possess multiple sequence variants, offering alternative avenues for investigation. Moreover, our repertoire has since been further elaborated by use of a yeast display library (Cross et al., 2023 JBC) providing additional nanobodies capable of targeting the same epitopes. Our findings presented here, thus serve as a heuristic, enabling us to distill the much larger repertoire into manageable and informative clusters of data. We will modify our manuscript to be more explicit of these facts.

      Weaknesses:

      The authors imply that nanobodies that retain binding/neutralization of early Omicron sublineages will be active against currently circulating and future virus strains. Unfortunately, no reasoning for such a conclusion nor data supporting this prediction are provided.

      The nanobodies we propose to retain binding to current and emerging omicron sublineages at the time (Fig. 4) are those that still bind to omicron BA.1, BA.4/5. The structures of XBB and BQ.1 are not divergent enough from these aforementioned omicron sublineages in the regions we propose our nanobodies retain binding (Fig. 4) to result in loss of binding. Thus, we hypothesize that the epitopes where these nanobodies bind or are predicted to bind (outlined in black (Fig. 4)), represent regions on spike vulnerable to nanobody intervention. Importantly, we also now have further experimental data to support our predictions that these nanobodies in Fig. 4 will retain binding (see plot in Author response image 1). We will provide additional data and complements to key figures to help illustrate this in the revised manuscript.

      Author response image 1.

    1. Author response:

      eLife assessment

      This study is a detailed investigation of how chromatin structure influences replication origin function in yeast ribosomal DNA, with focus on the role of the histone deacetylase Sir2 and the chromatin remodeler Fun30. Convincing evidence shows that Sir2 does not affect origin licensing but rather affects local transcription and nucleosome positioning which correlates with increased origin firing. However, the evidence remains incomplete as the methods employed do not rigorously establish a key aspect of the mechanism, fully address some alternative models, or sufficiently relate to prior results. Overall, this is a valuable advance for the field that could be improved to establish a more robust paradigm.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This paper presents a mechanistic study of rDNA origin regulation in yeast by SIR2. Each of the ~180 tandemly repeated rDNA gene copies contains a potential replication origin. Early-efficient initiation of these origins is suppressed by Sir2, reducing competition with origins distributed throughout the genome for rate-limiting initiation factors. Previous studies by these authors showed that SIR2 deletion advances replication timing of rDNA origins by a complex mechanism of transcriptional de-repression of a local PolII promoter causing licensed origin proteins (MCMcomplexes) to re-localize (slide along the DNA) to a different (and altered) chromatin environment. In this study, they identify a chromatin remodeler, FUN30, that suppresses the sir2∆ effect, and remarkably, results in a contraction of the rDNA to about one-quarter it's normal length/number of repeats, implicating replication defects of the rDNA. Through examination of replication timing, MCM occupancy and nucleosome occupancy on the chromatin in sir2, fun30, and double mutants, they propose a model where nucleosome position relative to the licensed origin (MCM complexes) intrinsically determines origin timing/efficiency. While their interpretations of the data are largely reasonable and can be interpreted to support their model, a key weakness is the connection between Mcm ChEC signal disappearance and origin firing. While the cyclical chromatin association-dissociation of MCM proteins with potential origin sequences may be generally interpreted as licensing followed by firing, dissociation may also result from passive replication and as shown here, displacement by transcription and/or chromatin remodeling.

      While it is true that both transcription and passive replication can cause the signal of MCM-ChEC to disappear, neither can cause selective disappearance of the displaced complex without affecting the non-displaced complex.  Indeed, in the case of transcription, RNA polymerase transcribing C-pro would have to first dislodge the normally positioned MCM complex before even reaching the displaced complex.  Furthermore, deletion of FUN30 leads to both more C-pro transcription and less disappearance of the displaced MCM complex.  It is important to keep in mind that this cannot somehow reflect continuous replenishment of displaced MCMs with newly loaded MCMs, since the cells are in S phase and licensing is restricted to G1. 

      Moreover, linking its disappearance from chromatin in the ChEC method with such precise resolution needs to be validated against an independent method to determine the initiation site(s). Differences in rDNA copy number and relative transcription levels also are not directly accounted for, obscuring a clearer interpretation of the results.

      Copy number reduction of the magnitude caused by deletion of SIR2 and FUN30 does not suppress the sir2D effect (i.e. early replication of the rDNA), but rather exacerbates it.  In particular, deletion of SIR2 and FUN30 causes the rDNA to shrink to approximately 35 copies.  Kwan et al., 2023 (PMID: 36842087) have shown that reduction of rDNA copy number to 35 causes a dramatic acceleration of rDNA replication in a SIR2 strain.  Thus, the effect of rDNA size on replication timing reinforces our conclusion that deletion of FUN30 suppresses rDNA replication.

      However, to address this concern directly, in the revision we will include 2 D gels in fob1 strains with equal number of repeats that allows to conclude that the effect of FUN30 deletion in suppressing rDNA origin firing is independent of either rDNA size or FOB1. The figure of the critical 2 D gels is shown below in the reply to reviewer 2.

      Nevertheless, this paper makes a valuable advance with the finding of Fun30 involvement, which substantially reduces rDNA repeat number in sir2∆ background. The model they develop is compelling and I am inclined to agree, but I think the evidence on this specific point is purely correlative and a better method is needed to address the initiation site question. The authors deserve credit for their efforts to elucidate our obscure understanding of the intricacies of chromatin regulation. At a minimum, I suggest their conclusions on these points of concern should be softened and caveats discussed. Statistical analysis is lacking for some claims.

      Strengths are the identification of FUN30 as suppressor, examination of specific mutants of FUN30 to distinguish likely functional involvement. Use of multiple methods to analyze replication and protein occupancies on chromatin. Development of a coherent model.

      Weaknesses are failure to address copy number as a variable; insufficient validation of ChEC method relationship to exact initiation locus; lack of statistical analysis in some cases. 

      The two potential initiation sites that one would monitor (non-displaced and displaced) are separated by less than 150 base pairs, and other techniques simply do not have the resolution necessary to distinguish such differences.  Furthermore, as we suggest in the manuscript, our results are consistent with a model in which it is only the displaced MCM complex that is activated, whether in sir2 or WT.  If no genotype-dependent difference in initiation sites is even expected, it would be hard to interpret even the most precise replication-based assays.  However, the reviewer is correct that this is a novel technique and that confirmation with a well-established technique is comforting, therefore we are performing ChIP experiments to corroborate, to the extent possible, the conclusions that we reached with ChEC. 

      We appreciate the reviewer pointing out that some statistical analyses were lacking, and we will correct this in a revised manuscript.

      Additional background and discussion for public review:

      This paper broadly addresses the mechanism(s) that regulate replication origin firing in different chromatin contexts. The rDNA origin is present in each of ~180 tandem repeats of the rDNA sequence, representing a high potential origin density per length of DNA (9.1kb repeat unit). However, the average origin efficiency of rDNA origins is relatively low (~20% in wild-type cells), which reduces the replication load on the overall genome by reducing competition with origins throughout the genome for limiting replication initiation factors. Deletion of histone deacetylase SIR2, which silences PolII transcription within the rDNA, results in increased early activation or the rDNA origins (and reduced rate of overall genome replication). Previous work by the authors showed that MCM complexes loaded onto the rDNA origins (origin licensing) were laterally displaced (sliding) along the rDNA, away from a well-positioned nucleosome on one side. The authors' major hypothesis throughout this work is that the new MCM location(s) are intrinsically more efficient configurations for origin firing. The authors identify a chromatin remodeling enzyme, FUN30, whose deletion appears to suppress the earlier activation of rDNA origins in sir2∆ cells. Indeed, it appears that the reduction of rDNA origin activity in sir2∆ fun30∆ cells is severe enough to results in a substantial reduction in the rDNA array repeat length (number of repeats); the reduced rDNA length presumably facilitates it's more stable replication and maintenance.

      Analysis of replication by 2D gels is marginally convincing, using 2D gels for this purpose is very challenging and tricky to quantify. The more quantitative analysis by EdU incorporation is more convincing of the suppression of the earlier replication caused by SIR2 deletion.

      To address the mechanism of suppression, they analyze MCM positioning using ChEC, which in G1 cells shows partial displacement of MCM from normal position A to positions B and C in sir2∆ cells and similar but more complete displacement away from A to positions B and C in sir2fun30 cells. During S-phase in the presence of hydroxyurea, which slows replication progression considerably (and blocks later origin firing) MCM signals redistribute, which is interpreted to represent origin firing and bidirectional movement of MCMs (only one direction is shown), some of which accumulate near the replication fork barrier, consistent with their interpretation. They observe that MCMs displaced (in G1) to sites B or C in sir2∆ cells, disappear more rapidly during S-phase, whereas the similar dynamic is not observed in sir2∆fun30∆. This is the main basis for their conclusion that the B and C sites are more permissive than A. While this may be the simplest interpretation, there are limitations with this assay that undermine a rigorous conclusion (additional points below). The main problem is that we know the MCM complexes are mobile so disappearance may reflect displacement by other means including transcription which is high is the sir2∆ background. Indeed, the double mutant has greater level of transcription per repeat unit which might explain more displaced from A in G1. Thus, displacement might not always represent origin firing. Because the sir2 background profoundly changes transcription, and the double mutant has a much smaller array length associated with higher transcription, how can we rule out greater accessibility at site A, for example in sir2∆, leading to more firing, which is suppressed in sir2 fun30 due to greater MCM displacement away from A?

      I think the critical missing data to solidly support their conclusions is a definitive determination of the site(s) of initiation using a more direct method, such as strand specific sequencing of EdU or nascent strand analysis. More direct comparisons of the strains with lower copy number to rule out this facet. As discussed in detail below, copy number reduction is known to suppress at least part of the sir2∆ effect so this looms over the interpretations. I think they are probably correct in their overall model based on the simplest interpretation of the data but I think it remains to be rigorously established. I think they should soften their conclusions in this respect.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, the authors follow up on their previous work showing that in the absence of the Sir2 deacetylase the MCM replicative helicase at the rDNA spacer region is repositioned to a region of low nucleosome occupancy. Here they show that the repositioned displaced MCMs have increased firing propensity relative to non-displaced MCMs. In addition, they show that activation of the repositioned MCMs and low nucleosome occupancy in the adjacent region depend on the chromatin remodeling activity of Fun30.

      Strengths:

      The paper provides new information on the role of a conserved chromatin remodeling protein in the regulation of origin firing and in addition provides evidence that not all loaded MCMs fire and that origin firing is regulated at a step downstream of MCM loading.

      Weaknesses:

      The relationship between the author's results and prior work on the role of Sir2 (and Fob1) in regulation of rDNA recombination and copy number maintenance is not explored, making it difficult to place the results in a broader context. Sir2 has previously been shown to be recruited by Fob1, which is also required for DSB formation and recombination-mediated changes in rDNA copy number. Are the changes that the authors observe specifically in fun30 sir2 cells related to this pathway? Is Fob1 required for the reduced rDNA copy number in fun30 sir2 double mutant cells? 

      Strains lacking SIR2 have unstable rDNA size, and FOB1 deletion stabilizes rDNA size in sir2 background. Likewise, FOB1 deletion influences the kinetics  rDNA size reduction in sir2 fun30 cells. However, the main effect of Fun30 in sir2 cells we were interested in, suppression of rDNA replication, is preserved in fob1 background, arguing that the observed effect is independent of Fob1 (see figure below). Given that the main focus of the paper is regulation of rDNA origins activity and that these changes were independent of Fob1, we had elected not to include these results in the original manuscript but will gladly include them in the revision.

      Besides refuting the possible role of Fob1 in the FUN30-mediated activation of rDNA origin firing in sir2 cells, the use of fob1 background enabled us compare the activation of rDNA origins in the sir2 and sir2 fun30 strains with equally short rDNA size. The 2-D gels demonstrate a dramatic suppression of rDNA origin activity upon deletion of FUN30 in the sir2 fob1 strains with 35 rDNA copies.

      Author response image 1.

      The deletion of FUN30 diminishes the replication bubble signal in a fob1 sir2 strain with 35 rDNA copies by more than tenfold. The single rARS signal, marked with the arrow, originates from the rightmost rDNA repeat. This specific rightmost rDNA NheI fragment is approximately 25 kb in size, distinctly larger than the 4.7 kb NheI 1N rARS-containing fragments that originate from the internal rDNA repeats.

      Reviewer #3 (Public Review):

      Summary:

      Heterochromatin is characterized by low transcription activity and late replication timing, both dependent on the NAD-dependent protein deacetylase Sir2, the founding member of the sirtuins. This manuscript addresses the mechanism by which Sir2 delays replication timing at the rDNA in budding yeast. Previous work from the same laboratory (Foss et al. PLoS Genetics 15, e1008138) showed that Sir2 represses transcription-dependent displacement of the Mcm helicase in the rDNA. In this manuscript, the authors show convincingly that the repositioned Mcms fire earlier and that this early firing partly depends on the ATPase activity of the nucleosome remodeler Fun30. Using read-depth analysis of sorted G1/S cells, fun30 was the only chromatin remodeler mutant that somewhat delayed replication timing in sir2 mutants, while nhp10, chd1, isw1, htl1, swr1, isw2, and irc5 had not effect. The conclusion was corroborated with orthogonal assays including two-dimensional gel electrophoresis and analysis of EdU incorporation at early origins. Using an insightful analysis with an Mcm-MNase fusion (Mcm-ChEC), the authors show that the repositioned Mcms in sir2 mutants fire earlier than the Mcm at the normal position in wild type. This early firing at the repositioned Mcms is partially suppressed by Fun30. In addition, the authors show Fun30 affects nucleosome occupancy at the sites of the repositioned Mcm, providing a plausible mechanism for the effect of Fun30 on Mcm firing at that position. However, the results from the MNAse-seq and ChEC-seq assays are not fully congruent for the fun30 single mutant. Overall, the results support the conclusions providing a much better mechanistic understanding how Sir2 affects replication timing at rDNA.

      The reason that the results for the fun30 single mutant appear incongruent, with a larger signal of the +2 nucleosome in the MNase-seq plot but a negligible signal in the ChEC-seq plot is the paucity of displaced Mcm in the fun30 single mutant. Given the relative absence of displaced MCMs, the MCM-MNase fusion protein can't "light up" the +2 nucleosome.  We will comment on this in the revision to clarify this. 

      Strengths

      (1) The data clearly show that the repositioned Mcm helicase fires earlier than the Mcm in the wild type position.

      (2) The study identifies a specific role for Fun30 in replication timing and an effect on nucleosome occupancy around the newly positioned Mcm helicase in sir2 cells.

      Weaknesses

      (1) It is unclear which strains were used in each experiment.

      (2) The relevance of the fun30 phospho-site mutant (S20AS28A) is unclear.

      (3) For some experiments (Figs. 3, 4, 6) it is unclear whether the data are reproducible and the differences significant. Information about the number of independent experiments and quantitation is lacking. This affects the interpretation, as fun30 seems to affect the +3 nucleosome much more than let on in the description.

      We appreciate the reviewer pointing out places in which our manuscript omitted key pieces of information (items 1 and 3), and we will fix these oversights in our revision. 

      With regard to point 2, we had written: 

      “Fun30 is also known to play a role in the DNA damage response; specifically, phosphorylation of Fun30 on S20 and S28 by CDK1 targets Fun30 to sites of DNA damage, where it promotes DNA resection (Chen et al. 2016; Bantele et al. 2017). To determine whether the replication phenotype that we observed might be a consequence of Fun30's role in the DNA damage response, we tested non-phosphorylatable mutants for the ability to suppress early replication of the rDNA in sir2; these mutations had no effect on the replication phenotype (Figure 2B), arguing against a primary role for Fun30

      in DNA damage repair that somehow manifests itself in replication.”

      We will expand on this to clarify our point in the revision.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      Argunşah et al. describe and investigate the mechanisms underlying the differential response dynamics of barrel vs septa domains of the whisker-related primary somatosensory cortex (S1). Upon repeated stimulation, the authors report that the response ratio between multi- and single-whisker stimulation increases in layer (L) 4 neurons of the septal domain, while remaining constant in barrel L4 neurons. This difference is attributed to the short-term plasticity properties of interneurons, particularly somatostatin-expressing (SST+) neurons. This claim is supported by the increased density of SST+ neurons found in L4 of the septa compared to barrels, along with a stronger response of (L2/3) SST+ neurons to repeated multi- vs single-whisker stimulation. The role of the synaptic protein Elfn1 is then examined. Elfn1 KO mice exhibited little to no functional domain separation between barrel and septa, with no significant difference in single- versus multi-whisker response ratios across barrel and septal domains. Consistently, a decoder trained on WT data fails to generalize to Elfn1 KO responses. Finally, the authors report a relative enrichment of S2- and M1-projecting cell densities in L4 of the septal domain compared to the barrel domain.

      Strengths:

      This paper describes and aims to study a circuit underlying differential response between barrel columns and septal domains of the primary somatosensory cortex. This work supports the view that barrel and septal domains contribute differently to processing single versus multi-whisker inputs, suggesting that the barrel cortex multiplexes sensory information coming from the whiskers in different domains.

      We thank the reviewer for the very neat summary of our findings that barrel cortex multiplexes converging information in separate domains.

      Weaknesses:

      While the observed divergence in responses to repeated SWS vs MWS between the barrel and septal domains is intriguing, the presented evidence falls short of demonstrating that short-term plasticity in SST+ neurons critically underpins this difference. The absence of a mechanistic explanation for this observation limits the work's significance. The measurement of SST neurons' response is not specific to a particular domain, and the Elfn1 manipulation does not seem to be specific to either stimulus type or a particular domain.

      We appreciate the reviewer’s perspective. Although further research is needed to understand the circuit mechanisms underlying the observed phenomenon, we believe our data suggest that altering the short-term dynamics of excitatory inputs onto SST neurons reduces the divergent spiking dynamics in barrels versus septa during repetitive single- and multi-whisker stimulation. Future work could examine how SST neurons, whose somata reside in barrels and septa, respond to different whisker stimuli and the circuits in which they are embedded. At this time, however, the authors believe there is no alternative way to test how the short-term dynamics of excitatory inputs onto SST neurons, as a whole, contribute to the temporal aspects of barrel versus septa spiking.

      The study's reach is further constrained by the fact that results were obtained in anesthetized animals, which may not generalize to awake states.

      We appreciate the reviewer’s concern regarding the generalizability of our findings from anesthetized animals to awake states. Anesthesia was employed to ensure precise individual whisker stimulation (and multi-whisker in the same animal), which is challenging in awake rodents due to active whisking. While anesthesia may alter higher-order processing, core mechanisms, such as short and long term plasticity in the barrel cortex, are preserved under anesthesia (Martin-Cortecero et al., 2014; Mégevand et al., 2009).

      The statistical analysis appears inappropriate, with the use of repeated independent tests, dramatically boosting the false positive error rate.

      Thank you for your feedback on our analysis using independent rank-based tests for each time point in wild-type (WT) animals. To address concerns regarding multiple comparisons and temporal dependencies (for Figure 1F and 4D for now but we will add more in our revision), we performed a repeated measures ANOVA for WT animals (13 Barrel, 8 Septa, 20 time points), which revealed a significant main effect of Condition (F(1,19) = 16.33, p < 0.001) and a significant Condition-Time interaction (F(19,361) = 2.37, p = 0.001). Post-hoc tests confirmed significant differences between Barrel and Septa at multiple time points (e.g., p < 0.0025 at times 3, 4, 6, 7, 8, 10, 11, 12, 16, 19 after Bonferroni posthoc correction), supporting a differential multi-whisker vs. single-whisker ratio response in WT animals. In contrast, a repeated measures ANOVA for knock-out (KO) animals (11 Barrel, 7 Septa, 20 time points) showed no significant main effect of Condition (F(1,14) = 0.17, p = 0.684) or Condition-Time interaction (F(19,266) = 0.73, p = 0.791), indicating that the Barrel-Septa difference observed in WT animals is absent in KO animals.

      Furthermore, the manuscript suffers from imprecision; its conclusions are occasionally vague or overstated. The authors suggest a role for SST+ neurons in the observed divergence in SWS/MWS responses between barrel and septal domains. However, this remains speculative, and some findings appear inconsistent. For instance, the increased response of SST+ neurons to MWS versus SWS is not confined to a specific domain. Why, then, would preferential recruitment of SST+ neurons lead to divergent dynamics between barrel and septal regions? The higher density of SST+ neurons in septal versus barrel L4 is not a sufficient explanation, particularly since the SWS/MWS response divergence is also observed in layers 2/3, where no difference in SST+ neuron density is found.

      Moreover, SST+ neuron-mediated inhibition is not necessarily restricted to the layer in which the cell body resides. It remains unclear through which differential microcircuits (barrel vs septum) the enhanced recruitment of SST+ neurons could account for the divergent responses to repeated SWS versus MWS stimulation.

      We fully appreciate the reviewer’s comment. We currently do not provide any evidence on the contribution of SST neurons in the barrels versus septa in layer 4 on the response divergence of spiking observed in SWS versus MWS. We only show that these neurons differentially distribute in the two domains in this layer. It is certainly known that there is molecular and circuit-based diversity of SST-positive neurons in different layers of the cortex, so it is plausible that this includes cells located in the two domains of vS1, something which has not been examined so far. Our data on their distribution are one piece of information that SST neurons may have a differential role in inhibiting barrel stellate cells versus septa ones. Morphological reconstructions of SST neurons in L4 of the somatosensory barrel cortex has shown that their dendrites and axons project locally and may confine to individual domains, even though not specifically examined (Fig. 3 of Scala F et al., 2019). The same study also showed that L4 SST cells receive excitatory input from local stellate cells) and is known that they are also directly excited by thalamocortical fibers (Beierlein et al., 2003; Tan et al., 2008), both of which facilitate.

      As shown in our supplementary figure, the divergence is also observed in L2/3 where, as the reviewer also points out, where we do not have a differential distribution of SST cells, at least based on a columnar analysis extending from L4. There are multiple scenarios that could explain this “discrepancy” that one would need to examine further in future studies. One straightforward one is that the divergence in spiking in L2/3 domains may be inherited from L4 domains, where L4 SST act on. Another is that even though L2/3 SST neurons are not biased in their distribution their input-output function is, something which one would need to examine by detailed in vitro electrophysiological and perhaps optogenetic approaches in S1. Despite the distinctive differences that have been found between the L4 circuitry in S1 and V1 (Scala F et al., 2019), recent observations indicate that small but regular patches of V1 marked by the absence of muscarinic receptor 2 (M2) have high temporal acuity (Ji et al., 2015), and selectively receive input from SST interneurons (Meier et al., 2025). Regions lacking M2 have distinct input and output connectivity patterns from those that express M2 (Meier et al., 2021; Burkhalter et al., 2023). These findings, together with ours, suggest that SST cells preferentially innervate and regulate specific domains -columns- in sensory cortices.

      Regardless of the mechanism, the Elfn1 knock-out mouse line almost exclusively affects the incoming excitability onto SST neurons (see also reply to comment below), hence what can be supported by our data is that changing the incoming short-term synaptic plasticity onto these neurons brings the spiking dynamics between barrels and septa closer together.

      The Elfn1 KO mouse model seems too unspecific to suggest the role of the short-term plasticity in SST+ neurons in the differential response to repeated SWS vs MWS stimulation across domains. Why would Elfn1-dependent short-term plasticity in SST+ neurons be specific to a pathway, or a stimulation type (SWS vs MWS)? Moreover, the authors report that Elfn1 knockout alters synapses onto VIP+ as well as SST+ neurons (Stachniak et al., 2021; previous version of this paper)-so why attribute the phenotype solely to SST+ circuitry? In fact, the functional distinctions between barrel and septal domains appear largely abolished in the Elfn1 KO.

      Previous work by others and us has shown that globally removing Elfn1 selectively removes a synaptic process from the brain without altering brain anatomy or structure. This allows us to study how the temporal dynamics of inhibition shape activity, as opposed to inhibition from particular cell types. We will nevertheless update the text to discuss more global implications for SST interneuron dynamics and include a reference to VIP interneurons that contain Elfn1.

      When comparing SWS to MWS, we find that MWS replaces the neighboring excitation which would normally be preferentially removed by short-term plasticity in SST interneurons, thus providing a stable control comparison across animals and genotypes. On average, VIP interneurons failed to show modulation by MWS. We were unable to measure a substantial contribution of VIP cells to this process and also note that the Elfn1 expressing multipolar neurons comprise only ~5% of VIP neurons (Connor and Peters, 1984; Stachniak et al., 2021), a fraction that may be lost when averaging from 138 VIP cells. Moreover, the effect of Elfn1 loss on VIP neurons is quite different and marginal compared to that of SST cells, suggesting that the primary impact of Elfn1 knockout is mediated through SST+ interneuron circuitry. Therefore, even if we cannot rule out that these 5% of VIP neurons contribute to barrel domain segregation, we are of the opinion that their influence would be very limited if any.

      Reviewer #2 (Public review):

      Summary:

      Argunsah and colleagues demonstrate that SST-expressing interneurons are concentrated in the mouse septa and differentially respond to repetitive multi-whisker inputs. Identifying how a specific neuronal phenotype impacts responses is an advance.

      Strengths:

      (1) Careful physiological and imaging studies.

      (2) Novel result showing the role of SST+ neurons in shaping responses.

      (3) Good use of a knockout animal to further the main hypothesis.

      (4) Clear analytical techniques.

      We thank the reviewer for their appreciation of the study.

      Weaknesses:

      No major weaknesses were identified by this reviewer. Overall, I appreciated the paper but feel it overlooked a few issues and had some recommendations on how additional clarifications could strengthen the paper. These include:

      (1) Significant work from Jerry Chen on how S1 neurons that project to M1 versus S2 respond in a variety of behavioral tasks should be included (e.g. PMID: 26098757). Similarly, work from Barry Connor's lab on intracortical versus thalamocortical inputs to SST neurons, as well as excitatory inputs onto these neurons (e.g. PMID: 12815025) should be included.

      We thank the reviewer for these valuable resources that we overlooked. We will include Chen et al. (2015), Cruikshank et al. (2007) and Gibson et al. (1999) to contextualize S1 projections and SST+ inputs, strengthening the study’s foundation as well as Beierlein et al. (2003) which nicely show both local and thalamocortical facilitation of excitatory inputs onto L4 SST neurons, in contrast to PV cells. The paper also shows the gradual recruitment of SST neurons by thalamocortical inputs to provide feed-forward inhibition onto stellate cells (regular spiking) of the barrel cortex L4 in rat.

      (2) Using Layer 2/3 as a proxy to what is happening in layer 4 (~line 234). Given that layer 2/3 cells integrate information from multiple barrels, as well as receiving direct VPm thalamocortical input, and given the time window that is being looked at can receive input from other cortical locations, it is not clear that layer 2/3 is a proxy for what is happening in layer 4.

      We agree with the reviewer that what we observe in L2/3 is not necessarily what is taking place in L4 SST-positive cells. The data on L2/3 was included to show that these cells, as a population, can show divergent responses when it comes to SWS vs MWS, which is not seen in L2/3 VIP neurons. Regardless of the mechanisms underlying it, our overall data support that SST-positive neurons can change their activation based on the type of whisker stimulus and when the excitatory input dynamics onto these neurons change due to the removal of Elfn1 the recruitment of barrels vs septa spiking changes at the temporal domain. Having said that, the data shown in Supplementary Figure 3 on the response properties of L2/3 neurons above the septa vs above the barrels (one would say in the respective columns) do show the same divergence as in L4. This suggests that a circuit motif may exist that is common to both layers, involving SST neurons that sit in L4, L5 or even L2/3. This implies that despite the differences in the distribution of SST neurons in septa vs barrels of L4 there is an unidentified input-output spatial connectivity motif that engages in both L2/3 and L4. Please also see our response to a similar point raised by reviewer 1.

      (3) Line 267, when discussing distinct temporal response, it is not well defined what this is referring to. Are the neurons no longer showing peaks to whisker stimulation, or are the responses lasting a longer time? It is unclear why PV+ interneurons which may not be impacted by the Elfn1 KO and receive strong thalamocortical inputs, are not constraining activity.

      We thank the reviewer for their comment and will clarify the statement.

      This convergence of response profiles was further clear in stimulus-aligned stacked images, where the emergent differences between barrels and septa under SWS were largely abolished in the KO (Figure 4B). A distinction between directly stimulated barrels and neighboring barrels persisted in the KO. In addition, the initial response continued to differ between barrel and septa and also septa and neighbor (Figure 4B). This initial stimulus selectivity potentially represents distinct feedforward thalamocortical activity, which includes PV+ interneuron recruitment that is not directly impacted by the Elfn1 KO (Sun et al., 2006; Tan et al., 2008). PV+ cells are strongly excited by thalamocortical inputs, but these exhibit short-term depression, as does their output, contrasting with the sustained facilitation observed in SST+ neurons. These findings suggest that in WT animals, activity spillover from principal barrels is normally constrained by the progressive engagement of SST+ interneurons in septal regions, driven by Elfn1-dependent facilitation at their excitatory synapses. In the absence of Elfn1, this local inhibitory mechanism is disrupted, leading to longer responses in barrels, delayed but stronger responses in septa, and persistently stronger responses in unstimulated neighbors, resulting in a loss of distinction between the responses of barrel and septa domains that normally diverge over time (see Author response image 1 below).

      Author response image 1.

      A) Barrel responses are longer following whisker stimulation in KO. B) Septal responses are slightly delayed but stronger in KO. C) Unstimulated neighbors show longer persistent responses in KO.

      (4) Line 585 "the earliest CSD sink was identified as layer 4..." were post-hoc measurements made to determine where the different shank leads were based on the post-hoc histology?

      Post hoc histology was performed on plane-aligned brain sections which would allow us to detect barrels and septa, so as to confirm the insertion domains of each recorded shank. Layer specificity of each electrode therefore could therefore not be confirmed by histology as we did not have coronal sections in which to measure electrode depth.

      (5) For the retrograde tracing studies, how were the M1 and S2 injections targeted (stereotaxically or physiologically)? How was it determined that the injections were in the whisker region (or not)?

      During the retrograde virus injection, the location of M1 and S2 injections was determined by stereotaxic coordinates (Yamashita et al., 2018). After acquiring the light-sheet images, we were able to post hoc examine the injection site in 3D and confirm that the injections were successful in targeting the regions intended. Although it would have been informative to do so, we did not functionally determine the whisker-related M1 and whisker-related S2 region in this experiment.

      (6) Were there any baseline differences in spontaneous activity in the septa versus barrel regions, and did this change in the KO animals?

      Thank you for this interesting question. Our previous study found that there was a reduction in baseline activity in L4 barrel cortex of KO animals at postnatal day (P)12, but no differences were found at P21 (Stachniak et al., 2023).

      Reviewer #3 (Public review):

      Summary:

      This study investigates the functional differences between barrel and septal columns in the mouse somatosensory cortex, focusing on how local inhibitory dynamics, particularly involving Elfn1-expressing SST⁺ interneurons, may mediate temporal integration of multi-whisker (MW) stimuli in septa. Using a combination of in vivo multi-unit recordings, calcium imaging, and anatomical tracing, the authors propose that septa integrate MW input in an Elfn1-dependent manner, enabling functional segregation from barrel columns.

      Strengths:

      The core hypothesis is interesting and potentially impactful. While barrels have been extensively characterized, septa remain less understood, especially in mice, and this study's focus on septal integration of MW stimuli offers valuable insights into this underexplored area. If septa indeed act as selective integrators of distributed sensory input, this would add a novel computational role to cortical microcircuits beyond what is currently attributed to barrels alone. The narrative of this paper is intellectually stimulating.

      We thank the reviewer for finding the study intellectually stimulating.

      Weaknesses:

      The methods used in the current study lack the spatial and cellular resolution needed to conclusively support the central claims. The main physiological findings are based on unsorted multi-unit activity (MUA) recorded via low-channel-count silicon probes. MUA inherently pools signals from multiple neurons across different distances and cell types, making it difficult to assign activity to specific columns (barrel vs. septa) or neuron classes (e.g., SST⁺ vs. excitatory).

      The recording radius (~50-100 µm or more) and the narrow width of septa (~50-100 µm or less) make it likely that MUA from "septal" electrodes includes spikes from adjacent barrel neurons.

      The authors do not provide spike sorting, unit isolation, or anatomical validation that would strengthen spatial attribution. Calcium imaging is restricted to SST⁺ and VIP⁺ interneurons in superficial layers (L2/3), while the main MUA recordings are from layer 4, creating a mismatch in laminar relevance.

      We thank the reviewer for pointing out the possibility of contamination in septal electrodes. Importantly, it may not have been highlighted, although reported in the methods, but we used an extremely high threshold (7.5 std, in methods, line 583) for spike detection in order to overcome the issue raised here, which restricts such spatial contaminations. Since the spike amplitude decays rapidly with distance, at high thresholds, only nearby neurons contribute to our analysis, potentially one or two. We believe that this approach provides a very close approximation of single unit activity (SUA) in our reported data. We will include a sentence earlier in the manuscript to make this explicit and prevent further confusion.

      Regarding the point on calcium imaging being performed on L2/3 SST and VIP cells instead of L4. Both reviewer 1 and 2 brought up the same issue and we responded as follows. As shown in our supplementary figure, the divergence is also observed in L2/3 where we do not have a differential distribution of SST cells, at least based on a columnar analysis extending from L4. There are multiple scenarios that could explain this “discrepancy” that one would need to examine further in future studies. One straightforward one is that the divergence in spiking in L2/3 domains may be inherited from L4 domains, where L4 SST act on. Another is that even though L2/3 SST neurons are not biased in their distribution their input-output function is, something which one would need to examine by detailed in vitro electrophysiological and perhaps optogenetic approaches in S1. Despite the distinctive differences that have been found between the L4 circuitry in S1 and V1 (Scala F et al., 2019), recent observations indicate that small but regular patches of V1 marked by the absence of muscarinic receptor 2 (M2) have high temporal acuity (Ji et al., 2015), and selectively receive input from SST interneurons (Meier et al., 2025). Regions lacking M2 have distinct input and output connectivity patterns from those that express M2 (Meier et al., 2021; Burkhalter et al., 2023). These findings, together with ours, suggest that SST cells preferentially innervate and regulate specific domains -columns- in sensory cortices.

      Furthermore, while the role of Elfn1 in mediating short-term facilitation is supported by prior studies, no new evidence is presented in this paper to confirm that this synaptic mechanism is indeed disrupted in the knockout mice used here.

      We thank Reviewer #3 for noting the absence of new evidence confirming Elfn1’s disruption of short-term facilitation in our knockout mice. We acknowledge that our study relies on previously strong published data demonstrating that Elfn1 mediates short-term synaptic facilitation of excitatory inputs onto SST+ interneurons (Sylwestrak and Ghosh, 2012; Tomioka et al., 2014; Stachniak et al., 2019, 2023). These studies consistently show that Elfn1 knockout abolishes facilitation in SST+ synapses, leading to altered temporal dynamics, which we hypothesize underlies the observed loss of barrel-septa response divergence in our Elfn1 KO mice (Figure 4). Nevertheless, to address the point raised, we will clarify in the revised manuscript (around lines 245-247 and 271-272) that our conclusions are based on these established findings, stating: “Building on prior evidence that Elfn1 knockout disrupts short-term facilitation in SST+ interneurons (Sylwestrak and Ghosh, 2012; Tomioka et al., 2014; Stachniak et al., 2019, 2023), we attribute the abolished barrel-septa divergence in Elfn1 KO mice to altered SST+ synaptic dynamics, though direct synaptic measurements were not performed here.”

      Additionally, since Elfn1 is constitutively knocked out from development, the possibility of altered circuit formation-including changes in barrel structure and interneuron distribution, cannot be excluded and is not addressed.

      We thank Reviewer #3 for raising the valid concern that constitutive Elfn1 knockout could potentially alter circuit formation, including barrel structure and interneuron distribution. To address this, we will clarify in the revised manuscript (around line ~271 and in the Discussion) that in our previous studies that included both whole-cell patch-clamp in acute brain slices ranging from postnatal day 11 to 22 (P11 - P21) and in vivo recordings from barrel cortex at P12 and P21, we saw no gross abnormalities in barrel structure, with Layer 4 barrels maintaining their characteristic size and organization, consistent with wild-type (WT) mice (Stachniak et al., 2019, 2023). While we cannot fully exclude subtle developmental changes, prior studies indicate that Elfn1 primarily modulates synaptic function rather than cortical cytoarchitecture (Tomioka et al., 2014). Elfn1 KO mice show no gross morphological or connectivity differences and the pattern and abundance of Elfn1 expressing cells (assessed by LacZ knock in) appears normal (Dolan and Mitchell, 2013).

      We will add the following to the Discussion: “Although Elfn1 is constitutively knocked out, we find here and in previous studies that barrel structure is preserved (Stachniak et al., 2019, 2023). Further, the distribution of Elfn1 expressing interneurons is not different in KO mice, suggesting minimal developmental disruption (Dolan and Mitchell, 2013). Nonetheless, we acknowledge that subtle circuit changes cannot be ruled out without the usage of time-depended conditional knockout of the gene.”

      References

      (1) Beierlein, M., Gibson, J. R. & Connors, B. W. (2003). Two dynamically distinct inhibitory networks in layer 4 of the neocortex. J. Neurophysiol. 90, 2987–3000.

      (2) Burkhalter, A., D’Souza, R. D. & Ji, W. (2023). Integration of feedforward and feedback information streams in the modular architecture of mouse visual cortex. Annu. Rev. Neurosci. 46, 259–280.

      (3) Chen, J. L., Margolis, D. J., Stankov, A., Sumanovski, L. T., Schneider, B. L. & Helmchen, F. (2015). Pathway-specific reorganization of projection neurons in somatosensory cortex during learning. Nat. Neurosci. 18, 1101–1108.

      (4) Connor, J. R. & Peters, A. (1984). Vasoactive intestinal polypeptide-immunoreactive neurons in rat visual cortex. Neuroscience 12, 1027–1044.

      (5) Cruikshank, S. J., Lewis, T. J. & Connors, B. W. (2007). Synaptic basis for intense thalamocortical activation of feedforward inhibitory cells in neocortex. Nat. Neurosci. 10, 462–468.

      (6) Dolan, J. & Mitchell, K. J. (2013). Mutation of Elfn1 in mice causes seizures and hyperactivity. PLoS One 8, e80491.

      (7) Gibson, J. R., Beierlein, M. & Connors, B. W. (1999). Two networks of electrically coupled inhibitory neurons in neocortex. Nature 402, 75–79.

      (8) Ji, W., Gămănuţ, R., Bista, P., D’Souza, R. D., Wang, Q. & Burkhalter, A. (2015). Modularity in the organization of mouse primary visual cortex. Neuron 87, 632–643.

      (9) Martin-Cortecero, J. & Nuñez, A. (2014). Tactile response adaptation to whisker stimulation in the lemniscal somatosensory pathway of rats. Brain Res. 1591, 27–37.

      (10) Mégevand, P., Troncoso, E., Quairiaux, C., Muller, D., Michel, C. M. & Kiss, J. Z. (2009). Long-term plasticity in mouse sensorimotor circuits after rhythmic whisker stimulation. J. Neurosci. 29, 5326–5335.

      (11) Meier, A. M., Wang, Q., Ji, W., Ganachaud, J. & Burkhalter, A. (2021). Modular network between postrhinal visual cortex, amygdala, and entorhinal cortex. J. Neurosci. 41, 4809–4825.

      (12) Meier, A. M., D’Souza, R. D., Ji, W., Han, E. B. & Burkhalter, A. (2025). Interdigitating modules for visual processing during locomotion and rest in mouse V1. bioRxiv 2025.02.21.639505.

      (13) Scala, F., Kobak, D., Shan, S., Bernaerts, Y., Laturnus, S., Cadwell, C. R., Hartmanis, L., Froudarakis, E., Castro, J. R., Tan, Z. H., et al. (2019). Layer 4 of mouse neocortex differs in cell types and circuit organization between sensory areas. Nat. Commun. 10, 4174.

      (14) Stachniak, T. J., Sylwestrak, E. L., Scheiffele, P., Hall, B. J. & Ghosh, A. (2019). Elfn1-induced constitutive activation of mGluR7 determines frequency-dependent recruitment of somatostatin interneurons. J. Neurosci. 39, 4461–4475.

      (15) Stachniak, T. J., Kastli, R., Hanley, O., Argunsah, A. Ö., van der Valk, E. G. T., Kanatouris, G. & Karayannis, T. (2021). Postmitotic Prox1 expression controls the final specification of cortical VIP interneuron subtypes. J. Neurosci. 41, 8150–8166.

      (16) Stachniak, T. J., Argunsah, A. Ö., Yang, J. W., Cai, L. & Karayannis, T. (2023). Presynaptic kainate receptors onto somatostatin interneurons are recruited by activity throughout development and contribute to cortical sensory adaptation. J. Neurosci. 43, 7101–7118.

      (17) Sun, Q.-Q., Huguenard, J. R. & Prince, D. A. (2006). Barrel cortex microcircuits: Thalamocortical feedforward inhibition in spiny stellate cells is mediated by a small number of fast-spiking interneurons. J. Neurosci. 26, 1219–1230.

      (18) Sylwestrak, E. L. & Ghosh, A. (2012). Elfn1 regulates target-specific release probability at CA1-interneuron synapses. Science 338, 536–540.

      (19) Tan, Z., Hu, H., Huang, Z. J. & Agmon, A. (2008). Robust but delayed thalamocortical activation of dendritic-targeting inhibitory interneurons. Proc. Natl. Acad. Sci. USA 105, 2187–2192.

      (20) Tomioka, N. H., Yasuda, H., Miyamoto, H., Hatayama, M., Morimura, N., Matsumoto, Y., Suzuki, T., Odagawa, M., Odaka, Y. S., Iwayama, Y., et al. (2014). Elfn1 recruits presynaptic mGluR7 in trans and its loss results in seizures. Nat. Commun. 5, 4501.

      (21) Yamashita, T., Vavladeli, A., Pala, A., Galan, K., Crochet, S., Petersen, S. S. & Petersen, C. C. (2018). Diverse long-range axonal projections of excitatory layer 2/3 neurons in mouse barrel cortex. Front. Neuroanat. 12, 33.

    1. Author response:

      eLife Assessment:

      This important study investigates the propensity of the intravacuolar pathogen, Leishmania, to scavenge lipids which it utilizes for its accelerated growth within macrophages. Although some of the data compellingly links increased lipid acquisition to parasite growth, data to support the underlying mechanism to describe the proposed model is incomplete. The study adds to other work that has implicated pathogen-derived processes in the selective recruitment of vesicles to the pathogen-containing vacuole, based on the content of the cargo.

      We appreciate the time and effort that Editor and Reviewers have provided to provide the assessment of our work (eLife: eLife-RP-RA-2024-102857). We thank them all for this assessment.

      Regarding some of the concerns raised by Reviewer 1, particularly the lack of data on NPC-1 knockdown, we would like to clarify that this information was included in our original submission (as elaborated in detail in the following section). Additionally, we acknowledge that one of the major concerns about the completeness of our work stems from Reviewer 1’s comments on the isolation and purity of the parasitophorous vacuole (PV). Reviewer 2 has also emphasized the importance of this experiment in strengthening the technical rigor of our study, and we fully agree with this recommendation. We acknowledge that this is a very appropriate suggestion by both the Reviewers and we will include this data in the subsequent revision of this work for revaluation of assessment. Also, ahead of a full revision of the paper, we would like to address the concerns raised by the reviewers outlining our revision plans.

      Public Reviews:

      Reviewer #1 (Public review):

      Although the use of antimony has been discontinued in India, the observation that there are Leishmania parasites that are resistant to antimony in circulation has been cited as evidence that these resistant parasites are now a distinct strain with properties that ensure their transmission and persistence. It is of interest to determine what are the properties that favor the retention of their drug resistance phenotype even in the absence of the selective pressure that would otherwise be conferred by the drug. The hypothesis that these authors set out to test is that these parasites have developed a new capacity to acquire and utilize lipids, especially cholesterol which affords them the capacity to grow robustly in infected hosts.

      We sincerely appreciate Reviewer 1's thoughtful and positive evaluation of our manuscript. We acknowledge that the reviewer has a few major concerns, and we would like to address them one by one in the following section of this initial response before submitting a full revision of our work.

      Major issues:

      (1) There are several experiments for which they do not provide sufficient details, but proceed to make significant conclusions.

      Experiments in section 5 are poorly described. They supposedly isolated PVs from infected cells. No details of their protocol for the isolation of PVs are provided. They reference a protocol for PV isolation that focused on the isolation of PVs after L. amazonensis infection. In the images of infection that they show, by 24 hrs, infected cells harbor a considerable number of parasites. Is it at the 24 hr time point that they recover PVs? What is the purity of PVs? The authors should provide evidence of the success of this protocol in their hands. Earlier, they mentioned that using imaging techniques, the PVs seem to have fused or interconnected somehow. Does this affect the capacity to recover PVs? If more membranes are recovered in the PV fraction, it may explain the higher cholesterol content.

      We would like to thank the reviewer for correctly pointing out lack of details regarding PV isolation and its purity. There are multiple questions raised by the reviewer and we will answer them one by one in a point wise manner:

      Firstly, “Is it at the 24 hr time point that they recover PVs?”

      In the ‘Methods’ section of the original submission (Line number-606-611), there is a separate section on “Parasitophorous vacuole (PV) Isolation and cholesterol measurement”, where it is clearly mentioned, “24Hrs LD infected KCs were lysed by passing through a 22-gauge syringe needle to release cellular contents. Parasitophorous vacuoles (PV) were then isolated using a previously outlined protocol [Ref: 73].” However, we do acknowledge further details might be useful to enrich this section, and hence we would like to include the following details in the revised manuscript, “10<sup>7</sup> KCs were seeded in a 100 mm plate and allowed to adhere for 24 hours. Following infection with Leishmania donovani (LD) for 24 hours, the infected KCs were harvested by gentle scraping and lysed through five successive passages through an insulin needle to ensure membrane disruption while preserving organelle integrity. The lysate was centrifuged at 200 × g for 10 minutes at 4°C to remove intact cells and large debris. The resulting supernatant was carefully collected and subjected to a discontinuous sucrose density gradient (60%, 40%, and 20%). The gradient was centrifuged at 700 × g for 25 minutes at 4°C to facilitate organelle separation. The interphase between the 40% and 60% sucrose layers, enriched with PVs, was carefully collected and subjected to a final centrifugation step at 12,000 × g for 25 minutes at 4°C. The supernatant was discarded, and the resulting pellet was enriched for purified parasitophorous vacuoles, suitable for downstream biochemical and molecular analyses.”

      Secondly, What is the purity of PVs? Earlier, they mentioned that using imaging techniques, the PVs seem to have fused or interconnected somehow. Does this affect the capacity to recover PVs? If more membranes are recovered in the PV fraction, it may explain the higher cholesterol content.

      We appreciate the reviewer for pointing this critical lack of data in the current version of the manuscript. We will be providing data on the purity of isolated fraction by performing western blot against PV and cytoplasmic fraction in the Revised manuscript. We admit, as rightly pointed out by the reviewer we need to access the purity of isolated PV in our experiment and we plan to show this is in the Revised manuscript along with a biochemical quantification of total PV membrane isolated under different experimental condition using Amplex Red kit (Invitrogen™ A12216) or similar other methods.

      (2) In section 6 they evaluate the mechanism of LDL uptake in macrophages. Several approaches and endocytic pathway inhibitors are employed. The authors must be aware that the role of cytochalasin D in the disruption of fluid phase endocytosis is controversial. Although they reference a study that suggests that cytochalasin D has no effect on fluid-phase endocytosis, other studies have found the opposite (doi: 10.1371/journal.pone.0058054). It wasn't readily evident what concentrations were used in their study. They should consider testing more than 1 concentration of the drug before they make their conclusions on their findings on fluid phase endocytosis.

      We thank the reviewer for this insightful comment and we apologise for missing out mentioning Cytochalasin D concentration. To clarify, LDL uptake by LD-R infected KCs is LDL-receptor independent as clearly shown in Section 6, Figure 4A, Figure S4A, Figure S4B i and Figure S4B ii in the Submitted manuscript. In (Figure 4F and Figure S4D) of the Submitted manuscript, as referred by the Reviewer, Cytochalasin D was used at a concentration of 2.5µg/ml. At this concentration, we did not observe any effect of Cytochalsin D on LDL-receptor independent fluid phase endocytosis as intracellular LD-R amastigotes was able to uptake LDL successfully and proliferate in infected Kupffer cells, unlike Latranculin-A (5µM) treatment which completely inhibited intracellular proliferation of LD-R amastigotes by blocking only receptor independent Fluid phase endocytosis (Movie 2A and 2B and Figure 4E in the Submitted manuscript). In fact, the study referred by the reviewer (doi: 10.1371/journal.pone.0058054), used a concentration of 4µg/ml Cytochalasin D which did affect both LDL-receptor dependent and also receptor independent endocytosis in bone marrow derived macrophages. We would also like to clarify that in this work during our preliminary experiments we have also tested higher concentration Cytochalasin-D (5µg/ml). However, even at this higher concentration there were no significant effect of Cytochalasin-D on LD-R induced LDL-receptor independent fluid phase endocytosis as observed from intracellular LD-R amastigote count represented in Author response image 1. Thus, we strongly believe that Cytochalasin D does not have any impact on LD-R induced fluid phase endocytosis even at higher concentration. We will include this in the discussion section of the revised manuscript to clear out any confusion that readers might have, and also concentration of all the inhibitors used in the study will be mentioned in the Result section, as well as in the revised Figure legends.

      Author response image 1.

      A. Giemsa-stained images illustrating the impact of concentrations of CYT-D (2.5 and 5 µg/ml) on LD-R-infected Kupffer cells. Black arrow showing intracellular amastigotes. Scale bar 10µM. B. Graphical representation depicting the effect of varying concentrations of CYT-D on the intracellular growth of LD-R. ‘ns’ depicts no significant change.

      (3) In Figure 5 they present a blot that shows increased Lamp1 expression from as early as 4 hrs after infection with LD-R and by 12 hrs after infection of both LD-S and LD-R. Increased Lamp1 expression after Leishmania infection has not been reported by others. By what mechanism do they suggest is causing such a rapid increase (at 4hrs post-infection) in Lamp-1 protein? As they report, their RNA seq data did not show an increase in LAMP1 transcription (lines 432 - 434).

      We would like to express our gratitude to the reviewer for highlighting the novelty of this observation. Indeed, to the best of our knowledge, no similar findings have been reported previously in primary macrophages infected with Leishmania donovani (LD). Firstly, we would like to point out, as stated in the Methods section (lines 562–566) of the Submitted manuscript: "Flow-sorted metacyclic LD promastigotes were used at a MOI of 1:10 (with variations of 1:5 and 1:20 in some cases) for 4 hours, which was considered the 0th point of infection. Macrophages were subsequently washed to remove any extracellular loosely attached parasites and incubated further as per experimental requirements.” This indicates that our actual study points correspond to approximately the 8th hour post-infection”. We just wanted to clarify this to prevent any potential confusion.

      Now regarding LAMP1 expression, although we could not find any previous reports of its expression in LD infected primary macrophages, we would like to mention that a previous report (doi.org/10.1128/mBio.01464-20), has shown a similar punctuated LAMP-1 upregulation (as observed by us in Figure 5A i of the Submitted manuscript) in response to leishmania infection in non-phagocytic fibroblast. It is tempting to speculate that increased LAMP-1 expression observed in response to LD-R infected macrophages might be due to increased lysosomal biogenesis, required for degrading increased endocytosed-LDL into bioavailable cholesterol.  However, since no change in LAMP-1 expression in RNA seq data (Figure 6, of the Submitted manuscript), we can only speculate that this is happening due to some post transcriptional or post translational modifications. But further work will definitely require to investigate this mechanism in details which is beyond the scope of this work. That is why, in the Submitted manuscript, (Line 432-435), we have discussed this, “Although available RNAseq analysis (Figure 6) did not support this increased expression of lamp-1 in the transcript level, it did reflect a notable upregulation of vesicular fusion protein (VSP) vamp8 and stx1a in response to LD-R-infection. LD infection can regulate LAMP-1 expression, and the role of VSPs in LDL-vesicle fusion with LD-R-PV is worthy of further investigation.”

      However, we agree with the reviewer that this might not be enough for the clarification. Hence in the revised manuscript we plan to update this part as follows, “Although available RNAseq analysis (Figure 6) did not support this increased expression of lamp-1 in the transcript level, it did reflect a notable upregulation of vesicular fusion protein (VSP) vamp8 and stx1a in response to LD-R-infection. How, LD infection can regulate LAMP-1 expression, and the role of VSPs in LDL-vesicle fusion with LD-R-PV is worthy of further investigation. It is possible and has been earlier reported that LD infection can regulate host proteins expression through post transcriptional and post translational modifications (doi.org/10.1111/pim.12156, doi.org/10.3389/fmicb.2017.00314, doi: 10.3389/fimmu.2023.1287539). It is tempting to speculate that LD-R amastigote might be promoting an increased lysosomal biogenesis through any such mechanism to increase supply of bioavailable cholesterol through action of lysosomal acid hydrolases on LDL.”

      (4) In Figure 6, amongst several assays, they reported on studies where SPC-1 is knocked down in PECs. They failed to provide any evidence of the success of the knockdown, but nonetheless showed greater LD-R after NPC-1 was knocked down. They should provide more details of such experiments.

      Although we do understand the concern raised by the reviewer, this statement in question is factually incorrect. We would like to point out that in Figure 6 F i, of the Submitted manuscript, we have demonstrated decreased NPC-1 staining following transfection with NPC-1-specific siRNA, whereas no such reduction was observed with scrambled RNA. Similar immunofluorescence data confirming LDL-receptor knockdown has also been provided in Figure S4B i of the Submitted manuscript. However, we acknowledge that the reviewer may be referring to the lack of quantitative validation of the knockdown via Western blot. We would like to clarify although, we already had this data, but we did not include it to avoid duplication to reduce the data density of the manuscript. But as suggested by the reviewer, we will be including western blot for both NPC-1 and LDL-receptor knock down in the revised manuscript as represented in Author response image 2. Additionally, as suggested by the reviewer, we also noticed lack of details in Methods section of the submitted manuscript, concerning siRNA mediated Knock down (KD). Therefore, we plan to include more details in the revised manuscript, which will read as, “For all siRNA transfections, Lipofectamine® RNAiMAX Reagent (Life Technologies, 13778100) specifically designed for knockdown assays in primary cells was used according to the manufacturer's instructions with slight modifiction. PECs were seeded into 24-well plates at a density of 1x10<sup>5</sup> per well, and incubated at 37°C with 5% CO2. The transfection complex, comprising (1µl Lipofectamine® RNAiMAX and 50µl Opti MEM) and (1 µl siRNA and 50µl Opti MEM) mixed together directly added to the incubated PECs. Gene silencing was checked by IFA and by Western blot as mentioned previously”.

      Author response image 2.

      SiRNA-mediated gene knockdown analysis. (A-i, A-ii) Representative immunofluorescence microscopy image and corresponding Western blot analysis demonstrating the knockdown efficiency of NPC1 following SiRNA-mediated gene silencing, scale bar 10µm. (B-i, B-ii) Immunofluorescence image and Western blot confirming LDLr knockdown upon SiRNA treatment. Scrambled RNA (ScRNA) was used as a negative control, while Small Interfering RNA (SiRNA) specifically targeted NPC1 and LDLr transcripts, scale bar 10µm. TR-1 and TR-2 represent independent experimental trials. β-Actin was used as an endogenous loading control for Western blot normalization.

      Minor issues

      (1) There is an implication that parasite replication occurs well before 24hrs post-infection? Studies on Leishmania parasite replication have reported on the commencement of replication after 24hrs post-infection of macrophages (PMCID: PMC9642900). Is this dramatic increase in parasite numbers that they observed due to early parasite replication?

      We thank the reviewer for this insightful comment and appreciate the opportunity to clarify our findings. Indeed, as rightly assumed by the Reviewer, as our data suggest, and we also believe that this increase intracellular amastigotes number is a consequence of early replication of Leishmania donovani.  As already mentioned in response to Point number 3 raised by Reviewer 1, we would again like to highlight that in the Methods section (lines 562–566), it is clearly stated: "Flow-sorted metacyclic LD promastigotes were used at a MOI of 1:10 (with variations of 1:5 and 1:20 in some cases) for 4 hours, which was considered the 0th point of infection. Macrophages were subsequently washed to remove any extracellular loosely attached parasites and incubated further as per experimental requirements.” This effectively means that our actual study points correspond to approximately the 8th and 28th hours post-infection and we just want to mention it to avoid any confusion.

      Now, regarding specific concern, the study referred by the reviewer on the commencement of replication after 24hrs, was conducted on Leishmania major, which may differ significantly from Leishmania donovani owing to its species and strain-specific characteristics.  In fact, doubling time of Leishmania donovani (LD) has been previously reported to be approximately 11.4 hours (doi: 10.1111/j.1550-7408. 1990.tb01147.x).  Moreover, multiple studies have indicated an exponential increase in intracellular LD amastigote number (more than two-fold increase) by 24hrs post infection. However, by 48hrs post-infection, the replication rate appeared to slow down, with amastigote numbers not increasing (doubling) proportionally (doi:10.1128/AAC.01196-07, doi.org/10.1016/j.ijpara.2011.07.013). We also have a similar observation for both infected PEC and KC as depicted in Figure 1Ci and Figure S1Ci in the Submitted manuscript) along with Author response image 3. Hence it was an informed decision from our side to focus on 24 hours’ time point to perform the analysis on intracellular proliferation.

      Author response image 3.

      Graph representing number of intracellular LD-R (MHOM/IN/2009/BHU575/0) parasite burden at different time points post-infection. *** signifies p value < 0.0001, * signifies p value < 0.05.

      (2) Several of the fluorescence images in the paper are difficult to see. It would be helpful if a blown-up (higher magnification image of images in Figure 1 (especially D) for example) is presented.

      We apologise for the inconvenience. Although we have provided Zoomed images for several Figures in the Submitted manuscript, like Figure 4, Figure 5, Figure 6 and Figure 8. However, this was not always doable for all the figures (like for Figure 1D), due to lack of space and Figure arrangements requirements. However, to accommodate Reviewer’s request we would like to provide a blown-up image for Figure 1D as represented in Author response image 4 in the Revised version. If the reviewer similar representation for any other particular Figures, we will be happy to perform a similar presentation.

      Author response image 4.

      Three-Dimensional morphometric representation of Parasitophorous Vacuoles (PVs) in Leishmania infected Kupffer Cells at 24 Hours Post-Infection: Confocal 3D reconstruction illustrating the spatial distribution of parasitophorous vacuoles (PVs) in Kupffer cells (KCs) infected for 24 hours. ATP6V0D2, a lysosomal vacuolar ATPase subunit, is visualized in magenta, while the nucleus is depicted in cyan. The final panel highlights PV structural grooves outlined in red solid lines, with intracellular Leishmania donovani (LD) amastigotes indicated by white arrows. Higher magnification of Figure 1D further emphasizes the increased abundance of PVs in LD-R infected cells, suggesting enhanced intracellular replication and adaptation mechanisms of drug-resistant strains. Scale bar 5µM. Both yellow and magenta solid line box represents the same area of the image.

      (3) The times at which they choose to evaluate their infections seem arbitrary. It is not clear why they stopped analysis of their KC infections at 24 hrs. As mentioned above, several studies have shown that this is when intracellular amastigotes start replicating. They should consider extending their analyses to 48 or 72 hrs post-infection. Also, they stop in vitro infection of Apoe-/- mice at 11 days. Why? No explanation is given for why only 1 point after infection.

      Reviewer has raised two independent concerns and we would like to address them individually.

      Firstly, “The times at which they choose to evaluate their infections seem arbitrary. It is not clear why they stopped analysis of their KC infections at 24 hrs. As mentioned above, several studies have shown that this is when intracellular amastigotes start replicating. They should consider extending their analyses to 48 or 72 hrs post-infection.”

      We have already provided a detail justification for time point selection in our response to Reviewer Minor Comment 1. As mentioned already we observed a significant and sharp rise in the number of intracellular amastigotes between 4 and 24Hrs post-infection (Author response image 4), with replication rate appeared to be not increaseing proportionally after that. This early stage of rapid replication of LD amastigotes, therefore likely coincides with a critical period of lipid acquisition by intracellular amastigotes (Movie 2A and 2B and Figure 4E in the submitted manuscript) and thus 24hrs infected KC was specifically selected. In this regard, we would also like to add that at 72hrs post-infection, we noticed a notable number of infected Kupffer cells began detaching from the wells with extracellular amastigotes probably egressing out from the infected KCs. This phenomenon potentially reflects the severe impact of prolonged infection on Kupffer cell viability and adhesion properties as shown in Author response image 5 and Author Response Video 1. This point further influenced our decision to conclude all infection studies in Kupffer cells by the 48Hrs post-infection, which necessitate to complete the infection time point at 24 Hrs, for allowing treatment of Amp-B for another 24 Hrs (Figure 8, and Figure S5, in the Submitted manuscript). We acknowledge that we should have been possibly more clear on our selection of time points and as the Reviewer have suggested we plan to include this information in the revised manuscript for clear understanding of the reader.

      Author response image 5.

      Representative images of Kupffer cells infected with Leishmania donovani at 72Hrs post-infection showing a significant morphological changes. Infected cells exhibit a rounded morphology and progressive detachment. Scale bar 10µm.

      Secondly “Also, they stop in vitro infection of Apoe-/- mice at 11 days. Why? No explanation is given for why only 1 point after infection.”

      We apologize for not providing an explanation regarding the selection of the 11-day time point for Apoe-/- experiments (Figure 2 of the Submitted manuscript). Our rationale for this choice is based on both previous literature and the specific objectives of our study. Previous report suggests that Leishmania donovani infection in Apoe-/- mice triggers a heightened inflammatory response at approximately six weeks’ post-infection compared to C57BL/6 mice, leading to more efficient parasite clearance. This is owing to unique membrane composition of Apoe-/- which rectifies leishmania mediated defective antigen presentation at a later stage of infection (DOI 10.1194/jlr.M026914). Additionally, previous studies (doi: 10.1128/AAC.47.5.1529-1535.2003) have also indicated that Leishmania donovani infection is well-established in vivo within 6 to 11 days post-infection in murine models. Given that in this experiment we particularly aimed to assess the early infection status (parasite load) in diet-induced hypercholesterolemic mice, we would like to argue that the selection of the 11-day time point was intentional and well-aligned with our study objectives as this time point within this window are optimal for capturing initial parasite burden depending on initial lipid utilization, before host-driven immune clearance mechanisms could significantly alter infection dynamics. We will include this explanation in the Revised manuscript as suggested by the Reviewer.

      Reviewer #2 (Public review):

      Summary:

      This study by Pradhan et al. offers critical insights into the mechanisms by which antimony-resistant Leishmania donovani (LD-R) parasites alter host cell lipid metabolism to facilitate their own growth and, in the process, acquire resistance to amphotericin B therapy. The authors illustrate that LD-R parasites enhance LDL uptake via fluid-phase endocytosis, resulting in the accumulation of neutral lipids in the form of lipid droplets that surround the intracellular amastigotes within the parasitophorous vacuoles (PV) that support their development and contribute to amphotericin B treatment resistance. The evidence provided by the authors supporting the main conclusions is compelling, presenting rigorous controls and multiple complementary approaches. The work represents an important advance in understanding how intracellular parasites can modify host metabolism to support their survival and escape drug treatment.

      We would like to sincerely thank the reviewer for appreciating our work and find the evidence compelling to address the issue of emergence of drug resistance in infection with intracellular protozoan pathogens. Before we submit a full revision of the paper, we would like to provide a primary response addressing the concerns of the reviewer.

      Strengths:

      (1) The study utilizes clinical isolates of antimony-resistant L. donovani and provides interesting mechanistic information regarding the increased LD-R isolate virulence and emerging amphotericin B resistance.

      (2) The authors have used a comprehensive experimental approach to provide a link between antimony-resistant isolates, lipid metabolism, parasite virulence, and amphotericin B resistance. They have combined the following approaches:

      (a) In vivo infection models involving BL/6 and Apoe-/- mice.

      (b) Ex-vivo infection models using primary Kupffer cells (KC) and peritoneal exudate macrophages (PEC) as physiologically relevant host cells.

      (c) Various complementary techniques to ascertain lipid metabolism including GC-MS, Raman spectroscopy, microscopy.

      (d) Applications of genetic and pharmacological tools to show the uptake and utilization of host lipids by the infected macrophage resident L. donovani amastigotes.

      (3) The outcome of this study has clear clinical significance. Additionally, the authors have supported their work by including patient data showing a clear clinical significance and correlation between serum lipid profiles and treatment outcomes.

      (4) The present study effectively connects the basic cellular biology of host-pathogen interactions with clinical observations of drug resistance.

      (5) Major findings in the study are well-supported by the data:

      (a) Intracellular LD-R parasites induce fluid-phase endocytosis of LDL independent of LDL receptor (LDLr).

      (b) Enhanced fusion of LDL-containing vesicles with parasitophorous vacuoles (PV) containing LD-R parasites both within infected KCs and PECs cells.

      (c) Intracellular cholesterol transporter NPC1-mediated cholesterol efflux from parasitophorous vacuoles is suppressed by the LD-R parasites within infected cells.

      (d) Selective exclusion of inflammatory ox-LDL through MSR1 downregulation.

      (e) Accumulation of neutral lipid droplets contributing to amphotericin B resistance.

      Weaknesses:

      The weaknesses are minor:

      (1) The authors do not show how they ascertain that they have a purified fraction of the PV post-density gradient centrifugation.

      (2) The study could have benefited from a more detailed analysis of how lipid droplets physically interfere with amphotericin B access to parasites.

      We have addressed both these concerns as our preliminary response in details in subsequent “Recommendations for the Authors section” before we submit a complete Revised manuscript,

      Impact and significance:

      This work makes several fundamental advances:

      (1) The authors were able to show the link between antimony resistance and enhanced parasite proliferation.

      (2) They were also able to reveal how parasites can modify host cell metabolism to support their growth while avoiding inflammation.

      (3) They were able to show a certain mechanistic basis for emerging amphotericin B resistance.

      (4) They suggest therapeutic strategies combining lipid droplet inhibitors with current drugs.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      (1) Experimental suggestions:

      a) The authors could have provided a more detailed analysis of lipid droplet composition. This is a critically missing piece in this nice study.

      We completely agree with the reviewer on this, a more detailed analysis of lipid droplets composition, dynamics of its formation and mechanism of lipid transfer to amastigotes residing within the PV would be worthy of further investigation.  To answer the reviewers, we are already conducting investigation in this direction and have very promising initial results which we are willing to share with the reviewer as unpublished data if requested. Since, we plan to address these questions independently, we hope reviewer will understand our hesitation to include these data into the present work which is already immensely data dense. We sincerely believe existence of lipid droplet contact sites with the PV along with the specific lipid type transfer to amastigotes and its mechanism requires special attention and could stand out as an independent work by itself.

      b) The macrophages (PEC, KC) could have been treated with latex beads as a control, which would indicate that cholesterol and lipids are indeed utilized by the Leishmania parasitophorous vacuole (PV) and essential for its survival and proliferation.

      We thank the reviewer for this nice suggestion, which we believe will further strengthen the conclusion of this work. This has also been suggested by Reviewer 1 and we are planning to conduct this experiment and will include this data in the revised version of this manuscript.

      c) HMGCoA reductase is an important enzyme for the mevalonate pathway and cholesterol synthesis. The authors have not commented on this enzyme in either host or parasite. Additionally, western blots of these enzymes along with SREBP2 could have been performed.

      We appreciate the concern and do see the point why reviewer is suggesting this. We would like to mention that regarding HMGCoA we already do have real time qPCR data which perfectly aligns with our RNAseq data (Figure 6 Ai, in the Submitted manuscript), showing significant downregulation specifically in LD-R infected KC as compared to uninfected control. We are including this data as Author response image 6.  However, we did not proceed with checking the level of HMGCoA at the protein level as we noticed several previous reports have suggested that HMGCoA remains under transcriptional control of SERBP2(doi.org/10.1016/j.cmet.2011.03.005,doi: 10.1194/jlr.C066712,doi:10.1194/jlr.RA119000201), which acts the master regulator of mevalonate pathway and cholesterol synthesis (doi.org/10.1161/ATVBAHA.122.317320).  However, as suggested by the Reviewer, we will perform this experiment and will update the Revised manuscript with the expression data on HMGCoA probably in the Supplementary section

      Author response image 6.

      qPCR Analysis of HMGCR Expression Following Leishmania donovani Infection: Quantitative PCR analysis showing the relative expression of hmgcr (3-hydroxy-3-methylglutaryl-CoA reductase) in Kupffer cells after 24 hours of Leishmania donovani (LD) infection compared to uninfected control cells. Gene expression levels are normalized to β-actin as an internal control, and fold change is represented relative to the uninfected condition.

      d) The authors should discuss the expression pattern of any enzyme of the mevalonate pathway that they have found to be dysregulated in the transcript data.

      As per the reviewer’s suggestion, we have already looked into the RNA seq data and observed that apart from hmgcr, hmgcs (_3-hydroxy-3-methylglutaryl-CoA synthase), another key enzyme in the mevalonate pathway, is significantly downregulated in host PECs in response to LD-R infection compared to the LD-S infection.  We will update this in the Discussion section of the Revised manuscript, which will read as “Further analysis of RNA sequencing data revealed a significant downregulation of _hmgcs (3-hydroxy-3-methylglutaryl-CoA synthase) in LD-R infected PECs as compared to LD-S infecton. HMGCS which catalyzes the condensation of acetyl-CoA with acetoacetyl-CoA to form 3-hydroxy-3-methylglutaryl-CoA (HMG-CoA), which serves as an intermediate in both cholesterol biosynthesis and ketogenesis. The downregulation of hmgcs further supports our observation that LD-R-infected PECs preferentially rely on endocytosed low-density lipoprotein (LDL)-derived cholesterol rather than de novo synthesized cholesterol for their metabolic needs.”

      e) The authors have followed a previously published protocol by Real F (reference 73) to enrich for parasitophorous vacuole (PV). However, they do not show how they ascertain that they have a purified fraction of the PV post-density gradient centrifugation. The authors should at least show Western blot data for LAMP1 for different fractions of density gradient from which they enriched the PV.

      As we previously stated in our response to Reviewer 1, the Revised manuscript will include a detailed analysis of purity for different fractions during PV isolation. We sincerely appreciate the reviewer for highlighting this important concern and for suggesting an approach to conduct the experiment. We believe this experiment is crucial and will further reinforce the conclusions of our study.

      (2) Presentation improvements:

      a) Add a clear timeline for infection experiments.

      Sure. We will be including a schematic of Timelines in the revised figures 2 and 7

      b) Provide more details on patient sample collection and analysis.

      We plan to include more details on the sample collection in the Method section of the Revised manuscript as follows, “Blood samples were collected from a total of 22 individuals spanning a diverse age range (8 to 70 years) by RMRI, Bihar, India. Among these, nine samples were obtained from healthy individuals residing in endemic regions to serve as controls. Serum was isolated from each blood sample through centrifugation, and the lipid profile was subsequently analysed using a specialized diagnostic kit (Coral Clinical System) following the manufacturer's protocol.”

      c) Consider reorganizing figures to better separate mechanistic and clinical findings.

      We would like to thank the reviewer for this suggestion. However, we feel that the arrangement of the Figures as presented in the Original Submission is really helping a smooth flow of the story and hence, we would not want to disturb that. However, having said that, if the reviewer has specific suggestion regarding rearrangement of any particular figure, we will be happy to consider that.

      (3) Technical clarifications needed:

      a) Specify exact concentrations used for inhibitors.

      We apologise for this unwanted and unnecessary mistake. Please note we will clearly mention the concentration of all the inhibitors used in this study in Result section and in Revised Figure legends. The revised section will read as, “Finally, we infected the KCs with GFP expressing LD-R for 4Hrs, washed and allowed the infection to proceed in presence of fluorescent red-LDL and Latrunculin-A ( 5µM), a compound  which specifically inhibits fluid phase endocytosis by inducing actin depolymerization [41]. Real-time fluorescence tracking demonstrated that Latrunculin-A treatment not only prevented the uptake of fluorescent red-LDL but also severely impacted intracellular proliferation of LD-R amastigotes (Movie 2A and 2B and Figure 4E). In contrast, treatment with Cytochalasin-D (2.5µg/ml), which alters cellular F-actin organization but does not affect fluid phase endocytosis”

      b) Include more details on image analysis methods.

      Please note that in specific sections like in Line numbers 574-579, 653-658, 1047-1049 of the Submitted manuscript, we have put special attention in describing the Image analysis process. However, we agree that in some particular cases more details will be appreciated by the reader. Hence we will be including an additional section of Image Analysis in the Methods section of the revised manuscript. This section will read as, “Image processing and analysis were conducted using Fiji (ImageJ). For optimal visualization, Giemsa-stained macrophages (MΦs) were represented in grayscale to enhance contrast and structural clarity. To improve the distinction of different fluorescent signals, pseudo-colors were assigned to fluorescence images, ensuring better differentiation between various cellular components. For colocalization analysis (Figures 3, 5, 6, and S2), we utilized the RGB profile plot plugin in ImageJ, which allows for the precise assessment of signal overlap by generating fluorescence intensity profiles across selected regions of interest. This approach provided quantitative insights into the spatial relationship between labeled molecules within infected cells. Additionally, for analyzing the distribution of cofilin in Figure 4, the ImageJ surface plot plugin was employed. This tool enabled three-dimensional visualization of fluorescence intensity variations, facilitating a more detailed examination of cofilin localization and its potential reorganization in response to infection.”

      c) Clarify statistical analysis procedures.

      Response: We have already provided a dedicated section of Statistical Analysis in the Methods section and also have also shown the groups being compared to determine the statistical analysis in the Figure and in the Figure Legends of the Submitted manuscript. Furthermore, we plan to add additional clarification regarding the statistical analysis performed Revised manuscript. For example, in the Revised manuscript this section will read as, “All statistical analyses were performed using GraphPad Prism 8 on raw datasets to ensure robust and reproducible results. For datasets involving comparisons across multiple conditions, one-way or two-way analysis of variance (ANOVA) was conducted, followed by Tukey’s post hoc test to assess pairwise differences while controlling for multiple comparisons. A 95% confidence interval (CI) was applied to determine the statistical reliability of the observed differences. For non-parametric comparisons across multiple groups, Wilcoxon rank-sum tests were employed, maintaining a 95% confidence interval, which is particularly useful for analysing skewed data distributions. In cases where only two groups were compared, Student’s t-test was used to determine statistical significance, ensuring an accurate assessment of mean differences. All quantitative data are represented as mean ± standard error of the mean (SEM) to illustrate variability within experimental replicates. Statistical significance was determined at P ≤ 0.05. Notation for significance levels: *P ≤ 0.05; **P ≤ 0.001; ***P ≤ 0.0001.”

      (4) Minor corrections:

      a) Methods section could benefit from more details on Raman spectroscopy analysis.

      We agree with this suggestion of the Reviewer. For providing more clarity we will incorporate additional details in the Methodology for the Raman section of the Revised manuscript. The updated section will read as follows in the revised manuscript. “For confocal Raman spectroscopy, spectral data were acquired from individual cells at 1000× magnification using a 100 × 100 μm scanning area, following previously established specifications. After spectral acquisition, distinct Raman shifts corresponding to specific biomolecular signatures were extracted for further analysis. These included: Cholesterol (535–545 cm⁻¹), Nuclear components (780–790 cm⁻¹), Lipid structures (1262–1272 cm⁻¹), Fatty acids (1436–1446 cm⁻¹) Following spectral extraction, pseudo-color mapping was applied to highlight the spatial distribution of each biomolecular component within the cell. These processed spectral images are presented in Figure 3D1, where the first four panels illustrate the individual biomolecular distributions. A merged composite image was then generated to visualize the co-localization of these biomolecules within the cellular microenvironment, with the final panel specifically representing the spatial distribution of key biomolecules.”

      b) In the methods section line 609, page 14, the authors cite Real F protocol as reference 73 for PV enrichment. However, in the very next section on GC-MS analysis (lines 615-616, page 15), they state they have used reference 74 for PV enrichment. Can they explain why a discrepancy in PV isolation references this? Reference 74 does not mention anything related to PV isolation.

      We would like to sincerely apologise for this confusion which probably raised from our writing of this section. We would like to confirm that our PV isolation protocol is based on the published work of Real F protocol (reference 73). However, in the next section of the submitted manuscript, GC-MS analysis was described and that was performed based on protocol referenced in 74. In the Revised manuscript, we will avoid this confusion and made correction by putting the references in the proper places. Revised section will read as,

      “GC-MS analysis of LD-S and LD-R-PV

      Following a 24Hrs infection period, KCs were harvested, washed with phosphate-buffered saline (PBS), and pelleted. Subsequent to this, PV isolation was carried out using the previously described method [73]. The resulting parasitophorous vacuole (PV) pellet was processed for sterol isolation for GC_MS analysis following a previously established protocol [74], with slight modification. Briefly, the PV pellet was resuspended in 20 ml of dichloromethane:methanol (2:1, vol/vol) and incubated at 4°C for 24hours. After centrifugation (11,000 g, 1 hour, 4°C), the supernatant was checked through thin layer chromatography (TLC) and subsequently evaporated under vacuum. The residue and pellet were saponified with 30% potassium hydroxide (KOH) in methanol at 80°C for 2 hours. Sterols were extracted with n-hexane, evaporated, and dissolved in dichloromethane. A portion of the clear yellow sterol solution was treated with N, O-bis(trimethylsilyl)trifluoroacetamide (BSTFA) and heated at 80°C for 1 hour to form trimethylsilyl (TMS) ethers. Gas chromatography/mass spectrometry (GC/MS) analysis was performed using a Varian model 3400 chromatograph equipped with DB5 columns (methyl-phenylsiloxane ratio, 95/5; dimensions, 30 m by 0.25 mm). Helium was used as the gas carrier (1 ml/min). The column temperature was maintained at 270°C, with the injector and detector set at 300°C. A linear gradient from 150 to 180°C at 10°C/min was used for methyl esters, with MS conditions set at 280°C, 70 eV, and 2.2 kV.”

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public Review):

      In the manuscript by Su et al., the authors present a massively parallel reporter assay (MPRA) measuring the stability of in vitro transcribed mRNAs carrying wild-type or mutant 5' or 3' UTRs transfected into two different human cell lines. The goal presented at the beginning of the manuscript was to screen for effects of disease-associated point mutations on the stability of the reporter RNAs carrying partial human 5' or 3' UTRs. However, the majority of the manuscript is dedicated to identifying sequence components underlying the differential stability of reporter constructs. This shows that TA dinucleotides are the most predictive feature of RNA stability in both cell lines and both UTRs.

      The effect of AU rich elements (AREs) on RNA stability is well established in multiple systems, and the present study confirms this general trend but points out variability in the consequence of seemingly similar motifs on RNA stability. For example, the authors report that a long stretch of Us has extreme opposite effects on RNA stability depending on whether it is preceded by an A (strongly destabilizing) or followed by an A (strongly stabilizing). While the authors interpretation of a context- dependence of the effect is certainly well-founded, it seems counterintuitive that the preceding or following A would be the (only) determining factor. This points to a generally reductionist approach taken by the authors in the analysis of the data and in their attempt to dissect the contribution of "AU rich sequences" to RNA stability, with a general tendency to reduce the size and complexity of the features (e.g. to dinucleotides). While this certainly increases the statistical power of the analysis due to the number of occurrences of these motifs, it limits the interpretability of the results. How do TA dinucleotides per se contribute to destabilizing the RNA, both in 5' and 3' UTRs, but (according to limited data presented) not in coding sequences? What is the mechanism? RBPs binding to TA dinucleotide containing sequences are suggested to "mask" the destabilizing effect, thereby leading to a more stable RNA. Gain of TA dinucleotides is reported to have a destabilizing effect, but again no hypothesis is provided as to the underlying molecular mechanism. In addition to reducing the motif length to dinucleotides, the notion of "context dependence" is used in a very narrow sense; especially when focusing on simple and short motifs, a more extensive analysis of the interdependence of these features (beyond the existing analysis of the relationship between TA- diNTs and GC content) could potentially reveal more of the context dependence underlying the seemingly opposite behavior of very similar motifs.

      The contribution of coding region sequence to RNA stability has been extensively discussed (For example: doi.org/10.1016/j.molcel.2022.03.032; doi.org/10.1186/s13059-020-02251-5; doi.org/10.15252/embr.201948220; doi.org/10.1371/journal.pone.0228730; doi.org/10.7554/eLife.45396). While TA content at the third codon position (wobble position) has been implicated as a pro-degradation signal, codon optimality has emerged as the most prominent determinant for RNA stability. This indicates that the role of coding regions in RNA stability differs from that of UTRs due to the involvement of translation elongation. We did not intend to suggest that TA-dinucleotides in UTRs and coding regions have the same effect.

      We hypothesize that TA-dinucleotide may recruit endonucleases RNase A family, whose catalytic pockets exhibit a strong bias for TA dinucleotide (doi.org/10.1016/j.febslet.2010.04.018). Structures or protein bindings that blocks this recognition might stabilize RNAs. To gain further insight into the motif interactions, we plan to investigate the interactions between TA and other 15 dinucleotides through more detailed analyses.

      The present MPRAs measures the effect of UTR sequences in one specific reporter context and using one experimental approach (following the decay of in vitro transcribed and transfected RNAs). While this approach certainly has its merits compared to other approaches, it also comes with some caveats: RNA is delivered naked, without bound RBPs and no nuclear history, e.g. of splicing (no EJCs), editing and modifications. One way to assess the generalizability of the results as well as the context dependence of the effects is to perform the same analysis on existing datasets of RNA stability measurements obtained through other methods (e.g. transcription inhibition). Are TA dinucleotides universally the most predictive feature of RNA half-lives?

      Our system studies the stability control of RNA synthesized in vitro and delivered into human cells. While we did not intend to generalize our conclusions to endogenous RNAs, our approach contributes to the understanding of in vitro synthesized RNA used for cellular expression, such as in vaccines. It is known that endogenous RNAs undergo very different regulation. The most prominent factors controlling endogenous RNA stability are the density of splice junctions and the length of UTRs (doi.org/10.1186/s13059-022-02811-x; doi.org/10.1186/s12915-021-00949-x). To decipher the sequence regulation, these factors are controlled in our experiments. Therefore we do not expect the dinucleotide features found by our approach to be generalized as the most predictive feature of RNA half-life in vivo.

      The authors conclude their study with a meta-analysis of genes with increased TA dinucleotides in 5' and 3'UTRs, showing that specific functional groups are overrepresented among these genes. In addition, they provide evidence for an effect of disease-associated UTR mutations on endogenous RNA stability. While these elements link back to the original motivation of the study (screening for effects of point mutations in 5' and 3' UTRs), they provide only a limited amount of additional insights.

      We utilized the Taiwan Biobank to investigate whether mutations significantly affecting RNA stability also impact human biochemical measurements. Our findings indicate that these mutations indeed have a significant effect on various biochemical indices. This highlights the importance of our study, as it bridges basic science with potential applications in precision medicine. By linking specific UTR mutations with measurable changes in biochemical indices, our research underscores the potential for these findings to inform targeted medical interventions in the future.

      In summary, this manuscript presents an interesting addition to the long-standing attempts at dissecting the sequence basis of RNA stability in human cells. The analysis is in general very comprehensive and sound; however, at times the goal of the authors to find novelty and specificity in the data overshadows some analyses. One example is the case where the authors try to show that TA-dinucleotides and GC content are decoupled and not merely two sides of the same coin. They claim that the effect of TA dinucleotides is different between high- and low-GC content contexts but do not control for the fact that low GC-content regions naturally will contain more TA dinucleotides and therefore the effect sizes and the resulting correlation between TA-diNT rate and stability will be stronger (Fig. 5A). A more thorough analysis and greater caution in some of the claims could further improve the credibility of the conclusions.

      Low GC content implies a higher TA content but does not directly equate to a high TA-diNT rate. For instance, the sequence ATTGAACCTT has a lower GC content (0.3) compared to TATAGGCCGC (0.6), yet it also has a lower TA-diNT rate (0 vs. 0.22). To address this concern more rigorously, we performed a stratified analysis based on TA-diNT rate. As shown in our Fig. S7C, even after stratifying by TA-diNT rate (upper panel high TA-diNT rate / lower panel low TA-diNT rate), we still observe that the destabilizing effect of TA is stronger in the low GC content group.

      Reviewer #2 (Public Review):

      Summary of goals:

      Untranslated regions are key cis-regulatory elements that control mRNA stability, translation, and translocation. Through interactions with small RNAs and RNA binding proteins, UTRs form complex transcriptional circuitry that allows cells to fine-tune gene expression. Functional annotation of UTR variants has been very limited, and improvements could offer insights into disease relevant regulatory mechanisms. The goals were to advance our understanding of the determinants of UTR regulatory elements and characterize the effects of a set of "disease-relevant" UTR variants.

      Strengths:

      The use of a massively parallel reporter assay allowed for analysis of a substantial set (6,555 pairs) of 5' and 3' UTR fragments compiled from known disease associated variants. Two cell types were used.

      The findings confirm previous work about the importance of AREs, which helps show validity and adds some detailed comparisons of specific AU-rich motif effects in these two cell types.

      Using a Lasso regression, TA-dinucleotide content is identified as a strong regulator of RNA stability in a context dependent manner based on GC content and presence of RNA binding protein binding motifs. The findings have potential importance, drawing attention to a UTR feature that is not well characterized.

      The use of complementary datasets, including from half-life analyses of RNAs and from random sequence library MRPA's, is a useful addition and supports several important findings. The finding the TA dinucleotides have explanatory power separate from (and in some cases interacting with) GC content is valuable.

      The functional enrichment analysis suggests some new ideas about how UTRs may contribute to regulation of certain classes of genes.

      Weaknesses:

      It is difficult to understand how the calculations for half-life were performed. The sequencing approach measures the relative frequency of each sequence at each time point (less stable sequences become relatively less frequent after time 0, whereas more stable sequences become relatively more frequent after time 0). Since there is no discussion of whether the abundance of the transfected RNA population is referenced to some external standard (e.g., housekeeping RNAs), it is not clear how absolute (rather than relative) half-lives were determined.

      We estimated decay constant λ and half-life () by the following equations:

      where Ci(t) and Ci(t=0) are read count values of the ith replicate at time points  and  (see also Methods). The absolute abundance was not required for the half-life calculation.

      Fig. S1A and B are used to assess reproducibility. They show that read counts at a given time point correlate well across replicate experiments. However, this is not a good way to assess reproducibility or accuracy of the measurements of t1/2 are. (The major source of variability in read counts in these plots - especially at early time points - is likely the starting abundance of each RNA sequence, not stability.) This creates concerns about how well the method is measuring t1/2. Also creating concern is the observation that many RNAs are associated with half-lives that are much longer than the time points analyzed in the study. For example, based upon Figure S1 and Table S1 correctly, the median t1/2 for the 5' UTR library in HEK cells appears to be >700 minutes. Given that RNA was collected at 30, 75, and 120 minutes, accurate measurements of RNAs with such long half lives would seem to be very difficult.

      We estimated the half-life based on the following equations:

      Where Ci(t) and Ci(t=0) are read count values of the ith replicate at time points  and  (see also Methods). The calculation of the half-life involves first determining the decay constant 𝜆, which represents a constant rate of decay. Since 𝜆 is a constant, it is possible to accurately calculate it without needing data over the entire decay range. Our experimental design considers this by selecting appropriate time points to ensure a reliable estimation of 𝜆, and thus, the half-life. To determine the most suitable time points, we conducted preliminary experiments using RT-PCR. These experiments indicated that 30, 75, and 120 minutes provided an effective range for capturing the decay dynamics of the transcripts.

      There is no direct comparison of t1/2 between the two cell types studied for the full set of sequences studied. This would be helpful in understanding whether the regulatory effects of UTRs are generally similar across cell lines (as has been shown in some previous studies) or whether there are fundamental differences. The distribution of t1/2's is clearly quite different in the two cell lines, but it is important to know if this reflects generally slow RNA turnover in HEK cells or whether there are a large number of sequence-specific effects on stability between cell lines. A related issue is that it is not clear whether the relatively small number of significant variant effects detected in HEK cells versus SH-SY5Y cells is attributable to real biological differences between cell types or to technical issues (many fewer read counts and much longer half lives in HEK cells).

      For both cell lines, we selected oligonucleotides with R2 > 0.5 and mean squared error (MSE) < 1 for analysis when estimating half-life (λ) by linear regression. This selection criterion was implemented to minimize the effect of experimental noise. Additionally, we will further analyze the MSE distribution to determine if the two cell lines exhibit significantly different levels of experimental noise. We will also provide a direct comparison of half-lives between the two cell lines to assess the similarity in stability regulation.

      The general assertion is made in many places that TA dinucleotides are the most prominent destabilizing element in UTRs (e.g., in the title, the abstract, Fig. 4 legend, and on p. 12). This appears to be true for only one of the two cell lines tested based on Fig. 3.

      TA-dinucleotides and other TA-rich sequences exhibit similar effects on RNA stability, as illustrated in Fig. S5A-C. In two cell lines, TA-dinucleotide and WWWWWW sequences were representatives of the same stability-affecting cluster. While the impact of TA-dinucleotides can be generalized, we will rephrase some statements for clarification to avoid any potential misunderstanding.

      Appraisal and impact:

      The work adds to existing studies that previously identified sequence features, including AREs and other RNA binding protein motifs, that regulate stability and puts a new emphasis on the role of "TA" (better "UA") dinucleotides. It is not clear how potential problems with the RNA stability measurements discussed above might influence the overall conclusions, which may limit the impact unless these can be addressed.

      It is difficult to understand whether the importance of TA dinucleotides is best explained by their occurrence in a related set of longer RBP binding motifs (see Fig 5J, these motifs may be encompassed by the "WWWWWW cluster") or whether some other explanation applies. Further discussion of this would be helpful. Does the LASSO method tend to collapse a more diverse set of longer motifs that are each relatively rare compared to the dinucleotide? It remains unclear whether TA dinucleotides are associated with less stability independent of the presence of the known larger WWWWWWW motif. As noted above, the importance of TA dinucleotides in the HEK experiments appears to be less than is implied in the text.

      To ensure the representativeness of the features entered into the LASSO model, we pre-selected those with an occurrence greater than 10% among all UTRs. There is no evidence to support a preference for dinucleotides by LASSO. To address whether the destabilizing effect of TA dinucleotides is part of the broader WWWWWW motif, we will divide TA dinucleotides into two groups: those within the WWWWWW motif and those outside of it. We will then examine whether TA dinucleotides in these two groups exhibit the same destabilizing effect.

      The inclusion of more than a single cell type is an acknowledgement of the importance of evaluating cell type-specific effects. The work suggests a number of cell type-specific differences, but due to technical issues (especially with the HEK data, as outlined above) and the use of only two cell lines, it is difficult to understand cell type effects from the work.

      The inclusion of both 3' and 5' UTR sequences distinguishes this work from most prior studies in the field. Contrasting the effects of these regions on stability is of interest, although the role of these UTRs (especially the 5' UTR) in translational regulation is not assessed here.

      We examined the role of UTR and UTR variants in translation regulation using polysome profiling. By both univariate analysis and an elastic regression model, we identified motifs of short repeated sequences, including SRSF2 binding sites, as mutation hotspots that lead to aberrant translation. Furthermore, these polysome-shifting mutations had a considerable impact on RNA secondary structures, particularly in upstream AUG-containing 5’ UTRs. Integrating these features, our model achieved high accuracy (AUROC > 0.8) in predicting polysome-shifting mutations in the test dataset. Additionally, metagene analysis indicated that pathogenic variants were enriched at the upstream open reading frame (uORF) translation start site, suggesting changes in uORF usage underlie the translation deficiencies caused by these mutations. Illustrating this, we demonstrated that a pathogenic mutation in the IRF6 5’ UTR suppresses translation of the primary open reading frame by creating a uORF. Remarkably, site-directed ADAR editing of the mutant mRNA rescued this translation deficiency. Because the regulation of translation and stability does not converge, we illustrate these two mechanisms in two separate manuscripts (this one and doi.org/10.1101/2024.04.11.589132).

      Reviewer #3 (Public Review):

      Summary:

      In their manuscript titled "Multiplexed Assays of Human Disease‐relevant Mutations Reveal UTR Dinucleotide Composition as a Major Determinant of RNA Stability" the authors aim to investigate

      the effect of sequence variations in 3'UTR and 5'UTRs on the stability of mRNAs in two different human cell lines.

      To do so, the authors use a massively parallel reporter assay (MPRA). They transfect cells with a set of mRNA reporters that contain sequence variants in their 3' or 5' UTRs, which were previously reported in human diseases. They follow their clearance from cells over time relative to the matching non-variant sequence. To analyze their results, they define a set of factors (RBP and miRNA binding sites, sequence features, secondary structure etc.) and test their association with differences in mRNA stability. For features with a significant association, they use clustering to select a subset of factors for LASSO regression and identify factors that affect mRNA stability.

      They conclude that the TA dinucleotide content of UTRs is the strongest destabilizing sequence feature. Within that context, elevated GC content and protein binding can protect susceptible mRNAs from degradation. They also show that TA dinucleotide content of UTRs affects native mRNA stability, and that it is associated with specific functional groups. Finally, they link disease associated sequence variants with differences in mRNA stability of reporters.

      Strengths:

      (1) This work introduces a different MPRA approach to analyze the effect of genetic variants. While previous works in tissue culture use DNA transfections that require normalization for transcription efficiency, here the mRNA is directly introduced into cells at fixed amounts, allowing a more direct view of the mRNA regulation.

      (2) The authors also introduce a unique analysis approach, which takes into account multiple factors that might affect mRNA stability. This approach allows them to identify general sequence features that affect mRNA stability beyond specific genetic variants, and reach important insights on mRNA stability regulation. Indeed, while the conclusions to genetic variants identified in this work are interesting, the main strength of the work involve general effect of sequence features rather than specific variants.

      (3) The authors provide adequate supports for their claims, and validate their analysis using both their reporter data and native genes. For the main feature identified, TA di-nucleotides, they perform follow-up experiments with modified reporters that further strengthen their claims, and also validate the effect on native cellular transcripts (beyond reporters), demonstrating its validity also within native scenarios.

      (4) The work provides a broad analysis of mRNA stability, across two mRNA regulatory segments (3'UTR and 5'UTR) and is performed in two separate cell-types. Comparison between two different cell-types is adequate, and the results demonstrate, as expected, the dependence of mRNA stability on the cellular context. Analysis of 3'UTR and 5'UTR regulatory effects also shows interesting differences and similarities between these two regulatory regions.

      Weaknesses:

      (1) The authors fail to acknowledge several possible confounding factors of their MPRA approach in the discussion.

      First, while transfection of mRNA directly into cells allows to avoid the need to normalize for differences in transcription, the introduction of naked mRNA molecules is different than native cellular mRNAs and could introduce biases due to differences in mRNA modifications, protein associations etc. that may occur co-transcriptionally.

      Second, along those lines, the authors also use in-vitro polyadenylation. The length of the polyA tail of the transfected transcripts could potentially be very different than that of native mRNAs and also affect stability.

      The transcripts used in our study were polyadenylated in vitro with approximately 100 nucleotides  (Fig. S1C), similar to the polyA tail lengths typically observed in vivo  (dx.doi.org/10.1016/j.molcel.2014.02.007).  Additionally, these transcripts were capped to emulate essential mRNA characteristics and to minimize immune responses in recipient cells. This design allows us to study RNA decay for in vitro-synthesized RNA delivered into human cells, akin to RNA vaccines, but it does not necessarily extend to endogenous RNAs. As mentioned, endogenous RNAs undergo nuclear processing and are decorated by numerous trans factors, resulting in distinct regulatory mechanisms. We will provide a more in-depth discussion on these differences and their implications in the revised manuscript.

      (2) The analysis approach used in this work for identifying regulatory features in UTRs was not previously used. As such, lack of in-depth details of the methodology, and possibly also more general validation of the approach, is a drawback in convincing the reader in the validity of this approach and its results.

      In particular, a main point that is not addressed is how the authors decide on the set of "factors" used in their analysis? As choosing different sets of factors might affect the results of the analysis.

      In our study, we employed the calculation of the Variance Inflation Factor (VIF) as a basis for selecting variables. This well-established method is widely used to detect variables with high collinearity, thus ensuring the robustness and reliability of our analysis. By identifying and excluding highly collinear variables, we aimed to minimize multicollinearity and improve the accuracy of our regression models. For more detailed information on the use of VIF in regression analysis, please refer to Akinwande, M., Dikko, H., and Samson, A. (2015). Variance Inflation Factor: As a Condition for the Inclusion of Suppressor Variable(s) in Regression Analysis. Open Journal of Statistics, 5, 754-767. doi: 10.4236/ojs.2015.57075. We will include the method details in the revised manuscript.

      For example, the choice to use 7-mer sequences within the factors set is not explained, particularly when almost all motifs that are eventually identified (Figure 3B-E) are shorter.

      The known RBP motifs are primarily 6-mer. To explore the possibility of discovering novel motifs that could significantly impact our model, we started with 7-mer sequences. However, our analysis revealed that including these additional variables did not improve the explanatory power of the model; instead, it reduced it. Consequently, our final model focuses on motifs shorter than 7-mer. We will explain the motif selections in the revised manuscript.

      In addition, the authors do not perform validations to demonstrate the validity of their approach on simulated data or well-established control datasets. Such analysis would be helpful to further convince the reader in the usefulness and robustness of the analysis.

      We acknowledge the importance of validating our approach on simulated data or well-established control datasets to demonstrate its robustness and reliability. However, to the best of our knowledge, there are currently no well-established control datasets available that perfectly correspond to our specific study context. Despite this, we will continue to search for any relevant datasets that could be utilized for this purpose in future work. This effort will help to further reinforce the confidence in our methodology and its findings.

      (3) The analysis and regression models built in this work are not thoroughly investigated relative to native genes within cells. The effect of sequence "factors" on native cellular transcripts' stability is not investigated beyond TA di-nucleotides, and it is unclear to what degree do other predicted factors also affect native transcripts.

      Our system studies the stability control of RNA synthesized in vitro and delivered into human cells. While we validated the UTR TA-dinucleotide effect in vivo, we did not intend to conclude that this is the most influential regulation for endogenous RNAs. It is known that endogenous RNAs undergo very different regulation. The most prominent factors controlling endogenous RNA stability are the density of splice junctions and the length of UTRs (doi.org/10.1186/s13059-022-02811-x; doi.org/10.1186/s12915-021-00949-x). To decipher the sequence regulation, we controlled for these factors in our experiments. Therefore, we acknowledge that several endogenous features, which were excluded by our approach, may serve as predictive features of RNA half-life in vivo.

    1. Author Response

      Reviewer #1 (Public Review):

      Summary:

      The Roco proteins are a family of GTPases characterized by the conserved presence of an ROC-COR tandem domain. How GTP binding alters the structure and activity of Roco proteins remains unclear. In this study, Galicia C et al. took advantage of conformation-specific nanobodies to trap CtRoco, a bacterial Roco, in an active monomeric state and determined its high-resolution structure by cryo-EM. This study, in combination with the previous inactive dimeric CtRoco, revealed the molecular basis of CtRoco activation through GTP-binding and dimer-to-monomer transition.

      Strengths:

      The reviewer is impressed by the authors' deep understanding of the CtRoco protein. Capturing Roco proteins in a GTP-bound state is a major breakthrough in the mechanistic understanding of the activation mechanism of Roco proteins and shows similarity with the activation mechanism of LRRK2, a key molecule in Parkinson's disease. Furthermore, the methodology the authors used in this manuscript - using conformation-specific nanobodies to trap the active conformation, which is otherwise flexible and resistant to single-particle average - is highly valuable and inspiring.

      Weakness:

      Though written with good clarity, the paper will benefit from some clarifications.

      1) The angular distribution of particles for the 3D reconstructions should be provided (Figure 1 - Sup. 1 & Sup. 2).

      The supplementary figures will be adapted to include particle distribution plots.

      2) The B-factors for protein and ligand of the model, Map sharpening factor, and molprobity score should be provided (Table 1).

      The map used to interpret the model was post-processed by density modification, therefore no sharpening factor was obtained. This information will be included in Table 1, together with B-factors and molprobity scores.

      3) A supplemental Figure to Figure 2B, illustrating how a0-helix interacts with COR-A&LRR before and after GTP binding in atomic details, will be helpful for the readers to understand the critical role of a0-helix during CtRoco activation.

      A supplemental figure will be prepared to illustrate this in the revised document.

      4) For the following statement, "On the other hand, only relatively small changes are observed in the orientation of the Roc a3 helix. This helix, which was previously suggested to be an important element in the activation of LRRK2 (Kalogeropulou et al., 2022), is located at the interface of the Roc and CORB domains and harbors the residues H554 and Y558, orthologous to the LRRK2 PD mutation sites N1337 and R1441, respectively."

      It is not surprising the a3-helix of the ROC domain only has small changes when the ROC domain is aligned (Figure 2E). However, in the study by Zhu et al (DOI: 10.1126/science.adi9926), it was shown that a3-helix has a "see-saw" motion when the COR-B domain is aligned. Is this motion conserved in CtRoco from inactive to active state?

      We indeed describe the conformational changes from the perspective of the Roc domain. When using the COR-B domain for structural alignment, a rotational movement of Roc (including a “seesaw”-like movement of the α3-helix helix around His554) with respect to COR-B is correspondingly observed. We will include this in the revised document.

      5) A supplemental figure showing the positions of and distances between NbRoco1 K91 and Roc K443, K583, and K611 would help the following statement. "Also multiple crosslinks between the Nbs and CtRoco, as well as between both nanobodies were found. ... NbRoco1-K69 also forms crosslinks with two lysines within the Roc domain (K583 and K611), and NbRoco1-K91 is crosslinked to K583".

      A provisional figure displaying these crosslinks is already provided below, and we will also consider including this in the revised manuscript. However, in interpreting these crosslinks it should be taken into consideration that the additive length of the DSSO spacer and the lysine side chains leads to a theoretical upper limit of ∼26 Å for the distance between the α carbon atoms of cross-linked lysines (and even a cut-off distance of 35 Å when taking into account protein dynamics).

      Author response image 1.

      6) It would be informative to show the position of CtRoco-L487 in the NF and GTP-bound state and comment on why this mutation favors GTP hydrolysis.

      We will create an additional figure showing the position of L487, and discuss possible mechanisms for the observed effect of a mutation on GTPase activity.

      Reviewer #2 (Public Review):

      Summary

      The manuscript by Galicia et al describes the structure of the bacterial GTPyS-bound CtRoco protein in the presence of nanobodies. The major relevance of this study is in the fact that the CtRoco protein is a homolog of the human LRRK2 protein with mutations that are associated with Parkinson's disease. The structure and activation mechanisms of these proteins are very complex and not well understood. Especially lacking is a structure of the protein in the GTP-bound state. Previously the authors have shown that two conformational nanobodies can be used to bring/stabilize the protein in a monomer-GTPyS-bound state. In this manuscript, the authors use these nanobodies to obtain the GTPyS-bound structure and importantly discuss their results in the context of the mammalian LRRK2 activation mechanism and mutations leading to Parkinson's disease. The work is well performed and clearly described. In general, the conclusions on the structure are reasonable and well-discussed in the context of the LRRK2 activation mechanism.

      Strengths:

      The strong points are the innovative use of nanobodies to stabilize the otherwise flexible protein and the new GTPyS-bound structure that helps enormously in understanding the activation cycle of these proteins.

      Weakness:

      The strong point of the use of nanobodies is also a potential weak point; these nanobodies may have induced some conformational changes in a part of the protein that will not be present in a GTPyS-bound protein in the absence of nanobodies.

      Two major points need further attention.

      1) Several parts of the protein are very flexible during the monomer-dimer activity cycle. This flexibility is crucial for protein function, but obviously hampers structure resolution. Forced experiments to reduce flexibility may allow better structure resolution, but at the same time may impede the activation cycle. Therefore, careful experiments and interpretation are very critical for this type of work. This especially relates to the influence of the nanobodies on the structure that may not occur during the "normal" monomer-dimer activation cycle in the absence of the nanobodies (see also point 2). So what is the evidence that the nanobody-bound GTPyS-bound state is biochemically a reliable representative of the "normal" GTP-bound state in the absence of nanobodies, and therefore the obtained structure can be confidentially used to interpret the activation mechanism as done in the manuscript.

      See below for an answer to remark 1 and 2.

      2) The obtained structure with two nanobodies reveals that the nanobodies NbRoco1 and NbRoco2 bind to parts of the protein by which a dimer is impossible, respectively to a0-helix of the linker between Roc-COR and LRR, and to the cavity of the LRR that in the dimer binds to the dimerizing domain CORB. It is likely the open monomer GTP-bound structure is recognized by the nanobodies in the camelid, suggesting that overall the open monomer structure is a true GTP-bound state. However, it is also likely that the binding energy of the nanobody is used to stabilize the monomer structure. It is not automatically obvious that in the details the obtained nonobody-Roco-GTPyS structure will be identical to the "normal" Roco-GTPyS structure. What is the influence of nanobody-binding on the conformation of the domains where they bind; the binding energy may be used to stabilize a conformation that is not present in the absence of the nanobody. For instance, NbRoco1 binds to the a0 helix of the linker; what is here the "normal" active state of the Roco protein, and is e.g. the angle between RocCOR and LRR also rotated by 135 degrees? Furthermore, nanobody NbRoco2 in the LRR domain is expected to stabilize the LRR domain; it may allow a position of the LRR domain relative to the rest of the protein that is not present without nanobody in the LRR domain. I am convinced that the observed open structure is a correct representation of the active state, but many important details have to be supported by e,g, their CX-MS experiments, and in the end probably need confirmation by more structures of other active Roco proteins or confirmation by a more dynamic sampling of the active states by e.g. molecular dynamics or NMR.

      Recently, nanobodies have increasingly been used successfully to obtain structural insights in protein conformational states (reviewed in Uchański et al, Curr. Opin. Struc. Biol. 2020). As reviewer # 2 points out, the concern is sometimes raised that antibodies could distort a protein into non-native conformations. Here, it is important to note that the nanobodies were raised by immunizing a llama with the fully native CtRoco protein bound to a non-hydrolysable GTP analogue, after which the nanobodies were selected by phage display using the same fully native and functional form of the protein. As clearly explained in Manglik et al. Annu Rev Pharmacol Toxicol. 2017, the probability of an in vivo matured nanobody inducing a non-native conformation of the antigen is low, although it is possible that it selects a high-energy, low-population conformation of a dynamic protein. Immature B cells require engagement of displayed antibodies with antigen to proliferate and differentiate during clonal selection. Antibodies that induce non-native conformations of the antigen pay a substantial energetic penalty in this process, and B cell clones displaying such antibodies will have a significantly lower probability of proliferation and differentiation into mature antibody-secreting B lymphocytes. Hence, many recent experiments and observation give credence to the notion that nanobodies bind antigens primarily by conformational selection and not induced fit (e.g. Smirnova et al. PNAS 2015).

      Extrapolated to the case of CtRoco, which is clearly very flexible in its GTP-bound form, this means that the nanobodies are able to trap and stabilize one conformational state that is representative of the “active state” ensemble of the protein. In this respect, it is clear from our experiments (XL-MS, affinity and effect on GTPase activity) that the effects of NbRoco1 and NbRoco2 are additive (or even cooperative), meaning that both nanobodies recognize different features of the same CtRoco “active state”. Correspondingly, the monomeric, elongated “open” conformation is also observed in the structure of CtRoco bound to NbRoco1 only (Figure1 - supplement 2), albeit that this structure still displays more flexibility. The monomerization and conformational changes that we observe and describe in the current paper at high resolution are also in very good agreement with earlier observations for CtRoco in the GTP-bound form in absence of any nanobodies, including negative stain EM (Deyaert et al. Nature Commun, 2017), hydrogen-deuterium exchange experiments (Deyaert et al. Biochem. J. 2019) and native MS (Leemans et al. Biochem J. 2020).

      In the revised document we will include some additional text to address and clarify these aspects.

    1. Author response:

      eLife Assessment

      This study provides valuable insights into the behavioral, computational, and neural mechanisms of regime shift detection, by identifying distinct roles for the frontoparietal network and ventromedial prefrontal cortex in sensitivity to signal diagnosticity and transition probabilities, respectively. The findings are supported by solid evidence, including an innovative task design, robust behavioral modeling, and well-executed model-based fMRI analyses, though claims of neural selectivity would benefit from more rigorous statistical comparisons. Overall, this work advances our understanding of how humans adapt belief updating in dynamic environments and offers a framework for exploring biases in decision-making under uncertainty.

      Thank you for reviewing our manuscript. We appreciate the editors’ assessment and the reviewers’ constructive comments. Below we address the reviewers’ comments. In particular, we addressed Reviewer 1’s comments on (1) neural selectivity by performing statistical comparisons and (2) parameter estimation by providing more details on how the system-neglect model was parameterized. We addressed Reviewer 2’s comments on (1) our neuroimaging results regarding frontoparietal network and (2) model comparisons.  

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The study examines human biases in a regime-change task, in which participants have to report the probability of a regime change in the face of noisy data. The behavioral results indicate that humans display systematic biases, in particular, overreaction in stable but noisy environments and underreaction in volatile settings with more certain signals. fMRI results suggest that a frontoparietal brain network is selectively involved in representing subjective sensitivity to noise, while the vmPFC selectively represents sensitivity to the rate of change.

      Strengths:

      (1) The study relies on a task that measures regime-change detection primarily based on descriptive information about the noisiness and rate of change. This distinguishes the study from prior work using reversal-learning or change-point tasks in which participants are required to learn these parameters from experiences. The authors discuss these differences comprehensively.

      Thank you for recognizing our contribution to the regime-change detection literature and our effort in discussing our findings in relation to the experience-based paradigms.

      (2) The study uses a simple Bayes-optimal model combined with model fitting, which seems to describe the data well.

      Thank you for recognizing the contribution of our Bayesian framework and system-neglect model.

      (3) The authors apply model-based fMRI analyses that provide a close link to behavioral results, offering an elegant way to examine individual biases.

      Thank you for recognizing our execution of model-based fMRI analyses and effort in using those analyses to link with behavioral biases.

      Weaknesses:

      My major concern is about the correlational analysis in the section "Under- and overreactions are associated with selectivity and sensitivity of neural responses to system parameters", shown in Figures 5c and d (and similarly in Figure 6). The authors argue that a frontoparietal network selectively represents sensitivity to signal diagnosticity, while the vmPFC selectively represents transition probabilities. This claim is based on separate correlational analyses for red and blue across different brain areas. The authors interpret the finding of a significant correlation in one case (blue) and an insignificant correlation (red) as evidence of a difference in correlations (between blue and red) but don't test this directly. This has been referred to as the "interaction fallacy" (Niewenhuis et al., 2011; Makin & Orban de Xivry 2019). Not directly testing the difference in correlations (but only the differences to zero for each case) can lead to wrong conclusions. For example, in Figure 5c, the correlation for red is r = 0.32 (not significantly different from zero) and r = 0.48 (different from zero). However, the difference between the two is 0.1, and it is likely that this difference itself is not significant. From a statistical perspective, this corresponds to an interaction effect that has to be tested directly. It is my understanding that analyses in Figure 6 follow the same approach.

      Relevant literature on this point is:

      Nieuwenhuis, S, Forstmann, B & Wagenmakers, EJ (2011). Erroneous analyses of interactions in neuroscience: a problem of significance. Nat Neurosci 14, 1105-1107. https://doi.org/10.1038/nn.2886

      Makin TR, Orban de Xivry, JJ (2019). Science Forum: Ten common statistical mistakes to watch out for when writing or reviewing a manuscript. eLife 8:e48175. https://doi.org/10.7554/eLife.48175

      There is also a blog post on simulation-based comparisons, which the authors could check out: https://garstats.wordpress.com/2017/03/01/comp2dcorr/

      I recommend that the authors carefully consider what approach works best for their purposes. It is sometimes recommended to directly compare correlations based on Monte-Carlo simulations (cf Makin & Orban). It might also be appropriate to run a regression with the dependent variable brain activity (Y) and predictors brain area (X) and the model-based term of interest (Z). In this case, they could include an interaction term in the model:

      Y = \beta_0 + \beta_1 \cdot X + \beta_2 \cdot Z + \beta_3 \cdot X \cdot Z

      The interaction term reflects if the relationship between the model term Z and brain activity Y is conditional on the brain area of interest X.

      Thank you for this great suggestion. We tested the difference in correlation both parametrically and nonparametrically. Their results were identical. In our parametric test, we used the Fisher z transformation to transform the difference in correlation coefficients to the z statistic (Fisher, 1921). That is, for two correlation coefficients, r<sub>blue</sub> (the correlation between behavioral slope, and neural slope estimated at change-consistent signals; sample size n<sub>blue</sub>) and  r<sub>red</sub>, (the correlation between behavioral slope, and neural slope estimated at change-consistent signals; sample size n<sub>red</sub>), the z statistic of the difference in correlation is given by

      We found that among the five ROIs in the frontoparietal network, two of them, namely the left IFG and left IPS, the difference in correlation was significant (one-tailed z test; left IFG: z=1.8355, p=0.0332; left IPS: z=2.3782, p=0.0087). For the remaining three ROIs, the difference in correlation was not significant (dmPFC: z=0.7594, p=0.2238 ; right IFG: z=0.9068, p=0.1822; right IPS: z=1.3764, p=0.0843). We chose one-tailed test because we already know the correlation under the blue signals was significantly greater than 0. Hence the alternative hypothesis is that r<sub>blue</sub>r<sub>red</sub>>0.

      In our nonparametric test, we performed nonparametric bootstrapping to test for the difference in correlation. That is, we resampled with replacement the dataset (subject-wise) and used the resampled dataset to compute the difference in correlation. We then repeated the above for 100,000 times so as to obtain the distribution of the correlation difference. We then tested for significance and estimated p-value based on this distribution. Consistent with our parametric tests, here we also found that the difference in correlation was significant in left IFG and left IPS (left IFG: r<sub>blue</sub>r<sub>red</sub>=0.46, p=0.0496; left IPS: r<sub>blue</sub>r<sub>red</sub>=0.5306, p=0.0041), but was not significant in dmPFC, right IFG, and right IPS (dmPFC: r<sub>blue</sub>r<sub>red</sub>=0.1634, p=0.1919; right IFG: r<sub>blue</sub>r<sub>red</sub>=0.2123, p=0.1681; right IPS: r<sub>blue</sub>r<sub>red</sub>=0.3434, p=0.0631).

      We will update these results in the revised manuscript. In summary, we found that the left IFG and left IPS in the frontoparietal network differentially responded to signals consistent with change (blue signals) compared with signals inconsistent with change (red signals). First, the neural sensitivity to signal diagnosticity measured when signals consistent with change appeared (blue signals) significantly correlated with individual subjects’ behavioral sensitivity to signal diagnosticity (r<sub>blue</sub>). By contrast, neural sensitivity to signal diagnosticity measured when signals inconsistent with change appeared did not significantly correlate with behavioral sensitivity (r<sub>red</sub>). Second, the difference in correlation, r<sub>blue</sub>r<sub>red</sub>, was statistically significant between correlation obtained at signals consistent with change and correlation obtained at signals inconsistent with change.

      Another potential concern is that some important details about the parameter estimation for the system-neglect model are missing. In the respective section in the methods, the authors mention a nonlinear regression using Matlab's "fitnlm" function, but it remains unclear how the model was parameterized exactly. In particular, what are the properties of this nonlinear function, and what are the assumptions about the subject's motor noise? I could imagine that by using the inbuild function, the assumption was that residuals are Gaussian and homoscedastic, but it is possible that the assumption of homoscedasticity is violated, and residuals are systematically larger around p=0.5 compared to p=0 and p=1. Relatedly, in the parameter recovery analyses, the authors assume different levels of motor noise. Are these values representative of empirical values?

      We thank the reviewer for this excellent point. The reviewer touched on model parameterization, assumption of noise, and parameter recovery analysis, which we answered below.

      On our model was parameterized

      We parameterized the model according to the system-neglect model in Eq. (2) and estimated the alpha parameter separately for each level of transition probability and the beta parameter separately for each level of signal diagnosticity. As a result, we had a total of 6 parameters (3 alpha and 3 beta parameters) in the model. The system-neglect model is then called by fitnlm so that these parameters can be estimated. The term ‘nonlinear’ regression in fitnlm refers to the fact that you can specify any model (in our case the system-neglect model) and estimate its parameters when calling this function. In our use of fitnlm, we assume that the noise is Gaussian and homoscedastic (the default option).

      On the assumptions about subject’s motor noise

      We wish to emphasize that we did not call the noise ‘motor’ because it can be estimation noise as well. Regardless, in the context of fitnlm, we assume that the noise is Gaussian and homoscedastic.

      On the possibility that homoscedasticity is violated

      In the revision, we plan to examine this possibility (residuals larger around p=0.5 compared with p=0 and p=1).

      On whether the noise levels in parameter recovery analysis are representative of empirical values

      To address the reviewer’s question, we conducted a new analysis using maximum likelihood estimation to estimate the noise level of each individual subject. We proceeded in the following steps. First, for each subject separately, we used the parameter estimates of the system-neglect model to compute the period-wise probability estimates of regime shift. As a reminder, we referred to a ‘period’ as the time when a new signal appeared during a trial (for a given transition probability and signal diagnosticity). Each trial consisted of 10 successive periods. Second, we computed the period-wise likelihood, the probability of observing the subject’s actual probability estimate given the probability estimate predicted by the system-neglect model and the noise level. Here we define noise as the standard deviation of a Gaussian distribution centered at the model-predicted probability estimate. We then summed over all periods the negative logarithm of likelihood and used MATLAB’s minimization algorithm (the ‘fmincon’ function) to obtain the noise estimate that minimized the sum of negative log likelihood (thus the noise estimate that maximized the sum of log likelihood). Across subjects, we found that the mean noise estimate was 0.1480 and ranged from 0.0816 to 0.3239. The noise estimate of each subject can be seen in the figure below.

      Author response image 1.

      Compared with our original parameter recovery analysis where the maximum noise level was set at 0.1, our data indicated that some subjects’ noise was larger than this value. Therefore, we expanded our parameter recovery analysis to include noise levels beyond 0.1 to up to 0.3. We found good parameter recovery across different levels of noise, with the Pearson correlation coefficient between the input parameter values used to simulate data and the estimated parameter values greater 0.95 (Supplementary Fig. S3). The results will be updated in Supplementary Fig. S3.

      Author response image 2.

      Parameter recovery. We simulated probability estimates according to the system-neglect model. We used each subject’s parameter estimates as our choice of parameter values used in the simulation. Using simulated data, we estimated the parameters (𝛼 and 𝛽) in the system-neglect model. To examine parameter recovery, we plotted the parameter values we used to simulate the data against the parameter estimates we obtained based on simulated data and computed their Pearson correlation. Further, we added different levels of Gaussian white noise with standard deviation 𝜎 = 0.01, 0.05, 0.1,0.2, 0.3 to the simulated data to examine parameter recovery and show the results respectively in Fig. A, B, C, D, and E. For each noise level, we show the parameter estimates in the left two graphs. In the right two graphs, we plot the parameter estimates based on simulated data against the parameter values used to simulate the data. A. Noise 𝜎 = 0.01. B. Noise 𝜎 = 0.05. C. Noise 𝜎 = 0.1. D. Noise 𝜎 = 0.2. E. Noise 𝜎 = 0.3.

      We will update the parameter recovery section (p. 44) and Supplementary Figure S3 to incorporate these new results:

      “We implemented 5 levels of noise with σ={0.01,0.05,0.1,0.2,0.3} and examined the impact of noise on parameter recovery for each level of noise. These noise levels covered the range of empirical noise levels we estimated from the subjects. To estimate each subject’s noise level, we carried out maximum likelihood estimation in the following steps. First, for each subject separately, we used the parameter estimates of the system-neglect model to compute the period-wise probability estimates of regime shift. As a reminder, we referred to a ‘period’ as the time when a new signal appeared during a trial (for a given transition probability and signal diagnosticity). Each trial consisted of 10 successive periods. Second, we computed the period-wise likelihood, the probability of observing the subject’s actual probability estimate given the probability estimate predicted by the system-neglect model and the noise level. Here we define noise as the standard deviation of a Gaussian distribution centered at the model-predicted probability estimate. We then summed over all periods the negative natural logarithm of likelihood and used MATLAB’s minimization algorithm (the ‘fmincon’ function) to obtain the noise estimate that minimized the sum of negative log likelihood (thus the noise estimate that maximized the sum of log likelihood). Across subjects, we found that the mean noise estimate was 0.1480 and ranged from 0.0816 to 0.3239 (Supplementary Figure S3).”

      The main study is based on N=30 subjects, as are the two control studies. Since this work is about individual differences (in particular w.r.t. to neural representations of noise and transition probabilities in the frontoparietal network and the vmPFC), I'm wondering how robust the results are. Is it likely that the results would replicate with a larger number of subjects? Can the two control studies be leveraged to address this concern to some extent?

      It would be challenging to use the control studies to address the robustness concern. The control studies were designed to address the motor confounds. They were less suitable, however, for addressing the individual difference issue raised by the reviewer. We discussed why this is the case below.

      The two control studies did not allow us to examine individual differences – in particular with respect to neural selectivity of noise and transition probability – and therefore we think it is less likely to leverage the control studies. Having said that, it is possible to look at neural selectivity of noise (signal diagnosticity) in the first control experiment where subjects estimated the probability of blue regime in a task where there was no regime change (transition probability was 0). However, the fact that there were no regime shifts in the first control experiment changed the nature of the task. Instead of always starting at the Red regime in the main experiment, in the first control experiment we randomly picked the regime to draw the signals from. It also changed the meaning and the dynamics of the signals (red and blue) that would appear. In the main experiment the blue signal is a signal consistent with change, but in the control experiment this is no longer the case. In the main experiment, the frequency of blue signals is contingent upon both noise and transition probability where blue signals are less frequent than red signals because of the small transition probabilities. But in the first control experiment, the frequency of blue signals is not less frequent because the regime was blue in half of the trials. Due to these differences, we do not see how analyzing the control experiments could help in establishing robustness because we do not have a good prediction as to whether and how the neural selectivity would be impacted by these differences.

      We can address the issue of robustness through looking at the effect size. In particular, with respect to individual differences in neural sensitivity of transition probability and signal diagnosticity, since the significant correlation coefficients between neural and behavioral sensitivity were between 0.4 and 0.58 for signal diagnosticity in frontoparietal network (Fig. 5C), and -0.38 and -0.37 for transition probability in vmPFC (Fig. 5D), the effect size of these correlation coefficients was considered medium to large (Cohen, 1992). Cohen, J. (1992). A power primer. Psychological Bulletin, 112(1), 155-159.

      It seems that the authors have not counterbalanced the colors and that subjects always reported the probability of the blue regime. If so, I'm wondering why this was not counterbalanced.

      We are aware of the reviewer’s concern. The first reason we did not do these (color counterbalancing and report blue/red regime balancing) was to not confuse the subjects in an already complicated task. Balancing these two variables also comes at the cost of sample size, which was the second reason we did not do it. Although we can elect to do these balancing at the between-subject level to not impact the task complexity, we could have introduced another confound that is the individual differences in how people respond to these variables. This is the third reason we were hesitant to do these counterbalancing.

      Reviewer #2 (Public review):

      Summary:

      This paper focuses on understanding the behavioral and neural basis of regime shift detection, a common yet hard problem that people encounter in an uncertain world. Using a regime-shift task, the authors examined cognitive factors influencing belief updates by manipulating signal diagnosticity and environmental volatility. Behaviorally, they have found that people demonstrate both over and under-reaction to changes given different combinations of task parameters, which can be explained by a unified system-neglect account. Neurally, the authors have found that the vmPFC-striatum network represents current belief as well as belief revision unique to the regime detection task. Meanwhile, the frontoparietal network represents cognitive factors influencing regime detection i.e., the strength of the evidence in support of the regime shift and the intertemporal belief probability. The authors further link behavioral signatures of system neglect with neural signals and have found dissociable patterns, with the frontoparietal network representing sensitivity to signal diagnosticity when the observation is consistent with regime shift and vmPFC representing environmental volatility, respectively. Together, these results shed light on the neural basis of regime shift detection especially the neural correlates of bias in belief update that can be observed behaviorally.

      Strengths:

      (1) The regime-shift detection task offers a solid ground to examine regime-shift detection without the potential confounding impact of learning and reward. Relatedly, the system-neglect modeling framework provides a unified account for both over or under-reacting to environmental changes, allowing researchers to extract a single parameter reflecting people's sensitivity to changes in decision variables and making it desirable for neuroimaging analysis to locate corresponding neural signals.

      Thank you for recognizing our task design and our system-neglect computational framework in understanding change detection.

      (2) The analysis for locating brain regions related to belief revision is solid. Within the current task, the authors look for brain regions whose activation covary with both current belief and belief change. Furthermore, the authors have ruled out the possibility of representing mere current belief or motor signal by comparing the current study results with two other studies. This set of analyses is very convincing.

      Thank you for recognizing our control studies in ruling out potential motor confounds in our neural findings on belief revision.

      (3) The section on using neuroimaging findings (i.e., the frontoparietal network is sensitive to evidence that signals regime shift) to reveal nuances in behavioral data (i.e., belief revision is more sensitive to evidence consistent with change) is very intriguing. I like how the authors structure the flow of the results, offering this as an extra piece of behavioral findings instead of ad-hoc implanting that into the computational modeling.

      Thank you for appreciating how we showed that neural insights can lead to new behavioral findings.

      Weaknesses:

      (1) The authors have presented two sets of neuroimaging results, and it is unclear to me how to reason between these two sets of results, especially for the frontoparietal network. On one hand, the frontoparietal network represents belief revision but not variables influencing belief revision (i.e., signal diagnosticity and environmental volatility). On the other hand, when it comes to understanding individual differences in regime detection, the frontoparietal network is associated with sensitivity to change and consistent evidence strength. I understand that belief revision correlates with sensitivity to signals, but it can probably benefit from formally discussing and connecting these two sets of results in discussion. Relatedly, the whole section on behavioral vs. neural slope results was not sufficiently discussed and connected to the existing literature in the discussion section. For example, the authors could provide more context to reason through the finding that striatum (but not vmPFC) is not sensitive to volatility.<br />

      We thank the reviewer for the valuable suggestions.

      With regard to the first comment, we wish to clarify that we did not find frontoparietal network to represent belief revision. It was the vmPFC and ventral striatum that we found to represent belief revision ( in Fig. 3). For the frontoparietal network, we identified its involvement in our task through finding that its activity correlated with strength of change evidence (Fig. 4) and individual subjects’ sensitivity to signal diagnosticity (Fig. 5). Conceptually, these two findings reflect how individuals interpret the signals (signals consistent or inconsistent with change) in light of signal diagnosticity. This is because (1) strength of change evidence is defined as signals (+1 for signal consistent with change, and -1 for signal inconsistent with change) multiplied by signal diagnosticity and (2) sensitivity to signal diagnosticity reflects how individuals subjectively evaluate signal diagnosticity. At the theoretical level, these two findings can be interpreted through our computational framework in that both the strength of change evidence and sensitivity to signal diagnosticity contribute to estimating the likelihood of change (Eqs. 1 and 2). We added a paragraph in Discussion to talk about this.

      We will add on p. 35:

      “For the frontoparietal network, we identified its involvement in our task through finding that its activity correlated with strength of change evidence (Fig. 4) and individual subjects’ sensitivity to signal diagnosticity (Fig. 5). Conceptually, these two findings reflect how individuals interpret the signals (signals consistent or inconsistent with change) in light of signal diagnosticity. This is because (1) strength of change evidence is defined as signals (+1 for signal consistent with change, and -1 for signal inconsistent with change) multiplied by signal diagnosticity and (2) sensitivity to signal diagnosticity reflects how individuals subjectively evaluate signal diagnosticity. At the theoretical level, these two findings can be interpreted through our computational framework in that both the strength of change evidence and sensitivity to signal diagnosticity contribute to estimating the likelihood of change (Eqs. 1 and 2).”

      With regard to the second comment, we added discussion on the behavioral and neural slope comparison. We pointed out previous papers conducting similar analysis (Vilares et al., 2012; Ting et al., 2015; Yang & Wu, 2020), their findings and how they relate to our results. Vilares et al. found that sensitivity to prior information (uncertainty in prior distribution) in the orbitofrontal cortex (OFC) and putamen correlated with behavioral measure of sensitivity to prior. In the current study, transition probability acts as prior in the system-neglect framework (Eq. 2) and we found that ventromedial prefrontal cortex represents subjects’ sensitivity to transition probability. Together, these results suggest that OFC and vmPFC are involved in the subjective evaluation of prior information in both static (Vilares et al., 2012) and dynamic environments (current study). In addition, we added to the literature by showing that distinct from vmPFC in representing sensitivity to transition probability or prior, the frontoparietal network represents how sensitive individual decision makers are to the diagnosticity of signals in revealing the true state (regime) of the environment.

      We will add on p. 36:

      “In the current study, our psychometric-neurometric analysis focused on comparing behavioral sensitivity with neural sensitivity to the system parameters (transition probability and signal diagnosticity). We measured sensitivity by estimating the slope of behavioral data (behavioral slope) and neural data (neural slope) in response to the system parameters. Previous studies had adopted a similar approach (Vilares et al., 2012; Ting et al., 2015; Yang & Wu, 2020). For example, Vilares et al. (2012) found that sensitivity to prior information (uncertainty in prior distribution) in the orbitofrontal cortex (OFC) and putamen correlated with behavioral measure of sensitivity to the prior. In the current study, transition probability acts as prior in the system-neglect framework (Eq. 2) and we found that ventromedial prefrontal cortex represents subjects’ sensitivity to transition probability. Together, these results suggest that OFC and vmPFC are involved in the subjective evaluation of prior information in both static (Vilares et al., 2011) and dynamic environments (current study). In addition, we added to the literature by showing that distinct from vmPFC in representing sensitivity to transition probability or prior, the frontoparietal network represents how sensitive individual decision makers are to the diagnosticity of signals in revealing the true state (regime) of the environment.” 

      (2) More details are needed for behavioral modeling under the system-neglect framework, particularly results on model comparison. I understand that this model has been validated in previous publications, but it is unclear to me whether it provides a superior model fit in the current dataset compared to other models (e.g., a model without \alpha or \beta). Relatedly, I wonder whether the final result section can be incorporated into modeling as well - i.e., the authors could test a variant of the model with two \betas depending on whether the observation is consistent with a regime shift and conduct model comparison.

      Thank you for the great suggestion.

      To address the reviewer’s question on model comparison, we tested 4 variants of the system-neglect model and incorporated them into the final result section. The original system-neglect model and its four models are:

      – Original system-neglect model: 6 total parameters, 3 beta parameters (one for each level of signal diagnosticity) and 3 alpha parameters (one for each level of transition probability).  

      – M1: System-neglect model with signal-dependent beta parameters (alpha parameters, and beta parameters separately estimated at change-consistent and change-inconsistent signals): 9 total parameters, 3 beta parameters for change-consistent signals, 3 beta parameters for change-inconsistent signals, and 3 alpha parameters.

      – M2: System-neglect model with signal-dependent alpha parameters (alpha parameters separately estimated at change-consistent and change-inconsistent signals, and beta parameters): 9 total parameters, 3 alpha parameters for change-consistent signals, 3 alpha parameters for change-inconsistent signals, and 3 beta parameters.

      – M3: System-neglect model without alpha parameters (only the beta parameters): 3 total parameters, all are beta parameters (one for each level of signal diagnosticity).

      – M4: System-neglect model without beta parameters (only the alpha parameters): 3 total parameters, all are alpha parameters (one for each level of transition probability).

      We compared these four models with the original system-neglect model. In the figure below, we plot  where  is the Akaike Information Criterion (AIC) of one of the new models minus the AIC of the original model. ∆AIC<0  indicates that the new model is better than the original model. By contrast, ∆AIC>0 suggests that the new model is worse than the original model.

      Author response image 3.

      When we separately estimated the beta parameter (M1) for change-consistent signals and change-inconsistent signals, we found that its AIC is significantly smaller than the original model (p<0.01). The same was found for the model where we separately estimated the alpha parameters for change-consistent and change-inconsistent signals (M2). When we took out either the alpha (M3) or the beta parameters (M4), we found that these models were worse than the original model (p<0.01). In summary, we found that models where we separately estimated the alpha/beta parameters for change-consistent and change-inconsistent signals were better than the original model. This is consistent with the insight the neural data provided.

      To show these results, we will add a new figure (Figure 7) in the revised manuscript.

    1. Author response:

      We are grateful for the thorough and constructive feedback provided on our manuscript.

      Regarding the main concern about power law behavior and scale invariance, we would like to clarify that our study does not aim to establish criticality. Instead, we focus on describing and understanding a specific scale-invariant property: the collapsed eigenspectra in neural activity under random sampling. Indeed, we tested Morrell et al.’s latent-variable model (eLife 12, RP89337, 2024, [1]), where a slowly varying latent factor drives population activity. Although it produces a seemingly power-law-like spectrum, random sampling does not replicate the strict spectral collapse observed in our data (second row in Author response image 1). This highlights that simply adding latent factors does not fully recapitulate the scale invariance we measure, suggesting richer or more intricate processes may be involved in real neural recordings.

      Author response image 1.

      Morrell et al.’s latent variable model [1, 2]. A-D: Functional sampled (RSap) eigenspectral of the Morrell et al. model. E-H: Random sampled (RSap) eigenspectra of the same model. Briefly, in Morrell et al.’s latent variable model [1, 2], neural activity is driven by Nf latent fields and a place fields. The latent fields are modeled as Ornstein-Uhlenbeck processes with a time constant τ . The parameters ϵ and η control the mean and variance of individual neurons’ firing rates, respectively. The following are the parameter values used. A,E: Using the same parameters as in [1]: N<sub>f</sub> = 10, ϵ = −2.67, η = 6, τ = 0.1. Half of the cells are also coupled to the place field. B,C,D,F,G,H: Using parameters from [2]: N<sub>f</sub> = 5, ϵ = −3, η = 4. There is no place field. The time constant τ = 0.1, 1, 10 for B,F, C,G, and D,H, respectively.

      We decided to make 5 key revisions.

      • As mentioned, we have evaluated the latent variable model proposed by Morrell et al. and found that they fail to reproduce the scale-invariant eigenspectra observed in our data; these results will be presented in the Discussion section and supported by a new Supplementary Figure.

      • We will include a discussion on the findings of Manley et al. (2024, [2]) regarding the issue of saturating dimensionality in the Discussion section, highlighting the methodological differences and their implications.

      • We will add a new mathematical derivation in the Methods section, elucidating the bounded dimensionality using the spectral properties of our model.

      • We will elaborate in the Discussion section to further emphasize the robustness of our findings by demonstrating their consistency across diverse datasets and experimental techniques.

      • We will incorporate a brief discussion on the implications for neural coding. In particular, Fisher information can become unbounded when the slope of the power-law rank plot is less than one, as highlighted in the recent work by Moosavi et al. (bioRxiv 2024.08.23.608710, Aug, 2024 [3]) in the Discussion section.

      We believe these revisions will address the concerns raised by you and collectively strengthen our manuscript to provide a more comprehensive and robust understanding of the geometry and dimensionality of brain-wide activity.

      References

      (1) M. C. Morrell, A. J. Sederberg, I. Nemenman, Latent dynamical variables produce signatures of spatiotemporal criticality in large biological systems. Physical Review Letters 126, 118302 (2021).

      (2) M. C. Morrell, I. Nemenman, A. Sederberg, Neural criticality from effective latent variables. eLife 12, RP89337 (2024).

    1. Author rsponse:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this paper, the authors have performed an antigenic assay for human seasonal N1 neuraminidase using antigens and mouse sera from 2009-2020 (with one avian N1 antigen). This shows two distinct antigen groups. There is poorer reactivity with sera from 2009-2012 against antigens from 2015-2019, and poorer reactivity with sera from 2015-2020 against antigens from 2009-2013. There is a long branch separating these two groups. However, 321 and 423 are the only two positions that are consistently different between the two groups. Therefore these are the most likely cause of these antigenic differences.

      Strengths:

      (1) A sensible rationale was given for the choice of sera, in terms of the genetic diversity.

      (2) There were two independent batches of one of the antigens used for generating sera, which demonstrated the level of heterogeneity in the experimental process.

      (3) Replicate of the Wisconsin/588/2019 antigen (as H1 and H6) is another useful measure of heterogeneity.

      (4) The presentation of the data, e.g. Figure 2, clearly shows two main antigenic groups.

      (5) The most modern sera are more recent than other related papers, which demonstrates that has been no major antigenic change.

      Weaknesses:

      (1) Issues with experimental methods

      As I am not an experimentalist, I cannot comment fully on the experimental methods. However, I note that BALB/c mice sera were used, whereas outbred ferret sera are typically used in influenza antigenic characterisation, so the antigenic difference observed may not be relevant in humans. Similarly, the mice were immunised with an artificial NA immunogen where the typical approach would be to infect the ferret with live virus intra-nasally.

      Indeed, ferrets are the gold standard model for the study of influenza. The main reason for this is the susceptibility of ferrets to infection with primary human influenza virus isolates and their ability to transmit human influenza A and B viruses. Although mouse models often require the use of mouse-adapted influenza virus strains, it is still the most used model to study new developments on influenza vaccine.

      In our previous publication we performed a parallel analysis of sera of ferrets that were primed by infection and boosted by recombinant protein, as well as mice that, like in this study that focuses on N1 NA, were prime-boosted with purified recombinant NA proteins in the presence of an adjuvant. Our data indicate that the NAI responses in immune sera from infected ferrets after infection and after boost enables similar antigenic classification and correlated strongly with those induced in mice that had been prime-boosted with adjuvanted recombinant NA (Catani et al., eLife 2024). To a large extend, the immunogenicity of an antigen relies on epitope accessibility, which may dictate a universal rule of immunogenicity and antigenicity (Altman et al., 2015).

      (2) Five mice sera were generated per immunogen and then pooled, but data was not presented that demonstrated these sera were sufficiently homogenous that this approach is valid.

      Although individual sera was not tested here. Based on previous studies from our group we are confident that a prime-boost schedule with 1 µg of adjuvanted soluble tetrameric NA, induces a highly homogeneous response in mice (Catani et al., 2022).

      (3) There were no homologous antigens for most of the sera. This makes the responses difficult to interpret as the homologous titre is often used to assess the overall reactivity of a serum. The sequence of the antigens used is not described, which again makes it difficult to interpret the results.

      The absence of homologous antigens may indeed make interpretation more difficult. However, we have observed that homologous sera do not always coincide with the highest reactivity, although highest reactivity is always found within an antigenic cluster. A sequence comparison would be appropriate to improve interpretability of the data. Therefore, a sequence alignment and a pairwise comparison will be provided in the revised manuscript as supplement. 

      (4) To be able to untangle the effects of the individual substitutions at 321, 386, and 432, it would have been useful to have included the naturally occurring variants at these positions, or to have generated mutants at these positions. Gao et al clearly show an antigenic difference with ferret sera correlated separately with N386K and I321V/K432E.

      The prevalence of single amino acid substitutions in N1 NA of clinical H1N1 virus strains isolated between 2009 and 2024 is minimal, which may indicate reduced fitness (see Author response image 1) in strains with these substitutions in NA. Nevertheless, we agree that the rescue of single mutants would provide important evidence to untangle those individual impacts on antigenicity. We plan to generate mutants with substitution at these positions in NA of A/Wisconsin/588/2019 H1N1 and determine the NAI against our panel of sera.

      Author response image 1.

      Prevalence of the indicated N1 NA substitutions in all clinical human H1N1 isolates with unique sequences deposited in the GISAID data bank since 2009.

      (5) The challenge experiments in Gao et al showed that NI titre was not a good correlate of protection, so that limits the interpretation of these results.

      On the contrary, challenges experiments confirmed that drift occurred in NA from H1N1 viruses isolated between 2009 (CA/09) and 2015 (MI/15). The dilution of transferred sera to equal inhibitory titers indicate that the homologous ferret sera (shown in figure 5e-f)(Gao et al., 2019) is still effective in protecting against infection while heterologous sera are not. This result emphasises that the nature of the homologous NAI response is well-suited for protection against a homologous challenge, although mechanistic data was not provided.

      Issues with the computational methods

      (6) The NAI titres were normalised using the ELISA results, and the motivation for this is not explained. It would be nice to see the raw values.

      Mice were immunized with different batches of recombinant protein. Each of those batches may have distinct intrinsic immunogenicity, as observed in Figure 1d. For that reason, NAI values were normalized using homologous ELISA titers induced by each respective NA antigen. A table with the raw values will be included in the revised manuscript.

      (7) It is not clear what value the random forest analysis adds here, given that positions 321 and 432 are the only two that consistently differ between the two groups.

      The substitutions at position 321 and 432 are indeed the only 2 consistently differing amino acids among the tested N1s. Although their correlation with antigenic clustering may be obvious after analysis, a random forest analysis would enable to reveal less obvious substitutions that contribute to the antigenic diversity. In the future, we intend to expand this methodology to strains that are not currently included in the panel. A random forest model is a relatively simple and performant method to deal with a new dataset.

      (8) As with the previous N2 paper, the metric for antigenic distance (the root mean square of the difference between the titres for two sera) is not one that would be consistent when different sera are included. More usual metrics of distance are Archetti-Horsfall, fold down from homologous, or fold down from maximum.

      The antigenic distances calculated prior to our random forest does use fold-difference as metrics as log2(max(EC50) / EC50). After having obtained the fold-difference values, a pairwise dissimilarity matrix was calculated to obtain the average antigenic distance between pairs of sera. A more detailed description of the methodology will be included in the methods session, including the R-code.

      (9) Antigenic cartography of these data is fraught. I wonder whether 2 dimensions are required for what seems like a 1-dimensional antigenic difference - certainly, the antigens, excluding the H5N1, are in a line. The map may be skewed by the high reactivity Brisbane/18 antigen. It is not clear if the column bases (normalisation factors for calculating antigenic distance) have been adjusted to account for the lack of homologous antigens. It is typical to present antigenic maps with a 1:1 x:y ratio.

      Antigenic cartography will be repeated excluding H5N1 and/or Brisbane/18 antigen. Data will be provided in the final rebuttal letter.

      Issues with interpretation

      (10) Figure 2 shows the NAI titres split into two groups for the antigens, however, A/Brisbane is an outlier in the second antigenic group with high reactivity.

      Indeed, A/Brisbane/02/2018 has overall higher IC50 values. However, it still falls into the same cluster that we called AG2. Highlighting A/Brisbane/02/2018 may lead to the misinterpretation of a non-existent antigenic group. 

      (11) Following Gao et al, I think you can claim that it is more likely that the antigenic change is due to K432E than I321V, based on a comparison of the amino acid change.

      Indeed, we would expect that substitution of the basic arginine to an acidic glutamate is more likely to impact antigenicity than the isoleucine-to-valine apolar substitution. Testing of mutant reassortants with single mutations may provide the definitive answer for that question.

      Appraisal:

      Taking into account the limitations of the experimental techniques (which I appreciate are due to resource constraints), this paper meets its aim of measuring the antigenic relationships between 2009-2020 seasonal N1s, showing that there were two main groups. The authors discovered that the difference between the two antigenic groups was likely attributable to positions 321 and 432, as these were the only two positions that were consistently different between the two groups. They came to this finding by using a random forest model, but other simpler methods could have been used.

      Impact:

      This paper contributes to the growing literature on the potential benefit of NA in the influenza vaccine.

      Reviewer #2 (Public review):

      Summary:

      In this study, Catani et al. have immunized mice with 17 recombinant N1 neuraminidases (NAs) from human isolates circulating between 2009-2020 to investigate antigenic diversity. NA inhibition (NAI) titers revealed two groups that were antigenically and phylogenetically distinct. Machine learning was used to estimate the antigenic distances between the N1 NAs and mutations at residues K432E and I321V were identified as key determinants of N1 NA antigenicity.

      Strengths:

      Observation of mutations associated with N1 antigenic drift.

      Weaknesses:

      Validation that K432E and I321V are responsible for antigenic drift was not determined in a background strain with native K432 and I321 or the restitution of antibody binding by reversion to K432 and I321 in strains that evaded sera.

      Reassortant A/Wisconsin/588/2019 with E432K, V321I and also K386N single mutations will be rescued and tested against the panel of sera.

    1. Author Response

      We would like to thank the Editors and Reviewers for their comprehensive review of the manuscript. We appreciate your feedback, and we will carefully consider all your comments in the revision of the manuscript. Below are our provisional responses to your comments.

      eLife assessment

      This manuscript reveals important insights into the role of ipsilateral descending pathways in locomotion, especially following unilateral spinal cord injury. The study provides solid evidence that this method improves the injured side's ability to support weight, and as such the findings may lead to new treatments for stroke, spinal cord injuries, or unilateral cerebral injuries. However, the methods and results need to be better detailed, and some of the statistical analysis enhanced.

      Thank you for your assessment. We will incorporate various textual enhancements in the final version of the manuscript to address the weaknesses you have pointed out. The specific improvements are outlined below.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This manuscript provides potentially important new information about ipsilateral cortical impact on locomotion. A number of issues need to be addressed.

      Strengths:

      The primary appeal and contribution of this manuscript are that it provides a range of different measures of ipsilateral cortical impact on locomotion in the setting of impaired contralateral control. While the pathways and mechanisms underlying these various measures are not fully defined and their functional impacts remain uncertain, they comprise a rich body of results that can inform and guide future efforts to understand cortical control of locomotion and to develop more effective rehabilitation protocols.

      Weaknesses:

      1. The authors state that they used a cortical stimulation location that produced the largest ankle flexion response (lines 102-104). Did other stimulation locations always produce similar, but smaller responses (aside from the two rats that showed ipsilateral neuromodulation)? Was there any site-specific difference in response to stimulation location?

      We derived motor maps in each rat, akin to the representation depicted in Fig 6. In each rat, alternative cortical sites did, indeed, produce distal or proximal contralateral leg flexion responses. Distal responses were more likely to be evoked in the rostral portion of the array, similarly to proximal responses early after injury. This distribution in responses across different cortical sites is reported in this study (Fig. 6) and is consistent with our prior work. The Results section will be revised to provide additional clarification and context for the data presented in Figure 6.

      1. Figure 2: There does not appear to be a strong relationship between the percentage of spared tissue and the ladder score. For example, the animal with the mild injury (based on its ladder score) in the lower left corner of Figure 2A has less than 50% spared tissue, which is less spared tissue than in any animal other than the two severe injuries with the most tissue loss. Is it possible that the ladder test does not capture the deficits produced by this spinal cord injury? Have the authors looked for a region of the spinal cord that correlates better with the deficits that the ladder test produces? The extent of damage to the region at the base of the dorsal column containing the corticospinal tract would be an appropriate target area to quantify and compare with functional measures.

      In Fig. S6 of our 2021 publication "Bonizzato and Martinez, Science Translational Medicine", we investigated the predictive value of tissue sparing in specific sub-regions of the spinal cord for ladder performance. Specifically, we examined the correlation between the accuracy of left leg ladder performance in the acute state and the preservation of the corticospinal tract (CST). Our results indicated that dorsal CST sparing serves as a mild predictor for ladder deficits, confirming the results obtain in this study.

      1. Lines 219-221: The authors state that "phase-coherent stimulation reinstated the function of this muscle, leading to increased burst duration (90{plus minus}18% of the deficit, p=0.004, t-test, Fig. 4B) and total activation (56{plus minus}13% of the deficit, p=0.014, t-test, Fig. 3B). This way of expressing the data is unclear. For example, the previous sentence states that after SCI, burst duration decreased by 72%. Does this mean that the burst duration after stimulation was 90% higher than the -72% level seen with SCI alone, i.e., 90% + -72% = +18%? Or does it mean that the stimulation recovered 90% of the portion of the burst duration that had been lost after SCI, i.e., -72% * (100%-90%)= -7%? The data in Figure 4 suggests the latter. It would be clearer to express both these SCI alone and SCI plus stimulation results in the text as a percent of the pre-SCI results, as done in Figure 4.

      Your assessment is correct; we intended to report that the stimulation recovered 90% of the portion of the burst duration that had been lost after SCI. This point will be addressed in the revision of the manuscript.

      1. Lines 227-229: The authors claim that the phase-dependent stimulation effects in SCI rats are immediate, but they don't say how long it takes for these effects to be expressed. Are these effects evident in the response to the first stimulus train, or does it take seconds or minutes for the effects to be expressed? After the initial expression of these effects, are there any gradual changes in the responses over time, e.g., habituation or potentiation?

      The effects are immediately expressed at the very first occurrence of stimulation. We never tested a rat completely naïve to stimuli, as each treadmill session involves prior cortical mapping to identify a suitable active site for involvement in locomotor experiments. Yet, as demonstrated in Supplementary Video 1 accompanying our 2021 publication on contralateral effects of cortical stimulation, "Bonizzato and Martinez, Science Translational Medicine," the impact of phase-dependent cortical stimulation on movement modulation is instantaneous and ceases promptly upon discontinuation of the stimulation. We did not quantify potential gradual changes in responsiveness over time, but we cannot exclude that for long stimulation sessions (e.g., 30 min or more), stimulus amplitude may need to be slightly increased over time to compensate habituation.

      1. Awake motor maps (lines 250-277): The analysis of the motor maps appears to be based on measurements of the percentage of channels in which a response can be detected. This analytic approach seems incomplete in that it only assesses the spatial aspect of the cortical drive to the musculature. One channel could have a just-above-threshold response, while another could have a large response; in either case, the two channels would be treated as the same positive result. An additional analysis that takes response intensity into account would add further insight into the data, and might even correlate with the measures of functional recovery. Also, a single stimulation intensity was used; the results may have been different at different stimulus intensities.

      We confirm that maps of cortical stimulation responsiveness may vary at different stimulus amplitudes. To establish an objective metric of excitability, we identified 100µA as a reliable stimulation amplitude across rats and used this value to build the ipsilateral motor representation results in Figure 6. This choice allows direct comparison with Figure 6 of our 2021 article, related to contralateral motor representation. The comparison reveals a lack of correlation with functional recovery metrics in the ipsilateral case, in contrast to the successful correlation achieved in the contralateral case.

      Regarding the incorporation of stimulation amplitudes into the analysis, as detailed in the Method section (lines 770-771), we systematically tested various stimulation amplitudes to determine the minimal threshold required for eliciting a muscle twitch, identified as the threshold value. This process was conducted for each electrode site. Upon reviewing these data, we considered the possibility of presenting an additional assessment of ipsilateral cortical motor representation based on stimulation thresholds. However, the representation depicted in the figure did not differ significantly from the data presented in Figure 6A. Furthermore, this representation introduced an additional weakness, as it was unclear how to represent the absence of a response in the threshold scale. We chose to arbitrarily designate it as zero on the inverse logarithmic scale, where, for reference, 100 µA is positioned at 0.2 and 50 µA at 0.5.

      In conclusion, we believe that the conclusions drawn from this analysis align substantially with those in the text. The addition of the threshold analysis, in our assessment, would not contribute significantly to improving the manuscript.

      Author response image 1.

      Threshold analysis

      Author response image 2.

      Original occurrence probability analysis, for comparison.

      1. Lines 858-860: The authors state that "All tests were one-sided because all hypotheses were strictly defined in the direction of motor improvement." By using the one-sided test, the authors are using a lower standard for assessing statistical significance that the overwhelming majority of studies in this field use. More importantly, ipsilateral stimulation of particular kinds or particular sites might conceivably impair function, and that is ignored if the analysis is confined to detecting improvement. Thus, a two-sided analysis or comparable method should be used. This appropriate change would not greatly modify the authors' current conclusions about improvements.

      Our original hypothesis, drawn from previous studies involving cortical stimulation in rats and cats, as well as other neurostimulation research for movement restoration, posited a favorable impact of neurostimulation on movement. Consistent with this hypothesis, we designed our experiments with a focus on enhancing movement, emphasizing a strict direction of improvement.

      It's important to note that a one-sided test is the appropriate match for a one-sided hypothesis, and it is not a lower standard in statistics. Each experiment we conducted was constructed around a strictly one-sided hypothesis: the inclusion of an extensor-inducing stimulus would enhance extension, and the inclusion of a flexion-inducing stimulus would enhance flexion. This rationale guided our choice of the appropriate statistical test.

      We acknowledge your concern regarding the potential for ipsilateral stimulation to have negative effects on locomotion, which might not be captured when designing experiments based on one-sided hypotheses. This concern is valid, and we will explicitly mention it in the statistics section. Nonetheless, even if such observations were made, they could serve as the basis for triggering an ad-hoc follow-up study.

      Reviewer #2 (Public Review):

      Summary:

      The authors' long-term goals are to understand the utility of precisely phased cortex stimulation regimes on recovery of function after spinal cord injury (SCI). In prior work, the authors explored the effects of contralesion cortex stimulation. Here, they explore ipsilesion cortex stimulation in which the corticospinal fibers that cross at the pyramidal decussation are spared. The authors explore the effects of such stimulation in intact rats and rats with a hemisection lesion at the thoracic level ipsilateral to the stimulated cortex. The appropriately phased microstimulation enhances contralateral flexion and ipsilateral extension, presumably through lumbar spinal cord crossed-extension interneuron systems. This microstimulation improves weight bearing in the ipsilesion hindlimb soon after injury, before any normal recovery of function would be seen. The contralateral homologous cortex can be lesioned in intact rats without impacting the microstimulation effect on flexion and extension during gait. In two rats ipsilateral flexion responses are noted, but these are not clearly demonstrated to be independent of the contralateral homologous cortex remaining intact.

      Strengths:

      This paper adds to prior data on cortical microstimulation by the laboratory in interesting ways. First, the strong effects of the spared crossed fibers from the ipsi-lesional cortex in parts of the ipsi-lesion leg's step cycle and weight support function are solidly demonstrated. This raises the interesting possibility that stimulating the contra-lesion cortex as reported previously may execute some of its effects through callosal coordination with the ipsi-lesion cortex tested here. This is not fully discussed by the authors but may represent a significant aspect of these data. The authors demonstrate solidly that ablation of the contra-lesional cortex does not impede the effects reported here. I believe this has not been shown for the contra-lesional cortex microstimulation effects reported earlier, but I may be wrong. Effects and neuroprosthetic control of these effects are explored well in the ipsi-lesion cortex tests here.

      In the revised version of the manuscript, we will incorporate various text improvements to address the points you have highlighted below. Additionally, we will integrate the suggested discussion topic on callosal coordination related to contralateral cortical stimulation.

      Weaknesses:

      Some data is based on very few rats. For example (N=2) for ipsilateral flexion effects of microstimulation. N=3 for homologous cortex ablation, and only ipsi extension is tested it seems. There is no explicit demonstration that the ipsilateral flexion effects in only 2 rats reported can survive the contra-lateral cortex ablation. We agree with this assessment. The ipsilateral flexion representation is here reported as a rare but consistent phenomenon, which we believe to have robustly described with Figure 7 experiments. We will underline in the text that the ablation experiment did not conclude on the unilateral-cortical nature of ipsilateral flexion effects.

      Some improvements in clarity and precision of descriptions are needed, as well as fuller definitions of terms and algorithms.

      Likely Impacts: This data adds in significant ways to prior work by the authors, and an understanding of how phased stimulation in cortical neuroprosthetics may aid in recovery of function after SCI, especially if a few ambiguities in writing and interpretation are fully resolved.

      The manuscript text will be revised in its final version, and we seek to eliminate any ambiguity in writing, data interpretation and algorithms.

      Reviewer #3 (Public Review):

      Summary:

      This article aims to investigate the impact of neuroprosthesis (intracortical microstimulation) implanted unilaterally on the lesion side in the context of locomotor recovery following unilateral thoracic spinal cord injury.

      Strength:

      The study reveals that stimulating the left motor cortex, on the same side as the lesion, not only activates the expected right (contralateral) muscle activity but also influences unexpected muscle activity on the left (ipsilateral) side. These muscle activities resulted in a substantial enhancement in lift during the swing phase of the contralateral limb and improved trunk-limb support for the ipsilateral limb. They used different experimental and stimulation conditions to show the ipsilateral limb control evoked by the stimulation. This outcome holds significance, shedding light on the engagement of the "contralateral projecting" corticospinal tract in activating not only the contralateral but also the ipsilateral spinal network.

      The experimental design and findings align with the investigation of the stimulation effect of contralateral projecting corticospinal tracts. They carefully examined the recovery of ipsilateral limb control with motor maps. They also tested the effective sites of cortical stimulation. The study successfully demonstrates the impact of electrical stimulation on the contralateral projecting neurons on ipsilateral limb control during locomotion, as well as identifying important stimulation spots for such an effect. These results contribute to our understanding of how these neurons influence bilateral spinal circuitry. The study's findings contribute valuable insights to the broader neuroscience and rehabilitation communities.

      Thank you for your assessment of this manuscript. The final version of the manuscript will incorporate your suggestions for improving term clarity and will also enhance the discussion on the mechanism of spinal network engagement, as outlined below.

      Weakness:

      The term "ipsilateral" lacks a clear definition in the title, abstract, introduction, and discussion, potentially causing confusion for the reader. In the next revision of the manuscript, we will provide a clear definition of the term "ipsilateral."

      The unexpected ipsilateral (left) muscle activity is most likely due to the left corticospinal neurons recruiting not only the right spinal network but also the left spinal network. This is probably due to the joint efforts of the neuroprosthesis and activation of spinal motor networks which work bilaterally at the spinal level. However, in my opinion, readers can easily link the ipsilateral cortical network to the ipsilateral-projecting corticospinal tract, which is less likely to play a role in ipsilateral limb control in this study since this tract is disrupted by the thoracic spinal injury.

      We agree with your assessment. The discussion section paragraph presenting putative mechanisms of cortico-spinal transmission in the effects presented in the results will be enhanced to reflect these suggestions.

    1. Author response:

      Reviewer #1:

      (1) After Figure 1, a single saturated (palmitic acid; PA) and a single unsaturated (linoleic acid; LA) fatty acid are used for the remaining studies, bringing into question whether effects are in fact the result of a difference in saturation vs. other potential differences.

      PA, SA, OA and LA are the most common FA species in humans (Figure 1A in manuscript). Among them, PA predominantly represents saturated FAs while LA is the main unsaturated FAs, respectively. Of note, although both SA and OA were included in our studies, their effects were comparable to those of PA and LA, respectively. Due to space constraints, the data of SA and OA are not presented in the figures.  

      (2) While primary macrophages are used in several mechanistic studies, tumor-associated macrophages (TAMs) are not used. Rather, correlative evidence is provided to connect mechanistic studies in macrophage cell lines and primary macrophages to TAMs.

      The roe of FABP4 in TAMs has been demonstrated in our previous studies using in vivo animal models1. Therefore, we did not include TAM-specific data in the current study.

      (3) CEBPA and FABP4 clearly regulate LA-induced changes in gene expression. However, whether these two key proteins act in parallel or as a pathway is not resolved by presented data.

      Multiple lines of evidence in our studies suggest that FABP4 and CEBPA act as a pathway in LA-induced changes: 1) FABP4-negative macrophages exhibit reduced expression of CEBPA in single cell sequencing data; 2) FABP4 KO macrophages exhibited reduced CEBPA expression; 3) LA-induced CEBPA expression in macrophages was compromised when FABP4 was absent.

      (4) It is very interesting that FABP4 regulates both lipid droplet formation and lipolysis, yet is unclear if the regulation of lipolysis is direct or if the accumulation of lipid droplets - likely plus some other signal(s) - induces upregulation of lipolysis genes.

      Yes, it is likely that tumor cells induce lipolysis signals. Multiple studies have shown that various tumor types stimulate lipolysis to support their growth and progression2-4.  In this process, lipid-loaded macrophages have emerged as a promising therapeutic target in cancer5, 6. Consistent with findings that lipolysis is essential for tumor-promoting M2 alternative macrophage activation7, our data using FABP4 WT and KO macrophages demonstrate that FABP4 plays a critical role in LA-induced lipid accumulation and lipolysis for tumor metastasis. 

      (5) In several places increased expression of genes coding for enzymes with known functions in lipid biology is conflated with an increase in the lipid biology process the enzymes mediate. Additional evidence would be needed to show these processes are in fact increased in a manner dependent on increased enzyme expression.

      We fully agree with the reviewer that increased gene expression does not necessarily equate to increased activity. The key finding of this study is that FABP4 plays a pivotal role in linoleic acid (LA)-mediated lipid accumulation and lipolysis in macrophages that promote tumor metastasis. Numerous lipid metabolism-related genes, including FABP4, CEBPA, GPATs, DGATs, and HSL, are involved in this process. While it was not feasible to verify the activity of all these genes, we confirmed the functional roles of key genes like FABP4 and CEBPA through various functional assays, such as gene silencing, knockout cell lines, lipid droplet formation, and tumor migration assays. Supported by established lipid metabolism pathways, our data provide compelling evidence that FABP4 functions as a crucial lipid messenger, facilitating unsaturated fatty acid-driven lipid accumulation and lipolysis in tumor-associated macrophages (TAMs), thus promoting breast cancer metastasis.   

      Reviewer #2:

      Overall, there is solid evidence for the importance of FABP4 expression in TAMs on metastatic breast cancer as well as lipid accumulation by LA in the ER of macrophages. A stronger rationale for the exclusive contribution of unsaturated fatty acids to the utilization of TAMs in breast cancer and a more detailed description and statistical analysis of data will strengthen the findings and resulting claims.

      We greatly appreciated the positive comments from Reviewer #2. In our study, we evaluated the effects of both saturated and unsaturated fatty acids (FA) on lipid metabolism in macrophages.  Our results showed that unsaturated FAs exhibited a preference for lipid accumulation in macrophages compared to saturated FAs. Further analysis revealed that unsaturated LA, but not saturated PA, induced FABP4 nuclear translocation and CEBPA activation, driving the TAG synthesis pathway. For in vitro experiments, statistical analyses were performed using a two-tailed, unpaired student t-test, two-way ANOVA followed by Bonferroni’s multiple comparison test, with GraphPad Prism 9. For experiments analyzing associations of FABP4, TAMs and other factors in breast cancer patients, the Kruskal-Wallis test was applied to compare differences across levels of categorical predictor variable. Additionally, multiple linear regression models were used to examine the association between the predictor variables and outcomes, with log transformation and Box Cox transformation applied to meet the normality assumptions of the model. It is worth noting that in some experiments, only significant differences were observed in groups treated with unsaturated fatty acids. Non-significant results from groups treated with saturated fatty acids were not included in the figures.

      Reviewer #3

      (1) While the authors speculate that UFA-activated FABP4 translocates to the nucleus to activate PPARgamma, which is known to induce C/EBPalpha expression, they do not directly test involvement of PPARgamma in this axis.

      Yes, LA induced FABP4 nuclear translocation and activation of PPARgamma in macrophages (see Figure below). Since these findings have been reported in multiple other studies 8, 9, we did not include the data in the current manuscript.

      Author response image 1.

      LA induced PPARg expression in macrophages. Bone-marrow derived macrophages were treated with 400μM saturated FA (SFA), unsaturated FA (UFA) or BSA control for 6 hours. PPARg expression was measured by qPCR (***p<0.001).

      (2) While there is clear in vitro evidence that co-cultured murine macrophages genetically deficient in FABP4 (or their conditioned media) do not enhance breast cancer cell motility and invasion, these macrophages are not bonafide TAM - which may have different biology. Use of actual TAM in these experiments would be more compelling. Perhaps more importantly, there is no in vivo data in tumor bearing mice that macrophage-deficiency of FABP4 affects tumor growth or metastasis.

      In our previous studies, we have shown that macrophage-deficiency of FABP4 reduced tumor growth and metastasis in vivo in mouse models1.

      (3) Related to this, the authors find FABP4 in the media and propose that macrophage secreted FABP4 is mediating the tumor migration - but don't do antibody neutralizing experiments to directly demonstrate this.

      Yes, we have recently published a paper of developing anti-FABP4 antibody for treatment of breast cancer in moue models10.

      (4) No data is presented that the mechanisms/biology that are elegantly demonstrated in the murine macrophages also occurs in human macrophages - which would be foundational to translating these findings into human breast cancer.

      Thanks for the excellent suggestions. Since this manuscript primarily focuses on mechanistic studies using mouse models, we plan to apply these findings in our future human studies. 

      (5) While the data from the human breast cancer specimens is very intriguing, it is difficult to ascertain how accurate IHC is in determining that the CD163+ cells (TAM) are in fact the same cells expressing FABP4 - which is central premise of these studies. Demonstration that IHC has the resolution to do this would be important. Additionally, the in vitro characterization of FABP4 expression in human macrophages would also add strength to these findings.

      The expression of FABP4 in CD163+ TAM observed through IHC is consistent with our previous findings, where we confirmed FABP4 expression in CD163+ TAMs using confocal microscopy. Emerging evidence further supports the pro-tumor role of FABP4 expression in human macrophages across various types of obesity-associated cancers11-13. 

      References

      (1) Hao J, Yan F, Zhang Y, Triplett A, Zhang Y, Schultz DA, Sun Y, Zeng J, Silverstein KAT, Zheng Q, Bernlohr DA, Cleary MP, Egilmez NK, Sauter E, Liu S, Suttles J, Li B. Expression of Adipocyte/Macrophage Fatty Acid-Binding Protein in Tumor-Associated Macrophages Promotes Breast Cancer Progression. Cancer Res. 2018;78(9):2343-55. Epub 2018/02/14. doi: 10.1158/0008-5472.CAN-17-2465. PubMed PMID: 29437708; PMCID: PMC5932212.

      (2) Nieman KM, Kenny HA, Penicka CV, Ladanyi A, Buell-Gutbrod R, Zillhardt MR, Romero IL, Carey MS, Mills GB, Hotamisligil GS, Yamada SD, Peter ME, Gwin K, Lengyel E. Adipocytes promote ovarian cancer metastasis and provide energy for rapid tumor growth. Nat Med. 2011;17(11):1498-503. Epub 20111030. doi: 10.1038/nm.2492. PubMed PMID: 22037646; PMCID: PMC4157349.

      (3) Wang YY, Attane C, Milhas D, Dirat B, Dauvillier S, Guerard A, Gilhodes J, Lazar I, Alet N, Laurent V, Le Gonidec S, Biard D, Herve C, Bost F, Ren GS, Bono F, Escourrou G, Prentki M, Nieto L, Valet P, Muller C. Mammary adipocytes stimulate breast cancer invasion through metabolic remodeling of tumor cells. JCI Insight. 2017;2(4):e87489. Epub 20170223. doi: 10.1172/jci.insight.87489. PubMed PMID: 28239646; PMCID: PMC5313068.

      (4) Balaban S, Shearer RF, Lee LS, van Geldermalsen M, Schreuder M, Shtein HC, Cairns R, Thomas KC, Fazakerley DJ, Grewal T, Holst J, Saunders DN, Hoy AJ. Adipocyte lipolysis links obesity to breast cancer growth: adipocyte-derived fatty acids drive breast cancer cell proliferation and migration. Cancer Metab. 2017;5:1. Epub 20170113. doi: 10.1186/s40170-016-0163-7. PubMed PMID: 28101337; PMCID: PMC5237166.

      (5) Masetti M, Carriero R, Portale F, Marelli G, Morina N, Pandini M, Iovino M, Partini B, Erreni M, Ponzetta A, Magrini E, Colombo P, Elefante G, Colombo FS, den Haan JMM, Peano C, Cibella J, Termanini A, Kunderfranco P, Brummelman J, Chung MWH, Lazzeri M, Hurle R, Casale P, Lugli E, DePinho RA, Mukhopadhyay S, Gordon S, Di Mitri D. Lipid-loaded tumor-associated macrophages sustain tumor growth and invasiveness in prostate cancer. J Exp Med. 2022;219(2). Epub 20211217. doi: 10.1084/jem.20210564. PubMed PMID: 34919143; PMCID: PMC8932635.

      (6) Marelli G, Morina N, Portale F, Pandini M, Iovino M, Di Conza G, Ho PC, Di Mitri D. Lipid-loaded macrophages as new therapeutic target in cancer. J Immunother Cancer. 2022;10(7). doi: 10.1136/jitc-2022-004584. PubMed PMID: 35798535; PMCID: PMC9263925.

      (7) Huang SC, Everts B, Ivanova Y, O'Sullivan D, Nascimento M, Smith AM, Beatty W, Love-Gregory L, Lam WY, O'Neill CM, Yan C, Du H, Abumrad NA, Urban JF, Jr., Artyomov MN, Pearce EL, Pearce EJ. Cell-intrinsic lysosomal lipolysis is essential for alternative activation of macrophages. Nat Immunol. 2014;15(9):846-55. Epub 2014/08/05. doi: 10.1038/ni.2956. PubMed PMID: 25086775; PMCID: PMC4139419.

      (8) Gillilan RE, Ayers SD, Noy N. Structural basis for activation of fatty acid-binding protein 4. J Mol Biol. 2007;372(5):1246-60. Epub 2007/09/01. doi: 10.1016/j.jmb.2007.07.040. PubMed PMID: 17761196; PMCID: PMC2032018.

      (9) Bassaganya-Riera J, Reynolds K, Martino-Catt S, Cui Y, Hennighausen L, Gonzalez F, Rohrer J, Benninghoff AU, Hontecillas R. Activation of PPAR gamma and delta by conjugated linoleic acid mediates protection from experimental inflammatory bowel disease. Gastroenterology. 2004;127(3):777-91. doi: 10.1053/j.gastro.2004.06.049. PubMed PMID: 15362034.

      (10) Hao J, Jin R, Yi Y, Jiang X, Yu J, Xu Z, Schnicker NJ, Chimenti MS, Sugg SL, Li B. Development of a humanized anti-FABP4 monoclonal antibody for potential treatment of breast cancer. Breast Cancer Res. 2024;26(1):119. Epub 20240725. doi: 10.1186/s13058-024-01873-y. PubMed PMID: 39054536; PMCID: PMC11270797.

      (11) Liu S, Wu D, Fan Z, Yang J, Li Y, Meng Y, Gao C, Zhan H. FABP4 in obesity-associated carcinogenesis: Novel insights into mechanisms and therapeutic implications. Front Mol Biosci. 2022;9:973955. Epub 20220819. doi: 10.3389/fmolb.2022.973955. PubMed PMID: 36060264; PMCID: PMC9438896.

      (12) Miao L, Zhuo Z, Tang J, Huang X, Liu J, Wang HY, Xia H, He J. FABP4 deactivates NF-kappaB-IL1alpha pathway by ubiquitinating ATPB in tumor-associated macrophages and promotes neuroblastoma progression. Clin Transl Med. 2021;11(4):e395. doi: 10.1002/ctm2.395. PubMed PMID: 33931964; PMCID: PMC8087928.

      (13) Yang J, Liu S, Li Y, Fan Z, Meng Y, Zhou B, Zhang G, Zhan H. FABP4 in macrophages facilitates obesity-associated pancreatic cancer progression via the NLRP3/IL-1beta axis. Cancer Lett. 2023;575:216403. Epub 20230921. doi: 10.1016/j.canlet.2023.216403. PubMed PMID: 37741433.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This study sought to reveal the potential roles of m6A RNA methylation in gene dosage regulatory mechanisms, particularly in the context of aneuploid genomes in Drosophila. Specifically, this work looked at the relationships between the expression of m6A regulatory factors, RNA methylation status, classical and inverse dosage effects, and dosage compensation. Using RNA sequencing and m6A mapping experiments, an in-depth analysis was performed to reveal changes in m6A status and expression changes across multiple aneuploid Drosophila models. The authors propose that m6A methylation regulates MOF and, in turn, deposition of H4K16Ac, critical regulators of gene dosage in the context of genomic imbalance.

      Strengths:

      This study seeks to address an interesting question with respect to gene dosage regulation and the possible roles of m6A in that process. Previous work has linked m6A to X-inactivation in humans through the Xist lncRNA, and to the regulation of the Sxl in flies. This study seeks to broaden that understanding beyond these specific contexts to more broadly understand how m6A impacts imbalanced genomes in other contexts.

      Weaknesses:

      The methods being used particularly for analysis of m6A at both the bulk and transcript-specific level are not sufficiently specific or quantitative to be able to confidently draw the conclusions the authors seek to make. MeRIP m6A mapping experiments can be very valuable, but differential methylation is difficult to assess when changes are small (as they often are, in this study but also m6A studies more broadly). For instance, based on the data presented and the methods described, it is not clear that the statement that "expression levels at m6A sites in aneuploidies are significantly higher than that in wildtype" is supported. MeRIP experiments are not quantitative, and since there are far fewer peaks in aneuploidies, it stands to reason that more antibody binding sites may be available to enrich those fewer peaks to a larger extent. But based on the data as presented (figure 2D) this conclusion was drawn from RPKM in IP samples, which may not fully account for changing transcript abundances in absolute (expression level changes) and relative (proportion of transcripts in input RNA sample) terms.

      Methylated RNA immunoprecipitation followed by sequencing (MeRIP-seq) is a commonly used strategy of genome-wide mapping of m6A modification. This method uses anti-m6A antibody to immunoprecipitate RNA fragments, which results in selective enrichment of methylated RNA. Then the RNA fragments were subjected to deep sequencing, and the regions enriched in the immunoprecipitate relative to input samples are identified as m6A peaks using the peak calling algorithm. We identified m6A peaks in different samples by the exomePeak2 program and determined common m6A peaks for each genotype based on the intersection of biological replicates. Figure 2D shows the RPM values of m6A peaks in MeRIP samples for each genotype, indicating that the levels of reads in the m6A peak regions were significantly higher in the aneuploid IP samples than in wildtypes. When the enrichment of IP samples relative to Input samples (RPM.IP/RPM.Input) was taken into account, the statistics for all three aneuploidies were still significantly higher than those of the wildtypes (Mann Whitney U test p-values < 0.001). This analysis is not about changes in the abundance of transcripts, but from the MeRIP perspective, showing that there are relatively more m6A-modified reads mapped to the m6A peaks in aneuploidies than that in wildtypes. In addition, we have added the results of IP/Input in the main text, and revised the description in the manuscript to make it more precise to reduce possible misunderstandings.

      The bulk-level m6A measurements as performed here also cannot effectively support these conclusions, as they are measured in total RNA. The focus of the work is mRNA m6A regulators, but m6A levels measured from total RNA samples will not reflect mRNA m6A levels as there are other abundance RNAs that contain m6A (including rRNA). As a result, conclusions about mRNA m6A levels from these measurements are not supported.

      According to some published articles, m6A levels of purified mRNA or total RNA can be detected by different methods (such as mass spectrometry, 2D thin-layer chromatography, etc.) in Drosophila cells or tissues [1-3].

      Here, we used the EpiQuik m6A RNA Methylation Quantification Kit (Colorimetric) (Epigentek, NY, USA, Cat # P-9005), which is suitable for detecting m6A methylation status directly using total RNA isolated from any species such as mammals, plants, fungi, bacteria, and viruses. This kit has previously been used by researchers to detect the m6A/A ratio in total RNA [4, 5] or purified mRNA [6] from different species.

      In order to compare the m6A levels between the total RNA and mRNA, it was shown that the enrichment of mRNA from total RNA using Dynabeads™mRNA Purification Kit (Invitrogen Cat # 61006) did not show any significantly differences comparing with the results of total RNA (Figure 1). That’s the reason why most of the results of m6A levels in the manuscript were detected in total RNA.

      Author response image 1.

      The m6A levels of total RNA and mRNA

      As suggested, we will try to extract and purify mRNA from different genotypes to verify our conclusion based on the m6A levels of total RNA if necessary. In addition, m6A modification in other types of RNA other than mRNA (e.g., lncRNA, rRNA) is not necessarily meaningless. We will also add discussions of this issue in the manuscript.

      (1) Lence T, et al. (2016) m6A modulates neuronal functions and sex determination in Drosophila. Nature 540(7632):242-247.

      (2) Haussmann IU, et al. (2016) m(6)A potentiates Sxl alternative pre-mRNA splicing for robust Drosophila sex determination. Nature 540(7632):301-304.

      (3) Kan L, et al. (2017) The m(6)A pathway facilitates sex determination in Drosophila. Nat Commun 8:15737.

      (4) Zhu C, et al. (2023) RNA Methylome Reveals the m(6)A-mediated Regulation of Flavor Metabolites in Tea Leaves under Solar-withering. Genomics Proteomics Bioinformatics 21(4):769-787.

      (5) Song H, et al. (2021) METTL3-mediated m(6)A RNA methylation promotes the anti-tumour immunity of natural killer cells. Nat Commun 12(1):5522.

      (6) Yin H, et al. (2021) RNA m6A methylation orchestrates cancer growth and metastasis via macrophage reprogramming. Nat Commun 12(1):1394.

      Reviewer #2 (Public Review):

      Summary:

      The authors have tested the effects of partial- or whole-chromosome aneuploidy on the m6A RNA modification in Drosophila. The data reveal that overall m6A levels trend up but that the number of sites found by meRIP-seq trend down, which seems to suggest that aneuploidy causes a subset of sites to become hyper-methylated. Subsequent bioinformatic analysis of other published datasets establish correlations between the activity of the H4K16 acetyltransferase dosage compensation complex (DCC) and the expression of m6A components and m6A abundance, suggesting that DCC and m6A can act in a feedback loop on each other. Overall, this paper uses bioinformatic trends to generate a candidate model of feedback between DCC and m6A. It would be improved by functional studies that validate the effect in vivo.

      Strengths:

      • Thorough bioinformatic analysis of their data.

      • Incorporation of other published datasets that enhance scope and rigor.

      • Finds trends that suggest that a chromosome counting mechanism can control m6A, as fits with pub data that the Sxl mRNA is m6A modified in XX females and not XY males.

      • Suggests this counting mechanism may be due to the effect of chromatin-dependent effects on the expression of m6A components.

      Weaknesses:

      • The linkage between H4K16 machinery and m6A is indirect and based on bioinformatic trends with little follow-up to test the mechanistic bases of these trends.

      We found a set of ChIP-seq data (GSE109901) of H4K16ac in female and male Drosophila larvae from the public database, and analyzed whether H4K16ac is directly associated with m6A regulator genes. ChIP-seq is a standard method to study transcription factor binding and histone modification by using efficient and specific antibodies for immunoprecipitation. The results showed that there were H4K16ac peaks at the 5' region in gene of m6A reader Ythdc1 in both males and females. In addition, most of the genome sites where the other m6A regulator genes located are acetylated at H4K16 in both sexes, except that Ime4 shows sexual dimorphism and only contains H4K16ac peak in females. These results indicate that the m6A regulator gene itself is acetylated at H4K16, so there is a direct relationship between H4K16ac and m6A regulators. We have added these contents to the text.

      Besides the above conclusion from the seq data, we are also going to do some experiments to test the linkage between H4K16 and m6A in the next, such as how about the m6A levels when MOF is over expressed with the increased levels of H4K16Ac, the H4K16 levels when YT521B is knocked down or over expressed and the relative expression levels of important regulatory genes in there.

      • The paper lacks sufficient in vivo validation of the effects of DCC alleles on m6A and vice versa. For example, Is the Ythdc1 genomic locus a direct target of the DCC component Msl-2 ? (see Figure 7).

      In order to study whether Ythdc1 genomic locus is a direct target of DCC component, we first analyzed a published MSL2 ChIP-seq data of Drosophila (GSE58768). Since MSL2 is only expressed in males under normal conditions, this set of data is from male Drosophila. According to the results, the majority (99.1%) of MSL2 peaks are located on the X chromosome, while the MSL2 peaks on other chromosomes are few. This is consistent with the fact that MSL2 is enriched on the X chromosome in male Drosophila [1, 2]. Ythdc1 gene is located on chromosome 3L, and there is no MSL2 peak near it. Similarly, other m6A regulator genes are not X-linked, and there is no MSL2 peak. Then we analyzed the MOF ChIP-seq data (GSE58768) of male Drosophila. It was found that 61.6% of MOF peaks were located on the X chromosome, which was also expected [3, 4]. Although there are more MOF peaks on autosomes than MSL2 peaks, MOF peaks are absent on m6A regulator genes on autosomes. Therefore, at present, there is no evidence that the gene locus of m6A regulators are the direct targets of DCC component MSL2 and MOF, which may be due to the fact that most MSL2 and MOF are tethered to the X chromosome by MSL complex under physiological conditions. Whether there are other direct or indirect interactions between Ythdc1 and MSL2 is an issue worthy of further study in the future.

      (1) Bashaw GJ & Baker BS (1995) The msl-2 dosage compensation gene of Drosophila encodes a putative DNA-binding protein whose expression is sex specifically regulated by Sex-lethal. Development 121(10):3245-3258.

      (2) Kelley RL, et al. (1995) Expression of msl-2 causes assembly of dosage compensation regulators on the X chromosomes and female lethality in Drosophila. Cell 81(6):867-877.

      (3) Kind J, et al. (2008) Genome-wide analysis reveals MOF as a key regulator of dosage compensation and gene expression in Drosophila. Cell 133(5):813-828.

      (4) Conrad T, et al. (2012) The MOF chromobarrel domain controls genome-wide H4K16 acetylation and spreading of the MSL complex. Dev Cell 22(3):610-624.

      Quite a bit of technical detail is omitted from the main text, making it difficult for the reader to interpret outcomes.

      (1) Please add the tissues to the labels in Figure 1D.

      Figure 1D shows the subcellular localization of FISH probe signals in Drosophila embryos. Arrowheads indicate the foci of probe signals. The corresponding tissue types are (1) blastoderm nuclei; (2) yolk plasm and pole cells; (3) brain and midgut; (4) salivary gland and midgut; (5) blastoderm nuclei and yolk cortex; (6) blastoderm nuclei and pole cells; (7) blastoderm nuclei and yolk cortex; (8) germ band. We have added these to the manuscript.

      (2) In the main text, please provide detail on the source tissues used for meRIP; was it whole larvae? adult heads? Most published datasets are from S2 cells or adult heads and comparing m6A across tissues and developmental stages could introduce quite a bit of variability, even in wt samples. This issue seems to be what the authors discuss in lines 197-199.

      In this article, the material used to perform MeRIP-seq was the whole third instar larvae. Because trisomy 2L and metafemale Drosophila died before developing into adults, it was not possible to use the heads of adults for MeRIP-seq detection of aneuploidy. For other experiments described here, the m6A abundance was measured using whole larvae or adult heads; material used for RT-qPCR analysis was whole larvae, larval brains, or adult heads; Drosophila embryos at different developmental stages were used for fluorescence in situ hybridization (FISH) experiments. We provide a detailed description of the experimental material for each assay in the manuscript.

      (3) In the main text, please identify the technique used to measure "total m6A/A" in Fig 2A. I assume it is mass spec.

      We used the EpiQuik m6A RNA Methylation Quantification Kit (Colorimetric) (Epigentek, NY, USA, Cat # P-9005) to measure the m6A/A ratio in RNA samples. This kit is commercially available for quantification of m6A RNA methylation, which used colorimetric assay with easy-to-follow steps for convenience and speed, and is suitable for detecting m6A methylation status directly using total RNA isolated from any species such as mammals, plants, fungi, bacteria, and viruses.

      (4) Line 190-191: the text describes annotating m6A sites by "nearest gene" which is confusing. The sites are mapped in RNAs, so the authors must unambiguously know the identity of the gene/transcript, right?

      When the m6A peaks were annotated using the R package ChIPseeker, it will include two items: "genomic annotation" and "nearest gene annotation". "Genomic annotation" tells us which genomic features the peak is annotated to, such as 5’UTR, 3’UTR, exon, etc. "Nearest gene annotation" indicates which specific gene/transcript the peak is matched to. We modified the description in the main text to make it easier to understand.

    1. Author response:

      eLife Assessment

      This valuable study presents a theoretical model of how punctuated mutations influence multistep adaptation, supported by empirical evidence from some TCGA cancer cohorts. This solid model is noteworthy for cancer researchers as it points to the case for possible punctuated evolution rather than gradual genomic change. However, the parametrization and systematic evaluation of the theoretical framework in the context of tumor evolution remain incomplete, and alternative explanations for the empirical observations are still plausible.

      We thank the editor and the reviewers for their thorough engagement with our work. The reviewers’ comments have drawn our attention to several important points that we have addressed in the updated version. We believe that these modifications have substantially improved our paper.

      There were two major themes in the reviewers’ suggestions for improvement. The first was that we should demonstrate more concretely how the results in the theoretical/stylized modelling parts of our paper quantitatively relate to dynamics in cancer.

      To this end, we have now included a comprehensive quantification of the effect sizes of our results across large and biologically-relevant parameter ranges. Specifically, following reviewer 1’s suggestion to give more prominence to the branching process, we have added two figures (Fig S3-S4) quantifying the likelihood of multi-step adaptation in a branching process for a large range of mutation rates and birth-death ratios. Formulating our results in terms of birth-death ratios also allowed us to provide better intuition regarding how our results manifest in models with constant population size vs models of growing populations. In particular, the added figure (Fig S3) highlights that the effect size of temporal clustering on the probability of successful 2-step adaptation is very sensitive to the probability that the lineage of the first mutant would go extinct if it did not acquire a second mutation. As a result, the phenomenon we describe is biologically likely to be most effective in those phases during tumor evolution in which tumor growth is constrained. This important pattern had not been described sufficiently clearly in the initial version of our manuscript, and we thank both reviewers for their suggestions to make these improvements.

      The second major theme in the reviewers’ suggestions was focused on how we relate our theoretical findings to readouts in genomic data, with both reviewers pointing to potential alternative explanations for the empirical patterns we describe.

      We have now extended our empirical analyses following some of the reviewers’ suggestions. Specifically, we have included analyses investigating how the contribution of reactive oxygen species (ROS)-related mutation signatures correlates with our proxies for multi-step adaptation; and we have included robustness checks in which we use Spearman instead of Pearson correlations. Moreover, we have included more discussion on potential confounds and the assumptions going into our empirical analyses as well as the challenges in empirically identifying the phenomena we describe.

      Below, we respond in detail to the individual comments made by each reviewer.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Grasper et al. present a combined analysis of the role of temporal mutagenesis in cancer, which includes both theoretical investigation and empirical analysis of point mutations in TCGA cancer patient cohorts. They find that temporally elevated mutation rates contribute to cancer fitness by allowing fast adaptation when the fitness drops (due to previous deleterious mutations). This may be relevant in the case of tumor suppressor genes (TSG), which follow the 2-hit hypothesis (i.e., biallelic 2 mutations are necessary to deactivate TS), and in cases where temporal mutagenesis occurs (e.g., high APOBEC, ROS). They provide evidence that this scenario is likely to occur in patients with some cancer types. This is an interesting and potentially important result that merits the attention of the target audience. Nonetheless, I have some questions (detailed below) regarding the design of the study, the tools and parametrization of the theoretical analysis, and the empirical analysis, which I think, if addressed, would make the paper more solid and the conclusion more substantiated.

      Strengths:

      Combined theoretical investigation with empirical analysis of cancer patients.

      Weaknesses:

      Parametrization and systematic investigation of theoretical tools and their relevance to tumor evolution.

      We sincerely thank Reviewer 1 for their comments. As communicated in more detail in the point-by-point replies to the “Recommendations for the authors”, we have revised the paper to address these comments in various ways. To summarize, Reviewer 1 asked for (1) more comprehensive analyses of the parameter space, especially in ranges of small fitness effects and low mutation rates; (2) additional clarifications on details of mechanisms described in the manuscript; and (3) suggested further robustness checks to our empirical analyses. We have addressed these points as follows: we have added detailed analyses of dynamics and effect sizes for branching processes (see Sections SI2 and SI3 in the Supplementary Information, as well as Figures S3 and S4). As suggested, these additions provide characterizations of effect sizes in biologically relevant parameter ranges (low mutation rates and smaller fitness effect sizes), and extend our descriptions to processes with dynamically changing population sizes. Moreover, we have added further clarifications at suggested points in the manuscript, e.g. to elaborate on the non-monotonicities in Fig 3. Lastly, we have undertaken robustness checks using Spearman rather than Pearson correlation coefficients to quantify relations between TSG deactivation and APOBEC signature contribution, and have performed analyses investigating dynamics of reactive oxygen species-associated mutagenesis instead of APOBEC.

      Reviewer #2 (Public review):

      This work presents theoretical results concerning the effect of punctuated mutation on multistep adaptation and empirical evidence for that effect in cancer. The empirical results seem to agree with the theoretical predictions. However, it is not clear how strong the effect should be on theoretical grounds, and there are other plausible explanations for the empirical observations.

      Thank you very much for these comments. We have now substantially expanded our investigations of the parameter space as outlined in the response to the “eLife Assessment” above and in the detailed comments below (A(1)-A(3)) to convey more quantitative intuition for the magnitude of the effects we describe for different phases of tumor evolution. We agree that there could be potential additional confounders to our empirical investigations besides the challenges regarding quantification that we already described in our initial version of the manuscript. We have thus included further discussion of these in our manuscript (see replies to B(1)-B(3)), and we have expanded our empirical analyses as outlined in the response to the “eLife Assessment”.

      For various reasons, the effect of punctuated mutation may be weaker than suggested by the theoretical and empirical analyses:

      (A1) The effect of punctuated mutation is much stronger when the first mutation of a two-step adaptation is deleterious (Figure 2). For double inactivation of a TSG, the first mutation--inactivation of one copy--would be expected to be neutral or slightly advantageous. The simulations depicted in Figure 4, which are supposed to demonstrate the expected effect for TSGs, assume that the first mutation is quite deleterious. This assumption seems inappropriate for TSGs, and perhaps the other synergistic pairs considered, and exaggerates the expected effects.

      Thank you for highlighting this discrepancy between Figure 2 and Figure 4. For computational efficiency and for illustration purposes, we had opted for high mutation rates and large fitness effects in Figure 2; however, our results are valid even in the setting of lower mutation rates and fitness effects. To improve the connection to Figure 4, and to address other related comments regarding parameter dependencies, we have now added more detailed quantification of the effects we describe (Figures SF3 and SF4) to the revised manuscript. These additions show that the effects illustrated in Figure 2 retain large effect sizes when going to much lower mutation rates and much smaller fitness effects. Indeed, while under high mutation rates we only see the large relative effects if the first mutation is highly deleterious, these large effects become more universal when going to low mutation rates.

      In general, it is correct that the selective disadvantage (or advantage) conveyed by the first mutation affects the likelihood of successful 2-step adaptations. It is also correct that the magnitude of the ‘relative effect’ of temporal clustering on valley-crossing is highest if the lineage with only the first of the two mutations is vanishingly unlikely to produce a second mutant before going extinct. If the first mutation is strongly deleterious, the lineage of such a first mutant is likely to quickly go extinct – and therefore also more likely to do so before producing a second mutant.

      However, this likelihood of producing the second mutant is also low if the mutation rate is low. As our added figure (Figure SF3) illustrates, at low mutation rates appropriate for cancer cells, is insensitive to the magnitude of the fitness disadvantage for large parts of the parameter space. Especially in populations of constant size (approximated by a birth/death ratio of 1), the relative effects for first mutations that reduce the birth rate by 0.5 or by 0.05 are indistinguishable (Figure SF3f).

      Moreover, the absolute effect (f<sub>k</sub> - f<sub>1</sub>), as we discuss in the paper (Figures SF2 and SF3) is largest in regions of the parameter space in which the first mutant is not infinitesimally unlikely to produce a second mutant (and f<sub>k</sub>  and f<sub>1</sub> would be infinitesimally small), but rather in parameter regions in which this first mutant has a non-negligible chance to produce a second mutant. The absolute effect (f<sub>k</sub> - f<sub>1</sub>) therefore peaks around fitness-neutral first mutations. While the next comment (below) says that our empirical investigations more closely resemble comparisons of relative effects and not absolute effects, we would expect that the observations in our data come preferentially from multi-step adaptations with large absolute effect since the absolute effect is maximal when both f<sub>k</sub> and f<sub>1</sub> are relatively high.

      In summary, we believe Figure 2, while having exaggerated parameters for very defendable reasons, is not a misleading illustration of the general phenomenon or of its applicability in biological settings, as effect sizes remain large when moving to biologically realistic parameter ranges. To clarify this issue, we have largely rewritten the relevant paragraphs in the results section and have added two additional figures (Figures SF3 and SF4) as well as a section in the SI with detailed discussion (SI2).

      (A2) More generally, parameter values affect the magnitude of the effect. The authors note, for example, that the relative effect decreases with mutation rate. They suggest that the absolute effect, which increases, is more important, but the relative effect seems more relevant and is what is assessed empirically.

      Thank you for this comment. As noted in the replies to the above comments, we have now included extensive investigations of how sensitive effect sizes are to different parameter choices. We also apologize for insufficiently clearly communicating how the quantities in Figure 4 relate to the findings of our theoretical models.

      The challenge in relating our results to single-timepoint sequencing data is that we only observe the mutations that a tumor has acquired, but we do not directly observe the mutation rate histories that brought about these mutations. As an alternative readout, we therefore consider (through rough proxies: TSGs and APOBEC signatures) the amount of 2-step adaptations per acquired/retained mutation. While we unfortunately cannot control for the average mutation rate in a sample, we motivate using this “TSG-deactivation score” by the hypothesis that for any given mutation rate, we expect a positive relationship between the amount of temporal clustering and the amount of 2-step adaptations per acquired/retained mutation. This hypothesis follows directly from our theoretical model where it formally translates to the statement that for a fixed μ, f<sub>k</sub> is increasing in k.

      However, while both quantities f<sub>k</sub>/f<sub>1</sub> or f<sub>k</sub> - f<sub>1</sub> from our theoretical model relate to this hypothesis – both are increasing in k –, neither of them maps directly onto the formulation of our empirical hypothesis.

      We have now rewritten the relevant passages of the manuscript to more clearly convey our motivation for constructing our TSG deactivation score in this form (P. 4-6).

      (A3) Routes to inactivation of both copies of a TSG that are not accelerated by punctuation will dilute any effects of punctuation. An example is a single somatic mutation followed by loss of heterozygosity. Such mechanisms are not included in the theoretical analysis nor assessed empirically. If, for example, 90% of double inactivations were the result of such mechanisms with a constant mutation rate, a factor of two effect of punctuated mutagenesis would increase the overall rate by only 10%. Consideration of the rate of apparent inactivation of just one TSG copy and of deletion of both copies would shed some light on the importance of this consideration.

      This is a very good point, thank you. In our empirical analyses, the main motivation was to investigate whether we would observe patterns that are qualitatively consistent with our theoretical predictions, i.e. whether we would find positive associations between valley-crossing and temporal clustering. Our aim in the empirical analyses was not to provide a quantitative estimate of how strongly temporally clustered mutation processes affect mutation accumulation in human cancers. We hence restricted attention to only one mutation process which is well characterized to be temporally clustered (APOBEC mutagenesis) and to only one category of (epi)genomic changes (SNPs, in which APOBEC signatures are well characterized). Of course, such an analysis ignores that other mutation processes (e.g. LOH, copy number changes, methylation in promoter regions, etc.) may interact with the mechanisms that we consider in deactivating Tumor suppressor genes.

      We have now updated the text to include further discussion of this limitation and further elaboration to convey that our empirical analyses are not intended as a complete quantification of the effect of temporal clustering on mutagenesis in-vivo (P. 10,11).

      Several factors besides the effects of punctuated mutation might explain or contribute to the empirical observations:

      (B1) High APOBEC3 activity can select for inactivation of TSGs (references in Butler and Banday 2023, PMID 36978147). This selective force is another plausible explanation for the empirical observations.

      Thank you for making this point. We agree that increased APOBEC3 activity, or any other similar perturbation, can change the fitness effect that any further changes/perturbations to the cell would bring about. Our empirical analyses therefore rely on the assumption that there are no major confounding structural differences in selection pressures between tumors with different levels of APOBEC signature contributions. We have expanded our discussion section to elaborate on this potential limitation (P. 10-11).

      While the hypothesis that APOBEC3 activity selects for inactivation of TSGSs has been suggested, there remain other explanations. Either way, the ways in which selective pressures have been suggested to change would not interfere relevantly with the effects we describe. The paper cited in the comment argues that “high APOBEC3 activity may generate a selective pressure favoring” TSG mutations as “APOBEC creates a high [mutation] burden, so cells with impaired DNA damage response (DDR) due to tumor suppressor mutations are more likely to avert apoptosis and continue proliferating”. To motivate this reasoning, in the same passage, the authors cite a high prevalence of TP53 mutations across several cancer types with “high burden of APOBEC3-induced mutations”, but also note that “this trend could arise from higher APOBEC3 expression in p53-mutated tumors since p53 may suppress APOBEC3B transcription via p21 and DREAM proteins”.

      Translated to our theoretical framework, this reasoning builds on the idea that APOBEC3 activity increases the selective advantage of mutants with inactivation of both copies of a TSG. In contrast, the mechanism we describe acts by altering the chances of mutants with only one TSG allele inactivated to inactivate the second allele before going extinct. If homozygous inactivation of TSGs generally conveys relatively strong fitness advantages, lineages with homozygous inactivation would already be unlikely to go extinct. Further increasing the fitness advantage of such lineages would thus manifest mostly in a quicker spread of these lineages, rather than in changes in the chance that these lineages survive. In turn, such a change would have limited effect on the “rate” at which such 2-step adaptations occur, but would mostly affect the speed at which they fixate. It would be interesting to investigate these effects empirically by quantifying the speed of proliferation and chance of going extinct for lineages that newly acquired inactivating mutations in TSGs.

      Beyond this explicit mention of selection pressures, the cited paper also discusses high occurrences of mutations in TSGs in relation to APOBEC. These enrichments, however, are not uniquely explained by an APOBEC-driven change in selection pressures. Indeed, our analyses would also predict such enrichments.

      (B2) Without punctuation, the rate of multistep adaptation is expected to rise more than linearly with mutation rate. Thus, if APOBEC signatures are correlated with a high mutation rate due to the action of APOBEC, this alone could explain the correlation with TSG inactivation.

      Thank you for making this point. Indeed, an identifying assumption that we make is that average mutation rates are balanced between samples with a higher vs lower APOBEC signature contribution. We cannot cleanly test this assumption, as we only observe aggregate mutation counts but not mutation rates. However, the fact that we observe an enrichment for APOBEC-associated mutations among the set of TSG-inactivating mutations (see Figure 4F) would be consistent with APOBEC-mutations driving the correlations in Fig 4D, rather than just average mutation rates. We have now added a paragraph to our manuscript to discuss these points (P. 10-11).

      (B3) The nature of mutations caused by APOBEC might explain the results. Notably, one of the two APOBEC mutation signatures, SBS13, is particularly likely to produce nonsense mutations. The authors count both nonsense and missense mutations, but nonsense mutations are more likely to inactivate the gene, and hence to be selected.

      Thank you for making this point.  We have included it in our discussion of potential confounders/limitations in the revised manuscript (P. 10-11).

    1. Author response:

      Reviewer 1:

      Summary:

      This paper describes molecular dynamics simulations (MDS) of the dynamics of two T-cell receptors (TCRs) bound to the same major histocompatibility complex molecule loaded with the same peptide (pMHC). The two TCRs (A6 and B7) bind to the pMHC with similar affinity and kinetics, but employ different residue contacts. The main purpose of the study is to quantify via MDS the differences in the inter- and intra-molecular motions of these complexes, with a specific focus on what the authors describe as catch-bond behavior between the TCRs and pMHC, which could explain how T-cells can discriminate between different peptides in the presence of weak separating force.

      Strengths:

      The authors present extensive simulation data that indicates that, in both complexes, the number of high-occupancy interdomain contacts initially increases with applied load, which is generally consistent with the authors’ conclusion that both complexes exhibit catch-bond behavior, although to different extents. In this way, the paper somewhat expands our understanding of peptide discrimination by T-cells.

      The reviewer makes thoughtful assessments of our manuscript. While our manuscript is meant to be a “short” contribution, our significant new finding is that even for TCRs targeting the same pMHC, having similar structures, and leading to similar functional outcomes in conventional assays, their response to applied load can be different. This supports out recent experimental work where TCRs targeting the same pMHC differed in their catch bond characteristics, and importantly, in their response to limiting copy numbers of pMHCs on the antigen-presenting cell (Akitsu et al., Sci. Adv., 2024; cited in our manuscript). Our present manuscript provides the physical basis where two similar TCRs respond to applied load differently. In the revised manuscript, we will make this point clearer.

      Weaknesses:

      While generally well supported by data, the conclusions would nevertheless benefit from a more concise presentation of information in the figures, as well as from suggesting experimentally testable predictions.

      Following the reviewers’ suggestions, we will update figures and use Figure Supplements to make the main figures more concise and to simplify the overall presentation.

      Regarding testable predictions, one prediction would be that B7 TCR will exhibit weaker catch bond behavior than A6. This is an important prediction because the two TCRs targeting the same pMHC have similar structures and are functionally similar in conventional assays. This prediction can be tested by single-molecule optical tweezers experiments. We also predict the A6 TCR may perform better when the number of pMHC molecules presented are limited, analogous to our recent experiments on different TCRs, Akitsu et al., Sci. Adv. (2024).

      Another testable prediction for the conservation of the basic allostery mechanism is to test the Cβ FG-loop deletion mutant located at the hinge region of the β chain, yet its deletion severely impairs the catch bond formation. These predictions will be mentioned and discussed in the updated manuscript.

      Reviewer 2:

      In this work, Chang-Gonzalez and coworkers follow up on an earlier study on the force-dependence of peptide recognition by a T-cell receptor using all-atom molecular dynamics simulations. In this study, they compare the results of pulling on a TCR-pMHC complex between two different TCRs with the same peptide. A goal of the paper is to determine whether the newly studied B7 TCR has the same load-dependent behavior mechanism shown in the earlier study for A6 TCR. The primary result is that while the unloaded interaction strength is similar, A6 exhibits more force stabilization.

      This is a detailed study, and establishing the difference between these two systems with and without applied force may establish them as a good reference setup for others who want to study mechanobiological processes if the data were made available, and could give additional molecular details for T-Cell-specialists. As written, the paper contains an overwhelming amount of details and it is difficult (for me) to ascertain which parts to focus on and which results point to the overall take-away messages they wish to convey.

      As mentioned above and as the reviewer correctly pointed out, the condensed appearance of this manuscript arose largely because we intended it to be a Research Advances article as a short follow up study of our previous paper on A6 TCR published in eLife. Most of the analysis scripts for the A6 TCR study are already available on Github. We will additionally deposit sample structures and simulation scripts for the B7 TCR. Trajectory will be provided upon request given their large size.

      Regarding the focus issue, it is in part due to the complex nature of the problem, which required simulations under different conditions and multi-faceted analyses. Concisely presenting the complex analyses also has been a challenge in our previous papers on TCR simulations (Hwang et al., PNAS 2020; Chang-Gonzalez et al., eLife, 2024 – both are cited in our manuscript). With updated figures and texts, we expect that the presentation will be a lot clearer. But even in the present form, the reviewer points out the main take-away message well: “The primary result is that while the unloaded interaction strength is similar, A6 exhibits more force stabilization.

      Detailed comments:

      (1) In Table 1 - are the values of the extension column the deviation from the average length at zero force (that is what I would term extension) or is it the distance between anchor points (which is what I would assume based on the large values. If the latter, I suggest changing the heading, and then also reporting the average extension with an asterisk indicating no extensional restraints were applied for B7-0, or just listing 0 load in the load column. Standard deviation in this value can also be reported. If it is an extension as I would define it, then I think B7-0 should indicate extension = 0+/- something.

      The distance between anchor points could also be labeled in Figure 1A.

      “Extension” is the distance between anchor points (blue spheres at the ends of the added strands in Fig. 1A). While its meaning should be clear in the section “Laddered extensions” in MD simulation protocol, at first glance it may lead to confusion. In a strict sense, use of “extension” for the distance is a misnomer, but we have used it in our previous two papers (Hwang et al., PNAS 2020; Chang-Gonzalez et al., eLife, 2024), so we prefer to keep it for consistency. Instead, in the caption of Table 1, we will explain its meaning, and also explicitly label it in Fig. 1A, as the reviewer suggested.

      Please also note that the no-load case B7<sup>0</sup> does not have a particular extension that yields zero load on average. It would in fact be very difficult to find such an extension (distance between two anchor points). To simulate the system without load, we separately built a TCR-pMHC complex without added linkers, and held the distal part of pMHC with weak harmonic restraints (explained in sections “Structure preparation” and “Systems without load”). In this way, no external force is applied to TCR as it moves relative to pMHC. We will clarify this when introducing B7<sup>0</sup> in the Results section.

      (2) As in the previous paper, the authors apply ”constant force” by scanning to find a particular bond distance at which a desired force is selected, rather than simply applying a constant force. I find this approach less desirable unless there is experimental evidence suggesting the pMHC and TCR were forced to be a particular distance apart when forces are applied. It is relatively trivial to apply constant forces, so in general, I would suggest this would have been a reasonable comparison. Line 243-245 speculates that there is a difference in catch bonding behavior that could be inferred because lower force occurs at larger extensions, but I do not believe this hypothesis can be fully justified and could be due to other differences in the complex.

      There is indeed experimental evidence that the TCR-pMHC complex operates under constant separation. The spacing between a T-cell and an antigen-presenting cell is maintained by adhesion molecules such as the CD2CD58 pair, as explained in our paper on the A6 TCR, (Chang-Gonzalez et al., eLife, 2024; please see the bottom paragraph on page 4 of the paper). In in vitro single-molecule experiments, pulling to a fixed separation and holding is also commonly done. Detailed comparison between constant extension vs. constant force simulations is definitely a subject of our future study. We will clarify these points when explaining about the constant extension (or separation).

      Regarding line 243–245, we agree with the reviewer that without further tests, lower forces at larger extensions per se cannot be an indicator that B7 forms a weaker catch bond. But with additional insight, it does have an indirect relevance. In addition to fewer TCR-pMHC contacts (Fig. 1C of our manuscript), the intra-TCR contacts are also reduced compared to those of A6 (Fig. 1D vs. Chang-Gonzalez et al., eLife, 2024, Fig. 8A,B, first column; reproduced in the figure in our response to reviewer 3 below). This shows that the B7 TCR forms a looser complex with pMHC compared to A6. With its higher compliance, the B7 TCR-pMHC complex needs to be under a greater extension than A6 to apply comparable levels of force, and it would be more difficult to achieve load-induced stabilization of the TCR-pMHC interface, hence a weaker catch bond. We will add this point when explaining the weaker catch bond behavior of B7.

      (3) On a related note, the authors do not refer to or consider other works using MD to study force-stabilized interactions (e.g. for catch bonding systems), e.g. these cases where constant force is applied and enhanced sampling techniques are used to assess the impact of that applied force: https://www.cell.com/biophysj/fulltext/S0006-3495(23)00341-7, https://www.biorxiv.org/content/10.1101/2024.10.10.617580v1. I was also surprised not to see this paper on catch bonding in pMHC-TCR referred to, which also includes some MD simulations: https://www.nature.com/articles/s41467-023-38267-1

      We thank the reviewer for bringing the three papers to our attention, which are:

      (1) Languin-Cattoën, Sterpone, and Stirnemann, Biophys. J. 122:2744 (2023): About bacterial adhesion protein FimH.

      (2) Peña Ccoa, et al., bioRxiv (2024): About actin binding protein vinculin.

      (3) Choi et al., Nat. Comm. 14:2616 (2023): About a mathematical model of the TCR catch bond.

      Catch bond mechanisms of FimH and vinculin are different from that of TCR in that FimH and vinculin have relatively well-defined weak- and strong-binding states where there are corresponding crystal structures. Availability of the end-state structures enable using simulation approaches such as enhanced sampling of individual states and studying the transition between the two states. In contrast, TCR does not have any structurally well-defined weakor strong-binding states, which requires a different approach. As demonstrated in our current manuscript as well as in our previous two papers (Hwang et al., PNAS 2020; Chang-Gonzalez et al., eLife, 2024), our microsecond-long simulations of the complex under realistic pN-level loads and a combination of analysis methods are effective for elucidating the catch bond mechanism of TCR. In the revised manuscript, we will cite the two papers, to compare the TCR catch bond mechanism with those of FimH and vinculin, which will offer a broader perspective.

      The third paper (Choi, 2023) proposes a mathematical model to analyze extensive sets of data, and also perform new experiments and additional simulations. Of note, their model assumptions are based mainly on the steered MD (SMD) simulation in their previous paper (Wu, et al., Mol. Cell. 73:1015, 2019). In their model, formation of a catch bond (called catch-slip bond in Choi’s paper) requires partial unfolding of MHC and tilting of the TCR-pMHC interface. While further studies are needed to find whether those changes are indeed required, even so, the question remains regarding how the complex in the fully folded state can bear load and enter such a state in the first place. Our current and previous simulation studies suggest a mechanism by which ligand- and load-dependent responses occur as the first obligatory step of catch bond formation, after which partial unfolding and/or extensive conformational transitions may occur, as described in our recent paper (Akitsu et al., Sci. Adv., 2024). In the revised manuscript, we will cite Wu’s paper and briefly explain the above.

      (4) The authors should make at least the input files for their system available in a public place (github, zenodo) so that the systems are a more useful reference system as mentioned above. The authors do not have a data availability statement, which I believe is required.

      As mentioned above, we will make sample input files and coordinates available on Github. Data availability statement will be added.

      Reviewer 3:

      Summary:

      The paper by Chang-Gonzalez et al. is a molecular dynamics (MD) simulation study of the dynamic recognition (load-induced catch bond) by the T cell receptor (TCR) of the complex of peptide antigen (p) and the major histocompatibility complex (pMHC) protein. The methods and simulation protocols are essentially identical to those employed in a previous study by the same group (Chang-Gonzalez et al., eLife 2024). In the current manuscript, the authors compare the binding of the same pMHC to two different TCRs, B7 and A6 which was investigated in the previous paper. While the binding is more stable for both TCRs under load (of about 10-15 pN) than in the absence of load, the main difference is that, with the current MD sampling, B7 shows a smaller amount of stable contacts with the pMHC than A6.

      Strengths:

      The topic is interesting because of the (potential) relevance of mechanosensing in biological processes including cellular immunology.

      Weaknesses:

      The study is incomplete because the claims are based on a single 1000-ns simulation at each value of the load and thus some of the results might be marred by insufficient sampling, i.e., statistical error. After the first 600 ns, the higher load of B7high than B7low is due mainly to the simulation segment from about 900 ns to 1000 ns (Figure 1D). Thus, the difference in the average value of the load is within their standard deviation (9 +/- 4 pN for B7low and 14.5 +/- 7.2 for B7high, Table 1). Even more strikingly, Figure 3E shows a lack of convergence in the time series of the distance between the V-module and pMHC, particularly for B70 (left panel, yellow) and B7low (right panel, orange). More and longer simulations are required to obtain a statistically relevant sampling of the relative position and orientation of the V-module and pMHC.

      The reviewer uses data points during the last 100 ns to raise an issue with sampling. But since we are using realistic pN range forces, force fluctuates more slowly. In fact, in our simulation of B7<sup>high</sup>, while the force peaks near 35 pN at 500 ns (Fig. 1D of our manuscript; reproduced as panels C and D below), the contact heat map shows no noticeable changes around 500 ns (Fig. 2C of our manuscript). Thus, a wider time window must be considered rather than focusing on instantaneous force.

      We believe the reviewer’s concern about sampling arose also due to a lack of clear explanation. Author response image 1 below contains panels from our earlier eLife paper on the A6 TCR. Panels A and B are from Fig. 8 of the A6 paper, and panels C and D are from Fig. 1D of our present manuscript. The high-load simulations in both cases (outlined circles) fluctuate widely in force so that one might argue that sampling was insufficient. However, unless one is interested in finding the precise value of force for a given extension, sampling in our simulations was reasonable enough to distinguish between high- and low-force behaviors. To support this, we show panel E below, which is from Appendix 3–Fig. 1 of our A6 paper. Added to this panel are the average forces and standard deviations of B7<sup>low</sup> and B7<sup>high</sup> from Table 1 of our manuscript (red squares). Please note that all of the data were measured after 500 ns. Except for Y8A<sup>low</sup> and dFG<sup>low</sup> of A6 (explained below), all of the data points lie on nearly a straight line.

      Author response image 1.

      Thermodynamically, the force and position of the restraint (blue spheres in Fig. 1A of our manuscript) form a pair of generalized force and the corresponding spatial variable in equilibrium at temperature 300 K, which is akin to the pressure P and volume V of an ideal gas. If V is fixed, P fluctuates. Denoting the average and std of pressure as ⟨P⟩ and ∆P, respectively, Burgess showed that ∆P/P⟩ is a constant (Eq. 5 of Burgess, Phys. Lett. A, 44:37; 1973). In the case of the TCRαβ-pMHC system, although individual atoms are not ideal gases, since their motion leads to the fluctuation in force on the restraints, the situation is analogous to the case where pressure arises from individual ideal gas molecules hitting the confining wall as the restraint. Thus, the near-linear behavior in panel E above is a consequence of the system being many-bodied and at constant temperature. The linearity is also an indirect indicator that sampling of force was reasonable. The fact that A6 and B7 data show a common linear profile further demonstrates the consistency in our force measurement. That said, the B7 data points (red in panel E) are elevated slightly above nearby A6 data points. This is consistent with B7 forming an overall weaker complex, both at the TCR-pMHC interface (panels A vs. C) and within intra-TCR interfaces (panels B vs. D), which can be seen by the wider ranges of color bars in panels A and B for A6 compared to panels C and D for B7.

      About the two outliers of A6, Y8A<sup>low</sup> is for an antagonist peptide and dFG<sup>low</sup> is the Cβ FG-loop deletion mutant. Interestingly, both cases had reduced numbers of contacts with pMHC, which likely caused a wider conformational motion, hence greater fluctuation in force.

      A similar argument applies to Fig. 3E of our manuscript. If precise values of the V-module to pMHC distance were needed, longer or duplicate simulations would be necessary, however, Fig. 3E as it currently stands clearly shows that B7<sup>high</sup> maintains more stable interface compared to B7<sup>low</sup>, which is consistent with all other measures we used, such as Fig. 3B (Hamming distance), Fig. 3C (buried surface area), and Fig. 4A–E (Vα-Vβ motion and CDR3 distance). They are also consistent with our simulations of A6.

      Thus, rather than relying on peculiarities of individual trajectories, we analyze data in multiple ways and draw conclusions based on features that are consistent across different simulations. Please also note that reviewer 1 mentioned that our conclusions are “generally well supported by data.”

      We will update our manuscript to concisely explain the above and also will add Panel E above as a supplement of Fig. 1.

      It is not clear why ”a 10 A distance restraint between alphaT218 and betaA259 was applied” (section MD simulation protocol, page 9).

      αT218 and βA259 are the residues attached to a leucine-zipper handle in in vitro optical trap experiments (Das, et al., PNAS 2015). In T cells, those residues also connect to transmembrane helices. Author response image 2 is a model of N15 TCR used in experiments in Das’ paper, constructed based on PDB 1NFD. Blue spheres represent Cα atoms corresponding to αT218 and βA259 of B7 TCR. Their distance is 6.7 ˚A. The 10-˚A distance restraint in simulation was applied to mimic the presence of the leucine zipper that prevents excessive separation of the added strands. The distance restraint is a flat-bottom harmonic potential which is activated only when the distance between the two atoms exceeds 10 ˚A, which we did not clarify in our original manuscript. The same restraint was used in our previous studies on JM22 and A6 TCRs.

      We will add the figure as a supplement of Fig. 1, cite Das’ paper, and also update description of the distance restraint in the MD simulation protocol section.

      Author response image 2.

    1. Author response:

      Public Reviews:

      We thank the reviewers for their overall positive assessments and constructive feedback

      Reviewer #1 (Public Review):

      Summary:

      The study explored the biomechanics of kangaroo hopping across both speed and animal size to try and explain the unique and remarkable energetics of kangaroo locomotion.

      Strengths:

      The study brings kangaroo locomotion biomechanics into the 21st century. It is a remarkably difficult project to accomplish. There is excellent attention to detail, supported by clear writing and figures.

      Weaknesses:

      The authors oversell their findings, but the mystery still persists.

      The manuscript lacks a big-picture summary with pointers to how one might resolve the big question.

      General Comments

      This is a very impressive tour de force by an all-star collaborative team of researchers. The study represents a tremendous leap forward (pun intended) in terms of our understanding of kangaroo locomotion. Some might wonder why such an unusual species is of much interest. But, in my opinion, the classic study by Dawson and Taylor in 1973 of kangaroos launched the modern era of running biomechanics/energetics and applies to varying degrees to all animals that use bouncing gaits (running, trotting, galloping and of course hopping). The puzzling metabolic energetics findings of Dawson & Taylor (little if any increase in metabolic power despite increasing forward speed) remain a giant unsolved problem in comparative locomotor biomechanics and energetics. It is our "dark matter problem".

      Thank you for the kind words

      This study is certainly a hop towards solving the problem. But, the title of the paper overpromises and the authors present little attempt to provide an overview of the remaining big issues.

      We will modify the title to reflect this comment.  

      The study clearly shows that the ankle and to a lesser extent the mtp joint are where the action is. They clearly show in great detail by how much and by what means the ankle joint tendons experience increased stress at faster forward speeds.

      Since these were zoo animals, direct measures were not feasible, but the conclusion that the tendons are storing and returning more elastic energy per hop at faster speeds is solid.

      The conclusion that net muscle work per hop changes little from slow to fast forward speeds is also solid.

      Doing less muscle work can only be good if one is trying to minimize metabolic energy consumption. However, to achieve greater tendon stresses, there must be greater muscle forces. Unless one is willing to reject the premise of the cost of generating force hypothesis, that is an important issue to confront.

      Further, the present data support the Kram & Dawson finding of decreased contact times at faster forward speeds. Kram & Taylor and subsequent applications of (and challenges to) their approach supports the idea that shorter contact times (tc) require recruiting more expensive muscle fibers and hence greater metabolic costs. Therefore, I think that it is incumbent on the present authors to clarify that this study has still not tied up the metabolic energetics across speed problems and placed a bow atop the package.

      Fortunately, I am confident that the impressive collective brain power that comprises this author list can craft a paragraph or two that summarizes these ideas and points out how the group is now uniquely and enviably poised to explore the problem more using a dynamic SIMM model that incorporates muscle energetics (perhaps ala' Umberger et al.). Or perhaps they have other ideas about how they can really solve the problem.

      You have raised important points, thank you for this feedback. We will add a paragraph discussing the limitations of our study and ensure the revised manuscript makes it clear which mysteries remain. We intend to address muscle forces, contact time, and energetics in future work when we have implemented all hindlimb muscles within the musculoskeletal model.  

      I have a few issues with the other half of this study (i.e. animal size effects). I would enjoy reading a new paragraph by these authors in the Discussion that considers the evolutionary origins and implications of such small safety factors. Surely, it would need to be speculative, but that's OK.

      We will integrate this into the discussion.

      Reviewer #2 (Public Review):

      Summary

      This is a fascinating topic that has intrigued scientists for decades. I applaud the authors for trying to tackle this enigma. In this manuscript, the authors primarily measured hopping biomechanics data from kangaroos and performed inverse dynamics.

      While these biomechanical analyses were thorough and impressively incorporated collected anatomical data and an Opensim model, I'm afraid that they did not satisfactorily address how kangaroos can hop faster and not consume more metabolic energy, unique from other animals.

      Noticeably, the authors did not collect metabolic data nor did they model metabolic rates using their modelling framework. Instead, they performed a somewhat traditional inverse dynamics analysis from multiple animals hopping at a self-selected speed.

      We aimed to provide a joint-level explanation, but we will address the limitations of not modelling the energy consumers themselves (the skeletal muscles) in the revised manuscript. We plan to expand upon muscle level energetics in the future with a more detailed MSK model.

      Within these analyses, the authors largely focused on ankle EMA, discussing its potential importance (because it affects tendon stress, which affects tendon strain energy, which affects muscle mechanics) on the metabolic cost of hopping. However, EMA was roughly estimated (CoP was fixed to the foot, not measured)…

      As noted in our methods, EMA was not calculated from a fixed centre of pressure (CoP). We did fix the medial-lateral position, owing to the fact that both feet contacted the force plate together, but the anteroposterior movement of the CoP was recorded by the force plate and thus allowed to move. We report the movement (or lack of movement) in our results. The anterior-posterior axis is the most relevant to lengthening or shortening the distance of the ‘out-lever’ R, and thereby EMA.

      It is necessary to assume fixed medial-lateral position because a single force trace and CoP is recorded when two feet land on the force plate. The medial-lateral forces on each foot cancel out so there is no overall medial-lateral movement if the forces are symmetrical (e.g. if the kangaroo is hopping in a straight path and one foot is not in front of the other). We only used symmetrical trials so that the anterior-posterior movement of the CoP would be reliable.

      and did not detectibly associate with hopping speed (see results).

      Yet, the authors interpret their EMA findings as though it systematically related with speed to explain their theory on how metabolic cost is unique in kangaroos vs. other animals.

      Indeed, the relationship between R and speed (and therefore EMA and speed) was not significant. However, the significant change in ankle height with speed, combined with no systematic change in COP at midstance, demonstrates that R would get longer at faster speeds. If we consider the nonsignificant relationship between R and speed to indicate that there is no change in R, then these two results conflict. We could not find a flaw in our methods, so instead concluded that the nonsignificant relationship between R and speed may be due to a small change in R being undetectable in our data. Taking both results into account, we think it is more likely that there is a non-detectable change in R, rather than no change in R with speed, but we presented both results for transparency.

      These speed vs. biomechanics relationships were limited by comparisons across different animals hopping at different speeds and could have been strengthened using repeated measures design.

      There is significant variation in speed within individuals, not just between individuals. The preferred speed of kangaroos is 2-4.5 m/s, but most individuals show a wide range within this. Eight of our 16 kangaroos had a maximum speed that was between 1-2m/s faster than their slowest trial. Repeated measures of these eight individuals comprises 78 out of the 100 trials.

      It would be ideal to collect data across the full range of speeds for all individuals, but it is not feasible in this type of experimental setting. Interference such as chasing is dangerous to kangaroos as they are prone to strong adverse reactions to stress.

      There are also multiple inconsistencies between the authors' theory on how mechanics affect energetics and the cited literature, which leaves me somewhat confused and wanting more clarification and information on how mechanics and energetics relate.

      We will ensure that this is clearer in the revised manuscript.

      My apologies for the less-than-favorable review, I think that this is a neat biomechanics study - but am unsure if it adds much to the literature on the topic of kangaroo hopping energetics in its current form.

      Reviewer #3 (Public Review):

      Summary:

      The goal of this study is to understand how, unlike other mammals, kangaroos are able to increase hopping speed without a concomitant increase in metabolic cost. They use a biomechancial analysis of kangaroo hopping data across a range of speeds to investigate how posture, effective mechanical advantage, and tendon stress vary with speed and mass. The main finding is that a change in posture leads to increasing effective mechanical advantage with speed, which ultimately increases tendon elastic energy storage and returns via greater tendon strain. Thus kangaroos may be able to conserve energy with increasing speed by flexing more, which increases tendon strain.

      Strengths:

      The approach and effort invested into collecting this valuable dataset of kangaroo locomotion is impressive. The dataset alone is a valuable contribution.

      Thank you!

      Weaknesses:

      Despite these strengths, I have concerns regarding the strength of the results and the overall clarity of the paper and methods used (which likely influences how convincingly the main results come across).

      (1) The paper seems to hinge on the finding that EMA decreases with increasing speed and that this contributes significantly to greater tendon strain estimated with increasing speed. It is very difficult to be convinced by this result for a number of reasons:

      • It appears that kangaroos hopped at their preferred speed. Thus the variability observed is across individuals not within. Is this large enough of a range (either within or across subjects) to make conclusions about the effect of speed, without results being susceptible to differences between subjects?

      Apologies, this was not clear in the manuscript. Kangaroos hopping at their preferred speed means we did not chase or startle them into high speeds to comply with ethics and enclosure limitations. Thus we did not record a wide range of speed within the bounds of what kangaroos are capable of (up to 12 m/s), but for the range we did measure (~2-4.5 m/s), there is variation hopping speed within each individual kangaroo. Out of 16 individuals, eight individuals had a difference of 1-2m/s between their slowest and fastest trials, and these kangaroos accounted for 78 out of 100 trials. Of the remainder, six individuals had three for fewer trials each, and two individual had highly repeatable speeds (3 out of 4, and 6 out of 7 trials were within 0.5 m/s). We will ensure this is clear in the revised manuscript.

      In the literature cited, what was the range of speeds measured, and was it within or between subjects?

      For other literature, to our knowledge the highest speed measured is ~9.5m/s (see supplementary Fig1b) and there were multiple measures for several individuals (see methods Kram & Dawson 1998).

      • Assuming that there is a compelling relationship between EMA and velocity, how reasonable is it to extrapolate to the conclusion that this increases tendon strain and ultimately saves metabolic cost?

      They correlate EMA with tendon strain, but this would still not suggest a causal relationship (incidentally the p-value for the correlation is not reported).

      We will add supporting literature on the relationship between metabolic cost and tendon stress (or strain), to elaborate on why the correlation between EMA and stress is important.

      Tendon strain could be increasing with ground reaction force, independent of EMA.

      Even if there is a correlation between strain and EMA, is it not a mathematical necessity in their model that all else being equal, tendon stress will increase as ema decreases? I may be missing something, but nonetheless, it would be helpful for the authors to clarify the strength of the evidence supporting their conclusions.

      Yes, GRF also contributes to the increase in tendon stress in the mechanism we propose. We have illustrated this in Fig 6, however we will make this clearer in the revised discussion.

      • The statistical approach is not well-described. It is not clear what the form of the statistical model used was and whether the analysis treated each trial individually or grouped trials by the kangaroo. There is also no mention of how many trials per kangaroo, or the range of speeds (or masses) tested.

      The methods include the statistical model with the variables that we used, as well as the kangaroo masses (13.7 to 26.6 kg, mean: 20.9 ± 3.4 kg). We will move the range of speeds from the supplementary material to the results or figure captions. We will add information on the number of trials per kangaroo to the methods.

      We did not group the data e.g. by using an average speed per individual for all their trials, or by comparing fast to slow groups (this was for display purposes in our figures, which we will make clearer in the methods).

      Related to this, there is no mention of how different speeds were obtained. It seems that kangaroos hopped at a self-selected pace, thus it appears that not much variation was observed. I appreciate the difficulty of conducting these experiments in a controlled manner, but this doesn't exempt the authors from providing the details of their approach.

      • Some figures (Figure 2 for example) present means for one of three speeds, yet the speeds are not reported (except in the legend) nor how these bins were determined, nor how many trials or kangaroos fit in each bin. A similar comment applies to the mass categories. It would be more convincing if the authors plotted the main metrics vs. speed to illustrate the significant trends they are reporting.

      Thank you for this comment. The bins are used only for display purposes and not within the analysis. In the revised manuscript, we will ensure this is clear.

      (2) The significance of the effects of mass is not clear. The introduction and abstract suggest that the paper is focused on the effect of speed, yet the effects of mass are reported throughout as well, without a clear understanding of the significance. This weakness is further exaggerated by the fact that the details of the subject masses are not reported.

      Indeed, the primary aim of our study was to explore the influence of speed, given the uncoupling of energy from hopping speed in kangaroos. We included mass to ensure that the effects of speed were not driven by body mass (i.e.: that larger kangaroos hopped faster).  

      (3) The paper needs to be significantly re-written to better incorporate the methods into the results section. Since the results come before the methods, some of the methods must necessarily be described such that the study can be understood at some level without turning to the dedicated methods section. As written, it is very difficult to understand the basis of the approach, analysis, and metrics without turning to the methods.

      We agree, and in the revised manuscript will incorporate some of the methodological details within the results.

      Author response image 1.

    1. Auhtor response:

      Public Reviews:

      Reviewer #1 (Public review):

      The study analyzes the gastric fluid DNA content identified as a potential biomarker for human gastric cancer. However, the study lacks overall logicality, and several key issues require improvement and clarification. In the opinion of this reviewer, some major revisions are needed:

      (1) This manuscript lacks a comparison of gastric cancer patients' stages with PN and N+PD patients, especially T0-T2 patients.

      We are grateful for this astute remark. A comparison of gfDNA concentration among the diagnostic groups indicates a trend of increasing values as the diagnosis progresses toward malignancy. The observed values for the diagnostic groups are as follows:

      Author response table 1.

      The chart below presents the statistical analyses of the same diagnostic/tumor-stage groups (One-Way ANOVA followed by Tukey’s multiple comparison tests). It shows that gastric fluid gfDNA concentrations gradually increase with malignant progression. We observed that the initial tumor stages (T0 to T2) exhibit intermediate gfDNA levels, which in this group is significantly lower than in advanced disease (p = 0.0036), but not statistically different from non-neoplastic disease (p = 0.74).

      Author response image 1.

      (2) The comparison between gastric cancer stages seems only to reveal the difference between T3 patients and early-stage gastric cancer patients, which raises doubts about the authenticity of the previous differences between gastric cancer patients and normal patients, whether it is only due to the higher number of T3 patients.

      We appreciate the attention to detail regarding the numbers analyzed in the manuscript. Importantly, the results are meaningful because the number of subjects in each group is comparable (T0-T2, N = 65; T3, N = 91; T4, N = 63). The mean gastric fluid gfDNA values (ng/µL) increase with disease stage (T0-T2: 15.12; T3-T4: 30.75), and both are higher than the mean gfDNA values observed in non-neoplastic disease (10.81 ng/µL for N+PD and 10.10 ng/µL for PN). These subject numbers in each diagnostic group accurately reflect real-world data from a tertiary cancer center.

      (3) The prognosis evaluation is too simplistic, only considering staging factors, without taking into account other factors such as tumor pathology and the time from onset to tumor detection.

      Histopathological analyses were performed throughout the study not only for the initial diagnosis of tissue biopsies, but also for the classification of Lauren’s subtypes, tumor staging, and the assessment of the presence and extent of immune cell infiltrates. Regarding the time of disease onset, this variable is inherently unknown--by definition--at the time of a diagnostic EGD. While the prognosis definition is indeed straightforward, we believe that a simple, cost-effective, and practical approach is advantageous for patients across diverse clinical settings and is more likely to be effectively integrated into routine EGD practice.

      (4) The comparison between gfDNA and conventional pathological examination methods should be mentioned, reflecting advantages such as accuracy and patient comfort.

      We wish to reinforce that EGD, along with conventional histopathology, remains the gold standard for gastric cancer evaluation. EGD under sedation is routinely performed for diagnosis, and the collection of gastric fluids for gfDNA evaluation does not affect patient comfort. Thus, while gfDNA analysis was evidently not intended as a diagnostic EGD and biopsy replacement, it may provide added prognostic value to this exam.

      (5) There are many questions in the figures and tables. Please match the Title, Figure legends, Footnote, Alphabetic order, etc.

      We are grateful for these comments and apologize for the clerical oversight. All figures, tables, titles and figure legends have now been double-checked.

      (6) The overall logicality of the manuscript is not rigorous enough, with few discussion factors, and cannot represent the conclusions drawn.

      We assume that the unusual wording remark regarding “overall logicality” pertains to the rationale and/or reasoning of this investigational study. Our working hypothesis was that during neoplastic disease progression, tumor cells continuously proliferate and, depending on various factors, attract immune cell infiltrates. Consequently, both tumor cells and immune cells (as well as tumor-derived DNA) are released into the fluids surrounding the tumor at its various locations, including blood, urine, saliva, gastric fluids, and others. Thus, increases in DNA levels within some of these fluids have been documented and are clinically meaningful. The concurrent observation of elevated gastric fluid gfDNA levels and immune cell infiltration supports the hypothesis that increased gfDNA—which may originate not only from tumor cells but also from immune cells—could be associated with better prognosis, as suggested by this study of a large real-world patient cohort.

      In summary, we thank Reviewer #1 for his time and effort in a constructive critique of our work.

      Reviewer #2 (Public review):

      Summary:

      The authors investigated whether the total DNA concentration in gastric fluid (gfDNA), collected via routine esophagogastroduodenoscopy (EGD), could serve as a diagnostic and prognostic biomarker for gastric cancer. In a large patient cohort (initial n=1,056; analyzed n=941), they found that gfDNA levels were significantly higher in gastric cancer patients compared to non-cancer, gastritis, and precancerous lesion groups. Unexpectedly, higher gfDNA concentrations were also significantly associated with better survival prognosis and positively correlated with immune cell infiltration. The authors proposed that gfDNA may reflect both tumor burden and immune activity, potentially serving as a cost-effective and convenient liquid biopsy tool to assist in gastric cancer diagnosis, staging, and follow-up.

      Strengths:

      This study is supported by a robust sample size (n=941) with clear patient classification, enabling reliable statistical analysis. It employs a simple, low-threshold method for measuring total gfDNA, making it suitable for large-scale clinical use. Clinical confounders, including age, sex, BMI, gastric fluid pH, and PPI use, were systematically controlled. The findings demonstrate both diagnostic and prognostic value of gfDNA, as its concentration can help distinguish gastric cancer patients and correlates with tumor progression and survival. Additionally, preliminary mechanistic data reveal a significant association between elevated gfDNA levels and increased immune cell infiltration in tumors (p=0.001).

      Reviewer #2 has conceptually grasped the overall rationale of the study quite well, and we are grateful for their assessment and comprehensive summary of our findings.

      Weaknesses:

      (1) The study has several notable weaknesses. The association between high gfDNA levels and better survival contradicts conventional expectations and raises concerns about the biological interpretation of the findings.

      We agree that this would be the case if the gfDNA was derived solely from tumor cells. However, the findings presented here suggest that a fraction of this DNA would be indeed derived from infiltrating immune cells. The precise determination of the origin of this increased gfDNA remains to be achieved in future follow-up studies, and these are planned to be evaluated soon, by applying DNA- and RNA-sequencing methodologies and deconvolution analyses.

      (2) The diagnostic performance of gfDNA alone was only moderate, and the study did not explore potential improvements through combination with established biomarkers. Methodological limitations include a lack of control for pre-analytical variables, the absence of longitudinal data, and imbalanced group sizes, which may affect the robustness and generalizability of the results.

      Reviewer #2 is correct that this investigational study was not designed to assess the diagnostic potential of gfDNA. Instead, its primary contribution is to provide useful prognostic information. In this regard, we have not yet explored combining gfDNA with other clinically well-established diagnostic biomarkers. We do acknowledge this current limitation as a logical follow-up that must be investigated in the near future.

      Moreover, we collected a substantial number of pre-analytical variables within the limitations of a study involving over 1,000 subjects. Longitudinal samples and data were not analyzed here, as our aim was to evaluate prognostic value at diagnosis. Although the groups are imbalanced, this accurately reflects the real-world population of a large endoscopy center within a dedicated cancer facility. Subjects were invited to participate and enter the study before sedation for the diagnostic EGD procedure; thus, samples were collected prospectively from all consenting individuals.

      Finally, to maintain a large, unbiased cohort, we did not attempt to balance the groups, allowing analysis of samples and data from all patients with compatible diagnoses (please see Results: Patient groups and diagnoses).

      (3) Additionally, key methodological details were insufficiently reported, and the ROC analysis lacked comprehensive performance metrics, limiting the study's clinical applicability.

      We are grateful for this useful suggestion. In the current version, each ROC curve (Supplementary Figures 1A and 1B) now includes the top 10 gfDNA thresholds, along with their corresponding sensitivity and specificity values (please see Suppl. Table 1). The thresholds are ordered from-best-to-worst based on the classic Youden’s J statistic, as follows:

      Youden Index = specificity + sensitivity – 1 [Youden WJ. Index for rating diagnostic tests. Cancer 3:32-35, 1950. PMID: 15405679]. We have made an effort to provide all the key methodological details requested, but we would be glad to add further information upon specific request.

    1. Author response:

      Reviewer 1:

      Summary:

      Identifying drugs that target specific disease phenotypes remains a persistent challenge. Many current methods are only applicable to well-characterized small molecules, such as those with known structures. In contrast, methods based on transcriptional responses offer broader applicability because they do not require prior information about small molecules. Additionally, they can be rapidly applied to new small molecules. One of the most promising strategies involves the use of “drug response signatures”-specific sets of genes whose differential expression can serve as markers for the response to a small molecule. By comparing drug response signatures with expression profiles characteristic of a disease, it is possible to identify drugs that modulate the disease profile, indicating a potential therapeutic connection.

      This study aims to prioritize potential drug candidates and to forecast novel drug combinations that may be effective in treating triple-negative breast cancer (TNBC). Large consortia, such as the LINCS-L1000 project, offer transcriptional signatures across various time points after exposing numerous cell lines to hundreds of compounds at different concentrations. While this data is highly valuable, its direct applicability to pathophysiological contexts is constrained by the challenges in extracting consistent drug response profiles from these extensive datasets. The authors use their method to create drug response profiles for three different TNBC cell lines from LINCS.

      To create a more precise, cancer-specific disease profile, the authors highlight the use of single-cell RNA sequencing (scRNA-seq) data. They focus on TNBC epithelial cells collected from 26 diseased individuals compared to epithelial cells collected from 10 healthy volunteers. The authors are further leveraging drug response data to develop inhibitor combinations.

      Strengths:

      The authors of this study contribute to an ongoing effort to develop automated, robust approaches that leverage gene expression similarities across various cell lines and different treatment regimens, aiming to predict drug response signatures more accurately. The authors are trying to address the gap that remains in computational methods for inferring drug responses at the cell subpopulation level.

      Weaknesses:

      One weakness is that the authors do not compare their method to previous studies. The authors develop a drug response profile by summarizing the time points, concentrations, and cell lines. The computational challenge of creating a single gene list that represents the transcriptional response to a drug across different cell lines and treatment protocols has been previously addressed. The Prototype Ranked List (PRL) procedure, developed by Iorio and co-authors (PNAS, 2010, doi:10.1073/pnas.1000138107), uses a hierarchical majority-voting scheme to rank genes. This method generates a list of genes that are consistently overexpressed or downregulated across individual conditions, which then hold top positions in the PRL. The PRL methodology was used by Aissa and co-authors (Nature Comm 2021, doi:10.1038/s41467-021-21884-z) to analyze drug effects on selective cell populations using scRNA-seq datasets. They combined PRL with Gene Set Enrichment Analysis (GSEA), a method that compares a ranked list of genes like PRL against a specific set of genes of interest. GSEA calculates a Normalized Enrichment Score (NES), which indicates how well the genes of interest are represented among the top genes in the PRL. Compared to the method described in the current manuscript, the PRL method allows for the identification of both upregulated and downregulated transcriptional signatures relevant to the drug’s effects. It also gives equal weight to each cell line’s contribution to the drug’s overall response signature.

      The authors performed experimental validation of the top two identified drugs; however, the effect was modest. In addition, the effect on TNBC cell lines was cell-line specific as the identified drugs were effective against BT20, whose transcriptional signatures from LINCS were used for drug identification, but not against the other two cell lines analyzed. An incorrect choice of genes for the signature may result in capturing similarities tied to experimental conditions (e.g., the same cell line) rather than the drug’s actual effects. This reflects the challenges faced by drug response signature methods in both selecting the appropriate subset of genes that make up the signature and managing the multiple expression profiles generated by treating different cell lines with the same drug.

      We appreciate the reviewer’s thoughtful feedback and their suggestion to refer to the Prototype Ranked List (PRL) manuscript. Unfortunately, since this methodology for the PRL isn’t implemented in an open-source package, direct comparison with our approach is challenging. Nonetheless, we investigated whether using ranks would yield similar results for the most likely active drug pairs identified by retriever. To do this, we calculated and compared the rankings of the average effect sizes provided by retriever. Although the Spearman (ρ \= 0.98) correlation coefficient was high, we observed that key genes are disadvantaged when using ranks compared to effect sizes. This difference is particularly evident in the gene set enrichment analysis, where using average ranks identified only one pathway as statistically significantly enriched. The code to replicate these analyses is available at https://github.com/dosorio/L1000-TNBC/blob/main/Code/.

      Author response image 1.

      Given the similarity in purpose between retriever and the PRL approach, we have added the following statement to the introduction: “Previously, this goal was approached using a majority-voting scheme to rank genes across various cell types, concentrations, and time points. This approach generates a prototype ranked list (PRL) that represents the consistent ranks of genes across several cell lines in response to a specific drug.”

      Regarding the experimental validation, we believe there is a misunderstanding about the evidence we provided. We would like to claridy that we used three different TNBC cell lines: CAL120, BT20, and DU4475. It’s important to note that CAL120 and DU4475 were not included in the signature generation process. Despite this, we observed effects that exceeded the additive effects expectations, particularly in the CAL120 cell line (Figure 5, Panel F).

      Reviewer 2:

      Summary:

      In their study, Osorio and colleagues present ‘retriever,’ an innovative computational tool designed to extract disease-specific transcriptional drug response profiles from the LINCS-L1000 project. This tool has been effectively applied to TNBC, leveraging single-cell RNA sequencing data to predict drug combinations that may effectively target the disease. The public review highlights the significant integration of extensive pharmacological data with high-resolution transcriptomic information, which enhances the potential for personalized therapeutic applications.

      Strengths:

      A key finding of the study is the prediction and validation of the drug combination QL-XII-47 and GSK-690693 for the treatment of TNBC. The methodology employed is robust, with a clear pathway from data analysis to experimental confirmation.

      Weaknesses:

      However, several issues need to be addressed. The predictive accuracy of ’retriever’ is contingent upon the quality and comprehensiveness of the LINCS-L1000 and single-cell datasets utilized, which is an important caveat as these datasets may not fully capture the heterogeneity of patient responses to treatment. While the in vitro validation of the drug combinations is promising, further in vivo studies and clinical trials are necessary to establish their efficacy and safety. The applicability of these findings to other cancer types also warrants additional investigation. Expanding the application of ’retriever’ to a broader range of cancer types and integrating it with clinical data will be crucial for realizing its potential in personalized medicine. Furthermore, as the study primarily focuses on kinase inhibitors, it remains to be seen how well these findings translate to other drug classes.

      We thank the reviewer for their thoughtful and constructive feedback. We appreciate your insights and agree that several important considerations need to be addressed.

      We recognize that the predictive accuracy of retriever depends on the LINCS-L1000 and single-cell datasets. These resources may not fully represent the complete range of transcriptional responses to disease and treatment across different patients. As you mentioned, this is an important limitation. However, we believe that by extrapolating the evaluation of the most likely active compound to each individual patient, we can help address this issue. This approach will provide valuable insights into which patients in the study are most likely to respond positively to treatment.

      On the in-vitro validation of drug combinations, we agree that while promising, these results are not sufficient on their own to establish clinical efficacy. Additional in-vivo studies will be essential in assessing the therapeutic potential and safety of these combinations, and clinical trials will be an important next step to validate the translational impact of our findings.

      Lastly, we appreciate the reviewer’s comment about the focus of our study on kinase inhibitors. This result was unexpected, as we tested the full set of compounds from the LINCS-L1000 project. We agree that exploring other top candidates, including different drug classes, will be important for assessing how broadly retriever approach can be applied.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Pradhan et al investigated the potential gustatory mechanisms that allow flies to detect cholesterol. They found that flies are indifferent to low cholesterol and avoid high cholesterol. They further showed that the ionotropic receptors Ir7g, Ir51b, and Ir56d are important for the cholesterol sensitivity in bitter neurons. The figures are clear and the behavior result is interesting. However, I have several major comments, especially on the discrepancy of the expression of these Irs with other lab published results, and the confusing finding that the same receptors (Ir7g, Ir51b) have been implicated in the detection of various seemingly unrelated compounds.

      Strengths:

      The results are very well presented, the figures are clear and well-made, text is easy to follow.

      Weaknesses:

      (1) Regarding the expression of Ir56d. The reported Ir56d expression pattern contradicts multiple previous studies (Brown et al., 2021 eLife, Figure 6a-c; Sanchez-Alcaniz et al., 2017 Nature Communications, Figure 4e-h; Koh et al., 2014 Neuron, Figure 3b). These studies, using three different driver lines, consistently showed Ir56d expression in sweet-sensing neurons and taste peg neurons. Importantly, Sanchez-Alcaniz et al. demonstrated that Ir56d is not expressed in Gr66a-expressing (bitter) neurons. This discrepancy is critical since Ir56d is identified as the key subunit for cholesterol detection in bitter neurons, and misexpression of Ir7g and Ir51b together is insufficient to confer cholesterol sensitivity (Fig.4b,d). Which Ir56d-GAL4 (and Gr66a-I-GFP) line was used in this study? Is there additional evidence (scRNA sequencing, in-situ hybridization, or immunostaining) supporting Ir56d expression in bitter neurons?

      We agree that the expression pattern of Ir56d diverges from two prior reports . The studies by Brown et al. and Koh et al. employed the same Ir56d-GAL4 driver line, which exhibited expression in sweet-sensing gustatory receptor neurons (GRNs) and taste peg neurons, but not bitter GRNs (the Sanchez-Alcaniz et al. paper did not use an Ir56d-Gal4).

      In our study, we used a Ir56d-GAL4 driver line (KDRC:2307) and the Gr66a-I-GFP reporter line (Weiss et al., 2011 Neuron). This is a crucial distinction, as differences in the regulatory regions used to generate different driver lines are well known to underlie differences in expression patterns. Our double-labeling experiments revealed co-expression of Ir56d with Gr66a-positive bitter GRNs specifically within the S6 and S7 sensilla—types previously shown to exhibit strong electrophysiological responses to cholesterol (Figure 2—figure supplement 1F).

      We believe this observation is biologically significant and consistent with our functional data. Specifically, targeted expression of Ir56d in bitter neurons using the Gr33a-GAL4 was sufficient to rescue cholesterol avoidance behavior in Ir56d<sup>1</sup> mutants (Figure 3G). These results demonstrate that Ir56d plays a functional role in bitter GRNs for cholesterol detection. The convergence of genetic, behavioral, and electrophysiological data presented in our study provides compelling support for this previously unappreciated expression pattern and function of Ir56d.

      (2) Ir51b has previously been implicated in detecting nitrogenous waste (Dhakal 2021), lactic acid (Pradhan 2024), and amino acids (Aryal 2022), all by the same lab. Additionally, both Ir7g and Ir51b have been implicated in detecting cantharidin, an insect-secreted compound that flies may or may not encounter in the wild, by the same lab. Is Ir51b proposed to be a specific receptor for these chemically distinct compounds or a general multimodal receptor for aversive stimuli? Unlike other multimodal bitter receptors, the expression level of Ir51b is rather low and it's unclear which subset of GRNs express this receptor. The chemical diversity among nitrogenous waste, amino acids, lactic acid, cantharidin, and cholesterol raises questions about the specificity of these receptors and warrants further investigation and at a minimum discussion in this paper. Given the wide and seemingly unrelated sensitivity of Ir51b and Ir7g to these compounds I'm leaning towards the hypothesis that at least some of these is non-specific and ecologically irrelevant without further supporting evidence from the authors.

      While it is true that IR51b and IR7g are responsive to a range of compounds, they share chemical features such as nitrogen-containing groups, hydrophobicity, or amphipathic structures suggesting that recognition of these chemicals may be mediated by the same or overlapping domains within the receptor complexes. These features could facilitate binding to a structurally diverse yet chemically related groups of aversive ligands.

      In the case of cholesterol, while its sterol ring system is distinct from the other compounds, it shares hydrophobic and amphipathic properties that may enable interaction with these receptors via similar structural motifs. Importantly, our data demonstrates that Ir51b and Ir7g are necessary but not sufficient on their own to confer cholesterol sensitivity, indicating that additional co-factors or receptor subunits are required for full functionality (Figure 4B, D). Furthermore, our dose-response analysis (Figure 3F) shows that Ir7g is particularly important at higher cholesterol concentrations, supporting the idea of graded sensitivity rather than indiscriminate activation. This suggests that these receptors may have evolved to recognize cholesterol and its analogs (e.g., phytosterols such as stigmasterol, yet to be tested), which are naturally found in the fly’s diet (e.g., yeast and plant-derived matter), as ecologically relevant cues signaling microbial contamination, lipid imbalance, or dietary overconsumption.

      We acknowledge the reviewer’s concern regarding the relatively low expression levels of Ir51b and Ir7g. However, we note that low transcript abundance does not necessarily equate to diminished physiological relevance. Finally, we agree that the chemical diversity of ligands associated with Ir51b and Ir7g warrants deeper investigation, particularly through structure-function studies aimed at identifying ligand-binding domains and receptor-ligand interactions at atomic resolution.

      (3) The Benton lab Ir7g-GAL4 reporter shows no expression in adults. Additionally, two independent labellar RNA sequencing studies (Dweck, 2021 eLife; Bontonou et al., 2024 Nature Communications) failed to detect Ir7g expression in the labellum. This contradicts the authors' previous RT-PCR results (Pradhan 2024 Fig. S4, Journal of Hazardous Materials) showing Ir7g expression in the labellum. Additionally the Benton and Carlson lab Ir51b-GAL4 reporters show no expression in adults as well. Please address these inconsistencies.

      With respect to Ir7g, we acknowledge that the Ir7g-GAL4 reporter line from the Benton lab does not exhibit detectable expression in adult labella. Furthermore, two independent transcriptomic studies—Dweck et al., 2021 (eLife) and Bontonou et al., 2024 (Nature Communications) also did not detect Ir7g transcripts in bulk RNA-seq datasets derived from adult labella. However, our previously published RT-PCR data (Pradhan et al., 2024, Journal of Hazardous Materials, Fig. S4) revealed Ir7g expression in labellar tissue, albeit at low levels. Our RT-PCR includes an internal control (tubulin) with the same reaction tube with control and the Ir7g mutant as a negative control. Therefore, we stand behind the findings that Ir7g is expressed in the labellum.

      We would like to point out that RT-PCR is more sensitive and better-suited to detect low-abundance transcripts than bulk RNA-seq, which may fail to capture transcripts due to limitations in depth of coverage. Moreover, immunohistochemistry can have limitations in detecting very low expression levels. Costa et al. 2013 (Translational lung cancer research) states that “RNA-Seq technique will not likely replace current RT-PCR methods, but will be complementary depending on the needs and the resources as the results of the RNA-Seq will identify those genes that need to then be examined using RT-PCR methods”.

      Similarly, regarding Ir51b, while the GAL4 reporter lines from the Benton and Carlson labs do not show robust adult expression, our RT-PCR and functional data strongly support a role for Ir51b in labellar bitter GRNs. Specifically, Ir51b<sup>1</sup> mutants display electrophysiological deficits in response to cholesterol (Figure 2A–B), and these defects are rescued by expressing Ir51b in Gr33a-positive bitter neurons (Figure 3G), providing functional validation of the RT-PCR expression.

      (4) The premise that high cholesterol intake is harmful to flies, which makes sensory mechanisms for cholesterol avoidance necessary, is interesting but underdeveloped. Animal sensory systems typically evolve to detect ecologically relevant stimuli with dynamic ranges matching environmental conditions. Given that Drosophila primarily consume fruits and plant matter (which contain minimal cholesterol) rather than animal-derived foods (which contain higher cholesterol), the ecological relevance of cholesterol detection requires more thorough discussion. Furthermore, at high concentrations, chemicals often activate multiple receptors beyond those specifically evolved for their detection. If the cholesterol concentrations used in this study substantially exceed those encountered in the fly's natural diet, the observed responses may represent an epiphenomenon rather than an ecologically and ethologically relevant sensory mechanism. What is the cholesterol content in flies' diet and how does that compare to the concentrations used in this paper?

      Drosophila melanogaster cannot synthesize sterols de novo, and must acquire them from its diet. In natural environments, flies acquire sterols from fermenting fruit, decaying plant matter, and yeast, which contain trace amounts of phytosterols (e.g., stigmasterol, β-sitosterol) and ergosterol. While the exact sterol concentrations in these sources remain uncharacterized, our behavioral assays used concentrations (0.001–0.01% by weight) that align with the low levels expected in such nutrient-limited ecological niches.

      In our study, the cholesterol concentrations tested ranged from 0.001% to 0.1%, thereby spanning both the physiologically relevant and slightly elevated range. Importantly, avoidance behaviors and receptor activation were most prominent at 0.1% cholesterol. While it is true that high chemical concentrations may elicit off-target effects via broad receptor activation, our genetic and electrophysiological data indicate that the observed responses are mediated by specific ionotropic receptors (Ir51b, Ir7g, Ir56d) and not merely generalized chemical stress.

      Ecologically, elevated sterol levels may also signal conditions unsuitable for egg-laying or larval development. For example, high levels of cholesterol or other sterols may occur in substrates colonized by pathogenic microbes, decaying animal tissue, or in cases of abnormal microbial fermentation, which could represent a nutritional or microbial hazard. The avoidance of cholesterol may help signal the flies to avoid consuming decaying animal tissue. In this context, sensory detection of excessive cholesterol might serve as a protective function.

      Reviewer #2 (Public review):

      Summary:

      In Cholesterol Taste Avoidance in Drosophila melanogaster, Pradhan et al. used behavioral and electrophysiological assays to demonstrate that flies can: (1) detect cholesterol through a subset of bitter-sensing gustatory receptor neurons (GRNs) and (2) avoid consuming food with high cholesterol levels. Mechanistically, they identified five members of the IR family as necessary for cholesterol detection in GRNs and for the corresponding avoidance behavior. Ectopic expression experiments further suggested that Ir7g + Ir56d or Ir51b + Ir56d may function as tuning receptors for cholesterol detection, together with the Ir25a and Ir76b co-receptors.

      Strengths:

      The experimental design of this study was logical and straightforward. Leveraging their expertise in the Drosophila taste system, the research team identified the molecular and cellular basis of a previously unrecognized taste category, expanding our understanding of gustation. A key strength of the study was its combination of electrophysiological recordings with behavioral genetic experiments.

      Weaknesses:

      My primary concern with this study is the lack of a systematic survey of the IRs of interest in the labellum GRNs. Consequently, there is no direct evidence linking the expression of putative cholesterol IRs to the B GRNs in the S6 and S7 sensilla.

      Specifically, the authors need to demonstrate that the IR expression pattern explains cholesterol sensitivity in the B GRNs of S6 and S7 sensilla, but not in other sensilla. Instead of providing direct IR expression data for all candidate IRs (as shown for Ir56d in Figure 2-figure supplement 1F), the authors rely on citations from several studies (Lee, Poudel et al. 2018; Dhakal, Sang et al. 2021; Pradhan, Shrestha et al. 2024) to support their claim that Ir7g, Ir25a, Ir51b, and Ir76b are expressed in B GRNs (Lines 192-194). However, none of these studies provide GAL4 expression or in situ hybridization data to substantiate this claim.

      Without a comprehensive IR expression profile for GRNs across all taste sensilla, it is difficult to interpret the ectopic expression results observed in the B GRN of the I9 sensillum or the A GRN of the L-sensillum (Figure 4). It remains equally plausible that other tuning IRs-beyond the co-receptor Ir25a and Ir76b-could interact with the ectopically expressed IRs to confer cholesterol sensitivity, rather than the proposed Ir7g + Ir56d or Ir51b + Ir56d combinations.

      We provide electrophysiological data demonstrating that the S6 and S7 sensilla respond to cholesterol (Figure 1D). This finding is consistent with the hypothesis that these sensilla harbor the complete receptor complexes necessary for cholesterol detection. In our electrophysiological recordings, only those bitter GRNs that co-express Ir56d along with either Ir7g or Ir51b generate action potentials in response to cholesterol. Other S-type sensilla lacking one or more of these subunits remain unresponsive, reinforcing the idea that these components are necessary for receptor function and sensory coding of cholesterol. Moreover, in the cholesterol-insensitive I9 sensillum (based on our mapping results using electrophysiology), co-expression of either Ir7g + Ir56d or Ir51b + Ir56d conferred de novo cholesterol sensitivity (Figure 4B). Importantly, no cholesterol response was observed when any of these IRs was expressed alone or when Ir7g + Ir51b were co-expressed without Ir56d. These findings strongly argue against the possibility that endogenous tuning IRs in I9 sensilla (e.g., Ir25a, Ir76b) are sufficient to generate cholesterol responsiveness.

      Furthermore, based on the literature, Ir25a and Ir76b are endogenously expressed in I- and L-type sensilla. Thus, their presence alone is insufficient for cholesterol responsiveness. These data support the model that cholesterol sensitivity depends on a specific, multi-subunit receptor complex (e.g., Ir7g + Ir25a + Ir56d + Ir76b or Ir51b + Ir25a + Ir56d + Ir76b).

      In conclusion, while we acknowledge that our data do not provide a full anatomical map of IR expression across all sensilla, our results strongly support the idea that cholesterol sensitivity in S6 and S7 sensilla arises from specific combinations of IRs expressed in the B GRNs.

      Reviewer #3 (Public review):

      Summary:

      Whether and how animals can taste cholesterol is not well understood. The study provides evidence that 1) cholesterol activates a subset of bitter-sensing gustatory receptor neurons (GRNs) in the fly labellum, but not other types of GRNs, 2) flies show aversion to high concentrations of cholesterol, and this is mediated by bitter GRNs, and 3) cholesterol avoidance depends on a specific set of ionotropic receptor (IR) subunits acting in bitter GRNs. The claims of the study are supported by electrophysiological recordings, genetic manipulations, and behavioral readouts.

      Strengths:

      Cholesterol taste has not been well studied, and the paper provides new insight into this question. The authors took a comprehensive and rigorous approach in several different parts of the paper, including screening the responses of all 31 labellar sensilla, screening a large panel of receptor mutants, and performing misexpression experiments with nearly every combination of the 5 IRs identified. The effects of the genetic manipulations are very clear and the results of electrophysiological and behavioral studies match nicely, for the most part. The appropriate controls are performed for all genetic manipulations.

      Weaknesses:

      The weaknesses of the study, described below, are relatively minor and do not detract from the main conclusions of the paper.

      (1) The paper does not state what concentrations of cholesterol are present in Drosophila's natural food sources. Are the authors testing concentrations that are ethologically Drosophila melanogaster primarily feeds on fermenting fruits and associated microbial communities, especially yeast, which serve as major sources of dietary sterols. These natural food sources are known to contain phytosterols such as stigmasterol and β-sitosterol. One study quantified phytosterols (e.g., stigmasterol, sitosterol) in fruits, reporting concentrations between 1.6–32.6 mg/100 g edible portion (~0.0016–0.0326% wet weight) (Han et al 2008). The range we tested falls within this range. Additionally, ergosterol, the principal sterol in yeast and a structural analog of cholesterol, is present at levels of about 0.005% to 0.02% in yeast-rich environments.

      To ensure physiological relevance, we designed our behavioral assays to include a broad concentration range of cholesterol, from 10<sup>-5</sup>% to 10<sup>-1</sup>%. This spans both physiological levels (0.001–0.01%), which are comparable to those found in the natural diet, and supra-physiological levels (e.g., 0.1%), which exceed natural exposure but help define the threshold for aversive behavior.

      Our results demonstrate that flies begin to avoid cholesterol at concentrations ≥10<sup>-3</sup>% more (Figure 3A), which falls within the upper physiological range and may reflect the threshold beyond which cholesterol or related sterols become deleterious. At these higher concentrations, excess sterols may disrupt membrane fluidity, interfere with hormone signaling, or promote microbial overgrowth—all of which could compromise fly health.

      (2) The paper does not state or show whether the expression of IR7g, IR51b, and IR56d is confined to bitter GRNs. Bitter-specific expression of at least some of these receptors would be necessary to explain why bitter GRNs but not sugar GRNs (or other GRN types) normally show cholesterol responses.

      We show the Ir56d-Gal4 is co-expressed with Gr66a-GFP in S6/S7 sensilla, indicating that it is expressed in bitter GRNs (Figure 2—figure supplement 1F). In the case of Ir7g and Ir51b, there are no reporters or antibodies to address expression. However, previously they have been shown to be expressed in bitter GRNs using RT-PCR (Dhakal et al. 2021, Communications Biology; Pradhan et al. 2024, Journal of Hazardous Materials). In addition, we provide functional evidence that bitter GRNs are required for the cholesterol response since silencing bitter GRNs abolishes cholesterol-induced action potentials (Figure 1E–F). Moreover, we showed that we could rescue the Ir7g<sup>1</sup>, Ir51b<sup>1</sup> and Ir56d<sup>1</sup> mutant phenotypes only when we expressed the cognate transgenes in bitter GRNs using the Gr33a-GAL4 (Figure 3G). Thus, while Ir7g/Ir51b are not exclusive to bitter GRNs, their functional role in cholesterol detection is bitter-GRN-specific.

      (3) The authors only investigated the responses of GRNs in the labellum, but GRN responses in the leg may also contribute to the avoidance of cholesterol feeding. Alternatively, leg GRNs might contribute to cholesterol attraction that is unmasked when bitter GRNs are silenced. In support of this possibility, Ahn et al. (2017) showed that Ir56d functions in sugar GRNs of the leg to promote appetitive responses to fatty acids.

      This is an interesting idea. Indeed, when bitter GRNs are hyperpolarized, the flies exhibit a strong attraction to cholesterol. Nevertheless, the cellular basis for cholesterol attraction and whether it is mediated by GRNs in the legs will require a future investigation.

      (4) The authors might consider using proboscis extension as an additional readout of taste attraction or aversion, which would help them more directly link the labellar GRN responses to a behavioral readout. Using food ingestion as a readout can conflate the contribution of taste with post-ingestive effects, and the regulation of food ingestion also may involve contributions from GRNs on multiple organs, whereas organ-specific contributions can be dissociated using proboscis extension. For example, does presenting cholesterol on the proboscis lead to aversive responses in the proboscis extension assay (e.g., suppression of responses to sugar)? Does this aversion switch to attraction when bitter GRNs are silenced, as with the feeding assay?

      We thank the reviewer for the suggestion regarding the use of the proboscis extension reflex (PER) assay to strengthen the link between labellar GRN activity and behavioral responses to cholesterol.

      Author response image 1.

      Our PER assay results shown above indicate that cholesterol presentation on the labellum or forelegs leads to an aversive response, as evidenced by a significant reduction in proboscis extension when compared to control stimuli (Author response image 1A. 2% sucrose or 2% sucrose with 10<sup>-1</sup>% cholesterol was applied to labellum or forelegs and the percent PER was recorded. n=6. Data were compared using single-factor ANOVA coupled with Scheffe’s post-hoc test. Statistical significance was compared with the control. Means ± SEMs. **p<0.01). This finding supports the idea that cholesterol is detected by labellar and leg GRNs and elicits behavioral avoidance. In contrast, sucrose stimulation robustly induces proboscis extension, as expected for an appetitive stimulus. We confirmed the defects of due to each Ir mutant by presenting the stimuli to the labellum (Author response image 1B). Together, these PER results provide a more direct behavioral correlate of labellar and leg GRN activation and reinforce our conclusion that cholesterol is sensed as an aversive tastant through the labellar bitter GRNs.

      (5) The authors claim that the cholesterol receptor is composed of IR25a, IR76b, IR56d, and either IR7g or IR51b. While the authors have shown that IR25a and IR76b are each required for cholesterol sensing, they did not show that both are required components of the same receptor complex. If the authors are relying on previous studies to make this assumption, they should state this more clearly. Otherwise, I think further misexpression experiments may be needed where only IR25a or IR76b, but not both, are expressed in GRNs.

      In our study, we relied on prior work demonstrating that Ir25a and Ir76b function as broadly required co-receptors in most IR-dependent chemosensory pathways (Ganguly et al., 2017; Lee et al., 2018). These studies showed that Ir25a and Ir76b are co-expressed in many GRNs across multiple taste modalities. Functional IR complexes often fail to form or signal properly in the absence of these co-receptors. Thus, it is widely accepted in the field that Ir25a and Ir76b function together as a core heteromeric scaffold for diverse IR complexes, akin to co-receptors in other ionotropic glutamate receptor families. We state that while Ir25a and Ir76b are presumed co-receptors in the cholesterol receptor complex based on their conserved roles, their direct physical interaction with Ir7g, Ir51b, and Ir56d remains to be demonstrated.

      In support of this model, we note that in our ectopic expression experiments using I9 sensilla, which endogenously express Ir25a and Ir76b, introduction of either Ir7g + Ir56d or Ir51b + Ir56d was sufficient to confer cholesterol sensitivity (Figure 4B). We obtained a similar result in L6 sensilla (Figure 4D), which also endogenously express Ir25a and Ir76b. These findings imply that both co-receptors are already present in these sensilla and are likely part of the functional complex. However, we agree that we have not directly tested the requirement for both co-receptors in a minimal reconstitution context, such as expressing only Ir25a or Ir76b alongside tuning IRs in an otherwise null background. Such an experiment would indeed provide more direct evidence of their joint requirement in the receptor complex. Future studies, including heterologous expression experiments, will be necessary to define the cholesterol-receptor complexes.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors introduce a computational model that simulates the dendrites of developing neurons in a 2D plane, subject to constraints inspired by known biological mechanisms such as diffusing trophic factors, trafficked resources, and an activity-dependent pruning rule. The resulting arbors are analyzed in terms of their structure, dynamics, and responses to certain manipulations. The authors conclude that 1) their model recapitulates a stereotyped timecourse of neuronal development: outgrowth, overshoot, and pruning 2) Neurons achieve near-optimal wiring lengths, and Such models can be useful to test proposed biological mechanisms- for example, to ask whether a given set of growth rules can explain a given observed phenomenon - as developmental neuroscientists are working to understand the factors that give rise to the intricate structures and functions of the many cell types of our nervous system.

      Overall, my reaction to this work is that this is just one instantiation of many models that the author could have built, given their stated goals. Would other models behave similarly? This question is not well explored, and as a result, claims about interpreting these models and using them to make experimental predictions should be taken warily. I give more detailed and specific comments below.

      We thank the reviewer for the summary of the work. We find the criticism “that this is one instantiation of many models [we] could have built” can apply to any model. To quote George Box, “all models are wrong, but some models are useful” was the moto that drove our modeling approach. In principle, there are infinitely many possible models. We chose one of the most minimalistic models which implements known biological mechanisms including activity-independent and -dependent phases of dendritic growth, and constrained parameters based on experimental data. We compare the proposed model to other alternatives in the Discussion section, especially to the models of Hermann Cuntz which propose very different strategies for growth.

      However, the reviewer is right that within the type of model we chose, we could have more extensively explored the sensitivity to parameters. In the revised manuscript we will investigate the sensitivity of model output to variations of specific parameters, as explained below.

      Point 1.1. Line 109. After reading the rest of the manuscript, I worry about the conclusion voiced here, which implies that the model will extrapolate well to manipulations of all the model components. How were the values of model parameters selected? The text implies that these were selected to be biologically plausible, but many seem far off. The density of potential synapses, for example, seems very low in the simulations compared to the density of axons/boutons in the cortex; what constitutes a potential synapse? The perfect correlations between synapses in the activity groups is flawed, even for synapses belonging to the same presynaptic cell. The density of postsynaptic cells is also orders of magnitude of, etc. Ideally, every claim made about the model's output should be supported by a parameter sensitivity study. The authors performed few explorations of parameter sensitivity and many of the choices made seem ad hoc.

      It is indeed important to clarify how the model parameters were selected. Here we provide a short justification for some of these parameters, which will be included in the revised manuscript.

      1) Potential synapse density: We modelled 1,500 potential synapses in a cortical sheet of size 185x185 microns squared. We used 1 pixel per μm to capture approximately 1 μm thick dendrites. Therefore, we started with initial density of 0.044 potential synapses per μm^2. From Author Response Image 1 we can see that at the end of our simulation time ~1,000 potential synapses remain. So in fact, the density of potential synapses is totally sufficient, since not many potential synapses end up connected. The rapid slowing down of growth in our model is not due to a depletion of potential synaptic partners as the number of potential synapses remains high. Nonetheless, we will explore this in the revised manuscript. (this figure will be included in the revised submission):

      2) Stabilized synapse density: Since ~1,000 of the potential synapses in the modeled cortical sheet remain available, ~500 become connected to the dendrites of the 9 somas in the modeled cortical sheet. This means that the density of stable connected synapses is approximately 0.015 synapses per μm^2. This is also the number that is shown in Figure 3b, which is about 60 synapses stabilized per cell. This density is much easier to compare to experimental data, and below we provide some numbers from literature we already cited in the manuscript as well as a recent preprint.

      In the developing cortex:

      • Leighton, Cheyne and Lohmann 2023 https://doi.org/10.1101/2023.03.02.530772 find up to 0.4 synapses per μm in pyramidal neurons in vivo in the developing mouse visual cortex at P8 to P13. This is almost identical to our value of 0.4 synapses per μm.

      • Ultanir et al., 2007 https://doi.org/10.1073/pnas.0704031104 find 0.7 to 1.7 spines per μm in pyramidal neurons in vivo in L2/3 of the developing mouse cortex, at P10 to P20.

      • Glynn et al., 2011 https://doi.org/10.1038/nn.2764 find 0.1 to 0.7 spines per μm^2 in pyramidal neurons in vivo and in vitro in L2/3 of the developing mouse cortex, at P8 to P60.

      In the developing hippocampus:

      Although these values vary somewhat across experiments, in most cases they are in agreement with our chosen values, especially when taking into account that we are modeling development (rather than adulthood).

      3) Soma/neuron density: Indeed, we did not exactly mention this number anywhere in the paper. But from the figures we can infer 9 somas growing dendrites on an area of ~34,000 μm^2. Thus, neuron density would be 300 neurons per mm^2. This number seems a bit low after a short search through the literature. For e.g. Keller et al., 2018 https://www.frontiersin.org/articles/10.3389/fnana.2018.00083/full reports about 90,000 neurons per mm^3, albeit in adulthood.

      We are also performing a sensitivity analysis where some of these parameters are varied and will include this in the revised manuscript. In particular:

      (1) We will vary the nature of the input correlations. In the current model, the synapses in each correlated group receive spike trains with a perfect correlation and there are no correlations across the groups. We will reduce the correlations within group and add non-zero correlations across the groups.

      (2) We will vary the density of the neuronal somas. We expect that higher densities of somas will either yield smaller dendritic areas because the different neurons compete more or result in a state where nearby neurons have to complement each other regarding their activity preferences.

      (3) We will introduce dynamics in the potential synapses to model the dynamics of axons. We plan to explore several scenarios. We could introduce a gradual increase in the density of potential synapses and implement a cap on the number of synapses that can be alive at the same time, and vary that cap. We could also introduce a lifetime of each synapse (following for example a lognormal distribution). A potential synapse can disappear if it does not form a stable synapse in its lifetime, in which case it could move to a different location.

      Point 1.2. Many potentially important phenomena seem to be excluded. I realize that no model can be complete, but the choice of which phenomena to include or exclude from this model could bias studies that make use of it and is worth serious discussion. The development of axons is concurrent with dendrite outgrowth, is highly dynamic, and perhaps better understood mechanistically. In this model, the inputs are essentially static. Growing dendrites acquire and lose growth cones that are associated with rapid extension, but these do not seem to be modeled. Postsynaptic firing does not appear to be modeled, which may be critical to activity-dependent plasticity. For example, changes in firing are a potential explanation for the global changes in dendritic pruning that occur following the outgrowth phase.

      As the reviewer concludes, no model can be complete. In agreement with this, here we would like to quote a paragraph from a very nice paper by Larry Abbott (“Theoretical Neuroscience Rising, Neuron 2008 https://www.sciencedirect.com/science/article/pii/S0896627308008921) which although published more than 10 years ago, still applies today:

      “Identifying the minimum set of features needed to account for a particular phenomenon and describing these accurately enough to do the job is a key component of model building. Anything more than this minimum set makes the model harder to understand and more difficult to evaluate. The term ‘‘realistic’’ model is a sociological rather than a scientific term. The truly realistic model is as impossible and useless a concept as Borges’ ‘‘map of the empire that was of the same scale as the empire and that coincided with it point for point’’ (Borges, 1975). […] The art of modeling lies in deciding what this subset should be and how it should be described.”

      We have clearly stated in the Introduction (e.g. lines 37-75) which phenomena we include in the model and why. The Discussion also compares our model to others (lines 315-373), pointing out that most models either focus on activity-independent or activity-dependent phases. We include both, combining literature on molecular gradients and growth factors, with activity-dependent connectivity refinements instructed by spontaneous activity. We could not think of a more tractable, more minimalist model that would include both activity-independent or activity-dependent aspects. Therefore, we feel that the current manuscript provides sufficient motivation but also a discussion of limitations of the current model.

      Regarding including the concurrent development of axons, we agree this is very interesting and currently not addressed in the model. As noted at the bottom of our reply to point 1.1, bullet (3) we are now revising the manuscript to include a simplified form of axonal dynamics by allowing changes in the lifetime and location of potential synapses, which come from axons of presynaptic partners.

      Regarding postsynaptic firing, this is indeed super relevant and an important point to consider. In one of our recent publications (Kirchner and Gjorgjieva, 2021 https://www.nature.com/articles/s41467-021-23557-3), we studied only an activity-dependent model for the organization of synaptic inputs on non-growing dendrites which have a fixed length. There, we considered the effect of postsynaptic firing and demonstrated that it plays an important role in establishing a global organization of synapses on the entire dendritic tree of the neuron, and not just local dendritic branches. For example, we showed that could that it could lead to the emergence of retinotopic maps which have been found experimentally (Iacaruso et al., 2017 https://www.nature.com/articles/nature23019). Since we use the same activity-dependent plasticity model in this paper, we expect that the somatic firing will have the same effect on establishing synaptic distributions on the entire dendritic tree. We will make a note of this in the Discussion in the revised paper.

      Point 1.3. Line 167. There are many ways to include activity -independent and -dependent components into a model and not every such model shows stability. A key feature seems to be that larger arbors result in reduced growth and/or increased retraction, but this could be achieved in many ways (whether activity dependent or not). It's not clear that this result is due to the combination of activity-dependent and independent components in the model, or conceptually why that should be the case.

      We never argued for model uniqueness. There are always going to be many different models (at different spatial and temporal scales, at different levels of abstraction). We can never study all of them and like any modeling study in systems neuroscience we have chosen one model approach and investigated this approach. We do compare the current model to others in the Discussion. If the reviewers have a specific implementation that we should compare our model to as an alternative, we could try, but not if this means doing a completely separate project.

      Point 1.4. Line 183. The explanation of overshoot in terms of the different timescales of synaptic additions versus activity-dependent retractions was not something I had previously encountered and is an interesting proposal. Have these timescales been measured experimentally? To what extent is this a result of fine-tuning of simulation parameters?

      We found that varying the amount of BDNF controls the timescale of the activity-dependent plasticity (see our Figure 5c). Hence, changing the balance between synaptic additions vs. retractions is already explored in Figure 5e and f. Here we show that the overshoot and retraction does not have to be fine-tuned but may be abolished if there is too much activity-dependent plasticity.

      Regarding the relative timescales of synaptic additions vs. retractions: since the first is mainly due to activity-independent factors, and the second due to activity-dependent plasticity, the questions is really about the timescales of the latter two. As we write in the Introduction (lines 60-62), manipulating activity-dependent synaptic transmission has been found to not affect morphology but rather the density and specificity of synaptic connections (Ultanir et al. 2007 https://doi.org/10.1073/pnas.0704031104), supporting the sequential model we have (although we do not impose the sequence, as both activity-independent and activity-dependent mechanisms are always “on”; but note that activity-dependent plasticity can only operate on synapses that have already formed).

      Point 1.5. Line 203. This result seems at odds with results that show only a very weak bias in the tuning distribution of inputs to strongly tuned cortical neurons (e.g. work by Arthur Konnerth's group). This discrepancy should be discussed.

      First, we note that the correlated activity experienced by our modeled synapses (and resulting synaptic organization) does not necessarily correspond to visual orientation, or any stimulus feature, for that matter.

      Nonetheless, this is a very interesting question and there is some variability in what the experimental data show. Many studies have shown that synapses on dendrites are organized into functional synaptic clusters: across brain regions, developmental ages and diverse species from rodent to primate (Kleindienst et al. 2011; Takahashi et al. 2012; Winnubst et al. 2015; Gökçe et al., 2016; Wilson et al. 2016; Iacaruso et al., 2017; Scholl et al., 2017; Niculescu et al. 2018; Kerlin et al. 2019; Ju et al. 2020). Interestingly, some in vivo studies have reported lack of fine-scale synaptic organization (Varga et al. 2011; X. Chen et al. 2011; T.-W. Chen et al. 2013; Jia et al. 2010; Jia et al. 2014), while others reported clustering for different stimulus features in different species. For example, dendritic branches in the ferret visual cortex exhibit local clustering of orientation selectivity but do not exhibit global organization of inputs according to spatial location and receptive field properties (Wilson et al. 2016; Scholl et al., 2017). In contrast, synaptic inputs in mouse visual cortex do not cluster locally by orientation, but only by receptive field overlap, and exhibit a global retinotopic organization along the proximal-distal axis (Iacaruso et al., 2017). We proposed a theoretical framework to reconcile these data: combining activity-dependent plasticity similar to the BDNF-proBDNF model that we used in the current work, and a receptive field model for the different species (Kirchner and Gjorgjieva, 2021 https://www.nature.com/articles/s41467-021-23557-3). We can mention this aspect in the revised manuscript.

      Point 1.6. Line 268. How does the large variability in the size of the simulated arbors relate to the relatively consistent size of arbors of cortical cells of a given cell type? This variability suggests to me that these simulations could be sensitive to small changes in parameters (e.g. to the density or layout of presynapses).

      As noted at the bottom of our reply to point 1.1, bullet (3) we are now revising the manuscript to include changes in the lifetime and location of potential synapses.

      Point 1.7. The modeling of dendrites as two-dimensional will likely limit the usefulness of this model. Many phenomena- such as diffusion, random walks, topological properties, etc - fundamentally differ between two and three dimensions.

      The reviewer is right about there being differences between two and three dimensions. But a simpler model does not mean a useless model even if not completely realistic. We have ongoing work that extends the current model to 3D but is beyond the scope of the current paper. In systems neuroscience, people have found very interesting results making such simplified geometric assumptions about networks, for instance the one-dimensional ring model has been used to uncover fundamental insights about computations even though highly simplified and abstracted.

      Point 1.8. The description of wiring lengths as 'approximately optimal' in this text is problematic. The plotted data show that the wiring lengths are several deviations away from optimal, and the random model is not a valid instantiation of the 2D non-overlapping constraints the authors imposed. A more appropriate null should be considered.

      We did not use the term “optimal” in line with previous literature. We wrongly referred to the minimal wiring length as the optimal wiring length, but neurons can optimize their wiring not only by minimizing their dendritic length (e.g. work of Hermann Cuntz). In the revised manuscript, we will replace the term “optimal wiring” with “minimal wiring”. Then we will compare the wiring length in the model with the theoretically minimal wiring length, the random wiring length and the actual data.

      Point 1.9. It's not clear to me what the authors are trying to convey by repeatedly labeling this model as 'mechanistic'. The mechanisms implemented in the model are inspired by biological phenomena, but the implementations have little resemblance to the underlying biophysical mechanisms. Overall my impression is that this is a phenomenological model intended to show under what conditions particular patterns are possible. Line 363, describing another model as computational but not mechanistic, was especially unclear to me in this context.

      What we mean by mechanistic is that we implement equations that model specific mechanisms i.e. we have a set of equations that implement the activity-independent attraction to potential synapses (with parameters such as the density of synapses, their spatial influence, etc) and the activity-dependent refinement of synapses (with parameters such as the ratio of BDNF and proBDNF to induce potentiation vs depression, the activity-dependent conversion of one factor to the other, etc). This is a bottom-up approach where we combine multiple elements together to get to neuronal growth and synaptic organization. This approach is in stark contrast to the so-called top-down or normative approaches where the method would involve defining an objective function (e.g. minimal dendritic length) which depends on a set of parameters and then applying a gradient descent or other mathematical optimization technique to get at the parameters that optimize the objective function. This latter approach we would not call mechanistic because it involves an abstract objective function (who could say what a neuron or a circuit should be trying to optimize) and a mathematical technique for how to optimize the function (we don’t know of neurons can compute gradients of abstract objective functions).

      Hence our model is mechanistic, but it does operate at a particular level of abstraction/simplification. We don’t model individual ion channels, or biophysics of synaptic plasticity (opening and closing of NMDA channels, accumulation of proteins at synapses, protein synthesis). We do, however, provide a biophysical implementation of the plasticity mechanism though the BDNF/proBDNF model which is more than most models of plasticity achieve, because they typically model a phenomenological STDP or Hebbian rule that just uses activity patterns to potential or depress synaptic weights, disregarding how it could be implemented.

      Reviewer #2 (Public Review):

      This work combines a model of two-dimensional dendritic growth with attraction and stabilisation by synaptic activity. The authors find that constraining growth models with competition for synaptic inputs produces artificial dendrites that match some key features of real neurons both over development and in terms of final structure. In particular, incorporating distance-dependent competition between synapses of the same dendrite naturally produces distinct phases of dendritic growth (overshoot, pruning, and stabilisation) that are observed biologically and leads to local synaptic organisation with functional relevance. The approach is elegant and well-explained, but makes some significant modelling assumptions that might impact the biological relevance of the results.

      Strengths:

      The main strength of the work is the general concept of combining morphological models of growth with synaptic plasticity and stabilisation. This is an interesting way to bridge two distinct areas of neuroscience in a manner that leads to findings that could be significant for both. The modelling of both dendritic growth and distance-dependent synaptic competition is carefully done, constrained by reasonable biological mechanisms, and well-described in the text. The paper also links its findings, for example in terms of phases of dendritic growth or final morphological structure, to known data well.

      Weaknesses:

      The major weaknesses of the paper are the simplifying modelling assumptions that are likely to have an impact on the results. These assumptions are not discussed in enough detail in the current version of the paper.

      1) Axonal dynamics.

      A major, and lightly acknowledged, assumption of this paper is that potential synapses, which must come from axons, are fixed in space. This is not realistic for many neural systems, as multiple undifferentiated neurites typically grow from the soma before an axon is specified (Polleux & Snider, 2010). Further, axons are also dynamic structures in early development and, at least in some systems, undergo activity-dependent morphological changes too (O'Leary, 1987; Hall 2000). This paper does not consider the implications of joint pre- and post-synaptic growth and stabilisation.

      We thank the reviewer for the summary of the strengths and weaknesses of the work. While we feel that including a full model of axonal dynamics is beyond the scope of the current manuscript, some aspects of axonal dynamics can be included. In a revised model, we will introduce a gradual increase in the density of potential synapses and implement a cap on the number of synapses that can be alive at the same time, and vary that cap. We plan to also introduce a lifetime of each synapse (following for example a lognormal distribution). A potential synapse can disappear if it does not form a stable synapse in its lifetime, in which case it could move to a different location. See also our reply to reviewer comment 1.1, bullet (3).

      2) Activity correlations

      On a related note, the synapses in the manuscript display correlated activity, but there is no relationship between the distance between synapses and their correlation. In reality, nearby synapses are far more likely to share the same axon and so display correlated activity. If the input activity is spatially correlated and synaptic plasticity displays distance-dependent competition in the dendrites, there is likely to be a non-trivial interaction between these two features with a major impact on the organisation of synaptic contacts onto each neuron.

      We are exploring the amount of correlation (between and within correlated groups) to include in the revised manuscript (see also our reply to reviewer comment 1.1, bullet (1)).

      However, previous experimental work, (Kleindienst et al., 2011 https://doi.org/10.1016/j.neuron.2011.10.015) has provided anatomical and functional analyses that it is unlikely that the functional synaptic clustering on dendritic branches is the result of individual axons making more than one synapse (see pg. 1019).

      3) BDNF dynamics

      The models are quite sensitive to the ratio of BDNF to proBDNF (eg Figure 5c). This ratio is also activity-dependent as synaptic activation converts proBDNF into BDNF. The models assume a fixed ratio that is not affected by synaptic activity. There should at least be more justification for this assumption, as there is likely to be a positive feedback relationship between levels of BDNF and synaptic activation.

      The reviewer is correct. We used the BDNF-proBDNF model for synaptic plasticity based on our previous work: Kirchner and Gjorgjieva, 2021 https://www.nature.com/articles/s41467-021-23557-3.

      There, we explored only the emergence of functionally clustered synapses on static dendrites which do not grow. In the Methods section (Parameters and data fitting) we justify the choice of the ratio of BDNF to proBDNF from published experimental work. We also performed sensitivity analysis (Supplementary Fig. 1) and perturbation simulations (Supplementary Fig. 3), which showed that the ratio is crucial in regulating the overall amount of potentiation and depression of synaptic efficacy, and therefore has a strong impact on the emergence and maintenance of synaptic organization. Since we already performed all this analysis, we do not expect there will be any differences in the current model which includes dendritic growth, as the activity-dependent mechanism has such a different timescale.

      A further weakness is in the discussion of how the final morphologies conform to principles of optimal wiring, which is quite imprecise. 'Optimal wiring' in the sense of dendrites and axons (Cajal, 1895; Chklovskii, 2004; Cuntz et al, 2007, Budd et al, 2010) is not usually synonymous with 'shortest wiring' as implied here. Instead, there is assumed to be a balance between minimising total dendritic length and minimising the tree distance (ie Figure 4c here) between synapses and the site of input integration, typically the soma. The level of this balance gives the deviation from the theoretical minimum length as direct paths to synapses typically require longer dendrites. In the model this is generated by the guidance of dendritic growth directly towards the synaptic targets. The interpretation of the deviation in this results section discussing optimal wiring, with hampered diffusion of signalling molecules, does not seem to be correct.

      We agree with this comment. We had wrongly used the term “optimal wiring” as neurons can optimize their wiring not only by minimizing their dendritic length but other factors as noted by the reviewer. In the revised manuscript will replace the term “optimal wiring” with “minimal wiring” and discuss these differences to previous work.

      Reviewer #3 (Public Review):

      The authors propose a mechanistic model of how the interplay between activity-independent growth and an activity-dependent synaptic strengthening/weaken model influences the dendrite shape, complexity and distribution of synapses. The authors focus on a model for stellate cells, which have multiple dendrites emerging from a soma. The activity independent component is provided by a random pool of presynaptic sites that represent potential synapses and that release a diffusible signal that promotes dendritic growth. Then a spontaneous activity pattern with some correlation structure is imposed at those presynaptic sites. The strength of these synapses follow a learning rule previously proposed by the lab: synapses strengthen when there is correlated firing across multiple sites, and synapses weaken if there is uncorrelated firing with the relative strength of these processes controlled by available levels of BDNF/proBDNF. Once a synapse is weakened below a threshold, the dendrite branch at that site retracts and loses its sensitivity to the growth signal

      The authors run the simulation and map out how dendrites and synapses evolve and stabilize. They show that dendritic trees growing rapidly and then stabilize by balancing growth and retraction (Figure 2). They also that there is an initial bout of synaptogenesis followed by loss of synapses, reflecting the longer amount of time it takes to weaken a synapse (Figure 3). They analyze how this evolution of dendrites and synapses depends on the correlated firing of synapses (i.e. defined as being in the same "activity group"). They show that in the stabilized phase, synapses that remain connected to a given dendritic branch are likely to be from same activity group (Figure 4). The authors systemically alter the learning rule by changing the available concentration of BDNF, which alters the relative amount of synaptic strengthening, which in turn affects stabilization, density of synapses and interestingly how selective for an activity group one dendrite is (Figure 5). In addition the authors look at how altering the activity-independent factors influences outgrowth (Figure 6). Finally, one of the interesting outcomes is that the resulting dendritic trees represent "optimal wiring" solutions in the sense that dendrites use the shortest distance given the distribution of synapses. They compare this distribute to one published data to see how the model compared to what has been observed experimentally.

      There are many strengths to this study. The consequence of adding the activity-dependent contribution to models of synapto- and dendritogenesis is novel. There is some exploration of parameters space with the motivation of keeping the parameters as well as the generated outcomes close to anatomical data of real dendrites. The paper is also scholarly in its comparison of this approach to previous generative models. This work represented an important advance to our understanding of how learning rules can contribute to dendrite morphogenesis

      We thank the reviewer for the positive evaluation of the work and the suggestions below.

    1. Author response:

      Reviewer #1 (Evidence, reproducibility and clarity):

      Authors has provided a mechanism by which how presence of truncated P53 can inactivate function of full length P53 protein. Authors proposed this happens by sequestration of full length P53 by truncated P53.

      In the study, performed experiments are well described.

      My area of expertise is molecular biology/gene expression, and I have tried to provide suggestions on my area of expertise. The study has been done mainly with overexpression system and I have included few comments which I can think can be helpful to understand effect of truncated P53 on endogenous wild type full length protein. Performing experiments on these lines will add value to the observation according to this reviewer.

      Major comments:

      (1) What happens to endogenous wild type full length P53 in the context of mutant/truncated isoforms, that is not clear. Using a P53 antibody which can detect endogenous wild type P53, can authors check if endogenous full length P53 protein is also aggregated as well? It is hard to differentiate if aggregation of full length P53 happens only in overexpression scenario, where lot more both of such proteins are expressed. In normal physiological condition P53 expression is usually low, tightly controlled and its expression get induced in altered cellular condition such as during DNA damage. So, it is important to understand the physiological relevance of such aggregation, which could be possible if authors could investigate effect on endogenous full length P53 following overexpression of mutant isoforms.

      Thank you very much for your insightful comments.

      (1) To address “what happens to endogenous wild-type full-length P53 in the context of mutant/truncated isoforms," we employed a human A549 cell line expressing endogenous wild-type p53 under DNA damage conditions such as an etoposide treatment(1). We choose the A549 cell line since similar to H1299, it is a lung cancer cell line (www.atcc.org). For comparison, we also transfected the cells with 2 μg of V5-tagged plasmids encoding FLp53 and its isoforms Δ133p53 and Δ160p53. As shown in Author response image 1A, lanes 1 and 2, endogenous p53 expression, remained undetectable in A549 cells despite etoposide treatment, which limits our ability to assess the effects of the isoforms on the endogenous wild-type FLp53. We could, however, detect the V5-tagged FLp53 expressed from the plasmid using anti-V5 (rabbit) as well as with antiDO-1 (mouse) antibody (Author response image 1). The latter detects both endogenous wildtype p53 and the V5-tagged FLp53 since the antibody epitope is within the Nterminus (aa 20-25). This result supports the reviewer’s comment regarding the low level of expression of endogenous p53 that is insufficient for detection in our experiments.   

      In summary, in line with the reviewer’s comment that ‘under normal physiological conditions p53 expression is usually low,’ we could not detect p53 with an anti-DO-1 antibody. Thus, we proceeded with V5/FLAG-tagged p53 for detection of the effects of the isoforms on p53 stability and function. We also found that protein expression in H1299 cells was more easily detectable than in A549 cells (Compare Author response image 1A and B). Thus, we decided to continue with the H1299 cells (p53-null), which would serve as a more suitable model system for this study.  

      (2) We agree with the reviewer that ‘It is hard to differentiate if aggregation of full-length p53 happens only in overexpression scenario’. However, it is not impossible to imagine that such aggregation of FLp53 happens under conditions when p53 and its isoforms are over-expressed in the cell. Although the exact physiological context is not known and beyond the scope of the current work, our results indicate that at higher expression, p53 isoforms drive aggregation of FLp53. Given the challenges of detecting endogenous FLp53, we had to rely on the results obtained with plasmid mediated expression of p53 and its isoforms in p53-null cells.

      Author response image 1.

      Comparative analysis of protein expression in A549 and H1299 cells. (A) A549 cells (p53 wild-type) were treated with etoposide to induce endogenous wild-type p53 expression. To assess the effects of FLp53 and its isoforms Δ133p53 and Δ160p53 on endogenous wild-type p53 aggregation, A549 cells were transfected with 2 μg of V5-tagged p53 expression plasmids, with or without etoposide (20μM for 8h) treatment. Western blot analysis was done with the anti-V5 (rabbit) to detect V5-tagged proteins and anti-DO-1 (mouse), the latter detects both endogenous wild-type p53 and V5-tagged FLp53. The merged image corresponds to the overlay between the V5 and DO1 antibody signals. (B) H1299 cells (p53-null) were transfected with 2 μg V5tagged p53 expression plasmids or the empty vector control pcDNA3.1. Western blot analysis was done with the anti-V5 (mouse) antibody. 

      (2) Can presence of mutant P53 isoforms can cause functional impairment of wild type full length endogenous P53? That could be tested as well using similar ChIP assay authors has performed, but instead of antibody against the Tagged protein if the authors could check endogenous P53 enrichment in the gene promoter such as P21 following overexpression of mutant isoforms. May be introducing a condition such as DNA damage in such experiment might help where endogenous P53 is induced and more prone to bind to P53 target such as P21.

      Thank you very much for your valuable comments and suggestions. To investigate the potential functional impairment of endogenous wild-type p53 by p53 isoforms, we initially utilized A549 cells (p53 wild-type), aiming to monitor endogenous wild-type p53 expression following DNA damage. However, as mentioned and demonstrated in Author response image 1, endogenous p53 expression was too low to be detected under these conditions, making the ChIP assay for analyzing endogenous p53 activity unfeasible. Thus, we decided to utilize plasmid-based expression of FLp53 and focus on the potential functional impairment induced by the isoforms.

      (3) On similar lines, authors described:

      "To test this hypothesis, we escalated the ratio of FLp53 to isoforms to 1:10. As expected, the activity of all four promoters decreased significantly at this ratio (Figure 4A-D). Notably, Δ160p53 showed a more potent inhibitory effect than Δ133p53 at the 1:5 ratio on all promoters except for the p21 promoter, where their impacts were similar (Figure 4E-H). However, at the 1:10 ratio, Δ133p53 and Δ160p53 had similar effects on all transactivation except for the MDM2 promoter (Figure 4E-H)."

      Again, in such assay authors used ratio 1:5 to 1:10 full length vs mutant. How authors justify this result in context (which is more relevant context) where one allele is Wild type (functional P53) and another allele is mutated (truncated, can induce aggregation). In this case one would except 1:1 ratio of full-length vs mutant protein, unless other regulation is going which induces expression of mutant isoforms more than wild type full length protein. Probably discussing on these lines might provide more physiological relevance to the observed data.

      Thank you for raising this point regarding the physiological relevance of the ratios used in our study.

      (1) In the revised manuscript (lines 193-195), we added in this direction that “The elevated Δ133p53 protein modulates p53 target genes such as miR‑34a and p21, facilitating cancer development(2, 3). To mimic conditions where isoforms are upregulated relative to FLp53, we increased the ratios to 1:5 and 1:10.” This approach aims to simulate scenarios where isoforms accumulate at higher levels than FLp53, which may be relevant in specific contexts, as also elaborated above.

      (2) Regarding the issue of protein expression, where one allele is wild-type and the other is isoform, this assumption is not valid in most contexts. First, human cells have two copies of TPp53 gene (one from each parent). Second, the TP53 gene has two distinct promoters: the proximal promoter (P1) primarily regulates FLp53 and ∆40p53, whereas the second promoter (P2) regulates ∆133p53 and ∆160p53(4, 5). Additionally, ∆133TP53 is a p53 target gene(6, 7) and the expression of Δ133p53 and FLp53 is dynamic in response to various stimuli. Third, the expression of p53 isoforms is regulated at multiple levels, including transcriptional, post-transcriptional, translational, and post-translational processing(8). Moreover, different degradation mechanisms modify the protein level of p53 isoforms and FLp53(8). These differential regulation mechanisms are regulated by various stimuli, and therefore, the 1:1 ratio of FLp53 to ∆133p53 or ∆160p53 may be valid only under certain physiological conditions. In line with this, varied expression levels of FLp53 and its isoforms, including ∆133p53 and ∆160p53, have been reported in several studies(3, 4, 9, 10). 

      (3) In our study, using the pcDNA 3.1 vector under the human cytomegalovirus (CMV) promoter, we observed moderately higher expression levels of ∆133p53 and ∆160p53 relative to FLp53 (Author response image 1B). This overexpression scenario provides a model for studying conditions where isoform accumulation might surpass physiological levels, impacting FLp53 function. By employing elevated ratios of these isoforms to FLp53, we aim to investigate the potential effects of isoform accumulation on FLp53.

      (4) Finally does this altered function of full length P53 (preferably endogenous one) in presence of truncated P53 has any phenotypic consequence on the cells (if authors choose a cell type which is having wild type functional P53). Doing assay such as apoptosis/cell cycle could help us to get this visualization.

      Thank you for your insightful comments. In the experiment with A549 cells (p53 wild-type), endogenous p53 levels were too low to be detected, even after DNA damage induction. The evaluation of the function of endogenous p53 in the presence of isoforms is hindered, as mentioned above. In the revised manuscript, we utilized H1299 cells with overexpressed proteins for apoptosis studies using the Caspase-Glo® 3/7 assay (Figure 7). This has been shown in the Results section (lines 254-269). “The Δ133p53 and Δ160p53 proteins block pro-apoptotic function of FLp53.

      One of the physiological read-outs of FLp53 is its ability to induce apoptotic cell death(11). To investigate the effects of p53 isoforms Δ133p53 and Δ160p53 on FLp53-induced apoptosis, we measured caspase-3 and -7 activities in H1299 cells expressing different p53 isoforms (Figure 7). Caspase activation is a key biochemical event in apoptosis, with the activation of effector caspases (caspase-3 and -7) ultimately leading to apoptosis(12). The caspase-3 and -7 activities induced by FLp53 expression was approximately 2.5 times higher than that of the control vector (Figure 7). Co-expression of FLp53 and the isoforms Δ133p53 or Δ160p53 at a ratio of 1: 5 significantly diminished the apoptotic activity of FLp53 (Figure 7). This result aligns well with our reporter gene assay, which demonstrated that elevated expression of Δ133p53 and Δ160p53 impaired the expression of apoptosis-inducing genes BAX and PUMA (Figure 4G and H). Moreover, a reduction in the apoptotic activity of FLp53 was observed irrespective of whether Δ133p53 or Δ160p53 protein was expressed with or without a FLAG tag (Figure 7). This result, therefore, also suggests that the FLAG tag does not affect the apoptotic activity or other physiological functions of FLp53 and its isoforms. Overall, the overexpression of p53 isoforms Δ133p53 and Δ160p53 significantly attenuates FLp53-induced apoptosis, independent of the protein tagging with the FLAG antibody epitope.”

      Referees cross-commenting

      I think the comments from the other reviewers are very much reasonable and logical.

      Especially all 3 reviewers have indicated, a better way to visualize the aggregation of full-length wild type P53 by truncated P53 (such as looking at endogenous P53# by reviewer 1, having fluorescent tag #by reviewer 2 and reviewer 3 raised concern on the FLAG tag) would add more value to the observation.

      Thank you for these comments. The endogenous p53 protein was undetectable in A549 cells induced by etoposide (Figure R1A). Therefore, we conducted experiments using FLAG/V5-tagged FLp53.  To avoid any potential side effects of the FLAG tag on p53 aggregation, we introduced untagged p53 isoforms in the H1299 cells and performed subcellular fractionation. Our revised results, consistent with previous FLAG-tagged p53 isoforms findings, demonstrate that co-expression of untagged isoforms with FLAG-tagged FLp53 significantly induced the aggregation of FLAG-FLp53, while no aggregation was observed when FLAG-tagged FLp53 was expressed alone (Supplementary Figure 6). These results clearly indicate that the FLAG tag itself does not contribute to protein aggregation. 

      Additionally, we utilized the A11 antibody to detect protein aggregation, providing additional validation (Figure 8 from Jean-Christophe Bourdon et al. Genes Dev. 2005;19:2122-2137). Given that the fluorescent proteins (~30 kDa) are substantially bigger than the tags used here (~1 kDa) and may influence oligomerization (especially GFP), stability, localization, and function of p53 and its isoforms, we avoided conducting these vital experiments with such artificial large fusions. 

      Reviewer #1 (Significance):

      The work in significant, since it points out more mechanistic insight how wild type full length P53 could be inactivated in the presence of truncated isoforms, this might offer new opportunity to recover P53 function as treatment strategies against cancer.

      Thank you for your insightful comments. We appreciate your recognition of the significance of our work in providing mechanistic insights into how wild-type FLp53 can be inactivated by truncated isoforms. We agree that these findings have potential for exploring new strategies to restore p53 function as a therapeutic approach against cancer. 

      Reviewer #2 (Evidence, reproducibility and clarity):

      The manuscript by Zhao and colleagues presents a novel and compelling study on the p53 isoforms, Δ133p53 and Δ160p53, which are associated with aggressive cancer types. The main objective of the study was to understand how these isoforms exert a dominant negative effect on full-length p53 (FLp53). The authors discovered that the Δ133p53 and Δ160p53 proteins exhibit impaired binding to p53-regulated promoters. The data suggest that the predominant mechanism driving the dominant-negative effect is the coaggregation of FLp53 with Δ133p53 and Δ160p53.

      This study is innovative, well-executed, and supported by thorough data analysis. However, the authors should address the following points:

      (1) Introduction on Aggregation and Co-aggregation: Given that the focus of the study is on the aggregation and co-aggregation of the isoforms, the introduction should include a dedicated paragraph discussing this issue. There are several original research articles and reviews that could be cited to provide context.

      Thank you very much for the valuable comments. We have added the following paragraph in the revised manuscript (lines 74-82): “Protein aggregation has become a central focus of modern biology research and has documented implications in various diseases, including cancer(13, 14, 15). Protein aggregates can be of different types ranging from amorphous aggregates to highly structured amyloid or fibrillar aggregates, each with different physiological implications. In the case of p53, whether protein aggregation, and in particular, co-aggregation with large N-terminal deletion isoforms, plays a mechanistic role in its inactivation is yet underexplored. Interestingly, the Δ133p53β isoform has been shown to aggregate in several human cancer cell lines(16). Additionally, the Δ40p53α isoform exhibits a high aggregation tendency in endometrial cancer cells(17). Although no direct evidence exists for Δ160p53 yet, these findings imply that p53 isoform aggregation may play a major role in their mechanisms of actions.”

      (2) Antibody Use for Aggregation: To strengthen the evidence for aggregation, the authors should consider using antibodies that specifically bind to aggregates.

      Thank you for your insightful suggestion. We addressed protein aggregation using the A11 antibody which specifically recognizes amyloid-like protein aggregates. We analyzed insoluble nuclear pellet samples prepared under identical conditions as described in Figure 6B. To confirm the presence of p53 proteins, we employed the anti-p53 M19 antibody (Santa Cruz, Cat No. sc-1312) to detect bands corresponding to FLp53 and its isoforms Δ133p53 and Δ160p53. The monomer FLp53 was not detected (Figure 8, lower panel, Jean-Christophe Bourdon et al. Genes Dev. 2005;19:2122-2137), which may be attributed to the lower binding affinity of the anti-p53 M19 antibody to it. These samples were also immunoprecipitated using the A11 antibody (Thermo Fischer Scientific, Cat No. AHB0052) to detect aggregated proteins. Interestingly, FLp53 and its isoforms, Δ133p53 and Δ160p53, were clearly visible with Anti-A11 antibody when co-expressed at a 1:5 ratio suggesting that they underwent co-aggregation. However, no FLp53 aggregates were observed when it was expressed alone (Author response image 2). These results support the conclusion in our manuscript that Δ133p53 and Δ160p53 drive FLp53 aggregation. 

      Author response image 2.

      Induction of FLp53 Aggregation by p53 Isoforms Δ133p53 and Δ160p53. H1299 cells transfected with the FLAG-tagged FLp53 and V5-tagged Δ133p53 or Δ160p53 at a 1:5 ratio. The cells were subjected to subcellular fractionation, and the resulting insoluble nuclear pellet was resuspended in RIPA buffer. The samples were heated at 95°C until the pellet was completely dissolved, and then analyzed by Western blotting. Immunoprecipitation was performed using the A11 antibody, which specifically recognizes amyloid protein aggregates, and the anti-p53 M19 antibody, which detects FLp53 as well as its isoforms Δ133p53 and Δ160p53. 

      (3) Fluorescence Microscopy: Live-cell fluorescence microscopy could be employed to enhance visualization by labeling FLp53 and the isoforms with different fluorescent markers (e.g., EGFP and mCherry tags).

      We appreciate the suggestion to use live-cell fluorescence microscopy with EGFP and mCherry tags for the visualization FLp53 and its isoforms. While we understand the advantages of live-cell imaging with EGFP / mCherry tags, we restrained us from doing such fusions as the GFP or corresponding protein tags are very big (~30 kDa) with respect to the p53 isoform variants (~30 kDa).  Other studies have shown that EGFP and mCherry fusions can alter protein oligomerization, solubility and aggregation(18, 19) Moreover, most fluorescence proteins are prone to dimerization (i.e. EGFP) or form obligate tetramers (DsRed)(20, 21, 22), potentially interfering with the oligomerization and aggregation properties of p53 isoforms, particularly Δ133p53 and Δ160p53.

      Instead, we utilized FLAG- or V5-tag-based immunofluorescence microscopy, a well-established and widely accepted method for visualizing p53 proteins. This method provided precise localization and reliable quantitative data, which we believe meet the needs of the current study. We believe our chosen method is both appropriate and sufficient for addressing the research question.

      Reviewer #2 (Significance):

      The manuscript by Zhao and colleagues presents a novel and compelling study on the p53 isoforms, Δ133p53 and Δ160p53, which are associated with aggressive cancer types. The main objective of the study was to understand how these isoforms exert a dominant negative effect on full-length p53 (FLp53). The authors discovered that the Δ133p53 and Δ160p53 proteins exhibit impaired binding to p53-regulated promoters. The data suggest that the predominant mechanism driving the dominant-negative effect is the coaggregation of FLp53 with Δ133p53 and Δ160p53.

      We sincerely thank the reviewer for the thoughtful and positive comments on our manuscript and for highlighting the significance of our findings on the p53 isoforms, Δ133p53 and Δ160p53. 

      Reviewer #3 (Evidence, reproducibility and clarity):

      In this manuscript entitled "Δ133p53 and Δ160p53 isoforms of the tumor suppressor protein p53 exert dominant-negative effect primarily by coaggregation", the authors suggest that the Δ133p53 and Δ160p53 isoforms have high aggregation propensity and that by co-aggregating with canonical p53 (FLp53), they sequestrate it away from DNA thus exerting a dominantnegative effect over it.

      First, the authors should make it clear throughout the manuscript, including the title, that they are investigating Δ133p53α and Δ160p53α since there are 3 Δ133p53 isoforms (α, β, γ), and 3 Δ160p53 isoforms (α, β, γ).

      Thank you for your suggestion. We understand the importance of clearly specifying the isoforms under study. Following your suggestion, we have added α in the title, abstract, and introduction and added the following statement in the Introduction (lines 57-59): “For convenience and simplicity, we have written Δ133p53 and Δ160p53 to represent the α isoforms (Δ133p53α and Δ160p53α) throughout this manuscript.” 

      One concern is that the authors only consider and explore Δ133p53α and Δ160p53α isoforms as exclusively oncogenic and FLp53 dominant-negative while not discussing evidences of different activities. Indeed, other manuscripts have also shown that Δ133p53α is non-oncogenic and non-mutagenic, do not antagonize every single FLp53 functions and are sometimes associated with good prognosis. To cite a few examples:

      (1) Hofstetter G. et al. D133p53 is an independent prognostic marker in p53 mutant advanced serous ovarian cancer. Br. J. Cancer 2011, 105, 15931599.

      (2) Bischof, K. et al. Influence of p53 Isoform Expression on Survival in HighGrade Serous Ovarian Cancers. Sci. Rep. 2019, 9,5244.

      (3) Knezovi´c F. et al. The role of p53 isoforms' expression and p53 mutation status in renal cell cancer prognosis. Urol. Oncol. 2019, 37, 578.e1578.e10.

      (4) Gong, L. et al. p53 isoform D113p53/D133p53 promotes DNA doublestrand break repair to protect cell from death and senescence in response to DNA damage. Cell Res. 2015, 25, 351-369.

      (5) Gong, L. et al. p53 isoform D133p53 promotes efficiency of induced pluripotent stem cells and ensures genomic integrity during reprogramming. Sci. Rep. 2016, 6, 37281.

      (6) Horikawa, I. et al. D133p53 represses p53-inducible senescence genes and enhances the generation of human induced pluripotent stem cells. Cell Death Differ. 2017, 24, 1017-1028.

      (7) Gong, L. p53 coordinates with D133p53 isoform to promote cell survival under low-level oxidative stress. J. Mol. Cell Biol. 2016, 8, 88-90.

      Thank you very much for your comment and for highlighting these important studies. 

      We agree that Δ133p53 isoforms exhibit complex biological functions, with both oncogenic and non-oncogenic potentials. However, our mission here was primarily to reveal the molecular mechanism for the dominant-negative effects exerted by the Δ133p53α and Δ160p53α isoforms on FLp53 for which the Δ133p53α and Δ160p53α isoforms are suitable model systems. Exploring the oncogenic potential of the isoforms is beyond the scope of the current study and we have not claimed anywhere that we are reporting that. We have carefully revised the manuscript and replaced the respective terms e.g. ‘prooncogenic activity’ with ‘dominant-negative effect’ in relevant places (e.g. line 90). We have now also added a paragraph with suitable references that introduces the oncogenic and non-oncogenic roles of the p53 isoforms.

      After reviewing the papers you cited, we are not sure that they reflect on oncogenic /non-oncogenic role of the Δ133p53α isoform in different cancer cases.  Although our study is not about the oncogenic potential of the isoforms, we have summarized the key findings below:

      (1) Hofstetter et al., 2011: Demonstrated that Δ133p53α expression improved recurrence-free and overall survival (in a p53 mutant induced advanced serous ovarian cancer, suggesting a potential protective role in this context.

      (2) Bischof et al., 2019: Found that Δ133p53 mRNA can improve overall survival in high-grade serous ovarian cancers. However, out of 31 patients, only 5 belong to the TP53 wild-type group, while the others carry TP53 mutations.

      (3) Knezović et al., 2019: Reported downregulation of Δ133p53 in renal cell carcinoma tissues with wild-type p53 compared to normal adjacent tissue, indicating a potential non-oncogenic role, but not conclusively demonstrating it.

      (4) Gong et al., 2015: Showed that Δ133p53 antagonizes p53-mediated apoptosis and promotes DNA double-strand break repair by upregulating RAD51, LIG4, and RAD52 independently of FLp53.

      (5) Gong et al., 2016: Demonstrated that overexpression of Δ133p53 promotes efficiency of cell reprogramming by its anti-apoptotic function and promoting DNA DSB repair. The authors hypotheses that this mechanism is involved in increasing RAD51 foci formation and decrease γH2AX foci formation and chromosome aberrations in induced pluripotent stem (iPS) cells, independent of FL p53.

      (6) Horikawa et al., 2017: Indicated that induced pluripotent stem cells derived from fibroblasts that overexpress Δ133p53 formed noncancerous tumors in mice compared to induced pluripotent stem cells derived from fibroblasts with complete p53 inhibition. Thus, Δ133p53 overexpression is "non- or less oncogenic and mutagenic" compared to complete p53 inhibition, but it still compromises certain p53-mediated tumor-suppressing pathways. “Overexpressed Δ133p53 prevented FL-p53 from binding to the regulatory regions of p21WAF1 and miR-34a promoters, providing a mechanistic basis for its dominant-negative

      inhibition of a subset of p53 target genes.”

      (7) Gong, 2016: Suggested that Δ133p53 promotes cell survival under lowlevel oxidative stress, but its role under different stress conditions remains uncertain.

      We have revised the Introduction to provide a more balanced discussion of Δ133p53’s dule role (lines 62-73):

      “The Δ133p53 isoform exhibit complex biological functions, with both oncogenic and non-oncogenic potentials. Recent studies demonstrate the non-oncogenic yet context-dependent role of the Δ133p53 isoform in cancer development. Δ133p53 expression has been reported to correlate with improved survival in patients with TP53 mutations(23, 24), where it promotes cell survival in a nononcogenic manner(25, 26), especially under low oxidative stress(27). Alternatively, other recent evidences emphasize the notable oncogenic functions of Δ133p53 as it can inhibit p53-dependent apoptosis by directly interacting with the FLp53 (4, 6). The oncogenic function of the newly identified Δ160p53 isoform is less known, although it is associated with p53 mutation-driven tumorigenesis(28) and in melanoma cells’ aggressiveness(10). Whether or not the Δ160p53 isoform also impedes FLp53 function in a similar way as Δ133p53 is an open question. However, these p53 isoforms can certainly compromise p53-mediated tumor suppression by interfering with FLp53 binding to target genes such as p21 and miR-34a(2, 29) by dominant-negative effect, the exact mechanism is not known.” On the figures presented in this manuscript, I have three major concerns:

      (1) Most results in the manuscript rely on the overexpression of the FLAGtagged or V5-tagged isoforms. The validation of these construct entirely depends on Supplementary figure 3 which the authors claim "rules out the possibility that the FLAG epitope might contribute to this aggregation. However, I am not entirely convinced by that conclusion. Indeed, the ratio between the "regular" isoform and the aggregates is much higher in the FLAG-tagged constructs than in the V5-tagged constructs. We can visualize the aggregates easily in the FLAG-tagged experiment, but the imaging clearly had to be overexposed (given the white coloring demonstrating saturation of the main bands) to visualize them in the V5-tagged experiments. Therefore, I am not convinced that an effect of the FLAG-tag can be ruled out and more convincing data should be added. 

      Thank you for raising this important concern. We have carefully considered your comments and have made several revisions to clarify and strengthen our conclusions.

      First, to address the potential influence of the FLAG and V5 tags on p53 isoform aggregation, we have revised Figure 2 and removed the previous Supplementary Figure 3, where non-specific antibody bindings and higher molecular weight aggregates were not clearly interpretable. In the revised Figure 2, we have removed these potential aggregates, improving the clarity and accuracy of the data.

      To further rule out any tag-related artifacts, we conducted a coimmunoprecipitation assay with FLAG-tagged FLp53 and untagged Δ133p53 and Δ160p53 isoforms. The results (now shown in the new Supplementary Figure 3) completely agree with our previous result with FLAG-tagged and V5tagged Δ133p53 and Δ160p53 isoforms and show interaction between the partners. This indicates that the FLAG / V5-tags do not influence / interfere with the interaction between FLp53 and the isoforms. We have still used FLAGtagged FLp53 as the endogenous p53 was undetectable and the FLAG-tagged FLp53 did not aggregate alone. 

      In the revised paper, we added the following sentences (Lines 146-152): “To rule out the possibility that the observed interactions between FLp53 and its isoforms Δ133p53 and Δ160p53 were artifacts caused by the FLAG and V5 antibody epitope tags, we co-expressed FLAG-tagged FLp53 with untagged Δ133p53 and Δ160p53. Immunoprecipitation assays demonstrated that FLAGtagged FLp53 could indeed interact with the untagged Δ133p53 and Δ160p53 isoforms (Supplementary Figure 3, lanes 3 and 4), confirming formation of hetero-oligomers between FLp53 and its isoforms. These findings demonstrate that Δ133p53 and Δ160p53 can oligomerize with FLp53 and with each other.”

      Additionally, we performed subcellular fractionation experiments to compare the aggregation and localization of FLAG-tagged FLp53 when co-expressed either with V5-tagged or untagged Δ133p53/Δ160p53. In these experiments, the untagged isoforms also induced FLp53 aggregation, mirroring our previous results with the tagged isoforms (Supplementary Figure 5). We’ve added this result in the revised manuscript (lines 236-245): “To exclude the possibility that FLAG or V5 tags contribute to protein aggregation, we also conducted subcellular fractionation of H1299 cells expressing FLAG-tagged FLp53 along with untagged Δ133p53 or Δ160p53 at a 1:5 ratio. The results showed (Supplementary Figure 6) a similar distribution of FLp53 across cytoplasmic, nuclear, and insoluble nuclear fractions as in the case of tagged Δ133p53 or Δ160p53 (Figure 6A to D). Notably, the aggregation of untagged Δ133p53 or Δ160p53 markedly promoted the aggregation of FLAG-tagged FLp53 (Supplementary Figure 6B and D), demonstrating that the antibody epitope tags themselves do not contribute to protein aggregation.” 

      We’ve also discussed this in the Discussion section (lines 349-356): “In our study, we primarily utilized an overexpression strategy involving FLAG/V5tagged proteins to investigate the effects of p53 isoforms Δ133p53 and Δ160p53 on the function of FLp53. To address concerns regarding potential overexpression artifacts, we performed the co-immunoprecipitation (Supplementary Figure 6) and caspase-3 and -7 activity (Figure 7) experiments with untagged Δ133p53 and Δ160p53. In both experimental systems, the untagged proteins behaved very similarly to the FLAG/V5 antibody epitopecontaining proteins (Figures 6 and 7 and Supplementary Figure 6). Hence, the C-terminal tagging of FLp53 or its isoforms does not alter the biochemical and physiological functions of these proteins.”

      In summary, the revised data set and newly added experiments provide strong evidence that neither the FLAG nor the V5 tag contributes to the observed p53 isoform aggregation.

      (2) The authors demonstrate that to visualize the dominant-negative effect, Δ133p53α and Δ160p53α must be "present in a higher proportion than FLp53 in the tetramer" and the need at least a transfection ratio 1:5 since the 1:1 ration shows no effect. However, in almost every single cell type, FLp53 is far more expressed than the isoforms which make it very unlikely to reach such stoichiometry in physiological conditions and make me wonder if this mechanism naturally occurs at endogenous level. This limitation should be at least discussed.

      Thank you for your insightful comment. However, evidence suggests that the expression levels of these isoforms such as Δ133p53, can be significantly elevated relative to FLp53 in certain physiological conditions(3, 4, 9). For example, in some breast tumors, with Δ133p53 mRNA is expressed at a much levels than FLp53, suggesting a distinct expression profile of p53 isoforms compared to normal breast tissue(4). Similarly, in non-small cell lung cancer and the A549 lung cancer cell line, the expression level of Δ133p53 transcript is significantly elevated compared to non-cancerous cells(3). Moreover, in specific cholangiocarcinoma cell lines, the Δ133p53 /TAp53 expression ratio has been reported to increase to as high as 3:1(9). These observations indicate that the dominant-negative effect of isoform Δ133p53 on FLp53 can occur under certain pathological conditions where the relative amounts of the FLp53 and the isoforms would largely vary. Since data on the Δ160p53 isoform are scarce, we infer that the long N-terminal truncated isoforms may share a similar mechanism.

      (3) Figure 5C: I am concerned by the subcellular location of the Δ133p53α and Δ160p53α as they are commonly considered nuclear and not cytoplasmic as shown here, particularly since they retain the 3 nuclear localization sequences like the FLp53 (Bourdon JC et al. 2005; Mondal A et al. 2018; Horikawa I et al, 2017; Joruiz S. et al, 2024). However, Δ133p53α can form cytoplasmic speckles (Horikawa I et al, 2017) when it colocalizes with autophagy markers for its degradation.

      The authors should discuss this issue. Could this discrepancy be due to the high overexpression level of these isoforms? A co-staining with autophagy markers (p62, LC3B) would rule out (or confirm) activation of autophagy due to the overwhelming expression of the isoform.

      Thank you for your thoughtful comments. We have thoroughly reviewed all the papers you recommended (Bourdon JC et al., 2005; Mondal A et al., 2018; Horikawa I et al., 2017; Joruiz S. et al., 2024)(4, 29, 30, 31). Among these, only the study by Bourdon JC et al. (2005) provided data regarding the localization of Δ133p53(4). Interestingly, their findings align with our observations, indicating that the protein does not exhibit predominantly nuclear localization in the Figure 8 from Jean-Christophe Bourdon et al. Genes Dev. 2005;19:2122-2137. The discrepancy may be caused by a potentially confusing statement in that paper(4).

      The localization of p53 is governed by multiple factors, including its nuclear import and export(32). The isoforms Δ133p53 and Δ160p53 contain three nuclear localization sequences (NLS)(4). However, the isoforms Δ133p53 and Δ160p53 were potentially trapped in the cytoplasm by aggregation and masking the NLS. This mechanism would prevent nuclear import. 

      Further, we acknowledge that Δ133p53 co-aggregates with autophagy substrate p62/SQSTM1 and autophagosome component LC3B in cytoplasm by autophagic degradation during replicative senescence(33). We agree that high overexpression of these aggregation-prone proteins may induce endoplasmic reticulum (ER) stress and activates autophagy(34). This could explain the cytoplasmic localization in our experiments. However, it is also critical to consider that we observed aggregates in both the cytoplasm and the nucleus (Figures 6B and E and Supplementary Figure 6B). While cytoplasmic localization may involve autophagy-related mechanisms, the nuclear aggregates likely arise from intrinsic isoform properties, such as altered protein folding, independent of autophagy. These dual localizations reflect the complex behavior of Δ133p53 and Δ160p53 isoforms under our experimental conditions.

      In the revised manuscript, we discussed this in Discussion (lines 328-335): “Moreover, the observed cytoplasmic isoform aggregates may reflect autophagy-related degradation, as suggested by the co-localization of Δ133p53 with autophagy substrate p62/SQSTM1 and autophagosome component LC3B(33). High overexpression of these aggregation-prone proteins could induce endoplasmic reticulum stress and activate autophagy(34). Interestingly, we also observed nuclear aggregation of these isoforms (Figure 6B and E and Supplementary Figure 6B), suggesting that distinct mechanisms, such as intrinsic properties of the isoforms, may govern their localization and behavior within the nucleus. This dual localization underscores the complexity of Δ133p53 and Δ160p53 behavior in cellular systems.”

      Minor concerns:

      -  Figure 1A: the initiation of the "Δ140p53" is shown instead of "Δ40p53"

      Thank you! The revised Figure 1A has been created in the revised paper.

      -  Figure 2A: I would like to see the images cropped a bit higher, so the cut does not happen just above the aggregate bands

      Thank you for this suggestion. We’ve changed the image and the new Figure 2 has been shown in the revised paper.

      -  Figure 3C: what ratio of FLp53/Delta isoform was used?

      We have added the ratio in the figure legend of Figure 3C (lines 845-846) “Relative DNA-binding of the FLp53-FLAG protein to the p53-target gene promoters in the presence of the V5-tagged protein Δ133p53 or Δ160p53 at a 1: 1 ratio.”

      -  Figure 3C suggests that the "dominant-negative" effect is mostly senescencespecific as it does not affect apoptosis target genes, which is consistent with Horikawa et al, 2017 and Gong et al, 2016 cited above. Furthermore, since these two references and the others from Gong et al. show that Δ133p53α increases DNA repair genes, it would be interesting to look at RAD51, RAD52 or Lig4, and maybe also induce stress.

      Thank you for your thoughtful comments and suggestions. In Figure 3C, the presence of Δ133p53 or Δ160p53 only significantly reduced the binding of FLp53 to the p21 promoter. However, isoforms Δ133p53 and Δ160p53 demonstrated a significant loss of DNA-binding activity at all four promoters: p21, MDM2, and apoptosis target genes BAX and PUMA (Figure 3B). This result suggests that Δ133p53 and Δ160p53 have the potential to influence FLp53 function due to their ability to form hetero-oligomers with FLp53 or their intrinsic tendency to aggregate. To further investigate this, we increased the isoform to FLp53 ratio in Figure 4, which demonstrate that the isoforms Δ133p53 and Δ160p53 exert dominant-negative effects on the function of FLp53. 

      These results demonstrate that the isoforms can compromise p53-mediated pathways, consistent with Horikawa et al. (2017), which showed that Δ133p53α overexpression is "non- or less oncogenic and mutagenic" compared to complete p53 inhibition, but still affects specific tumor-suppressing pathways. Furthermore, as noted by Gong et al. (2016), Δ133p53’s anti-apoptotic function under certain conditions is independent of FLp53 and unrelated to its dominantnegative effects.

      We appreciate your suggestion to investigate DNA repair genes such as RAD51, RAD52, or Lig4, especially under stress conditions. While these targets are intriguing and relevant, we believe that our current investigation of p53 targets in this manuscript sufficiently supports our conclusions regarding the dominant-negative effect. Further exploration of additional p53 target genes, including those involved in DNA repair, will be an important focus of our future studies.

      - Figure 5A and B: directly comparing the level of FLp53 expressed in cytoplasm or nucleus to the level of Δ133p53α and Δ160p53α expressed in cytoplasm or nucleus does not mean much since these are overexpressed proteins and therefore depend on the level of expression. The authors should rather compare the ratio of cytoplasmic/nuclear FLp53 to the ratio of cytoplasmic/nuclear Δ133p53α and Δ160p53α.

      Thank you very much for this valuable suggestion. In the revised paper, Figure 5B has been recreated.  Changes have been made in lines 214215: “The cytoplasm-to-nucleus ratio of Δ133p53 and Δ160p53 was approximately 1.5-fold higher than that of FLp53 (Figure 5B).” 

      Referees cross-commenting

      I agree that the system needs to be improved to be more physiological.

      Just to precise, the D133 and D160 isoforms are not truncated mutants, they are naturally occurring isoforms expressed in almost every normal human cell type from an internal promoter within the TP53 gene.

      Using overexpression always raises concerns, but in this case, I am even more careful because the isoforms are almost always less expressed than the FLp53, and here they have to push it 5 to 10 times more expressed than the FLp53 to see the effect which make me fear an artifact effect due to the overwhelming overexpression (which even seems to change the normal localization of the protein).

      To visualize the endogenous proteins, they will have to change cell line as the H1299 they used are p53 null.

      Thank you for these comments. We’ve addressed the motivation of overexpression in the above responses. We needed to use the plasmid constructs in the p53-null cells to detect the proteins but the expression level was certainly not ‘overwhelmingly high’. 

      First, we tried the A549 cells (p53 wild-type) under DNA damage conditions, but the endogenous p53 protein was undetectable. Second, several studies reported increased Δ133p53 level compared to wild-type p53 and that it has implications in tumor development(2, 3, 4, 9). Third, the apoptosis activity of H1299 cells overexpressing p53 proteins was analyzed in the revised manuscript (Figure 7). The apoptotic activity induced by FLp53 expression was approximately 2.5 times higher than that of the control vector under identical plasmid DNA transfection conditions (Figure 7). These results rule out the possibility that the plasmid-based expression of p53 and its isoforms introduced artifacts in the results. We’ve discussed this in the Results section (lines 254269).

      Reviewer #3 (Significance):

      Overall, the paper is interesting particularly considering the range of techniques used which is the main strength.

      The main limitation to me is the lack of contradictory discussion as all argumentation presents Δ133p53α and Δ160p53α exclusively as oncogenic and strictly FLp53 dominant-negative when, particularly for Δ133p53α, a quite extensive literature suggests a not so clear-cut activity.

      The aggregation mechanism is reported for the first time for Δ133p53α and Δ160p53α, although it was already published for Δ40p53α, Δ133p53β or in mutant p53.

      This manuscript would be a good basic research addition to the p53 field to provide insight in the mechanism for some activities of some p53 isoforms.

      My field of expertise is the p53 isoforms which I have been working on for 11 years in cancer and neuro-degenerative diseases

      Thank you very much for your positive and critical comments. We’ve included a fair discussion on the oncogenic and non-oncogenic function of Δ133p53 in the Introduction following your suggestion (lines 62-73). 

      References

      (1) Pitolli C, Wang Y, Candi E, Shi Y, Melino G, Amelio I. p53-Mediated Tumor Suppression: DNA-Damage Response and Alternative Mechanisms. Cancers 11,  (2019).

      (2) Fujita K, et al. p53 isoforms Delta133p53 and p53beta are endogenous regulators of replicative cellular senescence. Nature cell biology 11, 1135-1142 (2009).

      (3) Fragou A, et al. Increased Δ133p53 mRNA in lung carcinoma corresponds with reduction of p21 expression. Molecular medicine reports 15, 1455-1460 (2017).

      (4) Bourdon JC, et al. p53 isoforms can regulate p53 transcriptional activity. Genes & development 19, 2122-2137 (2005).

      (5) Ghosh A, Stewart D, Matlashewski G. Regulation of human p53 activity and cell localization by alternative splicing. Molecular and cellular biology 24, 7987-7997 (2004).

      (6) Aoubala M, et al. p53 directly transactivates Δ133p53α, regulating cell fate outcome in response to DNA damage. Cell death and differentiation 18, 248-258 (2011).

      (7) Marcel V, et al. p53 regulates the transcription of its Delta133p53 isoform through specific response elements contained within the TP53 P2 internal promoter. Oncogene 29, 2691-2700 (2010).

      (8) Zhao L, Sanyal S. p53 Isoforms as Cancer Biomarkers and Therapeutic Targets. Cancers 14,  (2022).

      (9) Nutthasirikul N, Limpaiboon T, Leelayuwat C, Patrakitkomjorn S, Jearanaikoon P. Ratio disruption of the ∆133p53 and TAp53 isoform equilibrium correlates with poor clinical outcome in intrahepatic cholangiocarcinoma. International journal of oncology 42, 1181-1188 (2013).

      (10) Tadijan A, et al. Altered Expression of Shorter p53 Family Isoforms Can Impact Melanoma Aggressiveness. Cancers 13,  (2021).

      (11) Aubrey BJ, Kelly GL, Janic A, Herold MJ, Strasser A. How does p53 induce apoptosis and how does this relate to p53-mediated tumour suppression? Cell death and differentiation 25, 104-113 (2018).

      (12) Ghorbani N, Yaghubi R, Davoodi J, Pahlavan S. How does caspases regulation play role in cell decisions? apoptosis and beyond. Molecular and cellular biochemistry 479, 1599-1613 (2024).

      (13) Petronilho EC, et al. Oncogenic p53 triggers amyloid aggregation of p63 and p73 liquid droplets. Communications chemistry 7, 207 (2024).

      (14) Forget KJ, Tremblay G, Roucou X. p53 Aggregates penetrate cells and induce the coaggregation of intracellular p53. PloS one 8, e69242 (2013).

      (15) Farmer KM, Ghag G, Puangmalai N, Montalbano M, Bhatt N, Kayed R. P53 aggregation, interactions with tau, and impaired DNA damage response in Alzheimer's disease. Acta neuropathologica communications 8, 132 (2020).

      (16) Arsic N, et al. Δ133p53β isoform pro-invasive activity is regulated through an aggregation-dependent mechanism in cancer cells. Nature communications 12, 5463 (2021).

      (17) Melo Dos Santos N, et al. Loss of the p53 transactivation domain results in high amyloid aggregation of the Δ40p53 isoform in endometrial carcinoma cells. The Journal of biological chemistry 294, 9430-9439 (2019).

      (18) Mestrom L, et al. Artificial Fusion of mCherry Enhances Trehalose Transferase Solubility and Stability. Applied and environmental microbiology 85,  (2019).

      (19) Kaba SA, Nene V, Musoke AJ, Vlak JM, van Oers MM. Fusion to green fluorescent protein improves expression levels of Theileria parva sporozoite surface antigen p67 in insect cells. Parasitology 125, 497-505 (2002).

      (20) Snapp EL, et al. Formation of stacked ER cisternae by low affinity protein interactions. The Journal of cell biology 163, 257-269 (2003).

      (21) Jain RK, Joyce PB, Molinete M, Halban PA, Gorr SU. Oligomerization of green fluorescent protein in the secretory pathway of endocrine cells. The Biochemical journal 360, 645-649 (2001).

      (22) Campbell RE, et al. A monomeric red fluorescent protein. Proceedings of the National Academy of Sciences of the United States of America 99, 7877-7882 (2002).

      (23) Hofstetter G, et al. Δ133p53 is an independent prognostic marker in p53 mutant advanced serous ovarian cancer. British journal of cancer 105, 1593-1599 (2011).

      (24) Bischof K, et al. Influence of p53 Isoform Expression on Survival in High-Grade Serous Ovarian Cancers. Scientific reports 9, 5244 (2019).

      (25) Gong L, et al. p53 isoform Δ113p53/Δ133p53 promotes DNA double-strand break repair to protect cell from death and senescence in response to DNA damage. Cell research 25, 351-369 (2015).

      (26) Gong L, et al. p53 isoform Δ133p53 promotes efficiency of induced pluripotent stem cells and ensures genomic integrity during reprogramming. Scientific reports 6, 37281 (2016).

      (27) Gong L, Pan X, Yuan ZM, Peng J, Chen J. p53 coordinates with Δ133p53 isoform to promote cell survival under low-level oxidative stress. Journal of molecular cell biology 8, 88-90 (2016).

      (28) Candeias MM, Hagiwara M, Matsuda M. Cancer-specific mutations in p53 induce the translation of Δ160p53 promoting tumorigenesis. EMBO reports 17, 1542-1551 (2016).

      (29) Horikawa I, et al. Δ133p53 represses p53-inducible senescence genes and enhances the generation of human induced pluripotent stem cells. Cell death and differentiation 24, 1017-1028 (2017).

      (30) Mondal AM, et al. Δ133p53α, a natural p53 isoform, contributes to conditional reprogramming and long-term proliferation of primary epithelial cells. Cell death & disease 9, 750 (2018).

      (31) Joruiz SM, Von Muhlinen N, Horikawa I, Gilbert MR, Harris CC. Distinct functions of wild-type and R273H mutant Δ133p53α differentially regulate glioblastoma aggressiveness and therapy-induced senescence. Cell death & disease 15, 454 (2024).

      (32) O'Brate A, Giannakakou P. The importance of p53 location: nuclear or cytoplasmic zip code? Drug resistance updates : reviews and commentaries in antimicrobial and anticancer chemotherapy 6, 313-322 (2003).

      (33) Horikawa I, et al. Autophagic degradation of the inhibitory p53 isoform Δ133p53α as a regulatory mechanism for p53-mediated senescence. Nature communications 5, 4706 (2014).

      (34) Lee H, et al. IRE1 plays an essential role in ER stress-mediated aggregation of mutant huntingtin via the inhibition of autophagy flux. Human molecular genetics 21, 101-114 (2012).

    1. Author Response

      Reviewer #1 (Public Review):

      Weaknesses:

      1) The authors should better review what we know of fungal Drosophila microbiota species as well as the ecology of rotting fruit. Are the microbiota species described in this article specific to their location/setting? It would have been interesting to know if similar species can be retrieved in other locations using other decaying fruits. The term 'core' in the title suggests that these species are generally found associated with Drosophila but this is not demonstrated. The paper is written in a way that implies the microbiota members they have found are universal. What is the evidence for this? Have the fungal species described in this paper been found in other studies? Even if this is not the case, the paper is interesting, but there should be a discussion of how generalizable the findings are.

      The reviewer inquires as to whether the microbial species described in this article are ubiquitously associated with Drosophila or not. Indeed, most of the microbes described in this manuscript are generally recognized as species associated with Drosophila spp. For example, species such as Hanseniaspora uvarum, Pichia kluyveri, and Starmerella bacillaris have been detected in or isolated from Drosophila spp. collected in European countries as well as the United States and Oceania (Chandler et al., 2012; Solomon et al., 2019). As for the bacteria, species belonging to the genera Pantoea, Lactobacillus, Leuconostoc, and Acetobacter have also previously been detected in wild Drosophila spp. (Chandler et al., 2011). These elucidations will be incorporated into our revised manuscript.

      Nevertheless, the term “core” in the manuscript title may lead to misunderstanding, as the generality does not ensure the ubiquitous presence of these microbial species in every individual fly. Considering this point, we will replace the term with an expression more appropriate to our context.

      2) Can the authors clearly demonstrate that the microbiota species that develop in the banana trap are derived from flies? Are these species found in flies in the wild? Did the authors check that the flies belong to the D. melanogaster species and not to the sister group D. simulans?

      Can the authors clearly demonstrate that the microbiota species that develop in the banana trap are derived from flies? Are these species found in flies in the wild?

      The reviewer asked whether the microbial species identified in the fermented banana samples were derived from flies. To address this question, additional experiments under more controlled conditions, such as the inoculation of specific species of wild flies onto fresh bananas, would be needed. Nevertheless, the microbes may potentially originate from wild flies, as supported by the literature cited in our response to the Weakness 1).

      Alternative sources for microbial provenance also merit consideration. For example, microbial entities may be inherently present in unfermented bananas through the infiltration of peel injuries (lines 1141-1142 of the original manuscript). In addition, they could be introduced by insects other than flies, given that both rove beetles (Staphylinidae) and sap beetles (Nitidulidae) were observed in some of the traps. These possibilities will be incorporated into the 'MATERIALS AND METHODS' and 'DISCUSSION' sections of our revised manuscript.

      Did the authors check that the flies belong to the D. melanogaster species and not to the sister group D. simulans?

      Our sampling strategy was designed to target not only D. melanogaster but also other domestic Drosophila species, such as D. simulans, that inhabit human residential areas. After adult flies were caught in each trap, we identified the species as shown in Table S1, thereby showing the presence of either or both D. melanogaster and D. simulans. We will provide these descriptions in MATERIALS AND METHODS and DISCUSSION.

      3) Did the microarrays highlight a change in immune genes (ex. antibacterial peptide genes)? Whatever the answer, this would be worth mentioning. The authors described their microarray data in terms of fed/starved in relation to the Finke article. They should clarify if they observed significant differences between species (differences between species within bacteria or fungi, and more generally differences between bacteria versus fungi).

      Did the microarrays highlight a change in immune genes (ex. antibacterial peptide genes)? Whatever the answer, this would be worth mentioning.

      Regarding the antimicrobial peptide genes, statistical comparisons of our RNA-seq data across different conditions were impracticable because most of them showed low expression levels (refer to Author response table 1, which exhibits the RNA-seq data of the yeast-fed larvae; similar expression profiles were observed in the bacteria-fed larvae). While a subset of genes exhibited significantly elevated expression in the non-supportive conditions relative to the supportive ones, this can be due to intra-sample variability rather than due to distinct nutritional environments. Therefore, it would be difficult to discuss a change in immune genes in the paper. Additionally, the previous study that conducted larval microarray analysis (Zinke et al., 2002) did not explicitly focus on immune genes.

      Author response table 1.

      Antimicrobial peptide genes are not up-regulated by any of the microbes. Antimicrobial peptides gene expression profiles of whole bodies of first-instar larvae fed on yeasts. TPM values of all samples and comparison results of gene expression levels in the larvae fed on supportive and non-supportive yeasts are shown. Antibacterial peptide genes mentioned in Hanson and Lemaitre, 2020 are listed. NA or na, not available.

      They should clarify if they observed significant differences between species (differences between species within bacteria or fungi, and more generally differences between bacteria versus fungi).

      We did not observe significant differences between species within bacteria or fungi, or between bacteria and fungi. For example, the gene expression profiles of larvae fed on the various supporting microbes showed striking similarities to each other, as evidenced by the heat map showing the expression of all genes detected in larvae fed either yeast or bacteria (Author response image 1). Similarities were also observed among larvae fed on distinct non-supporting microbes.

      Author response image 1.

      Gene expression profiles of larvae fed on the various supporting microbes show striking similarities to each other. Heat map showing the gene expression of the first-instar larvae that fed on yeasts or bacteria. Freshly hatched germ-free larvae were placed on banana agar inoculated with each microbe and collected after 15 h feeding to examine gene expression of the whole body. Note that data presented in Figures 3A and 4C in the original manuscript, which are obtained independently, are combined to generate this heat map. The labels under the heat map indicate the microbial species fed to the larvae, with three samples analyzed for each condition. The lactic acid bacteria (“LAB”) include Lactiplantibacillus plantarum and Leuconostoc mesenteroides, while the lactic acid bacterium (“AAB”) represents Acetobacter orientalis. “LAB + AAB” signifies mixtures of the AAB and either one of the LAB species. The asterisk in the label highlights a sample in a “LAB” condition (Leuconostoc mesenteroides), which clustered separately from the other “LAB” samples. Brown abbreviations of scientific names are for the yeast-fed conditions. H. uva, Hanseniaspora uvarum; K. hum, Kazachstania humilis; M. asi, Martiniozyma asiatica; S. cra, Saccharomycopsis crataegensis; P. klu, Pichia kluyveri; S. bac, Starmerella bacillaris; S. cer, S. cerevisiae BY4741 strain.

      Only a handful of genes showed different expression patterns between larvae fed on yeast and those fed on bacteria, without any enrichment for specialized gene functions. Thus, it is challenging to discuss the potential differential impacts, if any, of yeast and bacteria on larval growth.

      4) The whole paper - and this is one of its merits - points to a role of the Drosophila larval microbiota in processing the fly food. Are these bacterial and fungal species found in the gut of larvae/adults? Are these species capable of establishing a niche in the cardia of adults as shown recently in the Ludington lab (Dodge et al.,)? Previous studies have suggested that microbiota members stimulate the Imd pathway leading to an increase in digestive proteases (Erkosar/Leulier). Are the microbiota species studied here affecting gut signaling pathways beyond providing branched amino acids?

      The whole paper - and this is one of its merits - points to a role of the Drosophila larval microbiota in processing the fly food. Are these bacterial and fungal species found in the gut of larvae/adults? Are these species capable of establishing a niche in the cardia of adults as shown recently in the Ludington lab (Dodge et al.,)?

      Although we did not investigate the microbiota in the gut of either larvae or adults, we did compare the microbiota within surface-sterilized larvae or adults with those in food samples. We found that adult flies and early-stage food sources, as well as larvae and late-stage food sources, harbor similar microbial species (Figure 1F). Additionally, previous examinations of the gut microbiota in wild adult flies have identified microbial species or taxa congruent with those we isolated from our foods (Chandler et al., 2011; Chandler et al., 2012). We have elaborated on this in our response to Weakness 1).

      While we did not investigate whether these species are capable of establishing a niche in the cardia of adults, we will cite the study by Dodge et al., 2023 in our revised manuscript and discuss the possibility that predominant microbes in adult flies may show a propensity for colonization.

      Previous studies have suggested that microbiota members stimulate the Imd pathway leading to an increase in digestive proteases (Erkosar/Leulier). Are the microbiota species studied here affecting gut signaling pathways beyond providing branched amino acids?

      The reviewer inquires whether the supportive microbes in our study stimulate gut Imd signaling pathways and induce the expression of digestive protease genes, as demonstrated in a previous study (Erkosar et al., 2015). According to our RNA-seq data, it seems unlikely that the supportive microbes stimulate the signaling pathway. Figures contained in Author response image 2 provide the statistical comparisons of expression levels for seven protease genes between the supportive and the non-supportive conditions. These genes did not exhibit a consistent upregulation in the presence of the supportive microbes (H. uva or K. hum in Author response image 2A; Le mes + A. ori in Author response image 2B). Rather, they exhibited a tendency to be upregulated under the non-supportive microbes (St. bac or Pi. klu in Author response image 2A; La. pla in Author response image 2B).

      Author response image 2.

      Most of the peptidase genes reported by Erkosar et al., 2015 are more highly expressed under the non-supportive conditions than the supportive conditions. Comparison of the expression levels of seven peptidase genes derived from the RNA-seq analysis of yeast-fed (A) or bacteria-fed (B) first-instar larvae. A previous report demonstrated that the expression of these genes is upregulated upon association with a strain of Lactiplantibacillus plantarum, and that the PGRP-LE/Imd/Relish signaling pathway, at least partially, mediates the induction (Erkosar et al., 2015). H. uva, Hanseniaspora uvarum; K. hum, Kazachstania humilis; P. klu, Pichia kluyveri; S. bac, Starmerella bacillaris; La. pla, Lactiplantibacillus plantarum; Le. mes, Leuconostoc mesenteroides; A. ori, Acetobacter orientalis; ns, not significant.

      Reviewer #2 (Public Review):

      Weaknesses:

      The experimental setting that, the authors think, reflects host-microbe interactions in nature is one of the key points. However, it is not explicitly mentioned whether isolated microbes are indeed colonized in wild larvae of Drosophila melanogaster who eat bananas. Another matter is that this work is rather descriptive and a few mechanical insights are presented. The evidence that the nutritional role of BCAAs is incomplete, and molecular level explanation is missing in "interspecies interactions" between lactic acid bacteria (or yeast) and acetic acid bacteria that assure their inhabitation. Apart from these matters, the future directions or significance of this work could be discussed more in the manuscript.

      The experimental setting that, the authors think, reflects host-microbe interactions in nature is one of the key points. However, it is not explicitly mentioned whether isolated microbes are indeed colonized in wild larvae of Drosophila melanogaster who eat bananas.

      The reviewer asks whether the isolated microbes were colonized in the larval gut. Previous studies on microbial colonization associated with Drosophila have predominantly focused on adults (Pais et al. PLOS Biology, 2018), rather than larval stages. Developing larvae continually consume substrates which are already subjected to microbial fermentation and abundant in live microbes until the end of the feeding larval stage. Therefore, we consider it difficult to discuss microbial colonization in the larval gut. We will add this point in the DISCUSSION of the revised manuscript.

      Another matter is that this work is rather descriptive and a few mechanical insights are presented. The evidence that the nutritional role of BCAAs is incomplete, and molecular level explanation is missing in "interspecies interactions" between lactic acid bacteria (or yeast) and acetic acid bacteria that assure their inhabitation.

      While recognizing the importance of comprehensive mechanistic analysis, this study includes all experimentally feasible data. Elucidation of more detailed molecular mechanisms lies beyond the scope of this study and will be the subject of future research.

      Regarding the nutritional role of BCAAs, the incorporation of BCAAs enabled larvae fed with the non-supportive yeast to grow to the second instar. This observation suggests that consumption of BCAAs upregulates diverse genes involved in cellular growth processes in larvae. We have discussed the hypothetical interaction between lactic acid bacteria (LAB) and acetic acid bacteria (AAB) in the manuscript (lines 402-405): LAB may facilitate lactate provision to AAB, consequently enhancing the biosynthesis of essential nutrients such as amino acids. To test this hypothesis, future experiments will include the supplementation of lactic acid to AAB culture plates and the co-inoculating LAB mutant strains defective in lactate production with AABs, to assess both larval growth and continuous larval association with AABs. With respect to AAB-yeast interactions, metabolites released from yeast cells might benefit AAB growth, and this possibility will be investigated through the supplementation of AAB culture plates with candidate metabolites identified in the cell suspension supernatants of the late-stage yeasts.

      Apart from these matters, the future directions or significance of this work could be discussed more in the manuscript.

      We appreciate the reviewer's recommendations and will include additional descriptions regarding these aspects in the DISCUSSION section.

      Reviewer #3 (Public Review):

      Weaknesses:

      Despite describing important findings, I believe that a more thorough explanation of the experimental setup and the steps expected to occur in the exposed diet over time, starting with natural "inoculation" could help the reader, in particular the non-specialist, grasp the rationale and main findings of the manuscript. When exactly was the decision to collect early-stage samples made? Was it when embryos were detected in some of the samples? What are the implications of bacterial presence in the no-fly traps? These samples also harbored complex microbial communities, as revealed by sequencing. Were these samples colonized by microbes deposited with air currents? Were they the result of flies that touched the material but did not lay eggs? Could the traps have been visited by other insects? Another interesting observation that could be better discussed is the fact that adult flies showed a microbiome that more closely resembles that of the early-stage diet, whereas larvae have a more late-stage-like microbiome. It is easy to understand why the microbiome of the larvae would resemble that of the late-stage foods, but what about the adult microbiome? Authors should discuss or at least acknowledge the fact that there must be a microbiome shift once adults leave their food source. Lastly, the authors should provide more details about the metabolomics experiments. For instance, how were peaks assigned to leucine/isoleucine (as well as other compounds)? Were both retention times and MS2 spectra always used? Were standard curves produced? Were internal, deuterated controls used?

      When exactly was the decision to collect early-stage samples made? Was it when embryos were detected in some of the samples?

      We collected traps and early-stage samples 2.5 days after setting up the traps. This time frame was determined by pilot experiments. A shorter collection time resulted in a greater likelihood of obtaining no-fly traps, whereas a longer collection time caused larval overcrowding, as well as adults’ deaths from drowning in the liquid seeping out of fruits. These procedural details will be delineated in the MATERIALS AND METHODS section of the revised manuscript.

      What are the implications of bacterial presence in the no-fly traps? These samples also harbored complex microbial communities, as revealed by sequencing. Were these samples colonized by microbes deposited with air currents? Were they the result of flies that touched the material but did not lay eggs? Could the traps have been visited by other insects?

      We assume that the origins of the microbes detected in the no-fly trap foods vary depending on the species. For instance, Colletotrichum musae, the fungus that causes banana anthracnose, may have been present in fresh bananas before trap placement. The filamentous fungi could have originated from airborne spores, but they could also have been introduced by insects that feed on these fungi. We will include these possibilities in the DISCUSSION section of the revised manuscript.

      Another interesting observation that could be better discussed is the fact that adult flies showed a microbiome that more closely resembles that of the early-stage diet, whereas larvae have a more late-stage-like microbiome. It is easy to understand why the microbiome of the larvae would resemble that of the late-stage foods, but what about the adult microbiome? Authors should discuss or at least acknowledge the fact that there must be a microbiome shift once adults leave their food source.

      We are grateful for the reviewer's insightful suggestions regarding shifts in the adult microbiome. We plan to include in the DISCUSSION section of the revised manuscript the possibility that the microbial composition may change substantially during pupal stages and that microbes obtained after eclosion could potentially form the adult gut microbiota.

      Lastly, the authors should provide more details about the metabolomics experiments. For instance, how were peaks assigned to leucine/isoleucine (as well as other compounds)? Were both retention times and MS2 spectra always used? Were standard curves produced? Were internal, deuterated controls used?

      We appreciate the reviewer's advice. Detailed methods of the metabolomic experiments will be included in our revised manuscript.

    1. Author Response

      We would like to thank the editors and reviewers for their thoughtful comments on our manuscript. Before we can provide a point-by-point response and submit a revised version of the manuscript we would like to provisionally address and alleviate some of their main concerns.

      A concern was expressed in the ‘eLife assessment’ and by two of the reviewers that a potential confound between the coding of sensory information and behavior outcome by IC neurons might have been introduced by combining data across different sound levels, which could challenge the conclusions of the study. In addressing this we have carried out the analysis (i.e. averaging the neural activity separately for different sound levels) suggested for distinguishing between the two alternative explanations offered by reviewer #1: That the difference in neural activity between hit and miss trials reflects a) behavior or b) sound level (more precisely: differences in response magnitude arising from a higher proportion of highsound-level trials in the hit trial group than in the miss trial group). If the data favored b), we would expect no difference in activity between hit and miss trials when plotted separately for different sound levels. The figure in Author response image 1 indicates that that is not the case. Hit and miss trial activity are clearly distinct even when plotted separately for different sound levels, confirming that this difference in activity reflects the animals’ behavior rather than sensory information.

      Author response image 1.

      A related concern was expressed with regards to the decoding analysis. Namely, that differences in the distributions of sound levels in the different trial types could confound the decoding into hit and miss trials and that, consequently, the results of the decoding analysis merely reflect differences in the processing of sound level. Our analysis actually aimed to take this into account but, unfortunately, we failed to include sufficient details in the methods section of the submitted manuscript. Rather than including all the trials in a given session, only trials of intermediate difficulty were used for the decoding analysis. More specifically, we only included trials across five sound levels, comprising the lowest sound level that exceeded a d prime of 1.5 plus the two sound levels below and above that level. That ensured that differences in sound level distributions would be small, while still giving us a sufficient number of trials to perform the decoding analysis. In this context, it is worth bearing in mind that a) the decoding analysis was done on a frame-by-frame basis, meaning that the decoding score achieved early in the trial has no impact on the decoding score at later time points in the trial, b) sound-driven activity can be observed predominantly immediately after stimulus onset and is largely over about 1 s into the trial (see cluster 3, for instance, or average miss trial activity in the plots above), c) decoding performance of the behavioral outcome starts to plateau 5001000 ms into the trial and remains high until it very gradually begins to decline after about 2 s into the trial. In other words, decoding performance remains high far longer than the stimulus would be expected to have an impact on the neurons’ activity. Therefore, we would expect any residual bias due to differences in the sound level distribution that our approach did not control for to be restricted to the very beginning of the trial and not to meaningfully impact the conclusions derived from the decoding analysis.

      Another concern expressed in the reviews is that, in relation to the cluster-wise analysis of neural activity, no direct comparison (beyond the pie charts of Figure 5C) was provided between data from lesioned and non-lesioned groups, leaving unclear how similar taskrelevant activity is between these groups. In Author response image 2 we plot, analogous to Figure 5B, the average hit and miss trial activity for the 10 clusters separately for lesioned and non-lesioned mice, illustrating more clearly the high degree of similarity between the two groups.

      Author response image 2.

    1. Author response:

      Reviewer #1 (Public review):

      (1) Some details are not described for experimental procedures. For example, what were the pharmacological drugs dissolved in, and what vehicle control was used in experiments? How long were pharmacological drugs added to cells?

      We apologise for the oversight. These details have now been added to the methods section of the manuscript as well as to the relevant figure legends.

      Briefly, latrunculin was used at a final concentration of 250 nM and Y27632 at a final concentration of 50 μM. Both drugs were dissolved in DMSO. The vehicle controls were effected with the highest final concentration of DMSO of the two drugs.

      The details of the drug treatments and their duration was added to the methods and to figures 6, S10, and S12.

      (2) Details are missing from the Methods section and Figure captions about the number of biological and technical replicates performed for experiments. Figure 1C states the data are from 12 beads on 7 cells. Are those same 12 beads used in Figure 2C? If so, that information is missing from the Figure 2C caption. Similarly, this information should be provided in every figure caption so the reader can assess the rigor of the experiments. Furthermore, how heterogenous would the bead displacements be across different cells? The low number of beads and cells assessed makes this information difficult to determine.

      We apologise for the oversight. We have now added this data to the relevant figure panels.

      To gain a further understanding of the heterogeneity of bead displacements across cells, we have replotted the relevant graphs using different colours to indicate different cells. This reveals that different cells appear to behave similarly and that the behaviour appears controlled by distance to the indentation or the pipette tip rather than cell identity.

      We agree with the reviewer that the number of cells examined is low. This is due to the challenging nature of the experiments that signifies that many attempts are necessary to obtain a successful measurement.

      The experiments in Fig 1C are a verification of a behaviour documented in a previous publication [1]. Here, we just confirm the same behaviour and therefore we decided that only a small number of cells was needed.

      The experiments in Fig 2C (that allow for a direct estimation of the cytoplasm’s hydraulic permeability) require formation of a tight seal between the glass micropipette and the cell, something known as a gigaseal in electrophysiology. The success rate of this first step is 10-30% of attempts for an experienced experimenter. The second step is forming a whole cell configuration, in which a hydraulic link is formed between the cell and the micropipette. This step has a success rate of ~ 50%. Whole cell links are very sensitive to any disturbance. After reaching the whole cell configuration, we applied relatively high pressures that occasionally resulted in loss of link between the cell and the micropipette. In summary, for the 12 successful measurements, hundreds of unsuccessful attempts were carried out.

      (3) The full equation for displacement vs. time for a poroelastic material is not provided. Scaling laws are shown, but the full equation derived from the stress response of an elastic solid and viscous fluid is not shown or described.

      We thank the reviewer for this comment. Based on our experiments, we found that the cytoplasm behaves as a poroelastic material. However, to understand the displacements of the cell surface in response to localised indentation, we show that we also need to take the tension of the sub membranous cortex into account. In summary, the interplay between cell surface tension generated by the cortex and the poroelastic cytoplasm controls the cell behaviour. To our knowledge, no simple analytical solutions to this type of problem exist.

      In Fig 1, we show that the response of the cell to local indentation is biphasic with a short time-scale displacement followed by a longer time-scale one. In Figs 2 and 3, we directly characterise the kinetics of cell surface displacement in response to microinjection of fluid. These kinetics are consistent with the long time-scale displacement but not the short time-scale one. Scaling considerations led us to propose that tension in the cortex may play a role in mediating the short time-scale displacement. To verify this hypothesis, we have now added new data showing that the length-scale of an indentation created by an AFM probe depends on tension in the cortex (Fig S5).

      In a previous publication [2], we derived the temporal dynamics of cell surface displacement for a homogenous poroelastic material in response to a change in osmolarity. In the current manuscript, the composite nature of the cell (membrane, cortex, cytoplasm) needs to be taken into account as well as a realistic cell shape. Therefore, we did not attempt to provide an analytical solution for the displacement of the cell surface versus time in the current work. Instead, we turned to finite element modelling to show that our observations are qualitatively consistent with a cell that comprises a tensed sub membranous actin cortex and a poroelastic cytoplasm (Fig 4). We have now added text to make this clearer for the reader.

      Reviewer #2 (Public review):

      Comments & Questions:

      The authors state, "Next, we sought to quantitatively understand how the global cellular response to local indentation might arise from cellular poroelasticity." However, the evidence presented in the following paragraph appears more qualitative than strictly quantitative. For instance, the length scale estimate of ~7 μm is only qualitatively consistent with the observed ~10 μm, and the timescale 𝜏𝑧 ≈ 500 ms is similarly described as "qualitatively consistent" with experimental observations. Strengthening this point would benefit from more direct evidence linking the short timescale to cell surface tension. Have you tried perturbing surface tension and examining its impact on this short-timescale relaxation by modulating acto-myosin contractility with Y-27632, depolymerizing actin with Latrunculin, or applying hypo/hyperosmotic shocks?

      Upon rereading our manuscript, we agree with the reviewer that some of our statements are too strong. We have now moderated these and clarified the goal of that section of the text.

      The reviewer asks if we have examined the effect of various perturbations on the short time-scale displacements. In our experimental conditions, we cannot precisely measure the time-scale of the fast relaxation because its duration is comparable to the frame rate of our image acquisition. However, we examined the amplitude of the displacement of the first phase in response to sucrose treatment and we have carried out new experiments in which we treat cells with 250nM Latrunculin to partially depolymerise cellular F-actin. Neither of these treatments had an impact on the amplitude of vertical displacements (Author response image 1).

      The absence of change in response to Latrunculin may be because the treatment decreases both the elasticity of the cytoplasm E and the cortical tension γ. As the length-scale l of the deformation of the surface scales as , the two effects of latrunculin treatment may therefore compensate one another and result in only small changes in l. We have now added this data to supplementary information and comment on this in the text.

      Author response image 1:

      Amplitude of the short time-scale displacements of beads in response to AFM indentation at δx=0µm for control cells, sucrose treated cells, and cells treated with Latrunculin B. n indicates the number of cells examined and N the number of beads.

      The reviewer’s comment also made us want to determine how cortical tension affects the length-scale of the cell surface deformation created by localised micro indentation. To isolate the role of the cortex from that of cell shape, we decided to examine rounded mitotic cells. In our experiments, we indented a mitotic cell expressing a membrane targeted GFP with a sharp AFM tip (Author response image 2).

      In our experiments, we adjusted force to generate a 2μm depth indentation and we imaged the cell profile with confocal microscopy before and during indentation. Segmentation of this data allowed us to determine the cell surface displacement resulting from indentation and measure a length scale of deformation. In control conditions, the length scale created by deformation is on the order of 1.2μm. When we inhibited myosin contractility with blebbistatin, the length-scale of deformation decreased significantly to 0.8 μm, as expected if we decrease the surface tension γ without affecting the cytoplasmic elasticity. We have now added this data to our manuscript.

      Author response image 2.

      (a) Overlay of the zx profiles of a mitotic cell before (green) and during indentation (red). The cell membrane is labelled with CellMask DeepRed. The arrowhead indicates the position of the AFM tip. Scale bar 10µm. (b) Position of the membrane along the top half of the cell before (green) and during (red) indentation. The membrane position is derived from segmentation of the data in (a). Deformation is highly localised and membrane profiles overlap at the edges. The tip position is marked by an *. (c) The difference in membrane height between pre-indentation and indentation profiles plotted in (b) with the tip located at x=0. (d) Schematic of the cell surface profile during indentation and the corresponding length scale of the deformation induced by indentation. (e) Measured length scale for an indentation ~2µm for DMSO control l=1.2±0.2µm (n=8 cells) and with blebbistatin treatment (100µM) l=0.8±0.4µm (n=9 cells) (p= 0.016

      The authors demonstrate that the second relaxation timescale increases (Figure 1, Panel D) following a hyperosmotic shock, consistent with cytoplasmic matrix shrinkage, increased friction, and consequently a longer relaxation timescale. While this result aligns with expectations, is a seven-fold increase in the relaxation timescale realistic based on quantitative estimates given the extent of volume loss?

      We thank the reviewer for this interesting question. Upon re-examining our data, we realised that the numerical values in the text related to the average rather than the median of our measurements. The median of the poroelastic time constant increases from ~0.4s in control conditions to 1.4s in sucrose, representing approximately a 3.5-fold increase.

      Previous work showed that HeLa cell volume decreases by ~40% in response to hyperosmotic shock [3]. The fluid volume fraction in cells is ~65-75%. If we assume that the water is contained in N pores of volume , we can express the cell volume as with V<sub>s</sub> the volume of the solid fraction. We can rewrite with ϕ = 0.42 -0.6. As V<sub>s</sub> does not change in response to osmotic shock, we can rewrite the volume change to obtain the change in pore size .

      The poroelastic diffusion constant scales as and the poroelastic timescale scales as . Therefore, the measured change in volume leads to a predicted increase in poroelastic diffusion time of 1.7-1.9-fold, smaller than observed in our experiments. This suggests that some intuition can be gained in a straightforward manner assuming that the cytoplasm is a homogenous porous material.

      However, the reality is more complex and the hydraulic pore size is distinct from the entanglement length of the cytoskeleton mesh, as we discussed in a previous publication [4]. When the fluid fraction becomes sufficiently small, macromolecular crowding will impact diffusion further and non-linearities will arise. We have now added some of these considerations to the discussion.

      If the authors' hypothesis is correct, an essential physiological parameter for the cytoplasm could be the permeability k and how it is modulated by perturbations, such as volume loss or gain. Have you explored whether the data supports the expected square dependency of permeability on hydraulic pore size, as predicted by simple homogeneity assumptions?

      We thank the reviewer for this comment. As discussed above, we have explored such considerations in a previous publication (see discussion in [4]). Briefly, we find that the entanglement length of the F-actin cytoskeleton does play a role in controlling the hydraulic pore size but is distinct from it. Membrane bounded organelles could also contribute to setting the pore size. In our previous publication, we derived a scaling relationship that indicates that four different length-scales contribute to setting cellular rheology: the average filament bundle length, the size distribution of particles in the cytosol, the entanglement length of the cytoskeleton, and the hydraulic pore size. Many of these length-scales can be dynamically controlled by the cell, which gives rise to complex rheology. We have now added these considerations to our discussion.

      Additionally, do you think that the observed decrease in k in mitotic cells compared to interphase cells is significant? I would have expected the opposite naively as mitotic cells tend to swell by 10-20 percent due to the mitotic overshoot at mitotic entry (see Son Journal of Cell Biology 2015 or Zlotek Journal of Cell Biology 2015).

      We thank the reviewer for this interesting question. Based on the same scaling arguments as above, we would expect that a 10-20% increase in cell volume would give rise to 10-20% increase in diffusion constant. However, we also note that metaphase leads to a dramatic reorganisation of the cell interior and in particular membrane-bounded organelles. In summary, we do not know why such a decrease could take place. We now highlight this as an interesting question for further research.

      Based on your results, can you estimate the pore size of the poroelastic cytoplasmic matrix? Is this estimate realistic? I wonder whether this pore size might define a threshold above which the diffusion of freely diffusing species is significantly reduced. Is your estimate consistent with nanobead diffusion experiments reported in the literature? Do you have any insights into the polymer structures that define this pore size? For example, have you investigated whether depolymerizing actin or other cytoskeletal components significantly alters the relaxation timescale?

      We thank the reviewer for this comment. We cannot directly estimate the hydraulic pore size from the measurements performed in the manuscript. Indeed, while we understand the general scaling laws, the pre-factors of such relationships are unknown.

      We carried out experiments aiming at estimating the hydraulic pore size in previous publications [3,4] and others have shown spatial heterogeneity of the cytoplasmic pore size [5]. In our previous experiments, we examined the diffusion of PEGylated quantum dots (14nm in hydrodynamic radius). In isosmotic conditions, these diffused freely through the cell but when the cell volume was decreased by a hyperosmotic shock, they no longer moved [3,4]. This gave an estimate of the pore radius of ~15nm.

      Previous work has suggested that F-actin plays a role in dictating this pore size but microtubules and intermediate filaments do not [4].

      There are no quantifications in Figure 6, nor is there a direct comparison with the model. Based on your model, would you expect the velocity of bleb growth to vary depending on the distance of the bleb from the pipette due to the local depressurization? Specifically, do blebs closer to the pipette grow more slowly?

      We apologise for the oversight. The quantifications are presented in Fig S10 and Fig S12. We have now modified the figure legends accordingly.

      Blebs are very heterogenous in size and growth velocity within a cell and across cells in the population in normal conditions [6]. Other work has shown that bleb size is controlled by a competition between pressure driving growth and actin polymerisation arresting it[7]. Therefore, we did not attempt to determine the impact of depressurisation on bleb growth velocity or size.

      In experiments in which we suddenly increased pressure in blebbing cells, we did notice a change in the rate of growth of blebs that occurred after we increased pressure (Author response image 3). However, the experiments are technically challenging and we decided not to perform more.

      Author response image 3:

      A. A hydraulic link is established between a blebbing cell and a pipette. At time t>0, a step increase in pressure is applied. B. Kymograph of bleb growth in a control cell (top) an in a cell subjected to a pressure increase at t=0s (bottom). Top: In control blebs, the rate of growth is slow and approximately constant over time. The black arrow shows the start of blebbing. Bottom: The black arrow shows the start of blebbing. The dashed line shows the timing of pressure application and the red arrow shows the increase in growth rate of the bleb when the pressure increase reaches the bleb. This occurs with a delay δt.

      I find it interesting that during depressurization of the interphase cells, there is no observed volume change, whereas in pressurization of metaphase cells, there is a volume increase. I assume this might be a matter of timescale, as the microinjection experiments occur on short timescales, not allowing sufficient time for water to escape the cell. Do you observe the radius of the metaphase cells decreasing later on? This relaxation could potentially be used to characterize the permeability of the cell surface.

      We thank the reviewer for this comment.

      First, we would like to clarify that both metaphase and interphase cells increase their volume in response to microinjection. The effect is easier to quantify in metaphase cells because we assume spherical symmetry and just monitor the evolution of the radius (Fig 3). However, the displacement of the beads in interphase cells (Fig 2) clearly shows that the cell volume increases in response to microinjection. For both interphase and metaphase cells, when the injection is prolonged, the membrane eventually detaches from the cortex and large blebs form until cell lysis. In contrast to the reviewer’s intuition, we never observe a relaxation in cell volume, probably because we inject fluid faster than the cell can compensate volume change through regulatory mechanisms involving ion channels.

      When we depressurise metaphase cells, we do not observe any change in volume (Fig S10). This contrasts with the increase that we observe upon pressurisation. The main difference between these two experiments is the pressure differential. During depressurisation experiments, this is the hydraulic pressure within the cell ~500Pa (Fig 6A); whereas during pressurisation experiments, this is the pressure in the micropipette, ranging from 1.4-10 kPa (Fig 3). We note in particular that, when we used the lowest pressures in our experiments, the increase in volume was very slow (see Fig 3C). Therefore, we agree with the reviewer that it is likely the magnitude of the pressure differential that explains these differences.

      I am curious about the saturation of the time lag at 30 microns from the pipette in Figure 4, Panel E for the model's prediction. A saturation which is not clearly observed in the experimental data. Could you comment on the origin of this saturation and the observed discrepancy with the experiments (Figure E panel 2)? Naively, I would have expected the time lag to scale quadratically with the distance from the pipette, as predicted by a poroelastic model and the diffusion of displacement. It seems weird to me that the beads start to move together at some distance from the pipette or else I would expect that they just stop moving. What model parameters influence this saturation? Does membrane permeability contribute to this saturation?

      We thank the reviewer for pointing this out. In our opinion, the saturation occurring at 30 microns arises from the geometry of the model. At the largest distance away from the micropipette, the cortex becomes dominant in the mechanical response of the cell because it represents an increasing proportion of the cellular material.

      To test this hypothesis, we will rerun our finite element models with a range of cell sizes. This will be added to the manuscript at a later date.

      Reviewer #3 (Public review):

      Weaknesses: I have two broad critical comments:

      (1) I sense that the authors are correct that the best explanation of their results is the passive poroelastic model. Yet, to be thorough, they have to try to explain the experiments with other models and show why their explanation is parsimonious. For example, one potential explanation could be some mechanosensitive mechanism that does not involve cytoplasmic flow; another could be viscoelastic cytoskeletal mesh, again not involving poroelasticity. I can imagine more possibilities. Basically, be more thorough in the critical evaluation of your results. Besides, discuss the potential effect of significant heterogeneity of the cell.

      We thank the reviewer for these comments and we agree with their general premise.

      Some observations could qualitatively be explained in other ways. For example, if we considered the cell as a viscoelastic material, we could define a time constant with η the viscosity and E the elasticity of the material. The increase in relaxation time with sucrose treatment could then be explained by an increase in viscosity. However, work by others has previously shown that, in the exact same conditions as our experiment, viscoelasticity cannot account for the observations[1]. In its discussion, this study proposed poroelasticity as an alternative mechanism but did not investigate that possibility. This was consistent with our work that showed that the cytoplasm behaves as a poroelastic material and not as a viscoelastic material [4]. Therefore, we decided not to consider viscoelasticity as possibility. We now explain this reasoning better and have added a sentence about a potential role for mechanotransductory processes in the discussion.

      (2) The study is rich in biophysics but a bit light on chemical/genetic perturbations. It could be good to use low levels of chemical inhibitors for, for example, Arp2/3, PI3K, myosin etc, and see the effect and try to interpret it. Another interesting question - how adhesive strength affects the results. A different interesting avenue - one can perturb aquaporins. Etc. At least one perturbation experiment would be good.

      We agree with the reviewer. In our previous studies, we already examined what biological structures affect the poroelastic properties of cells [2,4]. Therefore, the most interesting aspect to examine in our current work would be perturbations to the phenomenon described in Fig 6G and, in particular, to investigate what volume regulation mechanisms enable sustained intracellular pressure gradients. However, these experiments are particularly challenging and with very low throughput. Therefore, we feel that these are out of the scope of the present report and we mention these as promising future directions.

      References:

      (1) Rosenbluth, M. J., Crow, A., Shaevitz, J. W. & Fletcher, D. A. Slow stress propagation in adherent cells. Biophys J 95, 6052-6059 (2008). https://doi.org/10.1529/biophysj.108.139139

      (2) Esteki, M. H. et al. Poroelastic osmoregulation of living cell volume. iScience 24, 103482 (2021). https://doi.org/10.1016/j.isci.2021.103482

      (3) Charras, G. T., Mitchison, T. J. & Mahadevan, L. Animal cell hydraulics. J Cell Sci 122, 3233-3241 (2009). https://doi.org/10.1242/jcs.049262

      (4) Moeendarbary, E. et al. The cytoplasm of living cells behaves as a poroelastic material. Nat Mater 12, 253-261 (2013). https://doi.org/10.1038/nmat3517

      (5) Luby-Phelps, K., Castle, P. E., Taylor, D. L. & Lanni, F. Hindered diffusion of inert tracer particles in the cytoplasm of mouse 3T3 cells. Proc Natl Acad Sci U S A 84, 4910-4913 (1987). https://doi.org/10.1073/pnas.84.14.4910

      (6) Charras, G. T., Coughlin, M., Mitchison, T. J. & Mahadevan, L. Life and times of a cellular bleb. Biophys J 94, 1836-1853 (2008). https://doi.org/10.1529/biophysj.107.113605

      (7) Tinevez, J. Y. et al. Role of cortical tension in bleb growth. Proc Natl Acad Sci U S A 106, 18581-18586 (2009). https://doi.org/10.1073/pnas.0903353106

    1. Author Response

      eLife assessment

      This potentially valuable study uses classic neuroanatomical techniques and synchrotron X-ray tomography to investigate the mapping of the trunk within the brainstem nuclei of the elephant brain. Given its unique specializations, understanding the somatosensory projections from the elephant trunk would be of general interest to evolutionary neurobiologists, comparative neuroscientists, and animal behavior scientists. However, the anatomical analysis is inadequate to support the authors' conclusion that they have identified the elephant trigeminal sensory nuclei rather than a different brain region, specifically the inferior olive.

      Comment: We are happy that our paper is considered to be potentially valuable. Also, the editors highlight the potential interest of our work for evolutionary neurobiologists, comparative neuroscientists, and animal behavior scientists. The editors are more negative when it comes to our evidence on the identification of the trigeminal nucleus vs the inferior olive. We have five comments on this assessment. (i) We think this assessment is heavily biased by the comments of referee 2. We will show that the referee’s comments are more about us than about our paper. Hence, the referee failed to do their job (refereeing our paper) and should not have succeeded in leveling our paper. (ii) We have no ad hoc knock-out experiments to distinguish the trigeminal nucleus vs the inferior olive. Such experiments (extracellular recording & electrolytic lesions, viral tracing would be done in a week in mice, but they cannot and should not be done in elephants. (iii) We have extraordinary evidence. Nobody has ever described a similarly astonishing match of body (trunk folds) and myeloarchitecture in the trigeminal system before. (iv) We will show that our assignment of the trigeminal nucleus vs the inferior olive is more plausible than the current hypothesis about the assignment of the trigeminal nucleus vs the inferior olive as defended by referee 2. We think this is why it is important to publish our paper. (v) We think eLife is the perfect place for our publication because the deviating views of referee 2 are published along.

      Change: We performed additional peripherin-antibody staining to differentiate the inferior olive and trigeminal nucleus. Peripherin is a cytoskeletal protein that is found in peripheral nerves and climbing fibers. Specifically, climbing fibers of various species (mouse, rabbit, pig, cow, and human; Errante et al., 1998) are stained intensely with peripherin-antibodies. What is tricky for our purposes is that there is also some peripherin-antibody reactivity in the trigeminal nuclei (Errante et al., 1998). Such peripherin-antibody reactivity is weaker, however, and lacks the distinct axonal bundle signature that stems from the strong climbing fiber peripherin-reactivity as seen in the inferior olive (Errante et al., 1998). As can be seen in Author response image 1, we observe peripherin-reactivity in axonal bundles (i.e. in putative climbing fibers), in what we think is the inferior olive. We also observe weak peripherin-reactivity, in what we think is the trigeminal nucleus, but not the distinct and strong labeling of axonal bundles. These observations are in line with our ideas but are difficult to reconcile with the views of the referee. Specifically, the lack of peripherin-reactive axon bundles suggests that there are no climbing fibres in what the referee thinks is the inferior olive.

      Errante, L., Tang, D., Gardon, M., Sekerkova, G., Mugnaini, E., & Shaw, G. (1998). The intermediate filament protein peripherin is a marker for cerebellar climbing fibres. Journal of neurocytology, 27, 69-84.

      Author response image 1.

      The putative inferior olive but not the putative trigeminal nucleus contains peripherin-positive axon bundles (presumptive climbing fibers). (A) Overview picture of a brainstem section stained with anti-peripherin-antibodies (white color). Anti-peripherin-antibodies stain climbing fibers in a wide variety of mammals. The section comes from the posterior brainstem of African elephant cow Bibi; in this posterior region, both putative inferior olive and trigeminal nucleus are visible. Note the bright staining of the dorsolateral nucleus, the putative inferior olive according to Reveyaz et al., and the trigeminal nucleus according to Maseko et al., 2013. (B) High magnification view of the dorsolateral nucleus (corresponding to the upper red rectangle in A). Anti-peripherin-positive axon bundles (putative climbing fibers) are seen in support of the inferior olive hypothesis of Reveyaz et al. (C) High magnification view of the ventromedial nucleus (corresponding to the lower red rectangle in A). The ventromedial nucleus is weakly positive for peripherin but contains no anti-peripherin-positive axon bundles (i.e. no putative climbing fibers) in support of the trigeminal nucleus hypothesis of Reveyaz et al. Note that myelin stripes – weakly visible as dark omissions – are clearly anti-peripherin-negative.

      Reviewer #1:

      Summary:

      This fundamental study provides compelling neuroanatomical evidence underscoring the sensory function of the trunk in African and Asian elephants. Whereas myelinated tracts are classically appreciated as mediating neuronal connections, the authors speculate that myelinated bundles provide functional separation of trunk folds and display elaboration related to the "finger" projections. The authors avail themselves of many classical neuroanatomical techniques (including cytochrome oxidase stains, Golgi stains, and myelin stains) along with modern synchrotron X-ray tomography. This work will be of interest to evolutionary neurobiologists, comparative neuroscientists, and the general public, with its fascinating exploration of the brainstem of an icon sensory specialist.

      Comment: We are incredibly grateful for this positive assessment.

      Changes: None.

      Strengths:

      • The authors made excellent use of the precious sample materials from 9 captive elephants.

      • The authors adopt a battery of neuroanatomical techniques to comprehensively characterize the structure of the trigeminal subnuclei and properly re-examine the "inferior olive".

      • Based on their exceptional histological preparation, the authors reveal broadly segregated patterns of metabolic activity, similar to the classical "barrel" organization related to rodent whiskers.

      Comment: The referee provides a concise summary of our findings.

      Changes: None.

      Weaknesses:

      • As the authors acknowledge, somewhat limited functional description can be provided using histological analysis (compared to more invasive techniques).

      • The correlation between myelinated stripes and trunk fold patterns is intriguing, and Figure 4 presents this idea beautifully. I wonder - is the number of stripes consistent with the number of trunk folds? Does this hold for both species?

      Comment: We agree with the referee’s assessment. We note that cytochrome-oxidase staining is an at least partially functional stain, as it reveals constitutive metabolic activity. A significant problem of the work in elephants is that our recording possibilities are limited, which in turn limits functional analysis. As indicated in Figure 4 for the African elephant Indra, there was an excellent match of trunk folds and myelin stripes. Asian elephants have more, and less conspicuous trunk folds than African elephants. As illustrated in Figure 6, Asian elephants have more, and less conspicuous myelin stripes. Thus, species differences in myelin stripes correlate with species differences in trunk folds.

      Changes: We clarify the relation of myelin stripe and trunk fold patterns in our discussion of Figure 6.  

      Reviewer #2 (Public Review):

      The authors describe what they assert to be a very unusual trigeminal nuclear complex in the brainstem of elephants, and based on this, follow with many speculations about how the trigeminal nuclear complex, as identified by them, might be organized in terms of the sensory capacity of the elephant trunk.

      Comment: We agree with the referee’s assessment that the putative trigeminal nucleus described in our paper is highly unusual in size, position, vascularization, and myeloarchitecture. This is why we wrote this paper. We think these unusual features reflect the unique facial specializations of elephants, i.e. their highly derived trunk. Because we have no access to recordings from the elephant brainstem, we cannot back up all our functional interpretations with electrophysiological evidence; it is therefore fair to call them speculative.

      Changes: None.

      The identification of the trigeminal nuclear complex/inferior olivary nuclear complex in the elephant brainstem is the central pillar of this manuscript from which everything else follows, and if this is incorrect, then the entire manuscript fails, and all the associated speculations become completely unsupported.

      Comment: We agree.

      Changes: None.

      The authors note that what they identify as the trigeminal nuclear complex has been identified as the inferior olivary nuclear complex by other authors, citing Shoshani et al. (2006; 10.1016/j.brainresbull.2006.03.016) and Maseko et al (2013; 10.1159/000352004), but fail to cite either Verhaart and Kramer (1958; PMID 13841799) or Verhaart (1962; 10.1515/9783112519882-001). These four studies are in agreement, but the current study differs.

      Comment & Change: We were not aware of the papers of Verhaart and included them in the revised ms.

      Let's assume for the moment that the four previous studies are all incorrect and the current study is correct. This would mean that the entire architecture and organization of the elephant brainstem is significantly rearranged in comparison to ALL other mammals, including humans, previously studied (e.g. Kappers et al. 1965, The Comparative Anatomy of the Nervous System of Vertebrates, Including Man, Volume 1 pp. 668-695) and the closely related manatee (10.1002/ar.20573). This rearrangement necessitates that the trigeminal nuclei would have had to "migrate" and shorten rostrocaudally, specifically and only, from the lateral aspect of the brainstem where these nuclei extend from the pons through to the cervical spinal cord (e.g. the Paxinos and Watson rat brain atlases), the to the spatially restricted ventromedial region of specifically and only the rostral medulla oblongata. According to the current paper, the inferior olivary complex of the elephant is very small and located lateral to their trigeminal nuclear complex, and the region from where the trigeminal nuclei are located by others appears to be just "lateral nuclei" with no suggestion of what might be there instead.

      Comment: We have three comments here:

      1) The referee correctly notes that we argue the elephant brainstem underwent fairly major rearrangements. In particular, we argue that the elephant inferior olive was displaced laterally, by a very large cell mass, which we argue is an unusually large trigeminal nucleus. To our knowledge, such a large compact cell mass is not seen in the ventral brain stem of any other mammal.

      2) The referee makes it sound as if it is our private idea that the elephant brainstem underwent major rearrangements and that the rest of the evidence points to a conventional ‘rodent-like’ architecture. This is far from the truth, however. Already from the outside appearance (see our Figure 1B and Figure 6A) it is clear that the elephant brainstem has huge ventral bumps not seen in any other mammal. An extraordinary architecture also holds at the organizational level of nuclei. Specifically, the facial nucleus – the most carefully investigated nucleus in the elephant brainstem – has an appearance distinct from that of the facial nuclei of all other mammals (Maseko et al., 2013; Kaufmann et al., 2022). If both the overall shape and the constituting nuclei of the brainstem are very different from other mammals, it is very unlikely if not impossible that the elephant brainstem follows in all regards a conventional ‘rodent-like’ architecture.

      3) The inferior olive is an impressive nucleus in the partitioning scheme we propose (Author response image 1). In fact – together with the putative trigeminal nucleus we describe – it’s the most distinctive nucleus in the elephant brainstem. We have not done volumetric measurements and cell counts here, but think this is an important direction for future work. What has informed our work is that the inferior olive nucleus we describe has the serrated organization seen in the inferior olive of all mammals. We will discuss these matters in depth below.

      Changes: None.

      Such an extraordinary rearrangement of brainstem nuclei would require a major transformation in the manner in which the mutations, patterning, and expression of genes and associated molecules during development occur. Such a major change is likely to lead to lethal phenotypes, making such a transformation extremely unlikely. Variations in mammalian brainstem anatomy are most commonly associated with quantitative changes rather than qualitative changes (10.1016/B978-0-12-804042-3.00045-2).

      Comment: We have two comments here:

      1) The referee claims that it is impossible that the elephant brainstem differs from a conventional brainstem architecture because this would lead to lethal phenotypes etc. Following our previous response, this argument does not hold. It is out of the question that the elephant brainstem looks very different from the brainstem of other mammals. Yet, it is also evident that elephants live. The debate we need to have is not if the elephant brainstem differs from other mammals, but how it differs from other mammals.

      2). In principle we agree with the referee’s thinking that the model of the elephant brainstem that is most likely correct is the one that requires the least amount of rearrangements to other mammals. We therefore prepared a comparison of the model the referee is proposing (Maseko et al., 2013; see Author response table 1 below) with our proposition. We scored these models on their similarity to other mammals. We find that the referee’s ideas (Maseko et al., 2013) require more rearrangements relative to other mammals than our suggestion.

      Changes: Inclusion of Author response table 1, which we discuss in depth below.

      The impetus for the identification of the unusual brainstem trigeminal nuclei in the current study rests upon a previous study from the same laboratory (10.1016/j.cub.2021.12.051) that estimated that the number of axons contained in the infraorbital branch of the trigeminal nerve that innervate the sensory surfaces of the trunk is approximately 400 000. Is this number unusual? In a much smaller mammal with a highly specialized trigeminal system, the platypus, the number of axons innervating the sensory surface of the platypus bill skin comes to 1 344 000 (10.1159/000113185). Yet, there is no complex rearrangement of the brainstem trigeminal nuclei in the brain of the developing or adult platypus (Ashwell, 2013, Neurobiology of Monotremes), despite the brainstem trigeminal nuclei being very large in the platypus (10.1159/000067195). Even in other large-brained mammals, such as large whales that do not have a trunk, the number of axons in the trigeminal nerve ranges between 400,000 and 500,000 (10.1007/978-3-319-47829-6_988-1). The lack of comparative support for the argument forwarded in the previous and current study from this laboratory, and that the comparative data indicates that the brainstem nuclei do not change in the manner suggested in the elephant, argues against the identification of the trigeminal nuclei as outlined in the current study. Moreover, the comparative studies undermine the prior claim of the authors, informing the current study, that "the elephant trigeminal ganglion ... point to a high degree of tactile specialization in elephants" (10.1016/j.cub.2021.12.051). While clearly, the elephant has tactile sensitivity in the trunk, it is questionable as to whether what has been observed in elephants is indeed "truly extraordinary".

      Comment: These comments made us think that the referee is not talking about the paper we submitted, but that the referee is talking about us and our work in general. Specifically, the referee refers to the platypus and other animals dismissing our earlier work, which argued for a high degree of tactile specialization in elephants. We think the referee’s intuitions are wrong and our earlier work is valid.

      Changes: We prepared a Author response image 2 (below) that puts the platypus brain, a monkey brain, and the elephant trigeminal ganglion (which contains a large part of the trunk innervating cells) in perspective.

      Author response image 2.

      The elephant trigeminal ganglion is comparatively large. Platypus brain, monkey brain, and elephant ganglion. The elephant has two trigeminal ganglia, which contain the first-order somatosensory neurons. They serve mainly for tactile processing and are large compared to a platypus brain (from the comparative brain collection) and are similar in size to a monkey brain. The idea that elephants might be highly specialized for trunk touch is also supported by the analysis of the sensory nerves of these animals (Purkart et al., 2022). Specifically, we find that the infraorbital nerve (which innervates the trunk) is much thicker than the optic nerve (which mediates vision) and the vestibulocochlear nerve (which mediates hearing). Thus, not everything is large about elephants; instead, the data argue that these animals are heavily specialized for trunk touch.

      But let's look more specifically at the justification outlined in the current study to support their identification of the unusually located trigeminal sensory nuclei of the brainstem.

      (1) Intense cytochrome oxidase reactivity.

      (2) Large size of the putative trunk module.

      (3) Elongation of the putative trunk module.

      (4) The arrangement of these putative modules corresponds to elephant head anatomy.

      (5) Myelin stripes within the putative trunk module that apparently match trunk folds.

      (6) Location apparently matches other mammals.

      (7) Repetitive modular organization apparently similar to other mammals.

      (8) The inferior olive described by other authors lacks the lamellated appearance of this structure in other mammals.

      Comment: We agree those are key issues.

      Changes: None.

      Let's examine these justifications more closely.

      (1) Cytochrome oxidase histochemistry is typically used as an indicative marker of neuronal energy metabolism. The authors indicate, based on the "truly extraordinary" somatosensory capacities of the elephant trunk, that any nuclei processing this tactile information should be highly metabolically active, and thus should react intensely when stained for cytochrome oxidase. We are told in the methods section that the protocols used are described by Purkart et al (2022) and Kaufmann et al (2022). In neither of these cited papers is there any description, nor mention, of the cytochrome oxidase histochemistry methodology, thus we have no idea of how this histochemical staining was done. To obtain the best results for cytochrome oxidase histochemistry, the tissue is either processed very rapidly after buffer perfusion to remove blood or in recently perfusion-fixed tissue (e.g., 10.1016/0165-0270(93)90122-8). Given: (1) the presumably long post-mortem interval between death and fixation - "it often takes days to dissect elephants"; (2) subsequent fixation of the brains in 4% paraformaldehyde for "several weeks"; (3) The intense cytochrome oxidase reactivity in the inferior olivary complex of the laboratory rat (Gonzalez-Lima, 1998, Cytochrome oxidase in neuronal metabolism and Alzheimer's diseases); and (4) The lack of any comparative images from other stained portions of the elephant brainstem; it is difficult to support the justification as forwarded by the authors. The histochemical staining observed is likely background reactivity from the use of diaminobenzidine in the staining protocol. Thus, this first justification is unsupported.

      Comment: The referee correctly notes the description of our cytochrome-oxidase reactivity staining was lacking. This is a serious mistake of ours for which we apologize very much. The referee then makes it sound as if we messed up our cytochrome-oxidase staining, which is not the case. All successful (n = 3; please see our technical comments in the recommendation section) cytochrome-oxidase stainings were done with elephants with short post-mortem times (≤ 2 days) to brain removal/cooling and only brief immersion fixation (≤ 1 day). Cytochrome-oxidase reactivity in elephant brains appears to be more sensitive to quenching by fixation than is the case for rodent brains. We think it is a good idea to include a cytochrome-oxidase staining overview picture because we understood from the referee’s comments that we need to compare our partitioning scheme of the brainstem with that of other authors. To this end, we add a cytochrome-oxidase staining overview picture (Author response image 3) along with an alternative interpretation from Maseko et al., 2013.

      Changes: 1) We added details on our cytochrome-oxidase reactivity staining protocol and the cytochrome-oxidase reactivity in the elephant brain in general recommendation.

      2) We provide a detailed discussion of the technicalities of cytochrome-oxidase staining below in the recommendation section, where the referee raised further criticisms.

      3) We include a cytochrome-oxidase staining overview picture (Author response image 2) along with an alternative interpretation from Maseko et al., 2013.

      Author response image 3.

      Cytochrome-oxidase staining overview along with the Maseko et al. (2013) scheme Left, coronal cytochrome-oxidase staining overview from African elephant cow Indra; the section is taken a few millimeters posterior to the facial nucleus. Brown is putatively neural cytochrome-reactivity, and white is the background. Black is myelin diffraction and (seen at higher resolution, when you zoom in) erythrocyte cytochrome-reactivity in blood vessels (see our Figure 1E-G); such blood vessel cytochrome-reactivity is seen, because we could not perfuse the animal. There appears to be a minimal outside-in-fixation artifact (i.e. a more whitish/non-brownish appearance of the section toward the borders of the brain). This artifact is not seen in sections from Indra that we processed earlier or in other elephant brains processed at shorter post-mortem/fixation delays (see our Figure 1C). Right, coronal partitioning scheme of Maseko et al. (2013) for the elephant brainstem at an approximately similar anterior-posterior level.

      The same structures can be recognized left and right. The section is taken at an anterior-posterior level, where we encounter the trigeminal nuclei in pretty much all mammals. Note that the neural cytochrome reactivity is very high, in what we refer to as the trigeminal-nuclei-trunk-module and what Maseko et al. refer to as inferior olive. Myelin stripes can be recognized here as white omissions.

      At the same time, the cytochrome-oxidase-reactivity is very low in what Maseko et al. refer to as trigeminal nuclei. The indistinct appearance and low cytochrome-oxidase-reactivity of the trigeminal nuclei in the scheme of Maseko et al. (2013) is unexpected because trigeminal nuclei stain intensely for cytochrome-oxidase-reactivity in most mammals and because the trigeminal nuclei represent the elephant’s most important body part, the trunk. Staining patterns of the trigeminal nuclei as identified by Maseko et al. (2013) are very different at more posterior levels; we will discuss this matter below.

      Justifications (2), (3), and (4) are sequelae from justification (1). In this sense, they do not count as justifications, but rather unsupported extensions.

      Comment: These are key points of our paper that the referee does not discuss.

      Changes: None.

      (4) and (5) These are interesting justifications, as the paper has clear internal contradictions, and (5) is a sequelae of (4). The reader is led to the concept that the myelin tracts divide the nuclei into sub-modules that match the folding of the skin on the elephant trunk. One would then readily presume that these myelin tracts are in the incoming sensory axons from the trigeminal nerve. However, the authors note that this is not the case: "Our observations on trunk module myelin stripes are at odds with this view of myelin. Specifically, myelin stripes show no tapering (which we would expect if axons divert off into the tissue). More than that, there is no correlation between myelin stripe thickness (which presumably correlates with axon numbers) and trigeminal module neuron numbers. Thus, there are numerous myelinated axons, where we observe few or no trigeminal neurons. These observations are incompatible with the idea that myelin stripes form an axonal 'supply' system or that their prime function is to connect neurons. What do myelin stripe axons do, if they do not connect neurons? We suggest that myelin stripes serve to separate rather than connect neurons." So, we are left with the observation that the myelin stripes do not pass afferent trigeminal sensory information from the "truly extraordinary" trunk skin somatic sensory system, and rather function as units that separate neurons - but to what end? It appears that the myelin stripes are more likely to be efferent axonal bundles leaving the nuclei (to form the olivocerebellar tract). This justification is unsupported.

      Comment: The referee cites some of our observations on myelin stripes, which we find unusual. We stand by the observations and comments. The referee does not discuss the most crucial finding we report on myelin stripes, namely that they correspond remarkably well to trunk folds.

      Changes: None.

      (6) The authors indicate that the location of these nuclei matches that of the trigeminal nuclei in other mammals. This is not supported in any way. In ALL other mammals in which the trigeminal nuclei of the brainstem have been reported they are found in the lateral aspect of the brainstem, bordered laterally by the spinal trigeminal tract. This is most readily seen and accessible in the Paxinos and Watson rat brain atlases. The authors indicate that the trigeminal nuclei are medial to the facial nerve nucleus, but in every other species, the trigeminal sensory nuclei are found lateral to the facial nerve nucleus. This is most salient when examining a close relative, the manatee (10.1002/ar.20573), where the location of the inferior olive and the trigeminal nuclei matches that described by Maseko et al (2013) for the African elephant. This justification is not supported.

      Comment: The referee notes that we incorrectly state that the position of the trigeminal nuclei matches that of other mammals. We think this criticism is justified.

      Changes: We prepared a comparison of the Maseko et al. (2013) scheme of the elephant brainstem with our scheme of the elephant brainstem (see Author response table 1). Here we acknowledge the referee’s argument and we also changed the manuscript accordingly.

      (7) The dual to quadruple repetition of rostrocaudal modules within the putative trigeminal nucleus as identified by the authors relies on the fact that in the neurotypical mammal, there are several trigeminal sensory nuclei arranged in a column running from the pons to the cervical spinal cord, these include (nomenclature from Paxinos and Watson in roughly rostral to caudal order) the Pr5VL, Pr5DM, Sp5O, Sp5I, and Sp5C. However, these nuclei are all located far from the midline and lateral to the facial nerve nucleus, unlike what the authors describe in the elephants. These rostrocaudal modules are expanded upon in Figure 2, and it is apparent from what is shown that the authors are attributing other brainstem nuclei to the putative trigeminal nuclei to confirm their conclusion. For example, what they identify as the inferior olive in Figure 2D is likely the lateral reticular nucleus as identified by Maseko et al (2013). This justification is not supported.

      Comment: The referee again compares our findings to the scheme of Maseko et al. (2013) and rejects our conclusions on those grounds. We think such a comparison of our scheme is needed, indeed.

      Changes: We prepared a comparison of the Maseko et al. (2013) scheme of the elephant brainstem with our scheme of the elephant brainstem (see Author response table 1).

      (8) In primates and related species, there is a distinct banded appearance of the inferior olive, but what has been termed the inferior olive in the elephant by other authors does not have this appearance, rather, and specifically, the largest nuclear mass in the region (termed the principal nucleus of the inferior olive by Maseko et al, 2013, but Pr5, the principal trigeminal nucleus in the current paper) overshadows the partial banded appearance of the remaining nuclei in the region (but also drawn by the authors of the current paper). Thus, what is at debate here is whether the principal nucleus of the inferior olive can take on a nuclear shape rather than evince a banded appearance. The authors of this paper use this variance as justification that this cluster of nuclei could not possibly be the inferior olive. Such a "semi-nuclear/banded" arrangement of the inferior olive is seen in, for example, giraffe (10.1016/j.jchemneu.2007.05.003), domestic dog, polar bear, and most specifically the manatee (a close relative of the elephant) (brainmuseum.org; 10.1002/ar.20573). This justification is not supported.

      Comment: We carefully looked at the brain sections referred to by the referee in the brainmuseum.org collection. We found contrary to the referee’s claims that dogs, polar bears, and manatees have a perfectly serrated (a cellular arrangement in curved bands) appearance of the inferior olive. Accordingly, we think the referee is not reporting the comparative evidence fairly and we wonder why this is the case.

      Changes: None.

      Thus, all the justifications forwarded by the authors are unsupported. Based on methodological concerns, prior comparative mammalian neuroanatomy, and prior studies in the elephant and closely related species, the authors fail to support their notion that what was previously termed the inferior olive in the elephant is actually the trigeminal sensory nuclei. Given this failure, the justifications provided above that are sequelae also fail. In this sense, the entire manuscript and all the sequelae are not supported.

      Comment: We disagree. To summarize:

      (1) Our description of the cytochrome oxidase staining lacked methodological detail, which we have now added; the cytochrome oxidase reactivity data are great and support our conclusions.

      (2)–(5)The referee does not really discuss our evidence on these points.

      (6) We were wrong and have now fixed this mistake.

      (7) The referee asks for a comparison to the Maseko et al. (2013) scheme (agreed, see Author response image 4 4 and Author response table 1).

      (8) The referee bends the comparative evidence against us.

      Changes: None.

      A comparison of the elephant brainstem partitioning schemes put forward by Maseko et al 2013 and by Reveyaz et al.

      To start with, we would like to express our admiration for the work of Maseko et al. (2013). These authors did pioneering work on obtaining high-quality histology samples from elephants. Moreover, they made a heroic neuroanatomical effort, in which they assigned 147 brain structures to putative anatomical entities. Most of their data appear to refer to staining in a single elephant and one coronal sectioning plane. The data quality and the illustration of results are excellent.

      We studied mainly two large nuclei in six (now 7) elephants in three (coronal, parasagittal, and horizontal) sectioning planes. The two nuclei in question are the two most distinct nuclei in the elephant brainstem, namely an anterior ventromedial nucleus (the trigeminal trunk module in our terminology; the inferior olive in the terminology of Maseko et al., 2013) and a more posterior lateral nucleus (the inferior olive in our terminology; the posterior part of the trigeminal nuclei in the terminology of Maseko et al., 2013).

      Author response image 4 gives an overview of the two partitioning schemes for inferior olive/trigeminal nuclei along with the rodent organization (see below).

      Author response image 4.

      Overview of the brainstem organization in rodents & elephants according to Maseko et. (2013) and Reveyaz et al. (this paper).

      The strength of the Maseko et al. (2013) scheme is the excellent match of the position of elephant nuclei to the position of nuclei in the rodent (Author response image 4). We think this positional match reflects the fact that Maseko et al. (2013) mapped a rodent partitioning scheme on the elephant brainstem. To us, this is a perfectly reasonable mapping approach. As the referee correctly points out, the positional similarity of both elephant inferior olive and trigeminal nuclei to the rodent strongly argues in favor of the Maseko et al. (2013), because brainstem nuclei are positionally very conservative.

      Other features of the Maseko et al. (2013) scheme are less favorable. The scheme marries two cyto-architectonically very distinct divisions (an anterior indistinct part) and a super-distinct serrated posterior part to be the trigeminal nuclei. We think merging entirely distinct subdivisions into one nucleus is a byproduct of mapping a rodent partitioning scheme on the elephant brainstem. Neither of the two subdivisions resemble the trigeminal nuclei of other mammals. The cytochrome oxidase staining patterns differ markedly across the anterior indistinct part (see our Author response image 4) and the posterior part of the trigeminal nuclei and do not match with the intense cytochrome oxidase reactivity of other mammalian trigeminal nuclei (Referee Figure 3). Our anti-peripherin staining indicates that there probably no climbing fibers, in what Maseko et al. think. is inferior olive; this is a potentially fatal problem for the hypothesis. The posterior part of Maseko et al. (2013) trigeminal nuclei has a distinct serrated appearance that is characteristic of the inferior olive in other mammals. Moreover, the inferior olive of Maseko et al. (2013) lacks the serrated appearance of the inferior olive seen in pretty much all mammals; this is a serious problem.

      The partitioning scheme of Reveyaz et al. comes with poor positional similarity but avoids the other problems of the Maseko et al. (2013) scheme. Our explanation for the positionally deviating location of trigeminal nuclei is that the elephant grew one of the if not the largest trigeminal systems of all mammals. As a result, the trigeminal nuclei grew through the floor of the brainstem. We understand this is a post hoc just-so explanation, but at least it is an explanation.

      The scheme of Reveyaz et al. was derived in an entirely different way from the Maseko model. Specifically, we were convinced that the elephant trigeminal nuclei ought to be very special because of the gigantic trigeminal ganglia (Purkart et al., 2022). Cytochrome-oxidase staining revealed a large distinct nucleus with an elongated shape. Initially, we were freaked out by the position of the nucleus and the fact that it was referred to as inferior olive by other authors. When we found an inferior-olive-like nucleus at a nearby (although at an admittedly unusual) location, we were less worried. We then optimized the visualization of myelin stripes (brightfield imaging etc.) and were able to collect an entire elephant trunk along with the brain (African elephant cow Indra). When we made the one-to-one match of Indra’s trunk folds and myelin stripes (Figure 4) we were certain that we had identified the trunk module of the trigeminal nuclei. We already noted at the outset of our rebuttal that we now consider such certainty a fallacy of overconfidence. In light of the comments of Referee 2, we feel that a further discussion of our ideas is warranted. A strength of the Reveyaz model is that nuclei look like single anatomical entities. The trigeminal nuclei look like trigeminal nuclei of other mammals, the trunk module has a striking resemblance to the trunk and the inferior olive looks like the inferior olive of other mammals.

      We evaluated the fit of the two models in the form of a table (Author response table 1; below). Unsurprisingly, Author response table 1 aligns with our views of elephant brainstem partitioning.

      Author response table 1.

      Qualitative evaluation of elephant brainstem partitioning schemes

      ++ = Very attractive; + = attractive; - = unattractive; -- = very unattractive We scored features that are clear and shared by all mammals – as far as we know them – as very attractive. We scored features that are clear and are not shared by all mammals – as far as we know them – as very unattractive. Attractive features are either less clear or less well-shared features. Unattractive features are either less clear or less clearly not shared features.

      Author response table 1 suggests two conclusions to us. (i) The Reveyaz et al. model has mainly favorable properties. The Maseko et al. (2013) model has mainly unfavorable properties. Hence, the Reveyaz et al. model is more likely to be true. (ii) The outcome is not black and white, i.e., both models have favorable and unfavorable properties. Accordingly, we overstated our case in our initial submission and toned down our claims in the revised manuscript.

      What the authors have not done is to trace the pathway of the large trigeminal nerve in the elephant brainstem, as was done by Maseko et al (2013), which clearly shows the internal pathways of this nerve, from the branch that leads to the fifth mesencephalic nucleus adjacent to the periventricular grey matter, through to the spinal trigeminal tract that extends from the pons to the spinal cord in a manner very similar to all other mammals. Nor have they shown how the supposed trigeminal information reaches the putative trigeminal nuclei in the ventromedial rostral medulla oblongata. These are but two examples of many specific lines of evidence that would be required to support their conclusions. Clearly, tract tracing methods, such as cholera toxin tracing of peripheral nerves cannot be done in elephants, thus the neuroanatomy must be done properly and with attention to detail to support the major changes indicated by the authors.

      Comment: The referee claims that Maseko et al. (2013) showed by ‘tract tracing’ that the structures they refer to trigeminal nuclei receive trigeminal input. This statement is at least slightly misleading. There is nothing of what amounts to proper ‘tract tracing’ in the Maseko et al. (2013) paper, i.e. tracing of tracts with post-mortem tracers. We tried proper post-mortem tracing but failed (no tracer transport) probably as a result of the limitations of our elephant material. What Maseko et al. (2013) actually did is look a bit for putative trigeminal fibers and where they might go. We also used this approach. In our hands, such ‘pseudo tract tracing’ works best in unstained material under bright field illumination, because myelin is very well visualized. In such material, we find: (i) massive fiber tracts descending dorsoventrally roughly from where both Maseko et al. 2013 and we think the trigeminal tract runs. (ii) These fiber tracts run dorsoventrally and approach, what we think is the trigeminal nuclei from lateral.

      Changes: Ad hoc tract tracing see above.

      So what are these "bumps" in the elephant brainstem?

      Four previous authors indicate that these bumps are the inferior olivary nuclear complex. Can this be supported?

      The inferior olivary nuclear complex acts "as a relay station between the spinal cord (n.b. trigeminal input does reach the spinal cord via the spinal trigeminal tract) and the cerebellum, integrating motor and sensory information to provide feedback and training to cerebellar neurons" (https://www.ncbi.nlm.nih.gov/books/NBK542242/). The inferior olivary nuclear complex is located dorsal and medial to the pyramidal tracts (which were not labeled in the current study by the authors but are clearly present in Fig. 1C and 2A) in the ventromedial aspect of the rostral medulla oblongata. This is precisely where previous authors have identified the inferior olivary nuclear complex and what the current authors assign to their putative trigeminal nuclei. The neurons of the inferior olivary nuclei project, via the olivocerebellar tract to the cerebellum to terminate in the climbing fibres of the cerebellar cortex.

      Comment: We agree with the referee that in the Maseko et al. (2013) scheme the inferior olive is exactly where we expect it from pretty much all other mammals. Hence, this is a strong argument in favor of the Maseko et al. (2013) scheme and a strong argument against the partitioning scheme suggested by us.

      Changes: Please see our discussion above.

      Elephants have the largest (relative and absolute) cerebellum of all mammals (10.1002/ar.22425), this cerebellum contains 257 x109 neurons (10.3389/fnana.2014.00046; three times more than the entire human brain, 10.3389/neuro.09.031.2009). Each of these neurons appears to be more structurally complex than the homologous neurons in other mammals (10.1159/000345565; 10.1007/s00429-010-0288-3). In the African elephant, the neurons of the inferior olivary nuclear complex are described by Maseko et al (2013) as being both calbindin and calretinin immunoreactive. Climbing fibres in the cerebellar cortex of the African elephant are clearly calretinin immunopositive and also are likely to contain calbindin (10.1159/000345565). Given this, would it be surprising that the inferior olivary nuclear complex of the elephant is enlarged enough to create a very distinct bump in exactly the same place where these nuclei are identified in other mammals?

      Comment: We agree with the referee that it is possible and even expected from other mammals that there is an enlargement of the inferior olive in elephants. Hence, a priori one might expect the ventral brain stem bumps to the inferior olive, this is perfectly reasonable and is what was done by previous authors. The referee also refers to calbindin and calretinin antibody reactivity. Such antibody reactivity is indeed in line with the referee’s ideas and we considered these findings in our Referee Table 1. The problem is, however, that neither calbindin nor calretinin antibody reactivity are highly specific and indeed both nuclei in discussion (trigeminal nuclei and inferior olive) show such reactivity. Unlike the peripherin-antibody staining advanced by us, calbindin nor calretinin antibody reactivity cannot distinguish the two hypotheses debated.

      Changes: Please see our discussion above.

      What about the myelin stripes? These are most likely to be the origin of the olivocerebellar tract and probably only have a coincidental relationship with the trunk. Thus, given what we know, the inferior olivary nuclear complex as described in other studies, and the putative trigeminal nuclear complex as described in the current study, is the elephant inferior olivary nuclear complex. It is not what the authors believe it to be, and they do not provide any evidence that discounts the previous studies. The authors are quite simply put, wrong. All the speculations that flow from this major neuroanatomical error are therefore science fiction rather than useful additions to the scientific literature.

      Comment: It is unlikely that the myelin stripes are the origin of the olivocerebellar tract as suggested by the referee. Specifically, the lack of peripherin-reactivity indicates that these fibers are not climbing fibers (Referee Figure 1). In general, we feel the referee does not want to discuss the myelin stripes and obviously thinks we made up the strange correspondence of myelin stripes and trunk folds.

      Changes: Please see our discussion above.

      What do the authors actually have?

      The authors have interesting data, based on their Golgi staining and analysis, of the inferior olivary nuclear complex in the elephant.

      Comment: The referee reiterates their views.

      Changes: None.

      Reviewer #3 (Public Review):

      Summary:

      The study claims to investigate trunk representations in elephant trigeminal nuclei located in the brainstem. The researchers identified large protrusions visible from the ventral surface of the brainstem, which they examined using a range of histological methods. However, this ventral location is usually where the inferior olivary complex is found, which challenges the author's assertions about the nucleus under analysis. They find that this brainstem nucleus of elephants contains repeating modules, with a focus on the anterior and largest unit which they define as the putative nucleus principalis trunk module of the trigeminal. The nucleus exhibits low neuron density, with glia outnumbering neurons significantly. The study also utilizes synchrotron X-ray phase contrast tomography to suggest that myelin-stripe-axons traverse this module. The analysis maps myelin-rich stripes in several specimens and concludes that based on their number and patterning they likely correspond with trunk folds; however, this conclusion is not well supported if the nucleus has been misidentified.

      Comment: The referee gives a concise summary of our findings. The referee acknowledges the depth of our analysis and also notes our cellular results. The referee – in line with the comments of Referee 2 – also points out that a misidentification of the nucleus under study is potentially fatal for our analysis. We thank the referee for this fair assessment.

      Changes: We feel that we need to alert the reader more broadly to the misidentification concern. We think the critical comments of Referee 2, which will be published along with our manuscript, will go a long way in doing so. We think the eLife publishing format is fantastic in this regard. We will also include pointers to these concerns in the revised manuscript.

      Strengths:

      The strength of this research lies in its comprehensive use of various anatomical methods, including Nissl staining, myelin staining, Golgi staining, cytochrome oxidase labeling, and synchrotron X-ray phase contrast tomography. The inclusion of quantitative data on cell numbers and sizes, dendritic orientation and morphology, and blood vessel density across the nucleus adds a quantitative dimension. Furthermore, the research is commendable for its high-quality and abundant images and figures, effectively illustrating the anatomy under investigation.

      Comment: Again, a very fair and balanced set of comments. We are thankful for these comments.

      Changes: None.

      Weaknesses:

      While the research provides potentially valuable insights if revised to focus on the structure that appears to be the inferior olivary nucleus, there are certain additional weaknesses that warrant further consideration. First, the suggestion that myelin stripes solely serve to separate sensory or motor modules rather than functioning as an "axonal supply system" lacks substantial support due to the absence of information about the neuronal origins and the termination targets of the axons. Postmortem fixed brain tissue limits the ability to trace full axon projections. While the study acknowledges these limitations, it is important to exercise caution in drawing conclusions about the precise role of myelin stripes without a more comprehensive understanding of their neural connections.

      Comment: The referee points out a significant weakness of our study, namely our limited understanding of the origin and targets of the axons constituting the myelin stripes. We are very much aware of this problem and this is also why we directed high-powered methodology like synchrotron X-ray tomograms to elucidate the structure of myelin stripes. Such analysis led to advances, i.e., we now think, what looks like stripes are bundles and we understand the constituting axons tend to transverse the module. Such advances are insufficient, however, to provide a clear picture of myelin stripe connectivity.

      Changes: We think solving the problems raised by the referee will require long-term methodological advances and hence we will not be able to solve these problems in the current revision. Our long-term plans for confronting these issues are the following: (i) Improving our understanding of long-range connectivity by post-mortem tracing and MR-based techniques such as Diffusion-Tensor-Imaging. (ii) Improving our understanding of mid and short-range connectivity by applying even larger synchrotron X-ray tomograms and possible serial EM.

      Second, the quantification presented in the study lacks comparison to other species or other relevant variables within the elephant specimens (i.e., whole brain or brainstem volume). The absence of comparative data for different species limits the ability to fully evaluate the significance of the findings. Comparative analyses could provide a broader context for understanding whether the observed features are unique to elephants or more common across species. This limitation in comparative data hinders a more comprehensive assessment of the implications of the research within the broader field of neuroanatomy. Furthermore, the quantitative comparisons between African and Asian elephant specimens should include some measure of overall brain size as a covariate in the analyses. Addressing these weaknesses would enable a richer interpretation of the study's findings.

      Comment: The referee suggests another series of topics, which include the analysis of brain parts volumes or overall brain size. We agree these are important issues, but we also think such questions are beyond the scope of our study.

      Changes: We hope to publish comparative data on elephant brain size and shape later this year.  

    1. Author Response

      eLife assessment

      This study presents a valuable method to visualize the location of the cell types discovered through single-cell RNA sequencing. The evidence supporting the claims is solid, but the inclusion of a larger number of samples would strengthen the study. It would also be helpful to have the methods explained in more detail. The work will be of interest to those seeking to identify new cell types from scRNA-seq and snRNA-seq data.

      Response: We are surprised about the editor’s assessment of our paper as a “valuable” method. This is the first Drosophila adult spatial transcriptomics paper. Hence, we would at least consider this being an “important” method. Spatial transcriptomics has thus far only been done in embryos, which are easy to process for FISH for many decades. Integration with single-cell data is also new. We are further surprised that this assessment does not mention the identification of subcellular mRNA patterns in adult muscles as an “important” biological finding of this paper. We are not aware that any localized mRNAs in Drosophila muscles were known prior to our study. This shows the advantage of spatial transcriptomics over single-cell techniques.

      The work indeed does not represent a full spatial fly adult atlas – however, a proof of principle study covering both the head and body that we consider at least “important”.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, Janssens et al. addressed the challenge of mapping the location of transcriptionally unique cell types identified by single nuclei sequencing (snRNA-seq) data available through the Fly Cell Atlas. They identified 100 transcripts for head samples and 50 transcripts for fly body samples allowing the identification of every unique cell type discovered through the Fly Cell Atlas. To map all of these cell types, the authors divided the fly body into head and body samples and used the Molecular Cartography (Resolve Biosciences) method to visualize these transcripts. This approach allowed them to build spatial tissue atlases of the fly head and body, to identify the location of previously unknown cell types and the subcellular localization of different transcripts. By combining snRNA-seq data from the Fly Cell Atlas with their spatially resolved transcriptomics (SRT) data, they demonstrated an automated cell type annotation strategy to identify unknown clusters and infer their location in the fly body. This manuscript constitutes a proof-of-principle study to map the location of the cells identified by ever-growing single-cell transcriptomic datasets generated by others.

      Strengths:

      The authors used the Molecular Cartography (Resolve Biosciences) method to visualize 100 transcripts for head samples and 50 transcripts for fly body samples in high resolution. This method achieves high resolution by multiplexing a large number of transcript visualization steps and allows the authors to map the location of unique cell types identified by the Fly Cell Atlas.

      Response: We thank the reviewer for their comment, but are surprised that this assessment does not mention the identification of subcellular mRNA patterns in adult muscles as an important biological finding of this paper. This might be due to the visualization problem that this reviewer was facing with a greyscale version of the PDF as mentioned in the comments below. We do not know what caused the technical problem for this reviewer (the PDF figures are in color on the eLife website and on bioRxiv). We are surprised that the eLife discussion session did not resolve this issue.

      Weaknesses:

      Combining single-nuclei sequencing (snRNA-seq) data with spatially resolved transcriptomics (SRT) data is challenging, and the methods used by the authors in this study cannot reliably distinguish between cells, especially in brain regions where the processes of different neurons are clustered, such as in neuropils. This means that a grid that the authors mark as a unique cell may actually be composed of processes from multiple cells.

      Response: The size of the fly is one of the most challenging aspects of performing spatial transcriptomics. The small size of the samples led to detachment from the slides, which we solved by coating the slides with gelatin. While the resolution of Molecular Cartography is high (<200nm), in the brain challenges remain as noted by the reviewer. Drosophila neuronal nuclei are notoriously small and cannot be easily resolved with current techniques. We agree that for a full atlas either expansion microscopy, 3D techniques or even higher resolution will be required.

      Reviewer #2 (Public Review):

      Summary:

      The landmark publication of the "Fly Atlas" in 2022 provided a single cell/nuclear transcriptomic dataset from 15 individually dissected tissues, the entire head, and the body of male and female flies. These data led to the annotation of more than 250 cell types. While certainly a powerful and data-rich approach, a significant step forward relies on mapping these data back to the organism in time and space. The goal of this manuscript is to map 150 transcripts defined by the Fly Atlas by FISH and in doing so, provide, for the first time, a spatial transcriptomic dataset of the adult fly. Using this approach (Molecular Cartography with Resolve Biosciences), the authors, furthermore, distinguish different RNA localizations within a cell type. In addition, they seek to use this approach to define previously unannotated clusters found in the Fly Atlas. As a resource for the community at large interested in the computational aspects of their pipeline, the authors compare the strengths and weaknesses of their approach to others currently being performed in the field.

      Strengths:

      1. The authors use Resolve Biosciences and a novel bioinformatics approach to generate a FISH-based spatial transcriptomics map. To achieve this map, they selected 150 genes (50 body; 100 head) that were highly expressed in the single nuclear RNA sequencing dataset and were used in the 2022 paper to annotate specific cell types; moreover, the authors chose several highly expressed genes characteristic of unannotated cell types. Together, the approach and generated data are important next steps in translating the transcriptomic data to spatial data in the organism.

      Response: We thank the reviewer for this comment but would like to add that the statement that we selected “150 genes (50 body; 100 head) that were highly expressed in the single nuclear RNA sequencing dataset” is not correct. We have chosen genes with widely differing expression levels (log-scale range of 3.95 in body, 5.76 in head). Many of the chosen genes are also transcription factors. In fact, the here introduced method is more sensitive than the single cell atlas: the tinman positive cells were readily located (even non-heart cells were found to express tinman), whereas in the single cell FCA data tinman expression is often not detected in the cardiomyocytes (Tinman is detected in 273 cells in the entire FCA (mean expression of 1.44 UMI in positive cells), and in 71 cells out of 273 cardial cells (26%)).

      Author response image 1.

      Density plots for body (left) and head (right) showing levels of gene expression detected in scRNA-seq (body: Fly Cell Atlas, Li et al. 2022, head: Pech et al. (2023)). Blue: all genes, red: genes used in the spatial study.

      1. Working with Resolve, the authors developed a relatively high throughput approach to analyze the location of transcripts in Drosophila adults. This approach confirmed the identification of particular cell types suggested by the FlyAtlas as well as revealed interesting subcellular locations of the transcripts within the cell/tissue type. In addition, the authors used co-expression of different RNAs to unbiasedly identify "new cell types". This pipeline and data provide a roadmap for additional analyses of other time points, female flies, specific mutants, etc.

      2. The authors show that their approach reveals interesting patterns of mRNA distribution (e.g alpha- and beta-Trypsin in apical and basal regions of gut enterocytes or striped patterns of different sarcomeric proteins in body muscle). These observations are novel and reveal unexpected patterns. Likewise, the authors use their more extensive head database to identify the location of cells in the brain. They report the resolution of 23 clusters suggested by the single-cell sequencing data, given their unsupervised clustering approach. This identification supports the use of spatial cell transcriptomics to characterize cell types (or cell states).

      3. Lastly, the authors compare three different approaches --- their own described in this manuscript, Tangram, and SpaGE - which allow integration of single cell/nuclear RNA-seq data with spatial localization FISH. This was a very helpful section as the authors compared the advantages and disadvantages (including practical issues, like computational time).

      Weaknesses:

      1. Experimental setup. It is not clear how many and, for some of the data, the sex of the flies that were analyzed. It appears that for the body data, only one male was analyzed. For the heads, methods say male and female heads, but nothing is annotated in the figures. As such, it remains unclear how robust these data are, given such a limited sample from one sex. As such, the claims of a spatial atlas of the entire fly body and its head ("a rosetta stone") are overstated. Also, the authors should clearly state in the main text and figure legends the sex, the age, how many flies, and how many replicates contributed to the data presented (not just the methods). What also adds to the confusion is the use of "n" in para 2 of the results. " ... we performed coronal sections at different depths in the head (n=13)..." 13 sections in total from 1 head or sections from 13 heads? Based on the body and what is shown in the figure, one assumes 13 sections from one head. Please clarify.

      Response: While we agree that sex differences present indeed an interesting opportunity to study with spatial transcriptomics, our goal was not to define male/female differences but rather to establish the technology to go into this detail if wanted in the future. In the revised version, we will provide a more detailed description of the sections, including their sex/genotype/age. We would like to point out that we verified the specificity of our FISH method on all the body sections (Figure 2A, TpnC4 & Act88F) and not only on one. Furthermore, we also would like to state that the idea of “a rosetta stone” was mentioned as a future prospect. We will rewrite the discussion to make this more clear.

      1. Probes selected: Information from the methods section should be put into the main text so that it is clear what and why the gene lists were selected. The current main text is confusing. If the authors want others to use their approach, then some testing or, at the very least, some discussion of lower expressed genes should be added. How useful will this approach be if only highly expressed genes can be resolved? In addition, while it is understood that the company has a propriety design algorithm for the probes, the authors should comment on whether the probes for individual genes detect all isoforms or subsets (exons and introns?), given the high level of splicing in tissues such as muscle.

      Response: As stated above, while there is a slight bias to higher expressed genes (as expected for marker genes), we have also used very low expressed genes like tinman (body) or sens (head). This shows that our method is more sensitive than single-cell data, as ALL cardiomyocytes can be identified by tinman expression and not only some are positive, as is the case in the FCA data. In fact, the method can’t resolve too highly expressed genes due to optical crowding of the signal leading to a worse quantification. For this reason, ninaE was removed from the analysis (as mentioned in Spatial transcriptomics allows the localization of cell types in the head and brain and in Methods).

      As mentioned in the Methods, the probes are designed on gene level targeting all isoforms, but favoring principal isoforms (weighted by APPRIS level). The high level of splicing is indeed interesting and we expect that in the future spatial transcriptomics can help to generate more insight in this.

      1. Imaging: it isn't clear from the text whether the repeated rounds of imaging impacted data collection. In many of what appear to be "stitched" images, there are gradients of signal (eg, figure 2F); please comment. Also, since this a new technique, could a before and after comparison of the original images and the segmented images be shown in the supplemental data so that the reader can better appreciate how the authors assessed/chose/thresholded their data? More discussion of the accuracy of spot detection would be helpful.

      Response: Any high-resolution imaging (pixel size = 138 nm) of a large field of view (>1mm) uses a stitching method to combine several individual images to reconstruct a large field of view. This does not generate signal gradients, apart from lower signal at the extreme edges of each of the individual images. The spot detection algorithm was written and used by Resolve Biosciences and benchmarked for human (Hela) and mouse (NIH-3T3) cell lines in Groiss et al. 2021 (Highly resolved spatial transcriptomics for detection of rare events in cells, biorxiv). The specificity of the decoded probes was found to lie between 99.45 and 99.9% here, matching the results we found for TpnC4 and Act88F (99.4 and 99.8%). We will add their analysis to our discussion.

      1. The authors comment on how many RNAs they detected (first paragraph of results). How do these numbers compare to the total mRNA present as detected by single-cell or single-nuclear sequencing?

      Response: The total number of mRNAs detected per spatial transcriptomics experiment is much higher for the body samples compared to single-cell experiments (FCA data). In the head it is slightly lower, but here it is important to note that not all cell types are present in each slice in the head (while they are all present in the head scRNA experiments). A comparison on the cell-type level would be more meaningful, and we will investigate this for the revision.

      Author response image 2.

      Barplots showing total number of mRNA molecules detected in Molecular Cartography (Resolve, spatial spots) and in snRNA-seq data from the Fly Cell Atlas (10x Genomics, UMIs). Individual black dots show individual experiments, counts are only shown for the chosen gene panel for each sample. Bar shows the mean, with error bars representing the standard error.

      1. Using this higher throughput method of spatial transcriptomics, the authors discern different cell types and different localization patterns within a tissue/cell type.

      a. The authors should comment on the resolution provided by this approach, in terms of the detection of populations of mRNAs detected by low throughput methods, for example, in glia, motor neuron axons, and trachea that populate muscle tissue. Are these found in the images? Please show.

      Response: We did not add any markers for trachea in our gene panel, but we do detect sparse spots of repo (glia) and elav/VGlut in the muscle tissues (Gad1/VAChT are hardly detected in the muscle tissue). This is consistent with the glutamatergic nature of motor neurons in Drosophila as described previously (Schuster CM (2006) Glutamatergic synapses of Drosophila neuromuscular junctions: a high-resolution model for the analysis of experience-dependent potentiation. Cell Tissue Res 326: 287–299.)

      Author response image 3.

      Molecular Cartography zoomed in on indirect flight muscle. Segmented nuclei are shown in white (based on DAPI), scalebars represent 100 μm).

      b. The authors show interesting localization patterns in muscle tissue for different sarcomere protein-coding mRNAs, including enrichment of sls in muscle nuclei located near the muscle-tendon attachment sites. As this high throughput approach is newly being applied to the adult fly, it would increase confidence in these data, if the authors would confirm these data using a low throughput FISH technique. For example, do the authors detect such alternating "stripes" ( Act 88F, TpnC4, and Mhc) or enriched localization (sls) using FISH that doesn't rely on the repeated colorization, imaging, decolorization of the probes?

      Response: We thank the reviewer for their interest in the localization patterns in muscle tissue. We could confirm localized mRNA in all the sections analyzed, in flight muscles as well as in leg muscles. We furthermore show that Act 88F, TpnC4 are not detected outside of flight muscle cells (99.4% and 99.8% of the single molecular signal in flight muscles only). Hence, we already show the specificity test in a much more quantitative way compared to traditional FISH, which often includes amplification.

      1. The authors developed an unbiased method to identify "new cell types" which relies on co-expression of different transcripts. Are these new cell types or a cell state? While expression is a helpful first step, without any functional data, the significance of what the authors found is diminished. The authors need to soften their statements.

      Response: The term “new cell types” only appears in the title. We agree that with the current spatial map we cannot be sure to have found “new cell types”, instead we have shown where unannotated clusters from scRNA-seq map, based on gene expression. Therefore, we will tone down the title in the revised version and thank the reviewer for this valuable suggestion.

      Appraisal:

      The authors' goal is to map single cell/nuclear RNAseq data described in the 2022 Fly Atlas paper spatially within an organism to achieve a spatial transcriptomic map of the adult fly; no doubt, this is a critical next step in our use of 'omics approaches. While this manuscript does the hard work of trying to take this next step, including developing and testing a new pipeline for high throughput FISH and its analysis, it falls short, in its present form, in achieving this goal. The authors discuss creating a robust spatial map, based on one male fly. Moreover, they do not reveal principles of mRNA localization, as stated in the abstract; they show us patterns, but nothing about the logic or function of these patterns. This same criticism can be said of the identification of "new cell types, just based on RNA colocalization. In both cases (mRNA subcellular localization or cell type identification), further data in the form of validation with traditional low throughput FISH and genetic manipulations to assess the relation to cell function are required for the authors to make such claims.

      Response: We have indeed used one male fly for the adult male body data. This is mainly due to the cost of the sample processing. We used 12 individuals for the head samples (from 1 individual we acquired 2 sections, a total of 13 sections). We show that the body samples show a high correlation with each other, while the head samples cover multiple depths of the head. Still, even in the head, we find that sections at similar depths show a high similarity to each other in terms of gene-gene co-expression and expression patterns. Although obtaining more sections would be valuable, we don’t believe it to be necessary for the current goals. Additional replicates beyond the ones we already provide would require significant amounts of extra time and budget, while they would produce similar results as we already show. We are therefore reluctant to repeat the effort again.

      The usage of the term “new cell types” is indeed ambiguous and we will tone this down in the revised version. Instead, we meant that unannotated clusters could be mapped to their location. In the text, we further specify that this means that now we only have inferred the location of the nuclei and that for neurons their function/processes are still unknown. As such, our data provides a starting point to identify new cell types since their marker genes and nuclear location are inferred. The next step to identify “new cell types” would indeed be to acquire genetic access to the cell types and characterize them in more detail. This is currently beyond our goals, and therefore we will tone down the title in the revised version and thank the reviewer for this valuable suggestion.

      Discussion of likely impact:

      If revised, these data, and importantly the approach, would impact those working on Drosophila adults as well as those working in other model systems where single cell/nuclear sequencing is being translated to the spatial localization within the organism. The subcellular localization data - for example, the size of transcripts and how that relates to localization or the patterns of sarcomeric protein localization in muscle - are intriguing, and would likely impact our thinking on RNA localization, transport, etc if confirmed. Lastly, the authors compare their computational approaches to those available in the field; this is valuable as this is a rapidly evolving field and such considerations are critical for those wishing to use this type of approach.

      Response: We believe that our manuscript as it stands now is already an “important” paper that will strongly impact the Drosophila community (and beyond the spatial transcriptomics community). As it stands, it provides the groundwork for a full Drosophila adult spatial atlas, similar to how early scRNA-seq datasets provided a framework for the Fly Cell Atlas. In the manuscript we provide both experimental information on how to successfully perform spatial transcriptomics (treating slides for optimal attachment) and the data serves as a benchmark for future experiments to improve upon (similar to how early Drop-seq datasets were compared to later 10x datasets in single-cell transcriptomics). In addition, it also provides proof of principle methods on how to integrate the FCA data with these spatial data and it identifies localized mRNA species in large adult muscle cells, showing the complementarity of spatial techniques with single-cell RNA-seq. To conclude, this is the first spatial adult Drosophila transcriptomics paper, locating 150 mRNA species with easy data access in our user portal (https://spatialfly.aertslab.org/).

    1. Author response:

      Reviewer #1 (Evidence, reproducibility and clarity (Required)): 

      Summary: 

      Laura Morano and colleagues have performed a screen to identify compounds that interfere with the formation of TopBP1 condensates. TopBP1 plays a crucial role in the DNA damage response, and specifically the activation of ATR. They found that the GSK-3b inhibitor AZD2858 reduced the formation of TopBP1 condensates and activation of ATR and its downstream target CHK1 in colorectal cancer cell lines treated with the clinically relevant irinotecan active metabolite SN-38. This inhibition of TopBP1 condensates by AZD2858 was independent from its effect on GSK-3b enzymatic activity. Mechanistically, they show that AZD2858 thus can interfere with intra-S-phase checkpoint signaling, resulting in enhanced cytostatic and cytotoxic effects of SN-38 (or SN-38+Fluoracil aka FOLFIRI) in vitro in colorectal carcinoma cell lines. 

      Major comments: 

      Overall the work is rigorous and the main conclusions are convincing. However, they only show the effects of their combination treatments on colorectal cancer cell lines. I'm worried that blocking the formation of TopB1 condensates will also be detrimental in non-transformed cells. Furthermore it is somewhat disappointing that it remains unclear how AZD2858 blocks selfassembly of TopBP1 condensates, although I understand that unraveling this would be complex and somewhat out-of-reach for now. 

      We appreciate your feedback and fully recognize the importance of understanding how AZD2858 blocks the assembly of TopBP1 condensates. While we understand your disappointment, addressing this question remains a key focus for us. Keeping in mind that unravelling such a mechanism in vitro or in vivo is rather challenging, we have consulted an expert who has made efforts to predict the potential docking sites of AZD2858 on TopBP1, which may provide valuable insights for future experimental investigations. Using an AlphaFold model (no crystal or cryo-EM structure available) and looking for suitable pockets or cavities in which AZD2858 could bind, the analyses, though requiring cautious interpretation, suggested that AZD2858 may target the BRCT1 and BRCT8 domains (as shown below, two pockets n°1 and 7 with sufficient volume and surrounded by b-sheets structures like other GSK3 inhibitor) of TopBP1.

      However, these are preliminary results that require further exploration and experimental validation to confirm their significance and mechanistic implications.

      Author response image 1.

      Here are some specific points for improvement: 

      (1) The authors conclude that "These data supports [sic] the feasibility of targeting condensates formed in response to DNA damage to improve chemotherapy-based cancer treatments". To support this conclusion the authors need to show that proliferating non-transformed cells (e.g. primary cell cultures or organoids) can tolerate the combination of AZD2858 + SN-38 (or FOLFIRI) better than colorectal cancer cells. 

      We would like to thank the reviewer for this vital suggestion to prove that this combination is effective on tumor cells and not very toxic on healthy cells. We therefore used a healthy colon cell line (CCD841) and tested the efficacy of each treatment alone (FOLFIRI and AZD2858) as well as the combination FOLFIRI+AZD2858. We compared the results obtained in the CCD841 cell line with those obtained in the HCT116 colorectal cancer cell line. The results presented below show not only that each treatment alone is much less effective on CCD841 lines, but also that the combination is not synergistic.

      Author response image 2.

      Page 19 "This suggests that the combination... arrests the cell cycle before mitosis in a DNAPKsc-dependent manner." I find the remark that this arrest would be DNA-PKcs-dependent too speculative. I suppose that the authors base this claim on reference 55 but if they want to support this claim they need to prove this by adding DNA-PKcs inhibitors to their treated cells. 

      Thank you for your thoughtful comment. We agree with the reviewer that claiming the G2/M arrest is DNA-PKcs-dependent without direct experimental evidence is speculative. While we initially based this hypothesis on reference 55, we acknowledge that further experiments, such as the use of DNA-PKcs inhibitors, would be necessary to robustly support this claim.

      Given that this observation was intended as a potential explanation for the G2/M arrest observed at 6 and 12 hours of treatment with AZD2858 + SN-38 (compared to SN-38 alone), and considering that exploring this pathway is not the primary focus of our study, we have decided to remove this hypothesis from both the figure and the text to avoid any ambiguity.

      We appreciate the reviewer’s input and will consider investigating this pathway in future studies.

      (2) When discussing Figure S5B the authors claim that SN-38 + AZD2858 progressively increases the fractions of BrdU positive cells, but this is not supported by statistical analysis.

      The fractions are still very small, so I would like to see statistics on these data. Alternatively, the authors could take out this conclusion. 

      Thank you for your valuable comment. In response, we have conducted a statistical analysis (Mann-Whitney test) on the data, and the results have been added to Figure S5C for the 6-hour time point and Figure S5D for the 12-hour time point, based on three independent biological replicates. We hope this provides the necessary clarification.

      Minor comments: 

      - Page 5 Materials and methods - Cell culture. Last sentence "Add in what medium you cultured them" looks like an internal review remark and should probably be removed? 

      We apologize for this oversight. The medium has now been specified, and the sentence has been removed.

      - The numbers in all the synergy matrices (in white font) are extremely small and virtually unreadable, and visually distracting. I recommend taking these out altogether. 

      We believe that the reduction in figure quality may be due to the PDF compression, which affected the resolution of the figures. We are happy to provide high-resolution versions of the figures separately for clarity. If the issue persists even with the higher resolution, we will consider removing the numbers, as suggested.

      - The legends of the synergy matrices (for example Fig 1D, 4E, 5, 6) are often extremely small, making it difficult to understand them intuitively. Please enlarge them and label them more clearly, and use larger fonts. In the legend of Figure 5D,E a green matrix indicating % live cells is mentioned but I don't see it. Do they mean the grey matrix? 

      We have enlarged the figure legends and will provide high-resolution versions of the figures to ensure all details are clearly readable. Regarding Figure 5D,E: we acknowledge that the color may appear differently (more green or gray) depending on the display or printer settings. To avoid any confusion, we have corrected the legend to specify that the color in question is khaki, rather than green. Moreover, following suggestions of the reviewer #2, these figures have been respectively moved to Figure S6B and S6C.

      - Figure S2. Perhaps I misunderstand the PML body experiment but the authors seem to use PML body formation to support their idea that AZD2858 blocks TopBP1 condensate formation and not just any condensate formation. However, if this is the case they would need a proper positive control, i.e. an additional experimental condition in which they do see PLM bodies. 

      Arsenic is a well-known positive control for experiments involving PML bodies due to its ability to induce specific responses in PML proteins and modify PML nuclear bodies (NBs) structure and function (Jaffray et al., 2023, JCB ; Zhu et al., 1997, PNAS). Thus, we used Arsenic as a positive control and observed a significant increase in PML NBs vs the other conditions (Kruskal-Wallis test) as indicated below. We thus implemented the results in the corresponding figure S2B and text.

      Author response image 3.

      PML condensates were tested after 2 h of incubation. AZD2858 : 100nM ; SN-38 : 300nM ; Arsenic : 6µM. ****: p<0.0001 (Kruskal-Wallis test).

      - The quantification of the flow cytometry data needs to be clarified. I find it strange that in the figures (for example Figure 3A and 3C) representative examples are shown of apparently 3 replicates, and that the percentages shown in these examples are then the given in the text as the overall numbers; for example on page 18 "...BrdU incorporation increased from 16.11% (SN38 alone) to 41.83% (combination)...". This type of description is done in multiple places in the Results section and is confusing. It would be clearer if the authors show proper quantifications (mean +/- sem) of the percentages of (the relevant) gated populations. Besides, I don't think it make a lot of sense to mention in the text the percentages with 2 decimals behind the comma. This suggests a level of precision that does not seem justified in flow cytometry data. Finally, all flow cytometry plots look visually very busy and all the text is crammed in with really small fonts. Cleaning them up and enlarging the fonts of the remaining text/numbers would really improve the readability of the figures. 

      Thank you for your helpful comments. We understand your concern regarding the flow cytometry quantification. Indeed, the percentages presented in the figures are derived from representative replicates, and we acknowledge that this presentation could be confusing. To address this, we have included a table summarizing the data from all replicates to improve readability [Table S2 and S3 in the new version]. Second, we specified in the text that the data are representative biological replicates when needed. Third, we have performed statistical analyses on the three replicates when necessary, as shown in Supplementary Figure S5C-F in the new version. The text has been revised to reflect the correct statistical interpretation.

      Regarding the use of two decimal, we are unable to remove them due to limitations in the software (Kaluza) used for flow cytometry analysis. However, we agree that this level of precision may not be warranted, and we have revised the text where appropriate to reduce confusion.

      - In Figure 5G the authors show that FOLFIRI + AZD2858 are synergistic in two SN-38-resistant cell lines. They conclude that this combination may overcome drug resistance. But tried to figure out the used FOLFIRI concentrations used in these cell lines and they still seem far higher than the SN-38-sensitive HCT116 cell lines, so I would like to see a bit more nuance in their interpretation. I think overcoming drug resistance is an overstatement, and perhaps alleviating would be a better term 

      Thank you for highlighting this important point; we have adjusted the text accordingly.

      - The legend in Table S2 refers to Figure 5A-B; this should be Figure 4A-B. 

      Thank you, this has been corrected and Table S2 is now moved to Table S4 .

      Reviewer #1 (Significance (Required)): 

      The finding that AZD2858 block TOPbp1 condensate formation via a pleiotropic effect of this compound is interesting and convincing. To my best knowledge it's a novel finding which is interesting to the potential target audience mentioned below. Their findings that inhibition of TOPbp1 condensation and ATR signaling via AZD2858 may synergize with FOLFIRI therapy in colorectal cancer cells are still very preliminary, because the effects on non-cancerous cells are not tested. 

      Researchers involved in early cancer drug discovery and cell biologists studying DNA damage responses in cancer cells seem to me typical audience interested and influenced by this paper. 

      I'm a cell biologist studying cell cycle fate decisions, and adaptation of cancer cells & stem cells to (drug-induced) stress. My expertise aligns well with the work presented throughout this paper. 

      Reviewer #2 (Evidence, reproducibility and clarity (Required)): 

      The authors have extended their previous research to develop TOPBP1 as a potential drug target for colorectal cancer by inhibiting its condensation. Utilizing an optogenetic approach, they identified the small molecule AZD2858, which inhibits TOPBP1 condensation and works synergistically with first-line chemotherapy to suppress colorectal cancer cell growth. The authors investigated the mechanism and discovered that disrupting TOPBP1 assembly inhibits the ATR/Chk1 signaling pathway, leading to increased DNA damage and apoptosis, even in drug-resistant colorectal cancer cell lines. Addressing the following concerns would enhance clarity and further in vivo work may improve significance: 

      (1) How does the optogenetic method for inducing condensates compare to the DNA damage induction mechanism? 

      Optogenetics provides a versatile and precise approach for controlling the condensation of scaffold proteins in both space and time. This method enables us to study the role of biomolecular condensates with minute-scale resolution, separating their formation from potentially confounding upstream events, such as DNA damage, and providing valuable insights into their specific function. Importantly, based on our previous publications on TopBP1 or SLX4 optogenetic condensates, we have substantial evidence indicating that light-induced condensates closely mimic those formed in response to DNA damage:

      - Functional similarity: Optogenetic condensates recapitulate endogenous condensates formed upon exposure of the cells of DNA damaging agents, and include most known partner proteins involved in the DNA damage response. It was shown for light induced-TopBP1 and SLX4 condensates (1-3).

      - Dynamic reversibility: Optogenetic condensates and DNA damage induced condensates are both dynamic and reversible. They dissolve within 15 minutes of light deactivation or after removal of the damaging agent (1,3).

      - Chromatin association: Both optogenetic and DNA damage-induced condensates are bound to chromatin or localized at sites of DNA damage (3).

      - Regulation: Both types of condensates are regulated similarly, with their formation triggered by the same signaling pathways. ATR basal activity drives the nucleation of opto-TopBP1 condensates and endogenous TopBP1 structures upon light exposure (1). Likewise, sumoylation modifications regulate the formation of opto-SLX4 condensates and endogenous SLX4 condensates (3).

      - Structurally: Using super-resolution imaging by stimulation-emission-depletion (STED) microscopy, we observed that endogenous SLX4 nanocondensates formed globular clusters that were indistinguishable from recombinant light induced SLX4 condensates (1,3).  

      (1) Frattini C, Promonet A, Alghoul E, Vidal-Eychenie S, Lamarque M, Blanchard MP, et al. TopBP1 assembles nuclear condensates to switch on ATR signaling. Molecular Cell. 18 mars 2021;81(6):1231-1245.e8. 

      (2) Alghoul E, Basbous J, Constantinou A. An optogenetic proximity labeling approach to probe the composition of inducible biomolecular condensates in cultured cells. STAR Protocols. 2021;2(3):100677. 

      (3) Alghoul E, Basbous J, Constantinou A. Compartmentalization of the DNA damage response: Mechanisms and functions. DNA Repair. août 2023;128:103524.

      (2) Why wasn't the initial screen conducted on the HCT116-SN50 resistant cell line? 

      Thank you for raising this important question, which we also considered at the outset of the project. After careful consideration, we decided to use the HCT116 WT cells in order to obtain initial data from an unmodified cell line. It is worth mentioning that HCT116-SN50 cells exhibit slower proliferation compared to WT cells, and they also express an efflux pump capable of pumping out SN38. We were concerned that these factors might interfere with the optogenetic assay, which is why we chose to perform the screen using the WT HCT116 cells.

      (3) The labels in Fig. 1D are difficult to recognize. 

      This issue was also raised by Reviewer #1. We suspect that the PDF conversion may have reduced the resolution of the figures, so we will provide them separately in high resolution. In addition, we have increased the size of some labels to improve their clarity.

      The selected cell image in Fig. 2A for SN-38 seems over-representative; unselected cells appear similar to other groups. Why does AZD2858 itself induce TopBP1 condensates in the plot, yet this is not evident in the images? 

      Thank you for your comment; we have updated the figure with a more representative image. We indeed observe that AZD2858 alone induces a slight increase in TopBP1 condensates. However, this increase did not lead to the activation of the ATR/Chk1 signaling pathway, as shown by the Western blot data presented in Fig. 2B. In addition, AZD2858 specifically prevents the formation of TopBP1 condensates induced by SN38 treatment, and the level of TopBP1 condensates does not return to the basal levels observed in untreated cells, but rather to those observed with AZD2858 treatment. During the 2-hour AZD2858 treatment, the progression of replication forks was unaffected (Fig. 3A and 3B). However, when AZD2858 was added alone to the Xenopus egg extracts, there was increased recruitment of TopBP1 to the chromatin (Fig. 2E). This result suggests that AZD2858 alone can induce the assembly of TopBP1 on chromatin to initiate DNA replication (a well-established role of TopBP1), but the number and concentration of TopBP1 molecules did not reach levels sufficient to activate the ATR/Chk1 pathway.

      (4) In Fig. 3A, despite the drastic change in the FACS plot shape, the quantifications appear quite similar. 

      Thank you for this insightful observation. The gates for the S phase were intentionally set wider to avoid biasing the results and inadvertently excluding the population that incorporates BrdU weakly (but still incorporates it) in the SN-38 only condition. As a result, the percentage of cells within this gate remains similar, even though the overall shape of the FACS plot changes, reflecting a shift in the distribution of BrdU incorporation. This point has now been clarified in the legend of the Figure 3A.

      This effect can also be attributed to the relatively short treatment time (2 hours), which captures early changes in DNA synthesis. The effect becomes more pronounced at later time points, as shown in Figure 3C. For example, after 6 hours of treatment, the percentage of BrdU-positive cells increases from 15% with SN-38 alone to 41% with the AZD2858 combination, demonstrating a clearer impact on DNA synthesis. A graph summarizing the statistical analysis has been added to Figure S5C for the 6-hour time point and Figure S5D for the 12-hour time point, based on data from three independent biological replicates.

      (5) The results section is imbalanced; Figs. 5 and 6 could be combined into one figure. 

      We have combined Figures 5 and 6 into a single figure to optimize the presentation of results. To avoid overloading the new figure, some of the data have been moved to supplementary figures, ensuring the main figure remains clear and focused.

      (6) An in vivo study is anticipated to assess the drug's efficacy. 

      Although AZD2858 was developed a few years ago, there is a limited amount of in vivo data available, which led us to consider potential issues related to the drug's biodistribution or its pharmacokinetics (PK). Despite these concerns, we proceeded with preliminary in vivo studies, testing various diluents and injection routes for AZD2858. However, we observed that the compound was not effective in vivo. Given the strong synergistic effects observed in vitro, we concluded that AZD2858 was likely not being distributed properly in the mice. As a result, we have decided to conduct a more detailed investigation into the pharmacokinetics (PK), pharmacodynamics (PD), and absorption, distribution, metabolism, and excretion (ADME) of AZD2858 to better understand its in vivo behavior and efficacy. Therefore, the in vivo evaluation of AZD2858 will be addressed in a separate study specifically focused on this aspect.

      Reviewer #2 (Significance (Required)): 

      Addressing the stated concerns would enhance clarity and further in vivo work may improve significance. 

      Reviewer #3 (Evidence, reproducibility and clarity (Required)): 

      Summary 

      In 2021 (PMID: 33503405) and 2024 (PMID: 38578830) Constantinou and colleagues published two elegant papers in which they demonstrated that the Topbp1 checkpoint adaptor protein could assemble into mesoscale phase-separated condensates that were essential to amplify activation of the PIKK, ATR, and its downstream effector kinase, Chk1, during DNA damage signalling. A key tool that made these studies possible was the use of a chimeric Topbp1 protein bearing a cryptochrome domain, Cry2, which triggered condensation of the chimeric Topbp1 protein, and thus activation of ATR and Chk1, in response to irradiation with blue light without the myriad complications associated with actually exposing cells to DNA damage. 

      In this current report Morano and co-workers utilise the same optogenetic Topbp1 system to investigate a different question, namely whether Topbp1 phase-condensation can be inhibited pharmacologically to manipulate downstream ATR-Chk1 signalling. This is of interest, as the therapeutic potential of the ATR-Chk1 pathway is an area of active investigation, albeit generally using more conventional kinase inhibitor approaches. 

      The starting point is a high throughput screen of 4730 existing or candidate small molecule anticancer drugs for compounds capable of inhibiting the condensation of the Topbp1-Cry2mCherry reporter molecule in vivo. A surprisingly large number of putative hits (>300) were recorded, from which 131 of the most potent were selected for secondary screening using activation of Chk1 in response to DNA damage induced by SN-38, a topoisomerase inhibitor, as a surrogate marker for Topbp1 condensation. From this the 10 most potent compounds were tested for interactions with a clinically used combination of SN-38 and 5-FU (FOLFIRI) in terms of cytotoxicity in HCT116 cells. The compound that synergised most potently with FOLFIRI, the GSK3-beta inhibitor drug AZD2858, was selected for all subsequent experiments. 

      AZD2858 is shown to suppress the formation of Topbp1 (endogenous) condensates in cells exposed to SN-38, and to inhibit activation of Chk1 without interfering with activation of ATM or other endpoints of damage signalling such as formation of gamma-H2AX or activation of Chk2 (generally considered to be downstream of ATM). AZD2858 therefore seems to selectively inhibit the Topbp1-ATR-Chk1 pathway without interfering with parallel branches of the DNA damage signalling system, consistent with Topbp1 condensation being the primary target. Importantly, neither siRNA depletion of GSK3-beta, or other GSK3-beta inhibitors were able to recapitulate this effect, suggesting it was a specific non-canonical effect of AZD2858 and not a consequence of GSK3-beta inhibition per se. 

      To understand the basis for synergism between AZD2858 and SN-38 in terms of cell killing, the effect of AZD2858 on the replication checkpoint was assessed. This is a response, mediated via ATR-Chk1, that modulates replication origin firing and fork progression in S-phase cell under conditions of DNA damage or when replication is impeded. SN-38 treatment of HCT116 cells markedly suppresses DNA replication, however this was partially reversed by co-treatment with AZD2858, consistent with the failure to activate ATR-Chk1 conferring a defect in replication checkpoint function. 

      Figures 4 and 5 demonstrate that AZD2858 can markedly enhance the cytotoxic and cytostatic effects of SN-38 and FOLFIRI through a combination of increased apoptosis and growth arrest according to dosage and treatment conditions. Figure 6 extends this analysis to cells cultured as spheroids, sometimes considered to better represent tumor responses compared to single cell cultures. 

      Major comments 

      Most of the data presented is of good technical quality and supports the conclusions drawn. There are however a small number of instances where this is not true; ie where the data are of insufficient technical quality, or where the description or interpretation of the results is at variance with the data which is presented. Some examples: 

      (1) Fig.2E - the claim that "we observed an increase in RPA, Topb1 and Pol-epsilon levels when CPT and AZD2858 were added together" do not seem to be justified by the data provided. It is also unclear what the purpose/ significance of this experiment is. 

      Thank you for pointing out the contradiction in Figure 2E. Upon review, we identified an error in the labeling of conditions (CPT and AZD2858 were inadvertently swapped). The corrected figure now clearly shows that, at the 60-minute timepoint after starting replication, the combination of

      CPT and AZD2858 results in a greater accumulation of TopBP1, Pol ε, and RPA on chromatin compared to CPT alone. We have revised the sentence to: "Our data demonstrate that combining CPT and AZD2858 earlier enhances the accumulation of replication-related factors (RPA, TopBP1, and Pol ε) on chromatin compared to CPT treatment alone, particularly visible at the 60minute after starting replication."

      The significance of this experiment lies in its connection to the earlier observation that AZD2858 restores BrdU incorporation when combined with SN-38, as shown in flow cytometry data (Figure 3A). At a molecular level, this was further supported by DNA fiber assays, which revealed that replication tracks (CldU tracts) were longer in the combination treatment compared to SN-38 alone (Figure 3B).

      To strengthen and validate these findings, we chose to employ the Xenopus egg extract system for several reasons. This model provides a highly controlled environment where DNA replication occurs without confounding effects from transcription or translation. Moreover, replication is limited to a single round, offering a unique opportunity to specifically interrogate replication mechanisms. These attributes make the Xenopus model an ideal system to confirm that AZD2858 facilitates replication recovery in the presence of replication stress induced by agents like CPT. This will lead, in longer treatment, to accumulation of DNA damage and apoptosis (Figure 3D-E and Figure 4A-D)

      (2) Figs. 3 A and C certainly show that the SN-38-mediated suppression of DNA synthesis is modified and partially alleviated by co-treatment with AZD2858. The statement however that "prolonged co-incubation with AZD2858 for 6 and 12 hours effectively abolished the SN-38 induced S-phase checkpoint" is clearly misleading. If this were true, then the BrdU incorporation profiles of the respective samples would be similar or identical to control, which clearly they are not. Clearly AZD2858 is affecting the imposition of the S-phase checkpoint in some way, but not "abolishing" it. 

      We appreciate the reviewer’s detailed observations regarding Figures 3A and 3C and the phrasing in our manuscript. We agree that the term "abolished" is not precise in describing the effects of AZD2858 on the SN-38-induced S-phase checkpoint.

      To clarify: our data indicate that co-treatment with AZD2858 modifies and partially alleviates the SN-38-induced suppression of DNA synthesis, as demonstrated by increased BrdU incorporation relative to SN-38 treatment alone. However, as the reviewer correctly points out, the BrdU incorporation profiles of the co-treated samples do not fully return to control non treated cells levels. This suggests that while AZD2858 significantly mitigates the S-phase checkpoint, it does not completely abolish it.

      We have revised the statement in the manuscript to better reflect these findings, as follows: "Prolonged co-incubation with AZD2858 for 6 and 12 hours significantly alleviated the SN-38induced S-phase checkpoint, as evidenced by the partially increased BrdU incorporation. However, the population of co-treated cells is heterogeneous: some cells exhibit BrdU incorporation levels similar to those of untreated control cells, while others incorporate BrdU at levels comparable to cells treated with SN-38 alone. This indicates that AZD2858 does not fully restore DNA synthesis to control levels across the entire cell population."

      This revised phrasing aligns with the data presented and acknowledges the partial recovery of DNA synthesis observed. Thank you for bringing this to our attention and helping us improve the accuracy of our conclusions.

      (3) Fig. 3 E. The western blots of pDNA-PKcs (S2056) and total DNA-PKcs are really not interpretable. It is possible to sympathise that these reagents are probably extremely difficult to work with and obtain clear results, however uninterpretable results are not acceptable. 

      We agree that the data presented in the Fig3E are difficult to interpret. As noted by Reviewer 1, we recognize the challenge of obtaining clear and reliable results with these specific reagents. Based on this feedback, and to ensure the robustness of our conclusions, we have decided to exclude these specifics blots from the revised manuscript.

      We believe that this adjustment will enhance the clarity and reliability of the manuscript while focusing on the other, more interpretable data presented. Thank you for pointing this out, and we appreciate your understanding.

      (4) Fig. 3D. This is a puzzling image. Described as a PFGE assay, it presumably depicts an agarose gel, with intact genomic DNA at the top and a discrete band below representing fragmented genomic DNA. This is a little surprising, as fragmented genomic DNA does not usually appear as a specific band but as a heterogenous population or "smear". Nevertheless, even if one accepts this premise, it is unclear what is meant by "DSBs remained elevated after the combined treatment" when the intensity of this band is equivalent for both SN-38 and SN-38 + AZD2858 treatments. 

      We thank the reviewer for his insightful comments regarding the PFGE results in Figure 3D. We agree that the appearance of a discrete band, rather than a heterogeneous smear, is atypical for fragmented genomic DNA in this assay. However, by enhancing the signal intensity (as shown below), the expected smear becomes more appreciable.

      Author response image 4.

      Regarding the interpretation of the band intensities, we agree that the signals for SN-38 and SN38 + AZD2858 appear similar under these specific conditions. At the relatively high concentration of SN-38 used in this experiment (300 nM), it is indeed challenging to observe a more pronounced effect on DNA breaks. This is why we proposed the "DSBs remained elevated after the combined treatment" because the band intensity of SN-38 single agent treated cells or combined with AZD2858 is comparable. However, we note a slightly more intense γH2AX signal over time when AZD2858 is combined with SN-38 compared to SN-38 alone (Figure 3E). Furthermore, under lower, sub-optimal doses of SN-38 and over extended incubation treatment (48h), we observe a clearer increase in fragmented DNA bands, as demonstrated in Figure 4D.

      Minor comments 

      (1) Fig. 1. A surprisingly large number of compounds scored positive in the primary screen for inhibition of Topbp1 condensation (>300). Of the 131 of these selected for secondary screening using Chk1 activation (S345 phosphorylation) as a readout approximately 2/3 were negative, implying that a majority of the tested compounds inhibited Topbp1 condensation but not Chk1 activation. What could explain that?

      Thank you for this thoughtful comment. The discrepancy between the large number of compounds scoring positive for TopBP1 condensation inhibition and the smaller number inhibiting Chk1 activation (S345 phosphorylation) could be attributed to several factors:

      • Different cell lines and induction methods: The initial screen was conducted in HEK293 TrexFlpin cells overexpressing optoTopBP1, while the secondary screen used HCT116 cells. In addition, the methods used to induce the respective pathways were distinct: in the primary screen, we employed a blue light induction of opto-TopBP1 condensates, whereas in the secondary screen, we used an SN-38 treatment to induce DNA replication stress and activate the Chk1 pathway. These differences could account for the varying responses observed in the two screens.

      • The compounds that inhibited TopBP1 condensation might not fully block Chk1 activation. While they disrupt TopBP1 condensation, they may still allow for partial activation of Chk1 or Chk1 activation through alternative mechanisms. For instance, Chk1 activation could be mediated by other signaling pathways or molecules, such as ETAA1, a known Chk1 activator (1). Thus, TopBP1 condensation inhibition does not necessarily translate to complete inhibition of Chk1 activation, especially if ETAA1 is employed by cells as a rescue activator.

      • Some compounds may affect chromosome dynamics, potentially generating mechanical forces or torsional stress that could activate the ATR/Chk1 pathway independently of TopBP1

      (2).

      These factors suggest that while the compounds effectively disrupt TopBP1 condensation, they may not always fully inhibit the downstream Chk1 activation, pointing to the complexity of the DNA damage response pathways. 

      (1) Bass, T. E. et al. ETAA1 acts at stalled replication forks to maintain genome integrity. Nat Cell Biol 18, 1185–1195 (2016).

      (2) Kumar, A. et al. ATR Mediates a Checkpoint at the Nuclear Envelope in Response to Mechanical Stress. Cell 158, 633–646 (2014).

      (2) Fig. 2D. The protein-protein interaction assay shown demonstrates that AZD2858 ablates the light-induced auto-interaction between exogenous opto-Topbp1 molecules and ATR plus or minus SN-38, but clearly endogenous Topbp1 molecules do not participate. Why is this? 

      The biotin proximity labeling assay was conducted without exposing cells to light, using a TurboID module fused to TopBP1-mCherry-CRY2. Stable cell lines were then generated in HEK293 TrexFlpIn cells, where endogenous TopBP1 is still expressed. Upon adding doxycycline, the recombinant TurboID-TopBP1-mCherry-Cry2 (opto-TopBP1) is induced at levels comparable to endogenous TopBP1 (Fig 2D).

      Since the opto-TopBP1 construct exhibits behavior similar to that of endogenous TopBP1 (1), we used it to investigate whether TopBP1 self-assembly and its interaction with ATR are influenced by AZD2858 alone or in combination with SN38. Our results show that treatment with SN38 increases the proximity between opto-TopBP1 and the endogenous TopBP1 (not fused to TurboID). However, AZD2858, either alone or in combination with SN38, disrupts the selfassembly of recombinant TopBP1 with itself as well as its interaction with endogenous TopBP1.

      (1) Frattini C, Promonet A, Alghoul E, Vidal-Eychenie S, Lamarque M, Blanchard MP, et al. TopBP1 assembles nuclear condensates to switch on ATR signaling. Molecular Cell. 18 mars 2021;81(6):1231-1245.e8.

      Reviewer #3 (Significance (Required)): 

      Significance 

      Liquid phase separation of protein complexes is increasingly recognised as a fundamental mechanism in signal transduction and other cellular processes. One recent and important example was that of Topbp1, whose condensation in response to DNA damage is required for efficient activation of the ATR-Chk1 pathway. The current study asks a related but distinct question; can protein condensation be targeted by drugs to manipulate signalling pathways which in the main rely on protein kinase cascades? 

      Here, the authors identify an inhibitor of GSK3-beta as a novel inhibitor of DNA damage-induced Topbp1 condensation and thus of ATR-Chk1 signalling. 

      This work will be of interest to researchers in the fields of DNA damage signalling, biophysics of protein condensation, and cancer chemotherapy.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      In this paper by Brickwedde et al., the authors observe an increase in posterior alpha when anticipating auditory as opposed to visual targets. The authors also observe an enhancement in both visual and auditory steady-state sensory evoked potentials in anticipation of auditory targets, in correlation with enhanced occipital alpha. The authors conclude that alpha does not reflect inhibition of early sensory processing, but rather orchestrates signal transmission to later stages of the sensory processing stream. However, there are several major concerns that need to be addressed in order to draw this conclusion.

      First, I am not convinced that the frequency tagging method and the associated analyses are adequate for dissociating visual vs auditory steady-state sensory evoked potentials.

      Second, if the authors want to propose a general revision for the function of alpha, it would be important to show that alpha effects in the visual cortex for visual perception are analogous to alpha effects in the auditory cortex for auditory perception.

      Third, the authors propose an alternative function for alpha - that alpha orchestrates signal transmission to later stages of the sensory processing stream. However, the supporting evidence for this alternative function is lacking. I will elaborate on these major concerns below.

      (1) Potential bleed-over across frequencies in the spectral domain is a major concern for all of the results in this paper. The fact that alpha power, 36Hz and 40Hz frequency-tagged amplitude and 4Hz intermodulation frequency power is generally correlated with one another amplifies this concern. The authors are attaching specific meaning to each of these frequencies, but perhaps there is simply a broadband increase in neural activity when anticipating an auditory target compared to a visual target?

      We appreciate the reviewer’s insightful comment regarding the potential bleed-over across frequencies in the spectral domain. We fully acknowledge that the trade-off between temporal and frequency resolution is a challenge, particularly given the proximity of the frequencies we are examining.

      To address this concern, we performed additional analyses to investigate whether there is indeed a broadband increase in neural activity when anticipating an auditory target as compared to a visual target, as opposed to distinct frequency-specific effects. Our results show that the bleed-over between frequencies is minimal and does not significantly affect our findings. Specifically, we repeated the analyses using the same filter and processing steps for the 44 Hz frequency. At this frequency, we did not observe any significant differences between conditions.

      These findings suggest that the effects we report are indeed specific to the 40 Hz frequency band and not due to a general broadband increase in neural activity. We hope this addresses the reviewer’s concern and strengthens the validity of our frequency-specific results.

      Author response image 1.

      Illustration of bleeding over effects over a span of 4 Hz. A, 40 Hz frequency-tagging data over the significant cluster differing between when expecting an auditory versus a visual target (identical to Fig. 9 in the manuscript). B, 44 Hz signal over the same cluster chosen for A. The analysis was identical with the analysis performed in  A, apart from the frequency for the band-pass filter.

      We do, however, not specifically argue against the possibility of a broadband increase when anticipating an auditory compared to a visual target. But even a broadband-increase would directly contradict the alpha inhibition hypothesis, which poses that an increase in alpha completely disengages the whole cortex. We will clarify this point in the revised manuscript.

      (2) Moreover, 36Hz visual and 40Hz auditory signals are expected to be filtered in the neocortex. Applying standard filters and Hilbert transform to estimate sensory evoked potentials appears to rely on huge assumptions that are not fully substantiated in this paper. In Figure 4, 36Hz "visual" and 40Hz "auditory" signals seem largely indistinguishable from one another, suggesting that the analysis failed to fully demix these signals.

      We appreciate the reviewer’s insightful concern regarding the filtering and demixing of the 36 Hz visual and 40 Hz auditory signals, and we share the same reservations about the reliance on standard filters and the Hilbert transform method.

      To address this, we would like to draw attention to Author response image 1, which demonstrates that a 4 Hz difference is sufficient to effectively demix the signals using our chosen filtering and Hilbert transform approach. We believe that the reason the 36 Hz visual and 40 Hz auditory signals show similar topographies lies not in incomplete demixing but rather in the possibility that this condition difference reflects sensory integration, rather than signal contamination.

      This interpretation is further supported by our findings with the intermodulation frequency at 4 Hz, which also suggests cross-modal integration. Furthermore, source localization analysis revealed that the strongest condition differences were observed in the precuneus, an area frequently associated with sensory integration processes. We will expand on this in the discussion section to better clarify this point.

      (3) The asymmetric results in the visual and auditory modalities preclude a modality-general conclusion about the function of alpha. However, much of the language seems to generalize across sensory modalities (e.g., use of the term 'sensory' rather than 'visual').

      We thank the reviewer for pointing this out and agree that in some cases we have not made a good enough distinction between visual and sensory. We will make sure, that when using ‘sensory’, we either describe overall theories, which are not visual-exclusive or refer to the possibility of a broad sensory increase. However, when directly discussing our results and the interpretation thereof, we will now use ‘visual’ in the revised manuscript.

      (4) In this vein, some of the conclusions would be far more convincing if there was at least a trend towards symmetry in source-localized analyses of MEG signals. For example, how does alpha power in the primary auditory cortex (A1) compare when anticipating auditory vs visual target? What do the frequency-tagged visual and auditory responses look like when just looking at the primary visual cortex (V1) or A1?

      We thank the reviewer for this important suggestion and have added a virtual channel analysis. We were however, not interested in alpha power in primary auditory cortex, as we were specifically interested in the posterior alpha, which is usually increased when expecting an auditory compared to a visual target (and used to be interpreted as a blanket inhibition of the visual cortex). We will improve upon the clarity concerning this point in the manuscript.

      We have however, followed the reviewer’s suggestion of a virtual channel analysis, showing that the condition differences are not observable in primary visual cortex for the 36 Hz visual signal and in primary auditory cortex for the 40 Hz auditory signal. Our data clearly shows that there is an alpha condition difference in V1, while there no condition difference for 36 Hz in V1 and for 40 Hz in Heschl’s Gyrus (see Author response image 2).

      Author response image 2.

      Virtual channels for V1 and Helschl’s gyrus. A, alpha power for the virtual channel created in V1 (Calcerine_L and Calcerine_R from AAL atlas; Tzourio-Mazoyer et al., 2002, NeuroImage). A cluster permutation analysis over time (between -2 and 0) revealed a significant condition difference between ~ -2 and -1.7 s (p = 0.0449). B, 36 Hz frequency-tagging signal for the virtual channel created in V1 (equivalent to the procedure in A). The same cluster permutation as performed in A revealed no significant condition differences. C, 40 Hz frequency-tagging signal for the virtual channel created in Heschl’s gryrus (Heschl_L and Heschl_R from AAL atlas; Tzourio-Mazoyer et al., 2002, NeuroImage). The same cluster permutation as performed in A revealed no significant condition differences.

      (5) Blinking would have a huge impact on the subject's ability to ignore the visual distractor. The best thing to do would be to exclude from analysis all trials where the subjects blinked during the cue-to-target interval. The authors mention that in the MEG experiment, "To remove blinks, trials with very large eye-movements (> 10 degrees of visual angle) were removed from the data (See supplement Fig. 5)." This sentence needs to be clarified since eye-movements cannot be measured during blinking. In addition, it seems possible to remove putative blink trials from EEG experiments as well, since blinks can be detected in the EEG signals.

      We thank the reviewer for mentioning that we were making this point confusing. From the MEG-data, we removed eyeblinks using ICA. Alone for the supplementary Fig. 5 analysis, we used the eye-tracking data to confirm that participants were in fact fixating the centre of the screen. For this analysis, we removed trials with blinks (which can be seen in the eye-tracker as huge amplitude movements or as large eye-movements in degrees of visual angle; see Author response image 3 below to show a blink in the MEG data and the according eye-tracker data in degrees of visual angle). We will clarify this in the methods section.

      As for the concern closed eyes to ignore visual distractors, in both experiments we can observe highly significant distractor cost in accuracy for visual distractors, which we hope will convince the reviewer that our visual distractors were working as intended.

      Author response image 3.

      Illustration of eye-tracker data for a trial without and a trial with a blink. All data points recorded during this trial are plottet. A, ICA component 1, which reflects blinks and its according data trace in a trial. No blink is visible. B, eye-tracker data transformed into degrees of visual angle for the trial depicted in A. C, ICA component 1, which reflects blinks and its according data trace in a trial. A clear blink is visible. D, eye-tracker data transformed into degrees of visual angle for the trial depicted in C.

      (6) It would be interesting to examine the neutral cue trials in this task. For example, comparing auditory vs visual vs neutral cue conditions would be indicative of whether alpha was actively recruited or actively suppressed. In addition, comparing spectral activity during cue-to-target period on neutral-cue auditory correct vs incorrect trials should mimic the comparison of auditory-cue vs visual-cue trials. Likewise, neutral-cue visual correct vs incorrect trials should mimic the attention-related differences in visual-cue vs auditory-cue trials.

      We thank the reviewer for this suggestion. We have analysed the neutral cue trials in the EEG dataset (see suppl. Fig. 1) and will expand this figure to show all conditions. There were no significant differences to auditory or visual cues, but descriptively alpha power was higher for neutral cues compared to visual cues and lower for neutral cues compared to auditory cues. While this may suggest that for visual trials alpha is actively suppressed and for auditory trials actively recruited, we do not feel comfortable to make this claim, as the neutral condition may not reflect a completely neutral state. The neutral task can still be difficult, especially because of the uncertainty of the target modality.

      As for the analysis of incorrect versus correct trials, we love the idea, but unfortunately the accuracy rate was quite high so that the number of incorrect trials would not be sufficient to perform a reliable analysis.

      (7) In the abstract, the authors state that "This implies that alpha modulation does not solely regulate 'gain control' in early sensory areas but rather orchestrates signal transmission to later stages of the processing stream." However, I don't see any supporting evidence for the latter claim, that alpha orchestrates signal transmission to later stages of the processing stream. If the authors are claiming an alternative function to alpha, this claim should be strongly substantiated.

      We thank the reviewer for pointing out, that we have not sufficiently explained our case. The first point refers to gain control akin to the alpha inhibition hypothesis, which claims that increases in alpha disengage a whole cortical area. Since we have confirmed the alpha increase in our data to originate from primary visual cortex through source analysis, this should lead to decreased visual processing. The increase in 36 Hz visual processing therefore directly contradicts the alpha inhibition hypothesis. We propose an alternative explanation for the functionality of alpha activity in this task. Through pulsed inhibition, information packages of relevant visual information could be transmitted down the processing stream, thereby enhancing relevant visual signal transmission. We believe the fact that the enhanced visual 36 Hz signal we found correlated with visual alpha power on a trial-by-trial basis, and did not originate from primary visual cortex, but from areas known for sensory integration supports our claim.

      We will make this point clearer in our revised manuscript.

      Reviewer #2 (Public review):

      Brickwedde et al. investigate the role of alpha oscillations in allocating intermodal attention. A first EEG study is followed up with a MEG study that largely replicates the pattern of results (with small to be expected differences). They conclude that a brief increase in the amplitude of auditory and visual stimulus-driven continuous (steady-state) brain responses prior to the presentation of an auditory - but not visual - target speaks to the modulating role of alpha that leads them to revise a prevalent model of gating-by-inhibition.

      Overall, this is an interesting study on a timely question, conducted with methods and analysis that are state-of-the-art. I am particularly impressed by the author's decision to replicate the earlier EEG experiment in MEG following the reviewer's comments on the original submission. Evidently, great care was taken to accommodate the reviewer's suggestions.

      We thank the reviewer for the positive feedback and expression of interest in the topic of our manuscript.

      Nevertheless, I am struggling with the report for two main reasons: It is difficult to follow the rationale of the study, due to structural issues with the narrative and missing information or justifications for design and analysis decisions, and I am not convinced that the evidence is strong, or even relevant enough for revising the mentioned alpha inhibition theory. Both points are detailed further below.

      We thank the reviewer for raising this important point. We will revise our introduction and results in line with the reviewer’s suggestions, hoping that our rationale will then be easier to follow and that our evidence will be more convincing.

      Strength/relevance of evidence for model revision: The main argument rests on 1) a rather sustained alpha effect following the modality cue, 2) a rather transient effect on steady-state responses just before the expected presentation of a stimulus, and 3) a correlation between those two. Wouldn't the authors expect a sustained effect on sensory processing, as measured by steady-state amplitude irrespective of which of the scenarios described in Figure 1A (original vs revised alpha inhibition theory) applies? Also, doesn't this speak to the role of expectation effects due to consistent stimulus timing? An alternative explanation for the results may look like this: Modality-general increased steady-state responses prior to the expected audio stimulus onset are due to increased attention/vigilance. This effect may be exclusive (or more pronounced) in the attend-audio condition due to higher precision in temporal processing in the auditory sense or, vice versa, too smeared in time due to the inferior temporal resolution of visual processing for the attend-vision condition to be picked up consistently. As expectation effects will build up over the course of the experiment, i.e., while the participant is learning about the consistent stimulus timing, the correlation with alpha power may then be explained by a similar but potentially unrelated increase in alpha power over time.

      We thank the reviewer for raising these insightful questions and suggestions.

      It is true that our argument rests on a rather sustained alpha effect and a rather transient effect on steady-state responses and a correlation between the two. However, this connection would not be expected under the alpha inhibition hypothesis, which states that alpha activity would inhibit a whole cortical area (when irrelevant to the task), exerting “gain control”. This notion directly contradicts our results of the “irrelevant” visual information a) being transmitted at all and b) increasing.

      However, it has been shown on many occasions that alpha activity exerts pulsed inhibition, so we proposed an alternative theory of an involvement in signal transmission. In this case, the cyclic inhibition would serve as an ordering system, which only allows for high-priority information to pass, resulting in higher signa-to-noise. We do not make a claim about how fast or when these signals are transmitted in relation to alpha power. For instance, it could be that alpha power increases as a preparatory state even before signal is actually transmitted.  Zhigalov (2020 Hum. Brain M.) has shown that in V1, frequency-tagging responses were up-and down regulated with attention – independent of alpha activity.

      But we do believe that the fact that visual alpha power correlates on a trial-by-trial level with visual 36 Hz frequency-tagging increases and (a relationship which has not been found in V1, see Zhigalov 2020, Hum. Brain Mapp.) suggest a strong connection. Furthermore, the fact that the alpha modulation originates from early visual areas and occurs prior to any frequency-tagging changes, while the increase in frequency-tagging can be observed in areas which are later in the processing stream (such as the precuneus) is strongly indicative for an involvement of alpha power in the transmission of this signal. We cannot fully exclude alternative accounts and mechanisms which effect both alpha power and frequency-tagging responses. 

      We do believe that the alternative account described by the reviewer does not contradict our theory, as we do believe that the alpha power modulation may reflect an expectation effect (and the idea that it could be related to the resolution of auditory versus visual processing is very interesting!). It is also possible that this expectation is, as the reviewer suggests, related to attention/vigilance and might result in a modality-general signal increase. And indeed, we can observe an increase in the frequency-tagging response in sensory integration areas. Accordingly, we believe that the alternative explanation provided by the reviewer contradicts the alpha inhibition hypothesis, but not necessarily our alternative theory.

      We will revise the discussion, which we hope will make our case stronger and easier to follow. Additionally, we will mention the possibility for alternative explanations as well as the possibility, that alpha networks fulfil different roles in different locations/task environments.

      Structural issues with the narrative and missing information: Here, I am mostly concerned with how this makes the research difficult to access for the reader. I list the major points below:

      In the introduction the authors pit the original idea about alpha's role in gating against some recent contradictory results. If it's the aim of the study to provide evidence for either/or, predictions for the results from each perspective are missing. Also, it remains unclear how this relates to the distinction between original vs revised alpha inhibition theory (Fig. 1A). Relatedly if this revision is an outcome rather than a postulation for this study, it shouldn't be featured in the first figure.

      We agree with the reviewer that we have not sufficiently clarified our goal as well as how different functionalities of alpha oscillations would lead to different outcomes. We will revise the introduction and restructure the results and hope that it will be easier to follow.

      The analysis of the intermodulation frequency makes a surprise entrance at the end of the Results section without an introduction as to its relevance for the study. This is provided only in the discussion, but with reference to multisensory integration, whereas the main focus of the study is focussed attention on one sense. (Relatedly, the reference to "theta oscillations" in this sections seems unclear without a reference to the overlapping frequency range, and potentially more explanation.) Overall, if there's no immediate relevance to this analysis, I would suggest removing it.

      We thank the reviewer for pointing this out and will add information about this frequency to the introduction part. We believe that the intermodulation frequency analysis is important, as it potentially supports the notion that condition differences in the visual-frequency tagging response are related to downstream processing rather than overall visual information processing in V1. We would therefore prefer to leave this analysis in the manuscript.

      Reviewer #3 (Public review):

      Brickwedde et al. attempt to clarify the role of alpha in sensory gain modulation by exploring the relationship between attention-related changes in alpha and attention-related changes in sensory-evoked responses, which surprisingly few studies have examined given the prevalence of the alpha inhibition hypothesis. The authors use robust methods and provide novel evidence that alpha likely exhibits inhibitory control over later processing, as opposed to early sensory processing, by providing source-localization data in a cross-modal attention task.

      This paper seems very strong, particularly given that the follow-up MEG study both (a) clarifies the task design and separates the effect of distractor stimuli into other experimental blocks, and (b) provides source-localization data to more concretely address whether alpha inhibition is occurring at or after the level of sensory processing, and (c) replicates most of the EEG study's key findings.

      We are very grateful to the reviewer for their positive feedback and evaluation of our work.

      There are some points that would be helpful to address to bolster the paper. First, the introduction would benefit from a somewhat deeper review of the literature, not just reviewing when the effects of alpha seem to occur, but also addressing how the effect can change depending on task and stimulus design (see review by Morrow, Elias & Samaha (2023).

      We thank the reviewer for this suggestion and agree. We will add a paragraph to the introduction which refers to missing correlation studies and the impact of task design.

      Additionally, the discussion could benefit from more cautionary language around the revision of the alpha inhibition account. For example, it would be helpful to address some of the possible discrepancies between alpha and SSEP measures in terms of temporal specificity, SNR, etc. (see Peylo, Hilla, & Sauseng, 2021). The authors do a good job speculating as to why they found differing results from previous cross-modal attention studies, but I'm also curious whether the authors think that alpha inhibition/modulation of sensory signals would have been different had the distractors been within the same modality or whether the cues indicated target location, rather than just modality, as has been the case in so much prior work?

      We thank the reviewer for suggesting these interesting discussion points and will include a paragraph in our discussion which goes deeper into these topics.

      Overall, the analyses and discussion are quite comprehensive, and I believe this paper to be an excellent contribution to the alpha-inhibition literature.

    1. Author Response

      Reviewer #1 (Public Review):

      The manuscript, "A versatile high-throughput assay based on 3D ring-shaped cardiac tissues generated from human induced pluripotent stem cell-derived cardiomyocytes" developed a unique culture platform with PEG hydrogel that facilitates the in-situ measurement of contractile dynamics of the engineered cardiac rings. The authors optimized the tissue seeding conditions, demonstrated tissue morphology with expressions of cardiac and fibroblast markers, mathematically modeled the equation to derive contractile forces and other parameters based on imaging analysis, and ended by testing several compounds with known cardiac responses.

      To strengthen the paper, the following comments should be considered:

      1) This paper provided an intriguing platform that creates miniature cardiac rings with merely thousands of CMs per tissue in a 96-well plate format. The shape of the ring and the squeezing motion can recapitulate the contraction of the cardiac chamber to a certain degree. However, Thavandiran et al (PNAS 2013) created a larger version of the cardiac ring and found the electrical propagation revealed spontaneous infinite loop-like cycles of activation propagation traversing the ring. This model was used to mimic a reentrant wave during arrhythmia. Therefore, it presents great concerns if a large number of cardiac tissues experience arrhythmia by geometry-induced re-entry current and cannot be used as a healthy tissue model. It would be interesting to see the impulse propagation/calcium transient on these miniature cardiac rings and evaluate the % of arrhythmia occurrence.

      The size is a key factor impacting the electrical propagation within the generated tissues. Our ring-shaped cardiac tissues have a diameter of 360µm, which is largely smaller than other tissues proposed so far, including in Thavandiran et al (PNAS 2013) where circular tissues had a reported size > 1mm. As shown in Figure 4E (and highlighted below in Author response image 1), tissues under basal conditions display regular beating rates without spontaneous arrhythmias. Videos also show that the tissue contraction is homogeneous around the pillar, suggesting that the smaller size favors the electrical propagation and limits the occurrence of spontaneous reentrant waves. Optical mapping measurements will be performed in the future to assess the occurrence of reentrant waves.

      **Author response image 1. **

      Poincaré plot showing the plots between successive RR intervals (Data from Figure 4E in basal conditions). Linear regression with 95% confidence interval indicates identity.

      2) The platform can produce 21 cardiac rings per well in 96-well plates. The throughput has been the highest among competing platforms. The resulting tissues have good sarcomere striation due to the strain from the pillars. Now the emerging questions are culture longevity and reproducibility among tissues. According to Figure 1E, there was uneven ring formation around the pillar, which leads to the tissue thinning and breaking off. There is only 50% survival after 20 days of culture in the optimized seeding group. Is there any way to improve it? The tissues had two compartments, cardiac and fibroblast-rich regions, where fibroblasts are responsible for maintaining the attachment to the glass slides. Do the cardiac rings detach from the glass slides and roll up? The SD of the force measurement is a quarter of the value, which is not ideal with such a high replicate number. As the platform utilizes imaging analysis to derive contractile dynamics, calibration should be done based on the angle and the distance of the camera lens to the individual tissues to reduce the error. On the other hand, how reproducible of the pillars? It is highly recommended to mechanically evaluate the consistency of the hydrogel-based pillars across different wells and within the wells to understand the variance. Figure 2B reports the early results obtained as the system was tested and developed. Since then, we have tested different iPSC lines and confirm that the overall yield is higher (up to 20 tissues at D14 for some cell lines), however dependent of cell lines.

      The tissues do not detach from the glass slides. It is very rare to see tissues roll up on the central pillar. As shown in Figure 1B, the pillars have a specific shape to avoid tissues to roll up as they develop and contract.

      3) Does the platform allow the observation of non-synchronized beating when testing with compounds? This can be extremely important as the intended applications of this platform are drug testing and cardiac disease modeling. The author should elaborate on the method in the manuscript and explain the obtained results in detail. The arrhythmogenic effect of a drug can be derived from the regularity of the beat-to-beat time. Indeed, we show that dofetilide increases the variability in the beat-to-beat time by plotting for each beat, the beat-to-beat time with the next beat as a function of the beat-to-beat time with the previous beat.

      4) The results of drug testing are interesting. Isoproterenol is typically causing positive chronotropic and positive inotropic responses, where inotropic responses are difficult to obtain due to low tissue maturity. It is inconsistent with other reported results that cardiac rings do not exhibit increased beating frequency, but slightly increased forces only. Zhao et al were using electrical pacing at a defined rate during force measurement, whereas the ring constructs are not.

      We agree. The difference in the response to isoproterenol with previous papers may be explained by different incubation timing with the drug. In our case, the tissues were incubated for 5 minutes at 37•C before being recorded.

      Overall, the manuscript is well written and the designed platform presented the unique advantages of high throughput cardiac tissue culture. Besides the contractile dynamics and IHC images, the paper lacks other cardiac functional evaluations, such as calcium handling, impulse propagation, and/or electrophysiology. The culture reproducibility (high SD) and longevity (<20 days) still remain unsolved.

      Since the submission, we have managed to keep some tissues and analyze them up to 32 days. At that time point the tissues are still beating. Nevertheless, a specific study concerning tissue longevity has not been carried out as the tissues were usually fixed after 14 days to be stained and analyze their structure.

      Reviewer #2 (Public Review):

      The authors should be commended for developing a high throughput platform for the formation and study of human cardiac tissues, and for discussing its potential, advantages and limitations. The study is addressing some of the key needs in the use of engineered cardiac tissues for pharmacological studies: ease of use, reproducible preparation of tissues, and high throughput.

      There are also some areas where the manuscript should be improved. The design of the platform and the experimental design should be described in more detail.

      It would be of interest to comprehensively document the progression of tissue formation. To this end, it would be helpful to show the changes in tissue structure through a series of images that would correspond to the progression of contractile properties shown in Figure 3.

      Our results indicate that the fibroblasts/cardiomyocytes segregation likely happens as soon as the tissue is formed, as the fibroblasts are critical for tissue generation. The change with time in the shape of the contractile ring is reported in Figure 1E, with a series of images which correspond to the timepoints of Figure 3.

      The very interesting tissue morphology (separation into the two regions) that was observed in this study is inviting more discussion.

      Finally, the reader would benefit from more specific comparisons of the contractile function of cardiac tissues measured in this study with data reported for other cardiac tissue models.

    1. Author Response

      We thank the reviewers for truly valuable advice and comments. We have made multiple corrections and revisions to the original pre-print accordingly. Here we address 2 major points.

      1) Regarding the genetic association of the common COL11A1 variant rs3753841 (p.(Pro1335Leu)), we do not propose that it is the sole risk variant contributing to the association signal we detected and have clarified this in the manuscript. We concluded that it was worthy of functional testing for reasons described here. Although there were several common variants in the discovery GWAS within and around COL11A1, none were significantly associated with AIS and none were in linkage disequilibrium (R2>0.6) with the top SNP rs3753841. We next reviewed rare (MAF<=0.01) coding variants within the COL11A1 LD region of the associated SNP (rs3753841) in 625 available exomes representing 46% of the 1,358 cases from the discovery cohort. The LD block was defined using Haploview based on the 1KG_CEU population. Within the ~41 KB LD region (chr1:103365089- 103406616, GRCh37) we found three rare missense mutations in 6 unrelated individuals, Author response table 1. Two of them (NM_080629.2: c.G4093A:p.A1365T; NM_080629.2:c.G3394A:p.G1132S), from two individuals, are predicted to be deleterious based on CADD and GERP scores and are plausible AIS risk candidates. At this rate we could expect to find only 4-5 individuals with linked rare coding variants in the total cohort of 1,358 which collectively are unlikely to explain the overall association signal we detected. Of course, there also could be deep intronic variants contributing to the association that we would not detect by our methods. However, given this scenario, the relatively high predicted deleteriousness of rs3753841 (CADD= 25.7; GERP=5.75), and its occurrence in a Gly-X-Y triplet repeat, we hypothesized that this variant itself could be a risk allele worthy of further investigation.

      Author response table 1.

      We also appreciate the reviewer’s suggestion to perform a rare variant burden analysis of COL11A1. We conducted pilot gene-based analysis in 4534 European ancestry exomes including 797 of our own AIS cases and 3737 controls and tested the burden of rare variants in COL11A1. SKATO P value was not significant (COL11A1_P=0.18) but this could due to lack of power and/or background from rare benign variants that could be screened out using the functional testing we have developed.

      2) Regarding functional testing, by knockdown/knockout cell culture experiments, we showed for the first time that Col11a1 negatively regulates Mmp3 expression in cartilage chondrocytes, an AIS-relevant tissue. We then tested the effect of overexpressing the human wt or variant COL11A1 by lentiviral transduction in SV40-transformed chondrocyte cultures. We deleted endogenous mouse Col11a1 by Cre recombination to remove the background of its strong suppressive effects on Mmp3 expression. We acknowledge that Col11a1 missense mutations could confer gain of function or dominant negative effects that would not be revealed in this assay. However as indicated in our original manuscript we have noted that spinal deformity is described in the cho/cho mouse, a Col11a1 loss of function mutant. We also note the recent publication by Rebello et al. showing that missense mutations in Col11a2 associated with congenital scoliosis fail to rescue a vertebral malformation phenotype in a zebrafish col11a2 KO line. Although the connection between AIS and vertebral malformations is not altogether clear, we surmise that loss of the components of collagen type XI disrupt spinal development. in vivo experiments in vertebrate model systems are needed to fully establish the consequences and genetic mechanisms by which COL11A1 variants contribute to an AIS phenotype.

    1. Reviewer #3 (Public review):

      Summary:

      The authors examine the role of the medial frontal cortex of mice in exploiting statistical structure in tasks. They claim that mice are "proactive": they predict upcoming changes, rather than responding in a "model-free" way to environmental changes. Further, they speculate that the estimation of future change (i.e., prediction of upcoming events, based on learning temporal regularities) might be "a main ... function of dorsal medial frontal cortex (dmFC)." Unfortunately, the current manuscript contains flaws such that the evidence supporting these claims is inadequate.

      Strengths:

      Understanding the neural mechanisms by which we learn about statistical structure in the world is an important goal. The authors developed an interesting task and used model-based techniques to try to understand the mechanisms by which perturbation of dmFC influenced behavior. They demonstrate that lesions and optogenetic silencing of dmFC influence behavior, showing that this region has a causal influence on the task.

      Weaknesses:

      I was concerned that the main behavioral effects shown in Figure 1F were a statistical artifact. By requiring the Geometric block length to be preceded by a performance-based block, the authors introduce a dependence that can generate the phenomena they describe as anticipation.

      To demonstrate this, I simulated their task with an agent that does not have any anticipation of the change point (Reviewer image 1). The agent repeats the previous action with probability `p(repeat)` (similar to the choice kernel in the author's models). If the agent doesn't repeat then the next choice depends on the previous outcome. If the previous choice was rewarded, it stays with `P(WS)` and chooses randomly with `1-P(WS)`. If the previous choice was unrewarded, it switches with `P(LS)` and chooses randomly with `1-P(LS)`.

      Review image 1.

      An agent with `P(WS)=P(LS)=P(repeat)=0.85` shows the same phenomena as the mice: a difference in performance before the block switch and "earlier" crossing of the midpoint after the switch. https://imgdrop.io/image/aHn6y. The phenomena go away in the simulations when a fixed block length of 20 trials is followed by a Geometric block length.

      The authors did not completely rely on the phenomena of Figure 1F for their conclusions. They did a model comparison to provide evidence that animals are anticipating the switch. Unfortunately, the authors did not use state-of-the-art methods in this section of the paper. In particular, they failed to show that under a range of generative parameters for each model class, the model selection process chooses the correct model class (i.e. a confusion matrix). A more minor point, they used BIC instead of a more robust cross-validated metric for model selection. Finally, instead of comparing their "best" anticipating model to their 2nd best model (without anticipation), they compared their best to their 4th best (Supp Fig 3.5). This seems misleading.

      Given all of the the above issues, it is hard to critically evaluate the model-based analysis of the effects of lesions/optogenetics.

    1. Author response:

      We thank the reviewers for their thoughtful criticisms.  This provisional response addresses what we consider the central critiques, with a full, point-by-point reply to follow with the revised manuscript.  Central critiques concern 1) providing further clarity about the apportionment cost of time, 2) generality & scope, and 3) clarifying the meaning of key equations.

      (1) Apportionment cost

      Reviewers commonly identified a need to provide a concise and intuitive definition of apportionment cost, and to explicitly solve and provide for its mathematical expression. 

      We will add the following definition of apportionment cost to the manuscript: “Apportionment cost is the difference in reward that can be expected, on average, between a policy of taking versus a policy of not taking the considered pursuit, over a time equal to its duration.”  While this difference is the apportionment cost of time, the amount that would be expected over a time equal to the considered pursuit under a policy of not taking the considered pursuit is the opportunity cost of time.  Together, they sum to Time’s Cost.  The above definition of apportionment cost adds to other stated relationships of apportionment cost found throughout the paper (Lines 434,435,447,450). 

      As suggested, we will also add equations of apportionment cost, as below.

      (2) Generality & Scope

      Generality. We will add further examples in support of the generality of these equations for assessing and thinking about the value of initiating a pursuit.  Specifically, this will include 1) illustrating forgo decision making in a world composed of multiple pursuits, as in prey selection, 2) demonstrating and examining worlds in which a sequence of pursuits compose a considered pursuit’s ‘outside’, and 3) clarifying how our framework does contend with variance and uncertainty in reward magnitude and occurrence.

      Scope. In this manuscript, we consider the worth of initiating one or another pursuit having completed a prior one, and not the issue of continuing within a pursuit having already engaged in it.  The worth of continuing a pursuit, as in patch-foraging/give-up tasks, constitutes a third fundamental time decision-making topology which is outside the scope of the current work.  It engages a large and important literature, encompassing evidence accumulation, and requires a paper in its own right that can use the concepts and framework developed here.  We will further consider applying this framework to extant datasets.

      (3) Correction of typographical errors and further explanation of equations.   

      We would like to redress the two typographical errors identified by the reviewers that appeared in the equations on line 277 and on line 306, and provide further explanation to equations that gave pause to the reviewers.

      Typographical errors: 

      The first typographical error in the main text regards equation 2 and will be corrected so that equation 2 appears correctly as…

      Line 277:  

      The second typo regards the definition of the considered pursuit’s reward rate, and will be corrected to appear as…

      Line 306:   

      Regarding equations:

      Cross-reference to equations in the main text refer to equations as they appear in the main text.  Where needed, the appendix in which they are derived is also given.   Equation numbering within the appendices refer to equations as they appear in the appendices.  In the revision, we will refer to all equations that appear in the appendices as Ap.#.#. so as to avoid confusion between referencing equations as they appear in the main text and equations as they appear in the appendices.  

      We would also like to clarify that equation 8, , as we derive, is not new, as it is similarly derived and expressed in prior foundational work by McNamara (1982), which is now so properly attributed. 

      Equation 1 and Appendix 1

      Equation 1 is formulated to calculate the average reward received and average time spent per unit time spent in the default pursuit. So, fi is the encounter rate of pursuit  for one unit of time spent in the default pursuit (lines 259-262). Added to the summation in the numerator, we have the average reward obtained in the default pursuit per unit time and in the denominator we have the time spent in the default pursuit per unit time (1).

      Equation 2 and Appendix 2

      Eq. 2.4 in Appendix 2 calculates the average time spent outside of the considered pursuit, per encounter with the considered pursuit. Breaking down eq. 2.4, the first term in the numerator,

      gives the expected time spent in other pursuits, per unit time spent in the default pursuit, where fi is the encounter rate of pursuit  per unit time spent in the default pursuit, and  is the time required by pursuit i. The second term in the numerator, (1, added outside the summation) simply represents the unit of time spent in the default pursuit, over which the encounter rate of each pursuit is calculated. Together, these represent the total time spent outside the considered pursuit, per unit time spent in the default pursuit. The denominator,

      is the frequency with which the considered pursuit is encountered per unit time spent in the default pursuit, so

      is the average time spent within the default pursuit, per encounter with the considered pursuit. By multiplying the average time spent outside of the considered pursuit per unit time spent in the default pursuit by the average time spent within the default pursuit per encounter with the considered pursuit, we get eq. 2.4, the average time spent outside of the considered pursuit, per encounter with the considered pursuit, which is equal to tout.

                             (eq. 2.4)

    1. Author response:

      To Reviewer #1:

      Thank you for your thorough review and comments on our work, which you described as “the role of neuritin in T cell biology studied here is new and interesting.”.  We have summarized your comments into two categories: biology and investigation approach, experimental rigor, and data presentation.

      Biology and Investigation approach comments:

      (1) Questions regarding the T cell anergy model:

      Major point “(4) Figure 1E-H. The authors assume that this immunization protocol induces anergic cells, but they provide no experimental evidence for this. It would be useful to show that T cells are indeed anergic in this model, especially those that are OVA-specific. The lack of IL-2 production by Cltr cells could be explained by the presence of fewer OVA-specific cells, rather than by an anergic status.”

      T cell anergy is a well-established concept first described by Schwartz’s group. It refers to the hyporesponsive T cell functional state in antigen-experienced CD4 T cells (Chappert and Schwartz, 2010; Fathman and Lineberry, 2007; Jenkins and Schwartz, 1987; Quill and Schwartz, 1987).  Anergic T cells are characterized by their inability to expand and to produce IL2 upon subsequent antigen re-challenge. In this paper, we have borrowed the existing in vivo T cell anergy induction model used by Mueller’s group for T cell anergy induction (Vanasek et al., 2006).  Specifically, Thy1.1+ Ctrl or Nrn1-/- TCR transgenic OTII cells were co-transferred with the congenically marked Thy1.2+ WT polyclonal Treg cells into TCR-/- mice.  After anergy induction, the congenically marked TCR transgenic T cells were recovered by sorting based on Thy1.1+ congenic marker, and subsequently re-stimulation ex vivo with OVA323-339 peptide. We evaluated the T cell anergic state based on OTII cell expansion in vivo and IL2 production upon OVA323-339 restimulation ex vivo.  

      “The authors assume that this immunization protocol induces anergic cells, but they provide no experimental evidence for this.”

      Because the anergy model by Mueller's group is well established (Vanasek et al., 2006), we did not feel that additional effort was required to validate this model as the reviewer suggested. Moreover, the limited IL2 production among the control cells upon restimulation confirms the validity of this model.

      “The lack of IL-2 production by Cltr cells could be explained by the presence of fewer OVAspecific cells, rather than by an anergic status”.

      Cells from Ctrl and Nrn1-/- mice on a homogeneous TCR transgenic (OTII) background were used in these experiments. The possibility that substantial variability of TCR expression or different expression levels of the transgenic TCR could have impacted IL2 production rather than anergy induction is unlikely.

      Overall, we used this in vivo anergy model to evaluate the Nrn1-/- T cell functional state in comparison to Ctrl cells under the anergy induction condition following the evaluation of Nrn1 expression, particularly in anergic T cells.  Through studies using this anergy model, we observed a significant change in Treg induction among OTII cells. We decided to pursue the role of Nrn1 in Treg cell development and function rather than the biology of T cell anergy as evidenced by subsequent experiments.

      Minor points “(6) On which markers are anergic cells sorted for RNAseq analysis?”

      Cells were sorted out based on their congenic marker marking Ctrl or Nrn1-/- OTII cells transferred into the host mice.  We did not specifically isolate anergic cells for sequencing.

      (2) Question regarding the validity of iTreg differentiation model.

      Major point: “(5) Figure 2A-C and Figure 3. The use of iTregs to try to understand what is happening in vivo is problematic. iTregs are cells that have probably no equivalent in vivo, and so may have no physiological relevance. In any case, they are different from pTreg cells generated in vivo. Working with pTreg may be challenging, that is why I would suggest generating data with purified nTreg. Moreover, it was shown in the article of Gonzalez-Figueroa 2021 that Nrn1-/- nTreg retained a normal suppressive function, which would not be what is concluded by the authors of this manuscript. Moreover, we do not even know what the % of Foxp3 cells is in the iTreg used (after differentiation and 20h of re-stimulation) and whether this % is the same between Ctlr and Nrn1 KO cells.”.

      We thank Reviewer #1 for their feedback. While it is true that iTregs made in vitro and in vivo generated pTregs display several distinctions (e. g., differences in Foxp3 expression stability, for example), we strongly disagree with this statement by Revieweer#1 “The use of iTregs to try to understand what is happening in vivo is problematic. iTregs are cells that have probably no equivalent in vivo, and so may have no physiological relevance.” The induced Treg cell (iTreg) model was established over 20 years ago (Chen et al., 2003; Zheng et al., 2002), and the model is widely adopted with over 2000 citations. Further, it has been instrumental in understanding different aspects of regulatory T cell biology (Hurrell et al., 2022; John et al., 2022; Schmitt and Williams, 2013; Sugiura et al., 2022).   

      Because we have observed reduced pTreg generation in vivo, we choose to use the in vitro iTreg model system to understand the mechanistic changes involved in Treg cell differentiation and function, specifically, neuritin’s role in this process. We have made no claim that iTreg cell biology is identical to pTreg generated in vivo or nTreg cells. However, the iTreg culture system has proved to be a good in vitro system for deciphering molecular events involved in complex processes. As such, it remains a commonly used approach by many research groups in the Treg cell field (Hurrell et al., 2022; John et al., 2022; Sugiura et al., 2022). Moreover, applying the iTreg in vitro culture system has been instrumental in helping us identify the cell electrical state change in Nrn1-/- CD4 cells and revealed the biological link between Nrn1 and the ionotropic AMPA receptor (AMPAR), which we will discuss in the subsequent discussion. It is technically challenging to use nTreg cells for T cell electrical state studies due to their heterogeneous nature from development in an in vivo environment and the effect of manipulation during the nTreg cell isolation process, which can both affect the T cell electrical state.   

      “Moreover, it was shown in the article of Gonzalez-Figueroa 2021 that Nrn1-/- nTreg retained a normal suppressive function, which would not be what is concluded by the authors of this manuscript.” 

      We have also carried out nTreg studies in vitro in addition to iTreg cells. Similar to Gonzalez-Figueroa et al.'s findings, we did not observe differences in suppression function between Nrn1-/- and WT nTreg using the in vitro suppression assay. However, Nrn1-/- nTreg cells revealed reduced suppression function in vivo (Fig. 2D-L). In fact, Gonzalez-Figueroa et al. observed reduced plasma cell formation after OVA immunization in Treg-specific Nrn1-/- mice, implicating reduced suppression from Nrn1-/- follicular regulatory T (Tfr) cells. Thus, our observation of the reduced suppression function of Nrn1-/- nTreg toward effector T cell expansion, as presented in Fig. 2D-L, does not contradict the results from Gonzalez-Figueroa et al. Rather, the conclusions of these two studies agree that Nrn1 can play important roles in immune suppression observable in vivo that are not captured readily by the in vitro suppression assay.

      “Moreover, we do not even know what the % of Foxp3 cells is in the iTreg used (after differentiation and 20h of re-stimulation) and whether this % is the same between Ctlr and Nrn1 KO cells.”

      We have stated in the manuscript on page 7 line 208 that “Similar proportions of Foxp3+ cells were observed in Nrn1-/- and Ctrl cells under the iTreg culture condition, suggesting that Nrn1 deficiency does not significantly impact Foxp3+ cell differentiation”. In the revised manuscript, we will include the data on the proportion of Foxp3+ cells before iTreg restimulation.

      (3) Confirmation of transcriptomic data regarding amino acids or electrolytes transport change

      Minor point“(3) Would not it be possible to perform experiments showing the ability of cells to transport amino acids or electrolytes across the plasma membrane? This would be a more interesting demonstration than transcriptomic data.”

      We appreciate Review# 1’s suggestion regarding “perform experiments showing the ability of cells to transport amino acids or electrolytes across the plasma membrane”.  We have indeed already performed such experiments corroborating the transcriptomics data on differential amino acid and nutrient transporter expression. Specifically, we loaded either iTreg or Th0 cells with membrane potential (MP) dye and measured MP level change after adding the complete set of amino acids (complete AA).  Upon entry, the charge carried by AAs may transiently affect cell membrane potential. Different AA transporter expression patterns may show different MP change patterns upon AA entry, as we showed in Author response image 1. We observed reduced MP change in Nrn1-/- iTreg compared to the Ctrl, whereas in the context of Th0 cells, Nrn1-/- showed enhanced MP change than the Ctrl. We can certainly include these data in the revised manuscript.

      Author response image 1.

      Membrane potential change induced by amino acids entry. a. Nrn1-/- or WT iTreg cells loaded with MP dye and MP change was measured upon the addition of a complete set of AAs. b. Nrn1-/- or WT Th0 cells loaded with MP dye and MP change was measured upon the addition of a complete set of AAs.

      (4) EAE experiment data assessment

      Minor point ”(5) Figure 5F. How are cells re-stimulated? If polyclonal stimulation is used, the experiment is not interesting because the analysis is done with lymph node cells. This analysis should either be performed with cells from the CNS or with MOG restimulation with lymph node cells.”

      In the EAE study, the Nrn1-/- mice exhibit similar disease onset but a protracted non-resolving disease phenotype compared to the WT control mice.  Several reasons may contribute to this phenotype: 1. Enhanced T effector cell infiltration/persistence in the central nervous system (CNS); 2. Reduced Treg cell-mediated suppression to the T effector cells in the CNS; 3. Protracted non-resolving inflammation at the immunization site has the potential to continue sending T effector cells into CNS, contributing to persistent inflammation. Based on this reasoning, we examined the infiltrating T effector cell number and Treg cell proportion in the CNS.  We also restimulated cells from draining lymph nodes close to the inflammation site, looking for evidence of persistent inflammation.  When mice were harvested around day 16 after immunization, the inflammation at the local draining lymph node should be at the contraction stage.  We stimulated cells with PMA and ionomycin intended to observe all potential T effector cells involved in the draining lymph node rather than only MOG antigen-specific cells.  We disagree with Reviewer #1’s assumption that “This analysis should either be performed with cells from the CNS or with MOG restimulation with lymph node cells.”. We think the experimental approach we have taken has been appropriately tailored to the biological questions we intended to answer.

      Experimental rigor and data presentation.

      (1) Data labeling and additional supporting data

      Major points (2) The authors use Nrn1+/+ and Nrn1+/- cells indiscriminately as control cells on the basis of similar biology between Nrn1+/+ and Nrn1+/- cells at homeostasis. However, it is quite possible that the Nrn1+/- cells have a phenotype in situations of in vitro activation or in vivo inflammation (cancer, EAE). It would be important to discriminate Nrn1+/- and Nrn1+/+ cells in the data or to show that both cell types have the same phenotype in these conditions too.

      (3) Figure 1A-D. Since the authors are using the Nrp1 KO mice, it would be important to confirm the specificity of the anti-Nrn1 mAb by FACS. Once verified, it would be important to add FACS results with this mAb in Figures 1A-C to have single-cell and quantitative data as well.

      Minor points  

      (1) Line 119, 120 of the text. It is said that one of the most up-regulated genes in anergic cells is Nrn1 but the data is not shown.

      (2) For all figures showing %, the titles of the Y axes are written in an odd way. For example, it is written "Foxp3% CD4". It would be more conventional and clearer to write "% Foxp3+ / CD4+" or "% Foxp3+ among CD4+".

      (4) For certain staining (Figure 3E, H) it would be important to show the raw data, in addition to MFI or % values.

      We can adapt the labeling and provide additional data, including Nrn1 staining on Treg cells and flow graphs for pmTOR and pS6 staining (Fig. 3H), as requested by Reviewer #1.

      (2) Experimental rigor:

      General comments:

      “However, it is disappointing that reading this manuscript leaves an impression of incomplete work done too quickly.”

      We were discouraged to receive the comment, “this manuscript leaves an impression of incomplete work done too quickly.” Our study of this novel molecule began without any existing biological tools such as antibodies, knockout mice, etc.  Over the past several years, we have established our own antibodies for Nrn1 detection, obtained and characterized Nrn1 knockout mice, and utilized multiple approaches to identify the molecular mechanism of Nrn1 function. Through the use of the in vitro iTreg system described in this manuscript, we identified the association of Nrn1 deficiency with cell electrical state change, potentially connected to AMPAR function. We have further corroborated our findings by generating Nrn1 and AMPAR T cell specific double knockout mice and confirmed that T cell specific AMPAR deletion could abrogate the phenotype caused by the Nrn1 deficiency (see Author response image 2).  We did not include the double knockout data in the current manuscript because AMPAR function has not yet been studied thoroughly in T cell biology, and we feel this topic warrants examination in its own right.  However, the unpublished data support the finding that Nrn1 modulates the T cell electrical state and, consequently, metabolism, ultimately influencing tolerance and immunity.  In its current form, the manuscript represents the first characterization of the novel molecule Nrn1 in anergic cells, Tregs, and effector T cells. While this work has led to several exciting additional questions, we disagree that the novel characterization we have presented Is incomplete. We feel that our present data set, which squarely highlights Nrn1’s role as an important immune regulator while shedding unprecedented light on the molecular events involved, will be of considerable interest to a broad field of researchers.

      “Multiple models have been used, but none has been studied thoroughly enough to provide really conclusive and unambiguous data. For example, 5 different models were used to study T cells in vivo. It would have been preferable to use fewer, but to go further in the study of mechanisms.”

      We have indeed used multiple in vivo models to reveal Nrn1's function in Treg differentiation, Treg suppression function, T effector cell differentiation and function, and the overall impact on autoimmune disease. Because the impact of ion channel function is often context-dependent, we examined the biological outcome of Nrn1 deficiency in several in vivo contexts.  We would appreciate it if Reviewer#1 would provide a specific example, given the Nrn1 phenotype, of how to proceed deeper to investigate the electrical change in the in vivo models.

      “Major points (1) A real weakness of this work is the fact that in most of the results shown, there are few biological replicates with differences that are often small between Ctrl and Nrn1 -/-. The systematic use of student's t-test may lead to thinking that the differences are significant, which is often misleading given the small number of samples, which makes it impossible to know whether the distributions are Gaussian and whether a parametric test can be used. RNAseq bulk data are based on biological duplicates, which is open to criticism.”

      We respectfully disagree with Reviewer #1 on the question of statistical power and significance to our work. We have used 5-8 mice/group for each in vivo model and 3-4 technical replicates for the in vitro studies, with a minimum of 2-3 replicate experiments. These group sizes and replication numbers are in line with those seen in high-impact publications. While some differences between Ctrl and Nrn1-/- appear small, they have significant biological consequences, as evidenced by the various Nrn1-/- in vivo phenotypes. Furthermore, we believe we have subjected our data to the appropriate statistical tests to ensure rigorous analysis and representation of our findings.

      To Reviewer #2.

      We thank Reviewer #2 for the careful review of the manuscript. We especially appreciate the comments that “The characterizations of T cell Nrn1 expression both in vitro and in vivo are comprehensive and convincing. The in vivo functional studies of anergy development, Treg suppression, and EAE development are also well done to strengthen the notion that Nrn1 is an important regulator of CD4 responsiveness.”

      “The major weakness of this study stems from a lack of a clear molecular mechanism involving Nrn1. “  

      We fully understand this comment from Reviewer #2. The main mechanism we identified contributing to the functional defect of Nrn1-/- T cells involves novel effects on the electric and metabolic state of the cells. Although we referenced neuronal studies that indicate Nrn1 is the auxiliary protein for the ionotropic AMPA-type glutamate receptor (AMPAR) and may affect AMPAR function, we did not provide any evidence in this manuscript as the topic requires further in-depth study.   

      For the benefit of this discussion, we include our preliminary Nrn1 and AMPAR double knockout data (Author response image 2), which indicates that abrogating AMPAR expression can compensate for the defect caused by Nrn1 deficiency in vitro and in vivo. This preliminary data supports the notion that Nrn1 modulates AMPAR function, which causes changes in T cell electric and metabolic state, influencing T cell differentiation and function.  

      Author response image 2.

      Deletion of AMPAR expression in T cells compensates for the defect caused by Nrn1 deficiency. Nrn1-/- mice were crossed with T cell-specific AMPAR knockout mice (AMPARfl/flCD4Cre+) mice. The following mice were generated and used in the experiment: T cell specific AMPAR-knockout and Nrn1 knockout mice (AKONKO), Nrn1 knockout mice (AWTNKO), Ctrl mice (AWTNWT). a. Deletion of AMPAR compensates for the iTreg cell defect observed in Nrn1-/- CD4 cells. iTreg live cell proportion, cell number, and Ki67 expression among Foxp3+ cells 3 days after aCD3 restimulation. b. Deletion of AMPAR in T cells abrogates the enhanced autoimmune response in Nrn1-/- Mouse in the EAE disease model. Mouse relative weight change and disease score progression after EAE disease induction.  

      Ion channels can influence cell metabolism through multiple means (Vaeth and Feske, 2018; Wang et al., 2020). First, ion channels are involved in maintaining cell resting membrane potential. This electrical potential difference across the cell membrane is essential for various cellular processes, including metabolism (Abdul Kadir et al., 2018; Blackiston et al., 2009; Nagy et al., 2018; Yu et al., 2022). Second, ion channels facilitate the movement of ions across cell membranes. These ions are essential for various metabolic processes. For example, ions like calcium (Ca2+), potassium (K+), and sodium (Na+) play crucial roles in signaling pathways that regulate metabolism (Kahlfuss et al., 2020). Third, ion channel activity can influence cellular energy balance due to ATP consumption associated with ion transport to maintain ion balances (Erecińska and Dagani, 1990; Gerkau et al., 2019). This, in turn, can impact processes like ATP production, which is central to cellular metabolism. Thus, ion channel expression and function determine the cell’s bioelectric state and contribute to cell metabolism (Levin, 2021).

      Because the AMPAR function has not been thoroughly studied using a genetic approach in T cells, we do not intend to include the double knockout data in this manuscript before fully characterizing the T cell-specific AMPAR knockout mice.  

      “Although the biochemical and informatics studies are well-performed, it is my opinion that these results are inconclusive in part due to the absence of key "naive" control groups. This limits my ability to understand the significance of these data.

      Specifically, studies of the electrical and metabolic state of Nrn1-/- inducible Treg cells (iTregs) would benefit from similar data collected from wild-type and Nrn1-/- naive CD4 T cells.”

      We appreciate the reviewer’s comments. This comment reflects two concerns in data interpretation:

      (1) Are Nrn1-/- naïve T cells fundamentally different from WT cells? Does this fundamental difference contribute to the observed electrical and metabolic phenotype in iTreg or Th0 cells? This is a very good question we will perform the experiments as the reviewer suggested. While Nrn1 is expressed at a basal (low) level in naïve T cells, deletion of Nrn1 may cause changes in naïve T cell phenotype.   

      (2) Is the Nrn1-/- phenotype caused by Nrn1 functional deficiency or due to the secondary effect of Nrn1 deletion, such as non-physiological cell membrane structure changes?

      We have done the following experiment to address this concern.  We have cultured WT T cells in the presence of Nrn1 antibody and compared the outcome with Nrn1-/- iTreg cells (Author response image 3). WT iTreg cells under antibody blockade exhibited similar changes as Nrn1-/- iTreg cells, confirming the physiological relevance of the Nrn1-/- phenotype.

      Author response image 3.

      Nrn1 antibody blockade in WT iTreg cell culture caused similar phenotypic change as in Nrn1-/- iTreg cells. Nrn1-/- and WT CD4 cells were differentiated under iTreg condition in the presence of anti-Nrn1 (aNrn1) antibody or isotype control for 3 days. Cells were restimulated with anti-CD3 and in the presence of aNrn1 or isotype. a. MP measured 18hr after anti-CD3 restimulation. b. live CD4 cell number and proportion of Ki67 expression among live cells three days after restimulation. c. The proportion of Foxp3+ cells among live cells three days after restimulation.  

      Reference:

      Abdul Kadir, L., M. Stacey, and R. Barrett-Jolley. 2018. Emerging Roles of the Membrane Potential: Action Beyond the Action Potential. Front Physiol 9:1661.

      Blackiston, D.J., K.A. McLaughlin, and M. Levin. 2009. Bioelectric controls of cell proliferation: ion channels, membrane voltage and the cell cycle. Cell Cycle 8:3527-3536.

      Chappert, P., and R.H. Schwartz. 2010. Induction of T cell anergy: integration of environmental cues and infectious tolerance. Current opinion in immunology 22:552-559.

      Chen, W., W. Jin, N. Hardegen, K.J. Lei, L. Li, N. Marinos, G. McGrady, and S.M. Wahl. 2003. Conversion of peripheral CD4+CD25- naive T cells to CD4+CD25+ regulatory T cells by TGF-beta induction of transcription factor Foxp3. The Journal of experimental medicine 198:1875-1886.

      Erecińska, M., and F. Dagani. 1990. Relationships between the neuronal sodium/potassium pump and energy metabolism. Effects of K+, Na+, and adenosine triphosphate in isolated brain synaptosomes. J Gen Physiol 95:591-616.

      Fathman, C.G., and N.B. Lineberry. 2007. Molecular mechanisms of CD4+ T-cell anergy. Nat Rev Immunol 7:599-609.

      Gerkau, N.J., R. Lerchundi, J.S.E. Nelson, M. Lantermann, J. Meyer, J. Hirrlinger, and C.R. Rose. 2019. Relation between activity-induced intracellular sodium transients and ATP dynamics in mouse hippocampal neurons. The Journal of physiology 597:5687-5705.

      Hurrell, B.P., D.G. Helou, E. Howard, J.D. Painter, P. Shafiei-Jahani, A.H. Sharpe, and O. Akbari. 2022. PD-L2 controls peripherally induced regulatory T cells by maintaining metabolic activity and Foxp3 stability. Nature communications 13:5118.

      Jenkins, M.K., and R.H. Schwartz. 1987. Antigen presentation by chemically modified splenocytes induces antigen-specific T cell unresponsiveness in vitro and in vivo. The Journal of experimental medicine 165:302-319.

      John, P., M.C. Pulanco, P.M. Galbo, Jr., Y. Wei, K.C. Ohaegbulam, D. Zheng, and X. Zang. 2022. The immune checkpoint B7x expands tumor-infiltrating Tregs and promotes resistance to anti-CTLA-4 therapy. Nature communications 13:2506.

      Kahlfuss, S., U. Kaufmann, A.R. Concepcion, L. Noyer, D. Raphael, M. Vaeth, J. Yang, P. Pancholi, M. Maus, J. Muller, L. Kozhaya, A. Khodadadi-Jamayran, Z. Sun, P. Shaw, D. Unutmaz, P.B. Stathopulos, C. Feist, S.B. Cameron, S.E. Turvey, and S. Feske. 2020. STIM1-mediated calcium influx controls antifungal immunity and the metabolic function of nonpathogenic Th17 cells. EMBO molecular medicine 12:e11592.

      Levin, M. 2021. Bioelectric signaling: Reprogrammable circuits underlying embryogenesis, regeneration, and cancer. Cell 184:1971-1989.

      Nagy, E., G. Mocsar, V. Sebestyen, J. Volko, F. Papp, K. Toth, S. Damjanovich, G. Panyi, T.A. Waldmann, A. Bodnar, and G. Vamosi. 2018. Membrane Potential Distinctly Modulates Mobility and Signaling of IL-2 and IL-15 Receptors in T Cells. Biophys J 114:2473-2482.

      Quill, H., and R.H. Schwartz. 1987. Stimulation of normal inducer T cell clones with antigen presented by purified Ia molecules in planar lipid membranes: specific induction of a long-lived state of proliferative nonresponsiveness. Journal of immunology (Baltimore, Md. : 1950) 138:3704-3712.

      Schmitt, E.G., and C.B. Williams. 2013. Generation and function of induced regulatory T cells. Frontiers in immunology 4:152.

      Sugiura, A., G. Andrejeva, K. Voss, D.R. Heintzman, X. Xu, M.Z. Madden, X. Ye, K.L. Beier, N.U. Chowdhury, M.M. Wolf, A.C. Young, D.L. Greenwood, A.E. Sewell, S.K. Shahi, S.N. Freedman, A.M. Cameron, P. Foerch, T. Bourne, J.C. Garcia-Canaveras, J. Karijolich, D.C. Newcomb, A.K. Mangalam, J.D. Rabinowitz, and J.C. Rathmell. 2022. MTHFD2 is a metabolic checkpoint controlling effector and regulatory T cell fate and function. Immunity 55:65-81.e69.

      Vaeth, M., and S. Feske. 2018. Ion channelopathies of the immune system. Current opinion in immunology 52:39-50.

      Vanasek, T.L., S.L. Nandiwada, M.K. Jenkins, and D.L. Mueller. 2006. CD25+Foxp3+ regulatory T cells facilitate CD4+ T cell clonal anergy induction during the recovery from lymphopenia. Journal of immunology (Baltimore, Md. :1950) 176:5880-5889.

      Wang, Y., A. Tao, M. Vaeth, and S. Feske. 2020. Calcium regulation of T cell metabolism. Current opinion in physiology 17:207-223.

      Yu, W., Z. Wang, X. Yu, Y. Zhao, Z. Xie, K. Zhang, Z. Chi, S. Chen, T. Xu, D. Jiang, X. Guo, M. Li, J. Zhang, H. Fang, D. Yang, Y. Guo, X. Yang, X. Zhang, Y. Wu, W. Yang, and D. Wang. 2022. Kir2.1-mediated membrane potential promotes nutrient acquisition and inflammation through regulation of nutrient transporters. Nature communications 13:3544.

      Zheng, S.G., J.D. Gray, K. Ohtsuka, S. Yamagiwa, and D.A. Horwitz. 2002. Generation ex vivo of TGF-beta-producing regulatory T cells from CD4+CD25- precursors. Journal of immunology (Baltimore, Md. : 1950) 169:4183-4189.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      Loh and colleagues investigate valence encoding in the mesolimbic dopamine system. Using an elegant approach, they show that sucrose, which normally evokes strong dopamine neuron activity and release in the nucleus accumbens, is made aversive via conditioned taste aversion, the same sucrose stimulus later evokes much less dopamine neuron activity and release. Thus, dopamine activity can dynamically track the changing valence of an unconditioned stimulus. These results are important for helping clarify valence and value related questions that are the matter of ongoing debate regarding dopamine functions in the field.

      Strengths:

      This is an elegant way to ask this question, the within subject's design and the continuity of the stimulus is a strong way to remove a lot of the common confounds that make it difficult to interpret valence-related questions. I think these are valuable studies that help tie up questions in the field while also setting up a number of interesting future directions. There are number of control experiments and tweaks to the design that help eliminate a number of competing hypotheses regarding the results. The data are clearly presented and contextualized.

      Weaknesses for consideration:

      The focus on one relatively understudied region of the rat striatum for dopamine recordings could potentially limit generalization of the findings. While this can be determined in future studies, the implications should be further discussed in the current manuscript.

      We agree that the manuscript would benefit from providing a stronger rationale for our recording sites and acknowledging the potential for regional differences in dopamine signaling. We have made the following additions to the manuscript:

      Added to the Discussion: “Recordings were targeted to the lateral VTA and the corresponding approximate terminal site in the NAc lateral shell (Lammel et al., 2008). Subregional differences in dopamine activity likely contribute to mixed findings on dopamine and affect. For example, dopamine in the NAc lateral shell differentially encodes cues predictive of rewarding sucrose and aversive footshock, which is distinct from NAc medial shell dopamine responses (de Jong et al., 2019). Our findings are similar to prior work from our group targeting recordings to the NAc dorsomedial shell (Hsu et al., 2020; McCutcheon et al., 2012; Roitman et al., 2008): there, intraoral sucrose increased NAc dopamine release while the response in the same rats to quinine was significantly lower.”

      Reviewer #2 (Public review):

      Summary:

      Koh et al. report an interesting manuscript studying dopamine binding in the lateral accumbens shell of rats across the course of conditioned taste aversion. The question being asked here is how does the dopamine system respond to aversion? The authors take advantage of unique properties of taste aversion learning (notably, within-subjects remapping of valence to the same physical stimulus) to address this.

      They combine a well controlled behavioural design (including key, unpaired controls) with fibre photometry of dopamine binding via GrabDA and of dopamine neuron activity by gCaMP, careful analyses of behaviour (e.g., head movements; home cage ingestion), the authors show that, 1) conditioned taste aversion of sucrose suppresses the activity of VTA dopamine neurons and lateral shell dopamine binding to subsequent presentations of the sucrose tastant; 2) this pattern of activity was similar to the innately aversive tastant quinine; 3) dopamine responses were negatively correlated with behavioural (inferred taste reactivity) reactivity; and 4) dopamine responses tracked the contingency of between sucrose and illness because these responses recovered across extinction of the conditioned taste aversion.

      Strengths:

      There are important strengths here. The use of a well-controlled design, the measurement of both dopamine binding and VTA dopamine neuron activity, the inclusion of an extinction manipulation; and the thorough reporting of the data. I was not especially surprised by these results, but these data are a potentially important piece of the dopamine puzzle (e.g., as the authors note, salience-based argument struggles to explain these data).

      Weaknesses for consideration:

      (1) The focus here is on the lateral shell. This is a poorly investigated region in the context of the questions being asked here. Indeed, I suspect many readers might expect a focus on the medial shell. So, I think this focus is important. But, I think it does warrant greater attention in both the introduction and discussion. We do know from past work that there can be extensive compartmentalisation of dopamine responses to appetitive and aversive events and many of the inconsistent findings in the literature can be reconciled by careful examination of where dopamine is assessed. I do think readers would benefit from acknowledgement this - for example it is entirely reasonable to suppose that the findings here may be specific to the lateral shell.

      As with our response to Reviewer 1, we agree that we should provide further rationale for focusing our recordings on the lateral shell and acknowledge potential differences in dopamine dynamics across NAc subregions. In addition to the changes in the Discussion detailed in our response to Reviewer 1, we have made the following additions to the Introduction:

      Added to the Introduction: “NAc lateral shell dopamine differentially encodes cues predictive of rewarding (i.e., sipper spout with sucrose) and aversive stimuli (i.e., footshock), which is distinct from other subregions (de Jong et al., 2019). It is important to note that other regions of the NAc may serve as hedonic hotspots (e.g. dorsomedial shell; or may more closely align with the signaling of salience (e.g. ventromedial shell; (Yuan et al., 2021)).”

      (2) Relatedly, I think readers would benefit from an explicit rationale for studying the lateral shell as well as consideration of this in the discussion. We know that there are anatomical (PMID: 17574681), functional (PMID: 10357457), and cellular (PMID: 7906426) differences between the lateral shell and the rest of the ventral striatum. Critically, we know that profiles of dopamine binding during ingestive behaviours there can be highly dissimilar to the rest of ventral striatum (PMID: 32669355). I do think these points are worth considering.

      There are several reasons why dopamine dynamics were recorded in the NAc lateral shell:

      (1) Dopamine neurons in more medial aspects of the VTA preferentially target the NAc medial shell and core whereas dopamine neurons in the lateral VTA – our target for VTA DA recordings – project to the lateral shell of the NAc (Lammel et al., 2008). Thus, our goal was to sample NAc release dynamics in areas that receive projections from our cell body recording sites.

      (2) Cues predictive of reward availability (i.e., sipper spout with sucrose) and aversive stimuli (i.e., footshock) are differentially encoded by NAc lateral shell dopamine, which is distinct from NAc ventromedial shell dopamine responses (de Jong et al., 2019). These findings suggest a role for NAc lateral shell dopamine in the encoding of a stimulus’s valence, which made the subregion an area of interest for further examination.

      (3) With respect to the medial NAc shell specifically, extensive literature had already shown it to be a ‘hedonic hotspot’ (Morales and Berridge, 2020; Yuan et al., 2021) whereas the ventral portion is more mixed with respect to valence (Yuan et al., 2021). We had previously shown that intraoral infusions of primary taste stimuli of opposing valence (i.e., sucrose and quinine) evoke differential responses in dopamine release within the NAc dorsomedial shell (Roitman et al., 2008). We more recently replicated differential dopamine responses from dopamine cell bodies in the lateral VTA (Hsu et al., 2020) and thus endeavored to the possibility of changing dopamine responses in the lateral VTA to the same stimulus as its valence changes. As a result of these choices, measuring dopamine release in the lateral shell was a logical choice. The field would greatly benefit from continued future work surveying the entirety of the VTA DA projection terminus. 

      We have included these points of justification in the Introduction and Discussion sections.

      (3) I found the data to be very thoughtfully analysed. But in places I was somewhat unsure:

      (a) Please indicate clearly in the text when photometry data show averages across trials versus when they show averages across animals.

      We have now explicitly indicated in the figure legends of Figures 1, 3, 5, 7, and 8:

      (1) In heat maps, each row represents the averaged (across rats) response on that trial.

      (2) Traces below heat maps represent the response to infusion averaged first across trials for each rat and then across all rats.

      (3) Insets represent the average z-score across the infusion period averaged first across all trials for each rat and then across all rats.

      (b) I did struggle with the correlation analyses, for two reasons.

      (i) First, the key finding here is that the dopamine response to intraoral sucrose is suppressed by taste aversion. So, this will significantly restrict the range of dopamine transients, making interpretation of the correlations difficult.

      The overall hypothesis is that the dopamine response would correlate with the valence of a taste stimulus – even and especially when the stimulus remained constant but its valence changed. We inferred valence from the behavioral reactivity to the stimulus – reasoning that an appetitive taste will evoke minimal movement of the nose and paws (presumably because the animals are primarily engaging in small mouth movements associated with ingestion as shown by the seminal work of Grill and Norgren (1978) and the many studies published by the K.C. Berridge group) whereas an aversive taste will evoke significantly more movement as the rats engage in rejection responses (e.g. forelimb flails, chin rubs, etc.). When we conducted our regression analyses we endeavored to be as transparent as possible and labeled each symbol based on group (Unpaired vs Paired) and day (Conditioning vs Test). Both behavioral reactivity and dopamine responses change – but only for the Paired rats across days. In this sense, we believe the interpretation is clear. However, the Reviewer raises an important criticism that there would essentially be a floor effect with dopamine responses. We believe this is mitigated by data acquired across extinction and especially in Figure 9B. Here, the observations that dopamine responses fall to near zero but return to pre-conditioning levels in the Paired group with strong correlation between dopamine and behavioral reactivity throughout would hopefully partially allay the Reviewer’s concerns. See Part ii below for further support.

      (ii) Second, the authors report correlations by combining data across groups/conditions. I understand why the authors have done this, but it does risk obscuring differences between the groups. So, my question is: what happens to this trend when the correlations are computed separately for each group? I suspect other readers will share the same question. I think reporting these separate correlations would be very helpful for the field -

      regardless of the outcome.

      To address this concern, we performed separate regression analyses for Paired and Unpaired rats and provide the table below to detail results where data were combined across groups or separated. Expectedly, all analyses in Paired rats indicated a significant inverse relationship between dopamine and behavioral reactivity. Afterall, it is only in this group where behavioral reactivity to the taste stimulus changes as function of conditioning. Perhaps even more striking is that in almost all comparisons, even when restricting the regression analysis to Unpaired rats, we still observed a significant inverse relationship between dopamine and behavioral reactivity in most experiments. We have outlined the separated correlations below (asterisks denote slopes significantly different from 0; * p<0.05; ** p<0.01; *** p<0.005; **** p<0.001):

      Author response table 1.

      (4) Figure 1A is not as helpful as it might be. I do think readers would expect a more precise reporting of GCaMP expression in TH+ and TH- neurons. I also note that many of the nuances in terms of compartmentalisation of dopamine signalling discussed above apply to ventral tegmental area dopamine neurons (e.g. medial v lateral) and this is worth acknowledging when interpreting t

      Others have reported (Choi et al., 2020) and quantified (Hsu et al., 2020) GCaMP6f expression in TH+ neurons. While we didn’t report these quantifications, our observations were very much in line with previous quantifications from our laboratory (Hsu et al. 2020).

      We agree that we should elaborate on VTA subregional differences and have answered this response above (See responses to Reviewer 1 Weakness #1 and Reviewer 2 Weakness #2).

      Reviewer #3 (Public review):

      Summary:

      This study helps to clarify the mixed literature on dopamine responses to aversive stimuli. While it is well accepted that dopamine in the ventral striatum increases in response to various rewarding and appetitive stimuli, aversive stimuli have been shown to evoke phasic increases or decreasing depending on the exact aversive stimuli, behavioral paradigm, and/or dopamine recording method and location examined. Here the authors use a well-designed set of experiments to show differential responses to an appetitive primary reward (sucrose) that later becomes a conditioned aversive stimulus (sucrose previously paired with lithium chloride in a conditioned taste aversion paradigm). The results are interesting and add valuable data to the question of how the mesolimbic dopamine system encodes aversive stimuli, however, the conclusions are strongly stated given that the current data do not necessarily align with prior conflicting data in terms of recording location, and it is not clear exactly how to interpret the generally biphasic dopamine response to the CTA-sucrose which also evolves over exposures within a single session.

      Strengths:

      • The authors nicely demonstrate that their two aversive stimuli examined, quinine and sucrose following CTA, evoked aversive facial expressions and paw movements that differed from those following rewarding sucrose to support that the stimuli experienced by the rats differ in valence.

      • Examined dopamine responses to the exact same sensory stimuli conditioned to have opposing valences, avoiding standard confounds of appetitive and aversive stimuli being sensed by different sensory modalities (i.e., sweet taste vs. electric shock)

      • The authors examined multiple measurements of dopamine activity - cell body calcium (GCaMP6f) in midbrain and release in NAc (Grab-DA2h), which is useful as the prior mixed literature on aversive dopamine responses comes from a variety of recording methods.

      • Correlations between sucrose preference and dopamine signals demonstrate behavioral relevance of the differential dopamine signals.

      • The delayed testing experiment in Figure 7 nicely controls for the effect of time to demonstrate that the "rewarding" dopamine response to sucrose only recovers after multiple extinction sucrose exposures to extinguish the CTA.

      Weaknesses for consideration:

      (1) Regional differences in dopamine signaling to aversive stimuli are mentioned in the introduction and discussion. For instance, the idea that dopamine encodes salience is strongly argued against in the discussion, but the paper cited as arguing for that (Kutlu et al. 2021) is recording from the medial core in mice. Given other papers cited in the text about the regional differences in dopamine signaling in the NAc and from different populations of dopamine neurons in midbrain, it's important to mention this distinction wrt to salience signaling. Relatedly, the text says that the lateral NAc shell was targeted for accumbens recordings, but the histology figure looks like the majority of fibers were in the anterior lateral core of NAc. For the current paper to be a convincing last word on the issue, it would be extremely helpful to have similar recordings done in other parts of the NAc to do a more thorough comparison against other studies.

      As the Reviewer notes, NAc dopamine recordings were aimed at the lateral NAc shell. It is possible that some dopamine neurons lying within the anterior lateral core were recorded. Fiber photometry and the size of the fiber optics cannot definitively identify the precise location and number of dopamine neurons from which we recorded. Still, recording sites did not systematically differ between groups. Further, the within-subjects design helps to mitigate any potential biases for one subregion over another. The results presented in the manuscript strongly support a valence code. It is difficult to be the ‘last word’ on this topic and we suspect debate will continue. We used taste stimuli for appetitive and aversive stimuli – whereas many in the field will continue to use other noxious stimuli (e.g. foot shock) that likely recruit different circuits en route to the VTA. And there may very well be a different regional profile for dopamine signaling with different noxious stimuli. Moreover, we used intraoral infusion to avoid confounds of stimulus avoidance and competing motivations (e.g. food or fluid deprivation). We believe that this is one of the most important and unique features of our report. Recent work supports a role for phasic increases in dopamine in avoidance of noxious stimuli (Jung et al., 2024) and it will be critical for the field to reflect on the differences between avoidance and aversion. Moreover, in ongoing studies we aspire to fully survey dopamine signaling in conditioned taste aversion across the medial-lateral and dorsal-ventral axes of the VTA and NAc.

      (2) Dopamine release in the NAc never dips below baseline for the conditioned sucrose. Is it possible to really consider this as a signal for valence per se, as opposed to it being a weaker response relative to the original sucrose response?

      Indeed, NAc dopamine release to intraoral quinine nor aversive sucrose doesn’t dip below baseline but rather dopamine binding doesn’t change from pre-infusion baseline levels. It should be noted that VTA dopamine cell body activity does indeed dip below baseline in response to aversive sucrose. Moreover, using fast-scan cyclic voltammetry, we showed that dopamine release dips below baseline in the NAc dorsomedial shell in response to intraoral quinine (Roitman et al., 2008). The differences across recording sites may reflect regional differences but they may also reflect differences in recording approaches. GrabDA2h, used here, has relatively slow kinetics that may obscure dips below baseline (see response Weakness# 8 below).

      (3) Related to this, the main measure of the dopamine signal here, "mean z-score," obscures the temporal dynamics of the aversive dopamine response across a trial. This measure is used to claim that sucrose after CTA is "suppressing" dopamine neuron activity and release, which is true relative to the positive valence sucrose response. However, both GRAB-DA and cell-body GCaMP measurements show clear increases after onset of sucrose infusion before dipping back to baseline or slightly below in the average of all example experiments displayed. One could point to these data to argue either that aversive stimuli cause phasic increases in dopamine (due to the initial increase) or decreases (due to the delayed dip below baseline) depending on the measurement window. Some discussion of the dynamics of the response and how it relates to the prior literature would be useful.

      We have used mean z-score to do much of our quantitative analyses but the Reviewer raises the intriguing possibility that we are masking an initial increase in dopamine release and VTA DA activity evoked by aversive taste by doing so. We included the heat maps in the manuscript to be as transparent as possible about the time course of dopamine responses – both within a trial and across trials. The Reviewer’s point prompted us to reflect further on the heat maps and recognize that trials early in the session often showed a brief increase in dopamine for aversive sucrose but this response dissipated (NAc dopamine release) or flipped (VTA DA cell body activity) over trials. We now quantitatively characterize this feature by looking at the timecourse of dopamine responses in each third of the trials (1-10, 11-20, 21-30; see Author response images 1,2 and 3). As we infer the valence of the stimulus from nose and paw movements (behavioral reactivity), it is especially striking that we a similar timecourse for changes in behavior. Collectively, the data may reflect an updating process that is relatively slow and requires experience of the stimulus in a new (aversive) state – that is, a model-free process. While our experiments were not designed to test the updating of dopamine responses and discern their participation in model-based versus model-free learning processes – another debate in the dopamine field (Cone et al., 2016; Deserno et al., 2021)– the data reflect a model-free process. This is further supported in the experiment involving multiple conditioning sessions, where dopamine ‘dips’ are observed in trials 1-10 on Conditioning Day 3 and Extinction Day 1 when the new value of sucrose has been established. Finally, the relatively slow updating of the value of sucrose is reflected in older literature using a continuous intraoral infusion. Using this approach, rats began rejecting the saccharin infusion only after ~2min rather than immediately (Schafe et al., 1998; Schafe and Bernstein, 1996; Wilkins and Bernstein, 2006).   

      Author response image 1.

      Author response image 2.

      Author response image 3.

      (4) Would this delayed below-baseline dip be visible with a shorter infusion time?

      While our experiments did not explore this parameter, it would be interesting to parametrically vary infusion duration times and examine differences in dopamine responses. However, we believe the most parsimonious explanation is that the ‘dip’ in VTA cell body activity develops as a function of the slow updating of the value of sucrose reflective of a model-free process. We recognize that this is mere speculation.

      (5) Does the max of the increase or the dip of the decrease better correlate with the behavioral measures of aversion (orofacial, paw movements) or sucrose preference than "mean z-score" measure used here?

      It seems plausible that finding the most extreme value from baseline could better correlate to behavioral measures. Time courses to max increase and max decrease are different. Moreover, with appetitive sucrose, there are often multiple transients that occur throughout a single intraoral infusion. Coupled with a noisy time course for individual components of behavioral reactivity, we determined that averaging data across the whole infusion period (i.e. mean z-score) was the most objective way we could analyze the dopamine and behavioral responses to taste stimuli.

      (6) The authors argue strongly in the discussion against the idea that dopamine is encoding "salience." Could this initial peak (also seen in the first few trials of quinine delivery, fig 1c color plot) be a "salience" response?

      Our response above to the potential for ‘mixed’ dopamine responses to aversive sucrose led to additional analyses that support a slow updating of both behavior and dopamine to the new, aversive value of sucrose. Quinine is innately aversive and thus the Reviewer rightly points out that even here we observe an increase in dopamine release evoked by quinine on the first few trials (as observed in the heat map). We’d like to note, though, that the order of stimulus exposure was counterbalanced across rats. In those rats first receiving a sucrose session, quinine initially caused a modest increase in dopamine release during the first 10 trials (which is more pronounced in the first 2 trials). In the subsequent 2 blocks of 10 trials, no such increase was observed. Interestingly, in rats for which quinine was their first stimulus, we did not see an increase in dopamine release on the first few trials (see Author response image 4). We speculate that the initial sucrose session required the value of intraoral infusions to be updated when quinine was delivered to these rats and that, once more, the updating process may be slow and akin to a model-free process. This analysis, at present, is underpowered but will direct future attention in follow-up work.

      Author response image 4.

      (7) Related to this, the color plots showing individual trials show a reduction in the increases to positive valence sucrose across conditioning day trials and a flip from infusion-onset increase to delayed increases across test day trials. This evolution across days makes it appear that the last few conditioning day trials would be impossible to discriminate from the first few test day trials in the CTA-paired. Presumably, from strength of CTA as a paradigm, the sucrose is already aversive to the animals at the first trial of test day. Why do the authors think the response evolves across this session?

      As the Reviewer noted, Points 3-7 are related. We have speculated that the evolving dopamine response in Paired rats across test day trials reflects a model-free process. Importantly, as in the manuscript, our additional analyses once again show a tight relationship between behavioral reactivity and the dopamine response across the test session trials. It is important to note, though, that these experiments were not designed to test if responses reflect model-free or model-based processes.

      (8) Given that most of the work is using a conditioned aversive stimulus, the comparison to a primary aversive tastant quinine is useful. However, the authors saw basically no dopamine response to a primary aversive tastant quinine (measured only with GRAB-DA) and saw less noticeable decreases following CTA for NAc recordings with GRAB-DA2h than with cell body GCaMP. Given that they are using the high-affinity version of the GRAB sensor, this calls into question whether this is a true difference in release vs. soma activity or issue of high affinity release sensor making decreases in dopamine levels more difficult to observe.

      We share the same speculation as the Reviewer. Using fast-scan cyclic voltammetry, albeit measuring dopamine concentration in the dorsomedial shell, we observed a clear decrease from baseline with intraoral infusions of quinine (Roitman et al., 2008). Using fiber photometry here, the Reviewer and we note that GRAB_DA2h is a high-affinity (i.e., EC50: 7nM) dopamine sensor with relatively long off-kinetics (i.e., t1/2 decay time: 7300ms) (Labouesse et al., 2020). It may therefore be much more difficult to observe decreases (below baseline) using this sensor. The publication of new dopamine sensors - with lower affinity, faster kinetics, and greater dynamic range (Zhuo et al., 2024) – introduces opportunities for comparison and the greater potential for capturing decreases below baseline. Due to the poorer kinetics associated with GRAB_DA2h, we would not assert that direct comparisons between the GCaMP- and GRAB-based signals observed here represent true differences between somatic and terminal activity.

      References

      Choi JY, Jang HJ, Ornelas S, Fleming WT, Fürth D, Au J, Bandi A, Engel EA, Witten IB. 2020. A Comparison of Dopaminergic and Cholinergic Populations Reveals Unique Contributions of VTA Dopamine Neurons to Short-Term Memory. Cell Rep 33. doi:10.1016/j.celrep.2020.108492

      Cone JJ, Fortin SM, McHenry JA, Stuber GD, McCutcheon JE, Roitman MF. 2016. Physiological state gates acquisition and expression of mesolimbic reward prediction signals. Proc Natl Acad Sci U S A 113. doi:10.1073/pnas.1519643113

      de Jong JW, Afjei SA, Pollak Dorocic I, Peck JR, Liu C, Kim CK, Tian L, Deisseroth K, Lammel S. 2019. A Neural Circuit Mechanism for Encoding Aversive Stimuli in the Mesolimbic Dopamine System. Neuron 101. doi:10.1016/j.neuron.2018.11.005

      Deserno L, Moran R, Michely J, Lee Y, Dayan P, Dolan RJ. 2021. Dopamine enhances model-free credit assignment through boosting of retrospective model-based inference. Elife 10. doi:10.7554/eLife.67778

      Hsu TM, Bazzino P, Hurh SJ, Konanur VR, Roitman JD, Roitman MF. 2020. Thirst recruits phasic dopamine signaling through subfornical organ neurons. Proc Natl Acad Sci U S A 117:30744–30754. doi:10.1073/PNAS.2009233117/-/DCSUPPLEMENTAL

      Jung K, Krüssel S, Yoo S, An M, Burke B, Schappaugh N, Choi Y, Gu Z, Blackshaw S, Costa RM, Kwon HB. 2024. Dopamine-mediated formation of a memory module in the nucleus accumbens for goal-directed navigation. Nat Neurosci. doi:10.1038/s41593-024-01770-9

      Labouesse MA, Cola RB, Patriarchi T. 2020. GPCR-based dopamine sensors—A detailed guide to inform sensor choice for in vivo imaging. Int J Mol Sci. doi:10.3390/ijms21218048

      Lammel S, Hetzel A, Häckel O, Jones I, Liss B, Roeper J. 2008. Unique Properties of Mesoprefrontal Neurons within a Dual Mesocorticolimbic Dopamine System. Neuron 57. doi:10.1016/j.neuron.2008.01.022

      McCutcheon JE, Ebner SR, Loriaux AL, Roitman MF, Tobler PN. 2012. Encoding of aversion by dopamine and the nucleus accumbens. Front Neurosci 6. doi:10.3389/fnins.2012.00137

      Morales I, Berridge KC. 2020. ‘Liking’ and ‘wanting’ in eating and food reward: Brain mechanisms and clinical implications. Physiol Behav. doi:10.1016/j.physbeh.2020.113152

      Roitman MF, Wheeler RA, Wightman RM, Carelli RM. 2008. Real-time chemical responses in the nucleus accumbens differentiate rewarding and aversive stimuli. Nature Neuroscience 2008 11:12 11:1376–1377. doi:10.1038/nn.2219

      Schafe GE, Bernstein IL. 1996. Forebrain contribution to the induction of a brainstem correlate of conditioned taste aversion: I. The amygdala. Brain Res 741. doi:10.1016/S0006-8993(96)00906-7

      Schafe GE, Thiele TE, Bernstein IL. 1998. Conditioning method dramatically alters the role of amygdala in taste aversion learning. Learning and Memory 5. doi:10.1101/lm.5.6.481

      Wilkins EE, Bernstein IL. 2006. Conditioning method determines patterns of c-fos expression following novel taste-illness pairing. Behavioural Brain Research 169. doi:10.1016/j.bbr.2005.12.006

      Yuan L, Dou YN, Sun YG. 2021. Topography of reward and aversion encoding in the mesolimbic dopaminergic system. Journal of Neuroscience 39. doi:10.1523/JNEUROSCI.0271-19.2019

      Zhuo Y, Luo B, Yi X, Dong H, Miao X, Wan J, Williams JT, Campbell MG, Cai R, Qian T, Li F, Weber SJ, Wang L, Li B, Wei Y, Li G, Wang H, Zheng Y, Zhao Y, Wolf ME, Zhu Y, Watabe-Uchida M, Li Y. 2024. Improved green and red GRAB sensors for monitoring dopaminergic activity in vivo. Nat Methods 21. doi:10.1038/s41592-023-02100-w

    1. Author Response:

      We greatly appreciate invaluable and constructive comments from Editors and Reviewers. We also thank for their time and patience. We are pleased for our manuscript to have been assessed valuable and solid.

      One of most critical concerns was a possible involvement of Ca2+ channel inactivation in the strong paired pulse depression (PPD). Meanwhile, we have already measured total (free plus buffered) calcium increments induced by each of first four APs in a 40 Hz train at axonal boutons of prelimbic layer 2/3 pyramidal cells. We found that first four Ca2+ increments were not different each other, arguing against possible contribution of Ca2+ channel inactivation to PPD. Please see our reply to the 2nd issue in the Weakness section of Reviewer #3.

      The second critical issue was on the definition of ‘vesicular probability’. Previously, vesicular probability (pv) has been used with reference to the releasable vesicle pool which includes not only tightly docked vesicles but also reluctant vesicles. On the other hand, the meaning of pv in the present study was release probability of tightly docked vesicles. We clarified this point in our replies to the 1st issues in the Weakness sections of Reviewer #2 and Reviewer #3.

      To other Reviews’ comments, we below described our point-by-point replies.

      Reviewer #2 (Public review):

      Summary:

      Shin et al aim to identify in a very extensive piece of work a mechanism that contributes to dynamic regulation of synaptic output in the rat cortex at the second time scale. This mechanism is related to a new powerful model is well versed to test if the pool of SV ready for fusion is dynamically scaled to adjust supply demand aspects. The methods applied are state-of-the-art and both address quantitative aspects with high signal to noise. In addition, the authors examine both excitatory output onto glutamatergic and GABAergic neurons, which provides important information on how general the observed signals are in neural networks, The results are compellingly clear and show that pool regulation may be predominantly responsible. Their results suggests that a regulation of release probability, the alternative contender for regulation, is unlikely to be involved in the observed short term plasticity behavior (but see below). Besides providing a clear analysis pof the underlying physiology, they test two molecular contenders for the observed mechanism by showing that loss of Synaptotagmin7 function and the role of the Ca dependent phospholipase activity seems critical for the short term plasticity behavior. The authors go on to test the in vivo role of the mechanism by modulating Syt7 function and examining working memory tasks as well as overall changes in network activity using immediate early gene activity. Finally, they model their data, providing strong support for their interpretation of TS pool occupancy regulation.

      Strengths:

      This is a very thorough study, addressing the research question from many different angles and the experimental execution is superb. The impact of the work is high, as it applies recent models of short term plasticity behavior to in vivo circuits further providing insights how synapses provide dynamic control to enable working memory related behavior through nonpermanent changes in synaptic output.

      Weaknesses:

      While this work is carefully examined and the results are presented and discussed in a detailed manner, the reviewer is still not fully convinced that regulation of release provability is not a putative contributor to the observed behavior. No additional work is needed but in the moment I am not convinced that changes in release probability are not in play. One solution may be to extend the discussion of changes in rules probability as an alternative.

      Quantal content (m) depends on n * pv, where n = RRP size and pv =vesicular release probability. The value for pv critically depends on the definition of RRP size. Recent studies revealed that docked vesicles have differential priming states: loosely or tightly docked state (LS or TS, respectively). Because the RRP size estimated by hypertonic solution or long presynaptic depolarization is larger than that by back extrapolation of a cumulative EPSC plot (Moulder & Mennerick, 2005; Sakaba, 2006) in glutamatergic synapses, the former RRP (denoted as RRPhyper) may encompass not only AP-evoked fast-releasing vesicles (TS vesicle) but also reluctant vesicles (LS vesicles). Because we measured pv based on AP-evoked EPSCs such as strong paired pulse depression (PPD) and associated failure rates, pv in the present study denotes vesicular fusion probability of TS vesicles not that of LS plus TS vesicles.

      Recent studies suggest that release sites are not fully occupied by TS vesicles in the baseline (Miki et al., 2016; Pulido and Marty, 2018; Malagon et al., 2020; Lin et al., 2022). Instead the occupancy (pocc) by TS vesicles is subject to dynamic regulation by reversible rate constants (denoted by k1 and b1, respectively). The number of TS vesicles (n) can be factored into the number of release sites (N) and pocc, among which N is a fixed parameter but pocc depends on k1/(k1+b1) under the framework of the simple refilling model (see Methods). Because these refilling rate constants are regulated by Ca2+ (Hosoi, et al., 2008), pocc is not a fixed parameter. Therefore, release probability should be re-defined as pocc x pv. In this regard, the increase in release probability is a major player in STF. Our study asserts that STF by 2.3 times can be attributed to an increase in pocc rather than pv, because pv is close to unity (Fig. S8). Moreover, strong PPD was observed not only in the baseline but also at the early and in the middle of a train (Fig. 2 and 7) and during the recovery phase (Fig. 3), arguing against a gradual increase in pv of reluctant vesicles.

      If the Reviewer meant vesicular release or fusion probability (pv) by ‘release provability’, pv (of TS vesicles) is not a major player in STF, because the baseline pv is already higher than 0.8 even if it is most parsimoniously estimated (Fig. 2). Moreover, considering very high refilling rate (23/s), the high double failure rate cannot be explained without assuming that pv is close to unity (Fig. S8).

      Conventional models for facilitation assume a post-AP residual Ca2+-dependent step increase in pv of RRP (Dittman et al., 2000) or reluctant vesicles (Turecek et al., 2016). Given that pv of TS vesicles is close to one, an increase in pv of TS vesicles cannot account for facilitation. The possibility for activity-dependent increase in fusion probability of LS vesicles (denoted as pv,LS) should be considered in two ways depending on whether LS and TS vesicles reside in distinct pools or in the same pool. Notably, strong PPD at short ISI implies that pv,LS is near zero at the resting state. Whereas LS vesicles do not contribute to baseline transmission, short-term facilitation (STF) may be mediated by cumulative increase in pv, LS that reside in a distinct pool. Because the increase in pv,LS during facilitation recruits new release sites (increase in N), the variance of EPSCs should become larger as stimulation frequency increases, resulting in upward deviation from a parabola in the V-M plane, as shown in recent studies (Valera et al., 2012; Kobbersmed et al., 2020). This prediction is not compatible with our results of V-M analysis (Fig. 3), showing that EPSCs during STF fell on the same parabola regardless of stimulation frequencies. Therefore, it is unlikely that an increase in fusion probability of reluctant vesicles residing in a distinct release pool mediates STF in the present study.

      For the latter case, in which LS and TS vesicles occupy in the same release sites, it is hard to distinguish a step increase in fusion probability of LS vesicles from a conversion of LS vesicles to TS. Nevertheless, our results do not support the possibility for gradual increase in pv,LS that occurs in parallel with STF. Strong PPD, indicative of high pv, was consistently found not only in the baseline (Fig. 2 and Fig. S6) but also during post-tetanic augmentation phase (Fig. 3D) and even during the early development of facilitation (Fig. 2D-E and Fig. 7), arguing against gradual increase in pv,LS. One may argue that STF may be mediated by a drastic step increase of pv,LS from zero to one, but it is not distinguishable from conversion of LS to TS vesicles.

      To address the reviewer’s concern, we will incorporate these perspectives into the discussion and further clarify the reasoning behind our conclusions.

      <References>

      Moulder KL, Mennerick S (2005) Reluctant vesicles contribute to the total readily releasable pool in glutamatergic hippocampal neurons. J Neurosci 25:3842–3850.

      Sakaba, T (2006) Roles of the fast-releasing and the slowly releasing vesicles in synaptic transmission at the calyx of Held. J Neurosci 26(22): 5863-5871.

      Fig 3 I am confused about the interpretation of the Mean Variance analysis outcome. Since the data points follow the curve during induction of short term plasticity, aren't these suggesting that release probability and not the pool size increases? Related, to measure the absolute release probability and failure rate using the optogenetic stimulation technique is not trivial as the experimental paradigm bias the experiment to a given output strength, and therefore a change in release probability cannot be excluded.

      Under the recent definition of release probability, it can be factored into pv and pocc, which are fusion probability of TS vesicles and the occupancy of release sites by TS vesicles, respectively. With this regard, our interpretation of the Variance-Mean results is consistent with conventional one: different data points along a parabola represent a change in release probability (= pocc x pv). Our novel finding is that the increase in release probability should be attributed to an increase in pocc, not to that in pv.

      Fig4B interprets the phorbol ester stimulation to be the result of pool overfilling, however, phorbol ester stimulation has also been shown to increase release probability without changing the size of the readily releasable pool. The high frequency of stimulation may occlude an increased paired pulse depression in presence of OAG, which others have interpreted in mammalian synapses as an increase in release probability.

      To our experience in the calyx of Held synapses, OAG, a DAG analogue, increased the fast releasing vesicle pool (FRP) size (Lee JS et al., 2013), consistent with our interpretation (pool overfilling). Once the release sites are overfilled in the presence of OAG, it is expected that the maximal STF (ratio of facilitated to baseline EPSCs) becomes lower as long as the number of release sites (N) are limited. As aforementioned, the baseline pv is already close to one, and thus it cannot be further increased by OAG. Instead, the baseline pocc seems to be increased by OAG.

      <Reference>

      Lee JS, et al., Superpriming of synaptic vesicles after their recruitment to the readily releasable pool. Proc Natl Acad Sci U S A, 2013. 110(37): 15079-84.

      The literature on Syt7 function is still quite controversial. An observation in the literature that loss of Syt7 function in the fly synapse leads to an increase of release probability. Thus the observed changes in short term plasticity characteristics in the Syt7 KD experiments may contain a release probability component. Can the authors really exclude this possibility? Figure 5 shows for the Syt7 KD group a very prominent depression of the EPSC/IPSC with the second stimulus, particularly for the short interpulse intervals, usually a strong sign of increased release probability, as lack of pool refilling can unlikely explain the strong drop in synaptic output.

      The reviewer raises an interesting point regarding the potential link between Syt7 KD and increased initial pv, particularly in light of observations in Drosophila synapses (Guan et al., 2020; Fujii et al., 2021), in which Syt7 mutants exhibited elevated initial pv. However, it is important to note that these findings markedly differ from those in mammalian systems, where the role of Syt7 in regulating initial pv has been extensively studied. In rodents, consistent evidence indicates that Syt7 does not significantly affect initial pv, as demonstrated in several studies (Jackman et al., 2016; Chen et al., 2017; Turecek and Regehr, 2018). Furthermore, in our study of excitatory synapses in the mPFC layer 2/3, we observed an initial pv already near its maximal level, approaching a value of 1. Consequently, it is unlikely that the loss of Syt7 could further elevate the initial pv. Instead, such effects are more plausibly explained by alternative mechanisms, such as alterations in vesicle replenishment dynamics, rather than a direct influence on pv.

      <References>

      Chen, C., et al., Triple Function of Synaptotagmin 7 Ensures Efficiency of High-Frequency Transmission at Central GABAergic Synapses. Cell Rep, 2017. 21(8): 2082-2089.

      Fujii, T., et al., Synaptotagmin 7 switches short-term synaptic plasticity from depression to facilitation by suppressing synaptic transmission. Scientific reports, 2021. 11(1): 4059.

      Guan, Z., et al., Drosophila Synaptotagmin 7 negatively regulates synaptic vesicle release and replenishment in a dosage-dependent manner. Elife, 2020. 9: e55443.

      Jackman, S.L., et al., The calcium sensor synaptotagmin 7 is required for synaptic facilitation. Nature, 2016. 529(7584): 88-91.

      Turecek, J. and W.G. Regehr, Synaptotagmin 7 mediates both facilitation and asynchronous release at granule cell synapses. Journal of Neuroscience, 2018. 38(13): 3240-3251.

      Reviewer #3 (Public review):

      Summary:

      The report by Shin, Lee, Kim, and Lee entitled "Progressive overfilling of readily releasable pool underlies short-term facilitation at recurrent excitatory synapses in layer 2/3 of the rat prefrontal cortex" describes electrophysiological experiments of short-term synaptic plasticity during repetitive presynaptic stimulation at synapses between layer 2/3 pyramidal neurons and nearby target neurons. Manipulations include pharmacological inhibition of PLC and actin polymerization, activation of DAG receptors, and shRNA knockdown of Syt7. The results are interpreted as support for the hypothesis that synaptic vesicle release sites are vacant most of the time at resting synapses (i.e., p_occ is low) and that facilitation (and augmentation) components of short-term enhancement are caused by an increase in occupancy, presumably because of acceleration of the transition from not-occupied to occupied. The report additionally describes behavioural experiments where trace fear conditioning is degraded by knocking down syt7 in the same synapses.

      Strengths:

      The strength of the study is in the new information about short-term plasticity at local synapses in layer 2/3, and the major disruption of a memory task after eliminating short-term enhancement at only 15% of excitatory synapses in a single layer of a small brain region. The local synapses in layer 2/3 were previously difficult to study, but the authors have overcome a number of challenges by combining channel rhodopsins with in vitro electroporation, which is an impressive technical advance.

      Weaknesses:

      The question of whether or not short-term enhancement causes an increase in p_occ (i.e., "readily releasable pool overfilling") is important because it cuts to the heart of the ongoing debate about how to model short term synaptic plasticity in general. However, my opinion is that, in their current form, the results do not constitute strong support for an increase in p_occ, even though this is presented as the main conclusion. Instead, there are at least two alternative explanations for the results that both seem more likely. Neither alternative is acknowledged in the present version of the report.

      The evidence presented to support overfilling is essentially two-fold. The first is strong paired pulse depression of synaptic strength when the interval between action potentials is 20 or 25 ms, but not when the interval is 50 ms. Subsequent stimuli at frequencies between 5 and 40 Hz then drive enhancement. The second is the observation that a slow component of recovery from depression after trains of action potentials is unveiled after eliminating enhancement by knocking down syt7. Of the two, the second is predicted by essentially all models where enhancement mechanisms operate independently of release site depletion - i.e., transient increases in p_occ, p_v, or even N - so isn't the sort of support that would distinguish the hypothesis from alternatives (Garcia-Perez and Wesseling, 2008, https://doi.org/10.1152/jn.01348.2007).

      The apparent discrepancy in interpretation of post-tetanic augmentation between the present and previous papers [Sevens Wesseling (1999), Garcia-Perez and Wesseling (2008)] is an important issue that should be clarified. We noted that different meanings of ‘vesicular release probability’ in these papers are responsible for the discrepancy. We will add an explanation to Discussion on the difference in the meaning of ‘vesicular release probability’ between the present study and previous studies [Sevens Wesseling (1999), Garcia-Perez and Wesseling (2008)]. In summary, the pv in the present study was used for vesicular release probability of TS vesicles, while previous studies used it as vesicular release probability of vesicles in the RRP, which include LS and TS vesicles. Accordingly, pocc in the present study is occupancy of release sites by TS vesicles.

      Not only double failure rate but also other failure rates upon paired pulse stimulation were best fitted at pv close to 1 (Fig. S8 and associated text). Moreover, strong PPD, indicating release of vesicles with high pv, was observed not only at the beginning of a train but also in the middle of a 5 Hz train (Fig. 2D), during the augmentation phase after a 40 Hz train (Fig 3D), and in the recovery phase after three pulse bursts (Fig. 7). Given that pv is close to 1 throughout the EPSC trains and that N does not increase during a train (Fig. 3), synaptic facilitation can be attained only by the increase in pocc (occupancy of release sites by TS vesicles). In addition, it should be noted that Fig. 7 demonstrates strong PPD during the recovery phase after depletion of TS vesicles by three pulse bursts, indicating that recovered vesicles after depletion display high pv too. Knock-down of Syt7 slowed the recovery of TS vesicles after depletion of TS vesicles, highlighting that Syt7 accelerates the recovery of TS vesicles following their depletion.

      As addressed in our reply to the first issue raised by Reviewer #2 and the third issue raised by Reviewer #3, our results do not support possibilities for recruitment of new release sites (increase in N) having low pv or for a gradual increase in pv of reluctant vesicles during short-term facilitation.  

      <Following statement will be added to _Discussion_ in the revised manuscript>

      Previous studies suggested that an increase in pv is responsible for post-tetanic augmentation (Stevens and Wesseling, 1999; Garcia-Perez and Wesseling, 2008) by observing invariance of the RRP size after tetanic stimulation. In these studies, the RRP size was estimated by hypertonic sucrose solution or as the sum of EPSCs evoked 20 Hz/60 pulses train (denoted as ‘RRPhyper’). Because reluctant vesicles (called LS vesicles) can be quickly converted to TS vesicles (16/s) and are released during a train (Lee et al., 2012), it is likely that the RRP size measured by these methods encompasses both LS and TS vesicles. In contrast, we assert high pv based on the observation of strong PPD and failure rates upon paired stimulations at ISI of 20 ms (Fig. 2 and Fig. S8). Given that single AP-induced vesicular release occurs from TS vesicles but not from LS vesicles, pv in the present study indicates the fusion probability of TS vesicles. From the same reasons, pocc denotes the occupancy of release sites by TS vesicles. Note that our study does not provide direct clue whether release sites are occupied by LS vesicles that are not tapped by a single AP, although an increase in the LS vesicle number may accelerate the recovery of TS vesicles. As suggested in Neher (2024), even if the number of LS plus TS vesicles are kept constant, an increase in pocc (occupancy by TS vesicles) would be interpreted as an increase in ‘vesicular release probability’ as in the previous studies (Stevens and Wesseling (1999); Garcia-Perez and Wesseling (2008)) as long as it was measured based on RRPhyper.

      Regarding the paired pulse depression: The authors ascribe this to depletion of a homogeneous population of release sites, all with similar p_v. However, the details fit better with the alternative hypothesis that the depression is instead caused by quickly reversing inactivation of Ca2+ channels near release sites, as proposed by Dobrunz and Stevens to explain a similar phenomenon at a different type of synapse (1997, PNAS,<br /> https://doi.org/10.1073/pnas.94.26.14843). The details that fit better with Ca2+ channel inactivation include the combination of the sigmoid time course of the recovery from depression (plotted backwards in Fig1G,I) and observations that EGTA (Fig2B) increases the paired-pulse depression seen after 25 ms intervals. That is, the authors ascribe the sigmoid recovery to a delay in the activation of the facilitation mechanism, but the increased paired pulse depression after loading EGTA indicates, instead, that the facilitation mechanism has already caused p_r to double within the first 25 ms (relative to the value if the facilitation mechanism was not active). Meanwhile, Ca2+ channel inactivation would be expected to cause a sigmoidal recovery of synaptic strength because of the sigmoidal relationship between Ca2+-influx and exocytosis (Dodge and Rahamimoff, 1967, https://doi.org/10.1113/jphysiol.1967.sp008367).

      The Ca2+-channel inactivation hypothesis could probably be ruled in or out with experiments analogous to the 1997 Dobrunz study, except after lowering extracellular Ca2+ to the point where synaptic transmission failures are frequent. However, a possible complication might be a large increase in facilitation in low Ca2+ (Fig2B of Stevens and Wesseling, 1999, https://doi.org/10.1016/s0896-6273(00)80685-6).

      We appreciate the reviewer's thoughtful comment regarding the potential role of Ca2+ channel inactivation in the observed paired-pulse depression (PPD). As noted by the Reviewer, the Dobrunz and Stevens (1997) suggested that the high double failure rate at short ISIs in synapses exhibiting PPD can be attributed to Ca2+ channel inactivation. This interpretation seems to be based on a premise that the number of RRP vesicles are not varied trial-by-trial. The number of TS vesicles, however, can be dynamically regulated depending on the parameters k1 and b1, as shown in Fig. S8, implying that the high double failure rate at short ISIs cannot be solely attributed to Ca2+ channel inactivation. Nevertheless, we acknowledge the possibility that Ca2+ channel inactivation may contribute to PPD, and therefore, we have further investigated this possibility. Specifically, we measured action potential (AP)-evoked Ca2+ transients at individual axonal boutons of layer 2/3 pyramidal cells in the mPFC using two-dye ratiometry techniques. Our analysis revealed no evidence for Ca2+ channel inactivation during a 40 Hz train of APs. This finding indicates that voltage-gated Ca2+ channel inactivation is unlikely to contribute to the pronounced PPD.

      Author response image 1 below shows how we measured the total Ca2+ increments at axonal boutons. First we estimated endogenous Ca2+-binding ratio from analyses of single AP-induced Ca2+ transients at different concentrations of Ca2+ indicator dye (panels A to E). And then, using the Ca2+ buffer properties, we converted free [Ca2+] amplitudes to total calcium increments for the first four AP-evoked Ca2+ transients in a 40 Hz train (panels G-I). We will incorporate these results into the revised version of reviewed preprint to provide evidence against the Ca2+ channel inactivation.

      Author response image 1.

      On the other hand, even if the paired pulse depression is caused by depletion of release sites rather than Ca2+-channel inactivation, there does not seem to be any support for the critical assumption that all of the release sites have similar p_v. And indeed, there seems to be substantial emerging evidence from other studies for multiple types of release sites with 5 to 20-fold differences in p_v at a wide variety of synapse types (Maschi and Klyachko, eLife, 2020, https://doi.org/10.7554/elife.55210; Rodriguez Gotor et al, eLife, 2024, https://doi.org/10.7554/elife.88212 and refs. therein). If so, the paired pulse depression could be caused by depletion of release sites with high p_v, whereas the facilitation could occur at sites with much lower p_v that are still occupied. It might be possible to address this by eliminating assumptions about the distribution of p_v across release sites from the variance-mean analysis, but this seems difficult; simply showing how a few selected distributions wouldn't work - such as in standard multiple probability fluctuation analyses - wouldn't add much.

      We appreciate the reviewer’s insightful comments regarding the potential increase in pfusion of reluctant vesicles. It should be noted, however, that Maschi and Klyachko (2020) showed a distribution of release probability (pr) within a single active zone rather than a heterogeneity in pfusion of individual docked vesicles. Therefore both pocc and pv of TS vesicles would contribute to the pr distribution shown in Maschi and Klyachko (2020). 

      The Reviewer’s concern aligns closely with the first issue raised by Reviewer #2, to which we addressed in detail. Briefly, new release site may not be recruited during facilitation or post-tetanic augmentation, because variance of EPSCs during and after a train fell on the same parabola (Fig. 3). Secondly, strong PPD was observed not only in the baseline but also during early and late phases of facilitation, indicating that vesicles with very high pv contribute to EPSC throughout train stimulations (Fig. 2, 3, and 7). These findings argue against the possibilities for recruitment of new release sites harboring low pv vesicles and for a gradual increase in fusion probability of reluctant vesicles.

      To address the reviewers’ concern, we will incorporate the perspectives into Discussion and further clarify the reasoning behind our conclusions.

      In any case, the large increase - often 10-fold or more - in enhancement seen after lowering Ca2+ below 0.25 mM at a broad range of synapses and neuro-muscular junctions noted above is a potent reason to be cautious about the LS/TS model. There is morphological evidence that the transitions from a loose to tight docking state (LS to TS) occur, and even that the timing is accelerated by activity. However, 10-fold enhancement would imply that at least 90 % of vesicles start off in the LS state, and this has not been reported. In addition, my understanding is that the reverse transition (TS to LS) is thought to occur within 10s of ms of the action potential, which is 10-fold too fast to account for the reversal of facilitation seen at the same synapses (Kusick et al, 2020, https://doi.org/10.1038/s41593-020-00716-1).

      As the reviewer suggested, low external Ca2+ concentration can lower release probability (pr). Given that both pv and pocc are regulated by [Ca2+]i, low external [Ca2+] may affect not only pv but also pocc, both of which would contribute to low pr. Under such conditions, it would be plausible that the baseline pr becomes much lower than 0.1 due to low pv and pocc (for instance, pv decreases from 1 to 0.5, and pocc from 0.3 to 0.1, then pr = 0.05), and then pr (= pv x pocc) has a room for an increase by a factor of ten (0.5, for example) by short-term facilitation as cytosolic [Ca2+] accumulates during a train.

      If pv is close to one, pr depends pocc, and thus facilitation depends on the number of TS vesicles just before arrival of each AP of a train. Thus, post-train recovery from facilitation would depend on restoration of equilibrium between TS and LS vesicles to the baseline. Even if transition between LS and TS vesicles is very fast (tens of ms), the equilibrium involved in de novo priming (reversible transitions between recycling vesicle pool and partially docked LS vesicles) seems to be much slower (13 s in Fig. 5A of Wu and Borst 1999). Thus, we can consider a two-step priming model (recycling pool -> LS -> TS), which is comprised of a slow 1st step (-> LS) and a fast 2nd step (-> TS). Under the framework of the two-step model, the slow 1st step (de novo priming step) is the rate limiting step regulating the development and recovery kinetics of facilitation. Given that on and off rate for Ca2+ binding to Syt7 is slow, it is plausible that Syt7 may contribute to short-term facilitation (STF) by Ca2+-dependent acceleration of the 1st step (as shown in Fig. 9). During train stimulation, the number of LS vesicles would slowly accumulate in a Syt7 and Ca2+-dependent manner, and this increase in LS vesicles would shift LS/TS equilibrium towards TS, resulting in STF. After tetanic stimulation, the recovery kinetics from facilitation would be limited by slow recovery of LS vesicles.

      <Reference>

      Wu, L.-G. and Borst J.G.G. (1999) The reduced release probability of releasable vesicles during recovery from short-term synaptic depression. Neuron, 23(4): 821-832.

      Individual points:

      (1) An additional problem with the overfilling hypothesis is that syt7 knockdown increases the estimate of p_occ extracted from the variance-mean analysis, which would imply a faster transition from unoccupied to occupied, and would consequently predict faster recovery from depression. However, recovery from depression seen in experiments was slower, not faster. Meanwhile, the apparent decrease in the estimate of N extracted from the mean-variance analysis is not anticipated by the authors' model, but fits well with alternatives where p_v varies extensively among release sites because release sites with low p_v would essentially be silent in the absence of facilitation.

      Slower recovery from depression observed in the Syt7 knockdown (KD) synapses (Fig. 7) may results from a deficiency in activity-dependent acceleration of TS vesicle recovery. Although basal occupancy was higher in the Syt7 KD synapses, this does not indicate a faster activity-dependent recovery.

      Higher baseline occupancy does not always imply faster recovery of PPR too. Actually PPR recovery was slower in Syt7 KD synapses than WT one (18.5 vs. 23/s). Under the framework of the simple refilling model (Fig. S8Aa), the baseline occupancy and PPR recovery rate are calculated as k1 / (k1 + b1) and (k1 + b1), respectively. The baseline occupancy depends on k1/b1, while the PPR recovery on absolute values of k1 and b1. Based on pocc and PPR recovery time constant of WT and KD synapses, we expect higher k1/b1 but lower values for (k1 +b1) in Syt7 KD synapses compared to WT ones.

      Lower release sites (N) in Syt7-KD synapses was not anticipated. As you suggested, such low N might be ascribed to little recruitment of release sites during a train in KD synapses. But our results do not support this model. If silent release sites are recruited during a train, the variance should upwardly deviate from the parabola predicted under a fixed N (Valera et al., 2012; Kobbersmed et al. 2020). Our result was not the case (Fig. 3). In the first version of Ms, we have argued against this possibility in line 203-208.

      As discussed in both the Results and Discussion sections, the baseline EPSC was unchanged by KD (Fig. S3) because of complementary changes in the number of docking sites and their baseline occupancy (Fig. 6). These findings suggest that Syt7 may be involved in maintaining additional vacant docking sites, which could be overfilled during facilitation. It remains to be determined whether the decrease in docking sites in Syt7 KD synapses is related to its specific localization of Syt7 at the plasma membrane of active zones, as proposed in previous studies (Sugita et al., 2001; Vevea et al., 2021).

      (2) Figure S4A: I like the TTX part of this control, but the 4-AP part needs a positive control to be meaningful (e.g., absence of TTX).

      The reason why we used 4-AP in the presence of TTX was to increase the length constant of axon fibers and to facilitate the conduction of local depolarization in the illumination area to axon terminals. The lack of EPSC in the presence of 4-AP and TTX indicates that illumination area is distant from axon terminals enough for optic stimulation-induced local depolarization not to evoke synaptic transmission. This methodology has been employed in previous studies including the work of Little and Carter (2013).

      <Reference>

      Little JP and Carter AG (2013) Synaptic mechanisms underlying strong reciprocal connectivity between the medial prefrontal cortex and basolateral amygdala. J Neurosci, 33(39): 15333-15342.

      (3) Line 251: At least some of the previous studies that concluded these drugs affect vesicle dynamics used logic that was based on some of the same assumptions that are problematic for the present study, so the reasoning is a bit circular.

      (4) Line 329 and Line 461: A similar problem with circularity for interpreting earlier syt7 studies.

      (Reply to #3 and #4) We selected the target molecules as candidates based on their well-characterized roles in vesicle dynamics, and aimed to investigate what aspects of STP are affected by these molecules in our experimental context. For example, we could find that the baseline pocc and short-term facilitation (STF) are enhanced by the baseline DAG level and train stimulation-induced PLC activation, respectively. Notably, the effect of dynasore informed us that slow site clearing is responsible for the late depression of 40 Hz train EPSC. The knock-down experiments also provided us with information on the critical role of Syt7 in replenishment of TS vesicles. These approaches do not deviate from standard scientific reasoning but rather builds upon prior knowledge to formulate and test hypotheses.

      Importantly, our conclusions do not rely solely on the assumption that altering the target molecule impacts synaptic transmission. Instead, our conclusions are derived from a comprehensive analysis of diverse outcomes obtained through both pharmacological and genetic manipulations. These interpretations align closely with prior literature, further validating our conclusions.

      Therefore, the use of established studies to guide candidate selection and the consistency of our findings with existing knowledge do not represent a logical circularity but rather a reinforcement of the proposed mechanism through converging lines of evidence.

    1. Author response:

      We were pleased to read the positive comments regarding our manuscript and thank the reviewers and editors for the constructive feedback which we believe will be very helpful to improve the current version of the manuscript.

      Prior to addressing all comments in a full response, we provide a response to three issues that were raised in this provisional plan for revision: validation of the tracking algorithm, biological replicates, and mosquito survival.

      (1) Validation of the tracking algorithm:

      Reviewer 2 mentions that there is "No external validation for the flight tracking algorithm using manual annotation". We will address this comment in our full response by creating a manually labelled dataset to validate our detection algorithm.

      However, we would like to point out two important points:

      i) Quantifying the accuracy of a detection algorithm using a manually annotated set is indeed common practice in deep/machine learning algorithms in which manually annotated data are used to train the algorithm, and another set of manually annotated data is used to validate it. However, our detection and tracking algorithm is based on conventional computer vision techniques (not using any deep learning) that have been in use for several decades. Given that these algorithms are completely transparent and deterministic (as opposed to deep learning algorithms that are difficult to dissect and are created using partly stochastic processes) it is not common practice to use human annotations for validation. However, to address Reviewer 2's comment we will provide validation metrics in our full response.

      ii) We furthermore would like to note that our main metrics of interest (e.g. fraction of mosquitoes flying) only depends on accurately detecting mosquitoes and quantifying movement, its accuracy is not affected by potential identity swaps (the typical bottleneck in tracking algorithms).

      (2) Replicates:

      Reviewer 3 states that "Most experiments are only done with single replicates". This statement is not accurate: In Figure 2 we used 3 independent biological replicates for 4 colonies, 2 of which are Aaa and 2 are Aaf. We indeed provide additional data for 6 more colonies using a single replicate. Combined this data set comprises 588 days of continuous recordings. For Figures 3 and 4 we have 2 replicates for each perturbation experiment. For Figure 5 we provided 3 replicates for the host-seeking experiments. As outlined, the vast majority of our experiments has multiple replicates. We realize this may not have been described clearly enough in the manuscript, we will clarify this in the revised manuscript.

      (3) Mosquito survival:

      Below we provide survival data for the data shown in Figures 1 - 4, we will include this data as supplementary material. Overall we note here that mortality for all experiments was similar to the 'baseline' mortality we observe in our standard colony maintenance procedures. After three weeks, we typically observed that 70% of mosquitoes were still alive.

      Author response image 1.

      Survival curves for the data presented in Figures 1 - 4 of the main text. Day 0 indicates the day on which the BuzzWatch experiment started

    1. Author response:

      Reply to Reviewer #1 (Public Review):

      The post-processing increases number of putative neoantigens. As shown in Author response image 1, this is done through data augmentation or “mutations” of individual amino acids in a sequence by their most similar amino acid in the BLOSUM62 embedding. If most of the mutations result in a positive prediction (which we binarize through a >0.5 score) the sequence changes its prediction.

      Author response image 1.

      Post-processing pipeline to increase the number of putative neoantigens. Sequences can either be predicted using the forward method, for which a raw score is produced, or it can be introduced to a majority-vote prediction of the ensemble prediction of similar protein sequences.

      In this article, we obtain the following candidates after post-processing.

      Author response table 1.

      As mentioned, the prediction column shows a binary label. The full list contained 402 sequences did not include any other sequences that met the majority vote criteria.

      As noted by the reviewer, the Table 3 of our original paper includes the scores of the direct prediction, which has four sequences in common with the post-processing criteria (*Pnp, *Adar, *Lrrc28 and *Nr1h2). * indicates the mutated form of the peptide, i.e neoantigen.

      We selected the top 4 predicted antigens (present both by direct prediction and after post-processing; (*Pnp, *Adar, *Lrrc28 and *Nr1h2) (Wert-Carvajal et al. 2021), but we encountered difficulty in synthesizing, *Nr1h2 (Mutated Nr1h2), and thus it could not be included in the study.

      We also decided to evaluate the immunogenicity of *Wiz, which was identified as a potential TNA only after postprocessing. *Wiz exhibited lower levels of immunogenicity compared to *Pnp, *Adar, and *Lrrc28. However, unlike these, *Wiz is highly expressed in the tumor, and vaccination with *Wiz provided the strongest protection levels. These findings led us to incorporate post-processingg into the NAP-CNB platform.

      We chose *Herc6 as a mutated antigen predicted not to be a TNA over other candidates because its expression in the tumor was similar to that of *Wiz.

      Depending on the experiment we used 4 or 5 animals per group (this will be clarify in the revised version)

      The software used for statistical analysis was GraphPad Prism.

      Reply to Reviewer #2 (Public Review):

      This is true, binding affinity does not always predict immune responses but in most cases, high affinity peptides are immunogenic. There are of course other parameters that drive the effective priming of tumor-reactive CD8+ T cells through antigen cross-presentation, but the mechanisms of antigen presentation are yet not completely understood. High affinity peptides are desirable as good candidates in neoantigen-based vaccines.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      The behavior of cells expressing constitutively active HRas is examined in mosaic monolayers, both in MCF10a breast epithelial and Beas2b bronchial epithelial cell lines, mimicking the potential initial phase of development of carcinoma. Single HRas-positive cells are excluded from MCF10a but not Beas2b monolayers. Most interestingly, however, when in groups, these cells are not excluded, but rather sharply segregated within a MCF10a monolayer. In contrast, they freely mix with wt Beas2b cells. Biophysical analysis identifies high tension at heterotypic interfaces between HRas and wild-type cells as the likely reason for segregation of MCF10a cells. The hypothesis is supported experimentally, as myosin inhibition abolishes segregation. The probable reason for the lack of segregation in the bronchial epithelium is to be found in the different intrinsic properties of these cells, which form a looser tissue with lower basal actomyosin activity. The behaviour of single cells and groups is recapitulated in a vortex model based on the principle of differential interfacial tension, under the condition of high heterotypic interfacial tension.

      Strengths:

      Despite being long recognized as a crucial event during cancer development, segregation of oncogenic cells has been a largely understudied question. This nice work addresses the mechanics of this phenomenon through a straightforward experimental design, applying the biophysical analytical approaches established in the field of morphogenesis. Comparison between two cell types provides some preliminary clues on the diversity of effects in various cancers.

      Weaknesses:

      Although not calling into question the main message of this study, there are a few issues that one may want to address:

      (1) One may be careful in interpreting the comparison between MCF10a and Beas2b cells as used in this study. The conditions may not necessarily be representative of the actual properties of breast and bronchial epithelia. How much of the epithelial organization is reconstituted under these experimental conditions remains to be established. This is particularly obvious for bronchial cells, which would need quite specific culture conditions to build a proper bronchial layer. In this study, they seemed to be on the verge of a mesenchymal phenotype (large gaps, huge protrusions, cells growing on top of each other, as mentioned in the manuscript).

      We thank the reviewer for this important point. We agree that our experimental conditions do not fully recapitulate the in vivo architecture of either breast or bronchial epithelia. However, here, our intention is to compare two well-established epithelial lines with distinct intrinsic mechanical and organizational properties, rather than to reproduce in-vivo microenvironment. Nevertheless, to address this, we have now strengthened our quantitative analysis of epithelial integrity in Beas2b monolayers, by including ZO-1 immunofluorescence along with E-cadherin immunofluorescence. These measurements confirm that Beas2b monolayers under our culture conditions retain junctional organization, albeit with larger gaps and protrusions compared to MCF10a. We will revise the text to make this distinction explicit.

      As an alternative to Beas2b, comparison of MCF10a with another cell line capable of more robust in vitro epithelial organization, but ideally with different adhesive and/or tensile properties, would be highly interesting, as it may narrow down the parameters involved in segregation of oncogenic cells.

      We agree with the reviewer that the inclusion of an additional epithelial model system with distinct adhesive and organizational properties would provide valuable insights. In line with this suggestion, we are currently repeating the key experiments using Madin-Darby Canine Kidney (MDCK) cells, a well-established model epithelial cell line. We believe this complementary system will allow us to further dissect the behaviour of HRasV12-expressing cells.

      (2) While the seminal description of tissue properties based on interfacial tensions (Brodland 2002) is clearly key to interpreting these data, the actual "Differential Interfacial Tension Hypothesis" poses that segregation results from global differences, i.e., juxtaposition of two tissues displaying different intrinsic tensions. On the contrary, the results of the present work support a different scenario, where what counts is the actual difference in tension ALONG the tissue boundary, in other words, that segregation is driven by high HETEROTYPIC interfacial tension. This is an important distinction that should be clarified.

      We thank the reviewer for this insightful comment. As correctly noted, Brodland’s 2002 work provided a seminal formulation of the Differential Interfacial Tension Hypothesis (DITH), which frames tissue organization in terms of effective interfacial tensions. In its original form, DITH emphasized segregation as a consequence of global differences in the intrinsic (bulk) tensions of juxtaposed tissues.

      While our results specifically show that segregation is determined by local interfacial mechanics between transformed- and host cells, from our experiments with blebbistatin, where we observed lost in segregation upon reducing global contractility, we believe that the differences in local interfacial mechanics also stem from global differences which belong intrinsically to the tissues in discussion here.

      To directly map global interfacial tension, in the revised manuscript, we aim to perform staining with E-cadherin, and actin in the two tissues, and measure cortical actin, stress fibers, and E-cadherin levels at the cell-cell junctions. Once the global tissue mechanics are mapped, we can be more confident about our claim on DITH. Nevertheless, we will also clarify this distinction, more clearly in the text and explicitly state that while DITH provided the foundation for conceptualizing tissue mechanics, our findings on transformed cell- healthy cell interactions specifically demonstrate that segregation is driven by high heterotypic interfacial tension at the tissue boundary.

      (3) Related: The fact that actomyosin accumulates at the heterotypic interface is key here. It would be quite informative to better document the pattern of this accumulation, which is not clear enough from the images of the current manuscript: Are we talking about the actual interface between mutant and wt cells (membrane/cortex of heterotypic contacts)? Or is it more globally overactivated in the whole cell layer along the border? Some better images and some quantification would help.

      We agree that more detailed visualization of actomyosin distribution would strengthen our conclusion. We are currently working on re-imaging the heterotypic interfaces at higher magnification and are quantifying fluorescence intensity of actin and myosin-II along cell–cell boundaries. All of this will be integrated in the next version of the manuscript.

      (4) In the case of Beas2b cells, mutant cells show higher actin than wt cells, while actin is, on the contrary, lower in mutant MCF10a cells (Author response image 2). Has this been taken into account in the model? It may be in line with the idea that HRas may have a different action on the two cell types, a possibility that would certainly be worth considering and discussing.

      Our current vertex model does not explicitly incorporate actin levels; rather, it captures their functional consequences indirectly through effective mechanical parameters such as cortical tension and adhesion strength. Nonetheless, we agree that the opposite trends in actin enrichment between Beas2b and MCF10a HRasV12 mutants raise the important possibility that HRas signaling may act through distinct mechanisms in the two cell types.

      To further investigate this, we are currently culturing MCF10a and Beas2b HRasV12 mutant populations separately (i.e., without wild-type cells) to assess their intrinsic organization and behavior in isolation. These experiments will help us disentangle how HRas activation differentially impacts epithelial architecture in these two cellular contexts, and we will discuss these ongoing efforts in the revised manuscript.

      From the modelling perspective, the model currently does not account for the different actin levels of mutants with respect to wt cells in the two tissues. This can be accounted for by having different  and  for mutants and wt in the two cases in simulation.

      In conclusion, the study conveys an important message, but, as it stands, the strength of evidence is incomplete. It would greatly benefit from a more detailed and complete analysis of the experimental data, a better fit between this analysis and the corresponding vertex model, and a more in-depth discussion of biological and biophysical aspects. These revisions should be rather easily done, and would then make the evidence much more solid.

      Reviewer #2 (Public review):

      Summary:

      The authors investigate the behavior of oncogenic cells in mammary and bronchial epithelia. They observe that individual oncogenic cells are preferentially excluded from the mammary epithelium, but they remain integrated in the bronchial epithelium. They also observe that clusters of oncogenic cells form a compact cluster in the mammary epithelium, but they disperse in the bronchial epithelium. The authors demonstrate experimentally and in the vertex model simulations that the difference in observed behavior is due to the differential tension between the mutant and wild-type cells due to a differential expression of actin and myosin.

      Strengths:

      (1) Very detailed analysis of experiments to systematically characterize and quantify differences between mammary and bronchial epithelia.

      (2) Detailed comparison between the experiments and vertex model simulations to identify the differential cell line tension between the oncogenic and wild-type cells as one of the key parameters that are responsible for the different behavior of oncogenic cells in mammary and bronchial epithelia

      Weaknesses:

      (1) It is unclear what the mechanistic origin of the shape-tension coupling is, which is used in the vertex model, and how important that coupling is for the presented results. The authors claim that the shape-tension coupling is due to the anisotropic distribution of stress fibers when cells are under external stress. It is unclear why the stress fibers should affect an effective line tension on the cell boundaries and why the stress fibers should be sensitive to the magnitude of the internal isotropic cell pressure. In experiments, it makes sense that stress fibers form when cells are stretched. Similar stress fibers form when the cytoskeleton or polymer networks are stretched. It is unclear why the stress fibers should be sensitive to the magnitude of internal isotropic cell pressure. If all the surrounding cells have the same internal pressure, then the cell would not be significantly deformed due to that pressure, and stress fibers would not form. The authors should better justify the use of the shape-tension coupling in the model and also present simulation results without that coupling. I expect that most of the observed behavior is already captured by the differential tension, even if there is no shape-tension coupling. 

      While the segregation behavior can be captured by the differential tension, without the shape-tension coupling, we noticed unjamming and aligned movement of wild type cells at the mutant-cell interface. This was only captured when we incorporated shape tension coupling in the model, suggesting changes in cell shapes due to differential interfacial tension is essential in driving the fate of the mutants.  Below, difference between shape indices of cells at the interface and away from the boundary is plotted versus the interfacial tension in the case of no shape-tension coupling [Author response image 1]. The red dashed line represents the experimental value of the shape index difference. The blue line is the shape index difference between two randomly chosen groups of cells (half of the total number of cells in each group is taken). At zero line-tension, the difference in shape index between interface cells and cells away from the interface is same as that between randomly chosen groups of cells, which is expected since there should be no interface at zero line-tension. The no shape-tension data presented here are averaged over 19 seeds. Although the results without shape-tension coupling reaches experimental values at high enough differential tension [Author response image 2], a closer inspection of the simulation results show that the cells are just squeezed and are aligned perpendicular to the interface, which is contrary to what is seen in experiments.

      Author response image 1.

      Shape indices versus the interfacial line tension<br />

      Calculating the average of the absolute value of the dot product of the nematic director and the interface edge for simulations with and without shape-tension coupling clearly shows that with shape-tension coupling, the cells align and elongate along the interface as is seen in experiment, given by an interface dot product value > 0.5 at high enough line-tension values. Further, shape-tension coupling or biased edge tension has been used before to model for cell elongation during embryo elongation [1] and here we use it as an active line-tension force, which elongates cells along the interface, in addition to the differential tension which is passive. This additional quantification of the alignment and elongation of cells along the interface will be added to the Supplementary Information (SI).

      [1] Dye, N. A., Popović, M., Iyer, K. V., Fuhrmann, J. F., Piscitello-Gómez, R., Eaton, S., & Jülicher, F. (2021). Self-organized patterning of cell morphology via mechanosensitive feedback. Elife, 10, e57964.

      Author response image 2.

      Change in interfacial tension with and without shape tension coupling<br />

      (2) The observed difference of shape indices between the interfacial and bulk cells in simulations in the absence of differential line tension is concerning. This suggests that either there are not enough statistics from the simulations or that something is wrong with the simulations. For all presented simulation results, the authors should repeat multiple simulations and then present both averages and standard deviations. This way, it would be easier to determine whether the observed differences in simulations are statistically significant.

      The reviewer is right in pointing out that statistics for the plots must be shown. The difference in shape indices between the interfacial and bulk cells in simulations has been calculated over 11 different seed values. The observed differences in simulations along with the standard deviations have been plotted below [Author response image 3]. This figure in the paper will be updated to include the standard deviations. The non-zero difference in shape index in the absence of differential line tension for low values of stress threshold is due to the shape-tension coupling acting even at low differential tension. Thus, a non-zero, sufficiently high value of the stress threshold is required in our model with shape-tension coupling, for the model to make sense. This has also been stated in section 4 of the paper. The importance of the stress-tension coupling has been stated in response to the previous point.

      Author response image 3.<br />

      (3) The authors should also analyze the cell line tension data in simulations and make a comparison with experiments.

      We agree with the reviewer that cell line tension data should also be analyzed and compared with experiments. This will be added to the next version of the paper.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors report a new bioinformatics pipeline ("SPICE") to predict pairwise cooperative binding-sites based on input ChIP-seq data for transcription factor (TF)-of-interest, analyzed against DNA-binding sites (DNA motifs) in a database (HOCOMOCO). The pipeline also predicts the optimal distance between the paired binding sites. The pipeline correctly predicted known/reported transcription factor cooperations, and also predicted new cooperations, not yet reported in literature. The authors choose to follow up on the predicted interaction between Ikaros and Jun. Using ChIP-seq in mouse B cells, they show extensive overlap in binding regions between Ikaros and Jun in LPS+IL21 stimulated cells. In a human B-lineage cell line (MINO) they show that anti-Ikaros Ab can co-immunoprecipitate Jun protein, and that the MINO cell extracts contain protein(s) that can bind to the CNS9 probe (conserved region upstream of IL10 gene), and that binding is lost upon mutation of two basepairs in the AP1 binding motif, and reduced upon mutation of two basepairs in the non-canonical Ikaros binding motif. Part of this protein complex is super-shifted with an anti-Jun antibody, and more DNA is shifted with addition of an anti-Ikaros antibody.

      The authors perform EMSA showing that recombinant Jun can bind to the tested DNA-region (IL10 CNS9) and that addition of recombinant Ikaros (or anti-Ikaros antibody in Fig 3E) can enhance binding (increase amount of DNA shifted). The authors lastly show that the IL10 CNS9 DNA region can enhance transcription in B- and T-cells with a luciferase reporter assay, and that 2 bp mutation of the Ikaros or Jun DNA motifs greatly reduce or abolish this activity.

      This is interesting work, with two main contributions: The SPICE pipeline (if made available to the scientific community), and the report of interaction between Ikaros and Jun. However, the distinction between DNA motifs, and the proteins actually binding and having a biological function, should be made clear consistently throughout the manuscript. The same DNA motifs can be bound by multiple factors, for instance within transcription factor families with highly homology in the DNA-binding regions of the proteins.

      The reviewer has correctly assessed the content of our manuscript.

      Some specific points:

      SPICE: It is unclear if this is uploaded somewhere to be available to the scientific community.

      Thanks for this comment. We will upload the SPICE pipeline and its associated scripts (R and shell) via GitHub.

      It was unclear if Ikaros-Jun interaction was initially found from primary Jun ChIP-seq (and secondary Ikaros motif from HOCOMOCO) or from primary Ikaros CHIP-seq (and secondary Jun motif from HOCOMOCO). And - what were the two DNA motifs (primary and secondary, and their distance) from the SPICE analysis?

      The IKZF1-JUN interaction was found from primary JUN ChIP-seq data and searching for secondary IKZF1 motifs identified in the HOCOMOCO database. We will provide the primary and secondary motifs in our revised manuscript.

      Authors have mostly careful considerations and statements. One additional comment is that binding does not equal function (Fig 2D), and that opening of chromatin (by any other factor(s)) can give DNA-binding factors (like Ikaros and Jun) the opportunity to bind, without functional consequence for the biological process studied.

      We appreciate that the reviewer believes our considerations and statement are careful. We agree that opening of chromatin can give the opportunity of factors to bind, and we now make this point in the manuscript.

      Figure 2E: Ikaros is reported to be expressed at baseline in murine B cells, yet the Ikaros ChIP-seq in unstimulated cells had what looks to be no significant or low peaks. LPS stimulation induced strong Ikaros ChIP-seq signal. A western blot showing the Ikaros protein levels in the 3 conditions could help understand if the binding pattern is due to protein expression level induction. Similar for Jun (western in the 3 conditions), which seemed to mainly bind in the LPS+IL21 condition. Furthermore, as also suggested below, tracks showing Ikaros and Jun binding from all conditions (unstimulated, LPS only and LPS+IL21 stimulated cells), at select genomic loci, would be helpful in illustrating this difference in signal between the different cell conditions. This is relevant in regards to the point of cooperativity of binding.

      The main point of the paper was showing functional cooperation and proximity of binding. However, the use of purified JUN and Ikaros protein suggest cooperative binding. Exhaustive evaluation of the JUN-Ikaros association is left for future studies.

      ChIP-seq in mouse B cells showed that Ikaros bound strongly in LPS stimulated cells, in the (relative) absence of Jun binding (Fig. 2C). However, in EMSA (Fig 3C), there is no binding when the AP1 site is mutated, and the authors describe this as Ikaros binding site. What does the Ikaros binding look like at this genomic location in LPS (only) stimulated cells? The authors could show the same figure as in Fig 2F but show Ikaros and Jun ChIP-seq tracks at IL10 CNS9 locus from all conditions to compare binding in unstimulated, LPS and LPS+IL21 cells.

      As requested, we now show Ikaros and Jun ChIP-seq tracks from unstimulated, LPS-treated, and LPS + IL21-treated cells. Both Ikaros and cJUN were bound to the Il10 upstream CNS9 region with LPS treatment of cells (see Author response image 1, highlighted in red box), but binding was weaker than that observed with LPS + IL21.

      Author response image 1.

      Also: How does this reconcile with the luciferase assay in Fig 4E, where LPS (only) stimulation is used, which in Fig 2E only/mainly induced Ikaros, and not Jun ChIP-seq signal (while EMSA indicate Ikaros cannot bind the site alone, but can enhance Jun-dependent binding).

      As shown above, in the LPS (only) condition, both IKZF1 (Ikaros) and cJUN bind to Il10 CNS9 locus. Thus, this is not in conflict with our luciferase assay data in Fig. 4E, which showed Ikaros is dependent on AP-1 binding. Moreover, the AP-1 site in Fig. 4D and 4E can be bound by other AP-1 factors as well, such as JUND, JUNB, BATF, etc. These points can be made in the manuscript. These factors potentially can compete with cJUN binding and their roles remain to be explored.

      Comment on statements in results section: The luciferase assays in B and T cells do not demonstrate the role of the proteins Ikaros or Jun directly (page 10, lines 208 and surrounding text). The assay measures an effect of the DNA sequences (implying binding of some transcription factor(s)), but does not identify which protein factors bind there.

      We agree with the reviewer. It is reasonable and even likely that different family members may be partially redundant. This point is now made on our revised manuscript.

      Lastly, the authors only discuss Ikaros (using the term IKZF1 which is the gene symbol for the Ikaros protein). There are other Ikaros family members that have high homology and that are reported to bind similar DNA sequences (for instance Aiolos and Helios), which are expressed in B-cells and T-cells. A discussion of this is of relevance, as these are different proteins, although belonging to the same family (the Ikaros-family) of transcription factors. For instance, western for Aiolos and Helios will likely detect Aiolos in the B cells used, and Helios in the T cells used.

      We agree with the reviewer. As requested, we now discuss the possibility that Aiolos or Helios may also contribute.

      Reviewer #2 (Public Review):

      The study is performed with old tool Spamo (12 year ago), source data from Encode (2010-2012), even peak caller tool version MACS is old ~ 2013. De novo motif search tool is old too (new one STREME is not mentioned). Any composite element search tool published for the recent 12 years are not cited, there are some issues in data analysis in presentation. Almost all references are from about 8-10 year ago (the most recent date is 2019)

      The title is misleading

      Instead of “A new pipeline SPICE identifies novel JUN-IKZF1 composite elements”

      It should be written as “Application of SpaMo tool identifies novel JUN-IKZF1 composite elements”

      It reflects the pipeline better but honestly shows that the novelty is missed.

      Regarding the above two points, we respectfully disagree with the reviewer. Although SpaMo was used, the pipeline we developed is new and our findings are distinctive. The pipeline can systematically screen and predict novel protein-protein binding complex, and our discovery related to IKZF1-JUN composite element is new and the biological findings and validation are distinctive. This point is now made in the revised manuscript. As requested, we have added some additional references.

      The study was performed on too old data from ENCODE, authors mentioned 343 Encode ChIP-Seq libraries, but authors even did not care even about to set for each library the name of target TF (Figure 1E, Figure S2, Table 2).

      Although we used ENCODE data, which was in part when we initially developed the algorithm, those data are valid and using them allowed us to demonstrate the functionality of SPICE, which is versatile and can be used on datasets of one’s choice as well. As requested, in the revised manuscript we have added the names of the TFs in Figs, Fig. S2, and Table 1.

      Reviewer #3 (Public Review):

      The authors of this study have designed a novel screening pipeline to detect DNA motif spacing preferences between TF partners using publicly available data. They were able to recapitulate previously known composite elements, such as the AP-1/IRF4 composite elements (AICE) and predict many composite elements that are expected to be very useful to the community of researchers interested in dissecting the regulatory logic of mammalian enhancers and promoters. The authors then focus on a novel, SPICE predicted interaction between JUN and IKZF1, and show that under LPS and IL-21 treatment, JUN and IKZF1 in B cells have significant overlap in their genomic localization. Next, to know whether the two TFs physically interact, a co-immunoprecipitation experiment was performed. While JUN immunoprecipitated with an anti-IKZF1 antibody, curiously IKZF1 did not immunoprecipitate with an anti-JUN antibody. Finally, EMSA and luciferase experiments were performed to show that the two TFs bind cooperatively at an IL20 upstream probe.

      The reviewer has described the basic results of the study.

      Major strengths:

      1) SPICE was able to recapitulate previously known composite elements, such as the AP-1/IRF4 composite elements (AICE).

      2) Under LPS and IL-21 treatment, JUN and IKZF1 in B cells have significant overlap in their genomic localization. This is very good supporting evidence for the efficacy of SPICE in detecting TF partners.

      We are glad that the reviewer believes that SPICE is effective in detecting TF partners.

      Major weaknesses:

      1) The authors fail to convincingly show that IKZF1 and Jun physically interact. A quantitative measurement of their interaction strength would have been ideal.

      We agree that it is not conclusive that the factors interact directly as opposed to binding to nearby sites on DNA, which is what SPICE was intended to detect. We never intended to claim that we established a definite physical interaction. The coIP worked in one direction, but not reliably in the other, even though we have tried a total of four different antibodies. We now mention in the revised manuscript that we have tried the additional anti-JUN antibodies, cJun (60A8, CST) and JunD (D17G2, CST).

      2) The super-shift experiment to show that the proteins bound to their EMSA probe were indeed IKZF1 and JUN are not very convincing and would benefit from efforts to quantify the shift (Figure 3E). Nuclear extracts from cells with single or double CRISPR knock outs of the two TFs would have been ideal.

      We agree that using single or double knockouts would be helpful, but other Ikaros family or Jun family members could be involved, so such studies might not be definitive. That is why we used purified proteins to show apparent cooperative binding (Figure 4C).

      3) There is a second band beneath the more prominent band in the EMSA experiment with recombinant IKZF1 and JUN (Figure 4C). This second band is most probably bound by IKZF1 because it becomes weaker when the IKZF1 site is mutated and is completely absent when only JUN is added. This is completely ignored by the authors. Therefore, experiments with EMSA fail to convincingly show that IKZF1 and Jun bind cooperatively. They could just as well bind independently to the two sites.

      The second band has a faster mobility and might relate to IKZF1, although this is difficult to know. We comment on this band on revised manuscript. As noted above, the purified protein experiments do suggest cooperativity. However, our overall intent was to identify factors binding in proximity, which SPICE has successfully done, even if the binding was “independent”.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public Review):

      (1) It is a nice study but lacks some functional data required to determine how useful these alleles will be in practice, especially in comparison with the figure line that stimulated their creation.

      We are grateful for this comment. For the usefulness of these alleles, figure 3 shows that specific and efficient genetic manipulation of one cell subpopulation can be achieved by mating across the DreER mouse strain to the rox-Cre mouse strain. In addition, figure 6 shows that R26-loxCre-tdT can effectively ensure Cre-loxP recombination on some gene alleles and for genetic manipulation. The expression of the tdT protein is aligned with the expression of the Cre protein (Alb roxCre-tdT and R26-loxCre-tdT, figure 2 and figure 5), which ensures the accuracy of the tracing experiments. We believe more functional data can be shown in future articles that use mice lines mentioned in this manuscript.

      (2) The data in Figure 5 show strong activity at the Confetti locus, but the design of the newly reported R26-loxCre line lacks a WPRE sequence that was included in the iSure-Cre line to drive very robust protein expression.

      Thank you for coming up with this point in the manuscript. In the R26-loxCre-tdT mice knock-in strategy, the WPRE sequence is added behind the loxCre-P2A-tdT sequence.

      (3) the most valuable experiment for such a new tool would be a head-to-head comparison with iSure (or the latest iSure version from the Benedito lab) using the same CreER and target foxed allele. At the very least a comparison of Cre protein expression between the two lines using identical CreER activators is needed.

      According to the reviewer’s suggestion, we will compare iSuRe-Cre with R26-loxCre-tdT by using Alb-CreER and target R26-Confetti in the revised manuscript.

      (4) Why did the authors not use the same driver to compare mCre 1, 4, 7, and 10? The study in Figure 2 uses Alb-roxCre for 1 and 7 and Cdh5-roxCre for 4 and 10, with clearly different levels of activity driven by the two alleles in vivo. Thus whether mCre1 is really better than mCre4 or 10 is not clear.

      Thank you for raising this concern. After screening out four robust versions of mCre, we generated these four roxCre knock-in mice. It is unpredictable for us which is the most robust mCre in vivo. It might be one or two mCre versions that work efficiently. For example, if Alb-mCre1 was competitive with Cdh5-mCre10, we can use them for targeting genes in different cell types, broadening the potential utility of these mice.

      (5) Technical details are lacking. The authors provide little specific information regarding the precise way that the new alleles were generated, i.e. exactly what nucleotide sites were used and what the sequence of the introduced transgenes is. Such valuable information must be gleaned from schematic diagrams that are insufficient to fully explain the approach.

      Thank you for your careful suggestions.

      We will provide schematic figures as well as nucleotide sequences for mice generation in the revised manuscript.

      Reviewer #2 (Public Review):

      (1) The scenario where the lines would demonstrate their full potential compared to existing models has not been tested.

      We are grateful for this suggestion. We will compare iSuRe-Cre with R26-loxCre-tdT by using Alb-CreER and target R26-Confetti in the revised manuscript.

      (2) The challenge lies in performing such experiments, as low doses of tamoxifen needed for inducing mosaic gene deletion may not be sufficient to efficiently recombine multiple alleles in individual cells while at the same time accurately reporting gene deletion. Therefore, a demonstration of the efficient deletion of multiple floxed alleles in a mosaic fashion would be a valuable addition.

      Thank you for your constructive comments. Mosaic analysis using sparse labeling and efficient gene deletion would be our future direction using roxCre and loxCre strategies. We will include some discussion of using such strategy in the revised manuscript.

      (3) When combined with the confetti line, the reporter cassette will continue flipping, potentially leading to misleading lineage tracing results.

      Thank you for your professional comments. Indeed, the confetti used in this study can continue flipping, which would lead to potentially misleading lineage tracing results. Our use of R26-Confetti is to demonstrate the robustness of mCre for recombination. Some multiple-color mice lines that don’t flip have been published, for example, R26-Confetti2(10.1038/s41588-019-0346-6) and Rainbow (10.1161/CIRCULATIONAHA.120.045750). These reporters could be used for tracing Cre-expressing cells, without concerns of flipping of reporter cassettes.

      (4) Constitutive expression of Cre is also associated with toxicity, as discussed by the authors in the introduction.

      Thank you for your professional comments. The toxicity of constitutive expression of Cre and the toxicity associated with tamoxifen treatment in CreER mice line (10.1038/s44161-022-00125-6) are known to the field. This study can’t solve the toxicity of the constitutive expression of Cre in this work. Many mouse lines with constitutive Cre driven by different promoters are present across various fields, representing similar toxicity. To solve this issue, it would be possible to construct a new strategy that enables the removal of Cre after its expression.

      Reviewer #3 (Public Review):

      (1) Although leakiness is rather minor according to the original publication and the senior author of the study wrote in a review a few years ago that there is no leakiness(https://doi.org/10.1016/j.jbc.2021.100509).

      Thank you so much for your careful check. In this review (https://doi.org/10.1016/j.jbc. 2021.100509), the writer’s comments on iSuRe-Cre are on the reader's side, and all summary words are based on the original published paper (10.1038/s41467-019-10239-4). Currently, we have tested iSuRe-Cre in our hands. We did detect some leakiness in the heart and muscle, but hardly in other tissues as shown in the following figure.

      Author response image 1.

      Leakiness in Alb CreER;iSuRe-Cre mouse line. Pictures are representative results for 5 mice. Scale bars, white 100 µm.

      (2) I would have preferred to see a study, which uses the wonderful new tools to address a major biological question, rather than a primarily technical report, which describes the ongoing efforts to further improve Cre and Dre recombinase-mediated recombination.

      We gratefully appreciate your valuable comment. The roxCre and loxCre mice mentioned in this study provide more effective methods for inducible genetic manipulation in studying gene function. We hope that the application of our new genetic tools could help address some major biological questions in different biomedical fields in the future.

      (3) Very high levels of Cre expression may cause toxic effects as previously reported for the hearts of Myh6-Cre mice. Thus, it seems sensible to test for unspecific toxic effects, which may be done by bulk RNA-seq analysis, cell viability, and cell proliferation assays. It should also be analyzed whether the combination of R26-roxCre-tdT with the Tnni3-Dre allele causes cardiac dysfunction, although such dysfunctions should be apparent from potential changes in gene expression.

      We are sorry that we mistakenly spelled R26-loxCre-tdT into R26-roxCre-tdT in our manuscript. We have not generated R26-roxCre-tdT mouse line. We also thank the reviewer for concerns about the toxicity of high Cre expression. The toxicity of constitutive expression of Cre and the toxicity of tamoxifen treatment of CreER mice line (10.1038/s44161-022-00125-6) are known to the field. This study can’t solve the toxicity of the constitutive expression of Cre in this work. Many mouse lines with constitutive Cre driven by different promoters are present across various fields, representing similar toxicity. To solve this issue, it would be possible to construct a new strategy that enables the removal of Cre after its expression.

      (4) Is there any leakiness when the inducible DreER allele is introduced but no tamoxifen treatment is applied? This should be documented. The same also applies to loxCre mice.

      In this study, we come up with new mice tool lines, including Alb roxCre1-tdT, Cdh5 roxCre4-tdT, Alb roxCre7-GFP, Cdh5 roxCre10-GFP and R26-loxCre-tdT. As the data shown in supplementary figure 1, supplementary figure 2, and figure 4D, Alb roxCre1-tdT, Cdh5 roxCre4-tdT, Alb roxCre7-GFP, Cdh5 roxCre10-GFP and R26-loxCre-tdT are not leaky. Therefore, if there is any leakiness driven by the inducible DreER or CreER allele, the leakiness is derived from the DreER or CreER. We will supplement relevant experimental data in the revision.

      (5) It would be very helpful to include a dose-response curve for determining the minimum dosage required in Alb-CreER; R26-loxCre-tdT; Ctnnb1flox/flox mice for efficient recombination.

      Thank you for your suggestion. We understand the reviewer’s concern. We can do a dose-response curve in the revision work.

      (6) In the liver panel of Figure 4F, tdT signals do not seem to colocalize with the VE-cad signals, which is odd. Is there any compelling explanation?

      As the file-loading website has a file size limitation, the compressed image results in some signal unclear. The following are the zoom-out figures. The staining in Figure 4F will be optimized and high-resolution images will be provided in the revision.

      Author response image 2.

      (7) The authors claim that "virtually all tdT+ endothelial cells simultaneously expressed YFP/mCFP" (right panel of Figure 5D). Well, it seems that the abundance of tdT is much lower compared to YFP/mCFP. If the recombination of R26-Confetti was mainly triggered by R26-loxCre-tdT, the expression of tdT and YFP/mCFP should be comparable. This should be clarified.

      Thank you so much for your careful check. We checked these signals carefully and didn't find the “much lower” tdT signal. As the file-loading website has a file size limitation, the compressed image results in some signal unclear. We attached clear high resolution images here. The following figure shows how we split the tdT signal and compared it with YFP/mCFP.

      Author response image 3.

      (8) In several cases, the authors seem to have mixed up "R26-roxCre-tdT" with "R26-loxCre-tdT". There are errors in #251 and #256.Furthermore, in the passage from line #278 to #301. In the lines #297 and #300 it should probably read "Alb-CreER; R26-loxCretdT;Ctnnb1flox/flox"" rather than "Alb-CreER;R26-tdT2;Ctnnb1flox/flox".

      We are grateful for these careful observations. We have corrected these typos accordingly.

    1. Author Response

      Reviewer #1 (Public Review):

      We thank the Reviewer for their comments.

      Reviewer #2 (Public Review):

      1) In Figure 4, the authors injected a retrograde tracer in the NA and an anterograde tracer in DCN to find potential "nodes" of overlap. From this experiment, the authors identify the VTA and regions of the thalamus as potential areas of tracer overlap, but it is unclear how many other brain regions were examined. Did the authors jump straight to likely locations of overlap based on previous findings, or were large swaths of the brain examined systematically? If other brain regions were examined, which regions and how was this done? A table listing which brain regions were examined and the presence/intensity of ctb-Alexa568 and GFP fluorescence would be helpful.

      We thank the Reviewer for their comments. Exhaustive characterizations of inputs into nucleus accumbens (NAc) as well as of direct outputs of the deep cerebellar nuclei (DCN) have appeared elsewhere (e.g, Ma et al., 2020 doi: 10.3389/fnsys.2020.00015; Novello et al., 2022 doi: 10.1007/s12311-022-01499-w). Our anatomical investigations with retrograde and anterograde tracers were focused on putative intermediary nodal regions with robust inputs from the DCN, clear outputs to NAc, and limbic functionality. Only a handful of brain regions fulfill these criteria, and from those, we chose to target the VTA and intralaminar thalamus based on the observation that cerebellar activation induces dopamine release in the NAc medial shell and core (Holloway et al., 2019 doi: 10.1007/s12311-019-01074-w; Low et al., 2021 10.1038/s41586-021-04143-5) and on our previous work on DCN projections to the midline thalamus (Jung et al., 2022 doi: 10.3389/fnsys.2022.879634).

      2) In Figure 5, the authors inject AAV1-Cre in DCN and AAV-FLEX-tdTomato in VTA or thalamus. This is an interesting experiment, but controls are missing. An important control is to inject AAV-FLEX-tdTomato in the VTA or thalamus in the absence of AAV1-Cre injection in DCN. Cre-independent expression of tdTomato should be assessed in the VTA/thalamus and the NA.

      We thank the reviewer for bringing up this important control. We routinely perform this control experiment to test for any “leakiness” of floxed vectors prior to use but we did not include it in the paper. In response to the Reviewer’s comment, we show results from this control below. Briefly, we performed a large injection (300 nl) of AAV-FLEX-tdTomato in the thalamus area together with AAV-EGFP (to visualize the injection). No Cre-expressing virus was injected anywhere in the brain. PFA-fixed brain slices were obtained at 3 weeks post-injection and imaged for EGFP and tdTomato. Author Response Figure 1 shows images of the injected thalamus area. No tdTomato expression (Fig. 1C, red) was observed despite abundant EGFP expression (Fig. 1B, green), which confirms that in the absence of Cre the floxed construct does not “leak”.

      Author response image 1.

      (related to Fig. 5 of manuscript). Control experiment for “leakiness” of floxed tdTomato. A, Epifluorescence image of thalamus region in brain slice with EGFP (green) and tdTomato (red) channels merged. Gain settings in the red channel were increased intentionally, to ensure detection of any “leaky” cells. B, Cellular EGFP expression marks successful viral injection. C, No cellular expression of tdTomato without Cre. Note diffuse red signal from background fluorescence.

      Reviewer #3 (Public Review):

      1) The novelty of this paper lies in the mapping of projections from the interposed and the lateral nuclei of the cerebellum, as the authors themselves mention. However, in some of the experiments the medial nucleus is also clearly injected (Fig. 4B and 6B). In those experiments, it is impossible to distinguish which nucleus these projections come from, and they could be the ones from the medial nucleus that were previously described (see above).

      We thank the Reviewer for their comments. As stated in the Results and in the legend of Fig. 4, in addition to experiments with injections in all DCN (Fig. 4B-D), we also performed experiments with injections in only the lateral nucleus (Fig. 4E and F). The results of these experiments support the existence of lateral DCN projections that overlap with NAc-projecting neurons in VTA and thalamus. This finding is further corroborated by our transsynaptic experiments with lateral DCN-only injections (Fig. 5E,F). Regarding the optophysiological experiments, as mentioned in the Results, all DCN were injected (Fig. 6B) in order to maximize ChR2 expression and the chances of successful stimulation of projections.

      2) A strength of the paper is the use of both electrical and optogenetic stimulation. However, the responses to the two in the NAc are very different - electrical stimulation results in both excitation and inhibition, whereas opto stimulation mostly results in only excitation.

      We thank the Reviewer for this comment. At least two different explanations could account for the observed differences in the prevalence of inhibitory responses elicited by optogenetic vs electrical stimulation: i) inhibitory response prevalence is sensitive to stimulation intensity (Table 1 and Fig. 2B). Because of inherent differences between optogenetic and electrical stimulation, it is not possible to directly compare intensities between the two modalities in order to determine at which intensities, if at all, the prevalence of responses should match. The observation that, at least in the VTA, the prevalence of inhibitory responses elicited by 1 mW light intensity (the lowest intensity that we tested) was comparable to the prevalence of inhibitory responses elicited by 100 µA electrical stimulation is in line with this explanation; ii) DCN electrical stimulation is not node-specific. To our knowledge, there is currently no evidence to support mesoscale topographic organization in lateral and interposed DCN that is node-specific. Consequently, electrical stimulation of DCN could, in principle, result in NAc responses through various polysynaptic pathways and/or nodes. This possibility would still exist even if electrical stimulation had limited spread of a few hundred microns (as in our experiments) and is at least partly supported by the observed long latencies of electrically-evoked inhibitory responses (NAcCore: 268 ± 25 ms (10th percentile: 42 ms), NAcMed: 259 ± 14 ms (10th percentile: 60 ms). Our recognition of this intrinsic limitation of DCN electrical stimulation was the impetus behind the node-specific optogenetic experiments.

      3) The stimulation frequency at which the electrical stimulation in Fig 1 is done to identify responses in the NAc is 200 Hz for 25 ms. Is this physiological? In addition, responses in the NAc are measured for 500 ms after, which is a very long response time.

      Regarding stimulation frequency, DCN neurons readily reach firing rates between 100-200 Hz in vivo and ex vivo (e.g., Beekhof et al., 2021 doi.org/10.3390/cells10102686; Sarnaik & Raman, 2018 doi:10.7554/eLife.29546; Canto et al., 2016 doi:10.1371/journal.pone.0165887). Regarding the duration of the response window, at the outset of our experiments we were agnostic to the type of responses that our stimulation protocols would evoke in NAc. We therefore established a response time window that would allow us to capture both fast neurotransmitter-mediated responses as well as neuromodulatory (e.g., dopaminergic) responses, which are known to occur at hundred-millisecond latencies or longer (Wang et al., 2017 doi.org/10.1016/j.celrep.2017.02.062; Chuhma et al., 2014 doi:10.1016/j.neuron.2013.12.027; Gonon, 1997). A posteriori analysis indicated that even if we reduced the response time window by 50%, the ratio of DCN-evoked excitatory vs inhibitory responses in NAc would not change substantially (E/I500: 4.3 vs E/I250: 5).

      4) Previous studies have described how different cell types within the DCN have different downstream projections (Fujita et al. 2020). However, the experiments here bundle together all this known heterogeneity.

      We agree with the Reviewer that dissecting the contributions of specific DCN cell types to NAc circuits is an important next step, which we are excited to undertake in future studies. Here we have broken new ground by identifying for the first time nodes and functional properties of DCN-NAc connectivity. Performing these studies with DCN cell type-specificity was not justified or feasible, given that the molecular identity of participating DCN neurons is currently unknown.

      5) Previous studies have also highlighted the importance of different cell types within the NAc and how input streams are differentially targeted to them. Here, that heterogeneity is also obscured.

      Along the same lines as #4, we agree with the Reviewer about the importance of examining the cellular identity of NAc neurons that participate in DCN-NAc circuitry. We are excited to undertake these examinations using ex vivo approaches, which are well suited to dissect cellular and molecular mechanisms.

      6) In Fig. 4C, E and F, the experiments on overlap between anterograde and retrograde tracers are not particularly convincing - it's hard to see the overlap.

      We thank the reviewer for this comment and have included revised figure panels 4C5, E3, Author response image 1 and Figure 2 below. Single optical sections with altered color scheme and orthogonal confocal views are presented in order to show the overlap between DCN projections and NAc-projecting nodal neurons more clearly. The findings of these imaging experiments are corroborated by the results of our transsynaptic labeling approach (Fig. 5), which we have validated elsewhere (Jung et al., 2022 doi:10.3389/fnsys.2022.879634; and Author response image 1).

      Author response image 2.

      (related to Fig. 4 of manuscript). Co-localization of NAc-projecting neurons with DCN axonal projections in VTA and thalamus. A-D, Single optical sections and orthogonal views are shown. Green: EGFP-expressing DCN axons; white: ctb- Alexa 568; red: anti-TH (A-B; VTA) or NeuN (C-D; thalamus). White arrows in orthogonal views point to regions of overlap.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Cho et al. present a comprehensive and multidimensional analysis of glutamine metabolism in the regulation of B cell differentiation and function during immune responses. They further demonstrate how glutamine metabolism interacts with glucose uptake and utilization to modulate key intracellular processes. The manuscript is clearly written, and the experimental approaches are informative and well-executed. The authors provide a detailed mechanistic understanding through the use of both in vivo and in vitro models. The conclusions are well supported by the data, and the findings are novel and impactful. I have only a few, mostly minor, concerns related to data presentation and the rationale for certain experimental choices.

      Detailed Comments:

      (1) In Figure 1b, it is unclear whether total B cells or follicular B cells were used in the assay. Additionally, the in vitro class-switch recombination and plasma cell differentiation experiments were conducted without BCR stimulation, which makes the system appear overly artificial and limits physiological relevance. Although the effects of glutamine concentration on the measured parameters are evident, the results cannot be confidently interpreted as true plasma cell generation or IgG1 class switching under these conditions. The authors should moderate these claims or provide stronger justification for the chosen differentiation strategy. Incorporating a parallel assay with anti-BCR stimulation would improve the rigor and interpretability of these findings. 

      We will edit the manuscript to be more explicit that total splenic B cells were used in this set-up figure and the rest of the paper. In addition, we will try to perform new experiments to improve this "set-up figure" (and add old and new data for Supplemental Figure presentation). Specifically, we will increase the range of conditions tested - e.g., styles of stimulating proliferation and differentiation - to foster an increased sense of generality. We plan to compare mitogenic stimulation with anti-CD40 to  anti-IgM and to anti-IgM + anti-CD40, all with BAFF, IL-4, and IL-5, bearing in mind excellent work from Aiba et al, Immunity 2006; 24: 259-268, and similar papers. We also will try to present some representative flow cytometric profiles (presumably in new Supplemental Figure panels).

      To be transparent and add to a more open public discussion (using the virtues of this forum, the senior author and colleagues would caution about whether any in vitro conditions exist that warrant complete confidence. That is the reason for proceeding to immunization experiments in vivo. That is not said to cast doubt on our own in vitro data - there are some experiments (such as those of Fig. 1a-c and associated Supplemental Fig. 1) that only can be done in vitro or are better done that way (e.g., because of rapid uptake of early apoptotic B cells in vivo).

      For instance: Well-respected papers use the CD40LB and NB21.2D9 systems to activate B cells and generate plasma cells. Those appear to be BCR-independent and unfortunately, we found that they cannot be used with a.a. deprivation or these inhibitors due to effects on the engineered stroma-like cells. In considering BCR engagement, Reth has published salient points about signaling and concentrations of the Ab, the upshot being that this means of activating mitogenesis and plasma cell differentiation (when the B cells are costimulated via CD40 or TLR(4 or 7/8) is probably more than a bit artificial. Moreover, although Aiba et al, Immunity 2006; 24: 259-268 is a laudable exception, one rarely finds papers using BAFF despite the strong evidence it is an essential part of the equation of B cell regulation in vivo and a cytokine that modulates BCR signaling - in the cultures. 

      (2) In Figure 1c, the DMK alone condition is not presented. This hinders readers' ability to properly asses the glutaminolysis dependency of the cells for the measured readouts. Also, CD138+ in developing PCs goes hand in hand with decreased B220 expression. A representative FACS plot showing the gating strategy for the in vitro PCs should be added as a supplementary figure. Similarly, division number (going all the way to #7) may be tricky to gate and interpret. A representative FACS plot showing the separation of B cells according to their division numbers and a subsequent gating of CD138 or IgG1 in these gates would be ideal for demonstrating the authors' ability to distinguish these populations effectively.

      We agree that exact placement  of divisions deconvolution by FlowJow is more fraught than might be thought forpresentations in many or most papers. For the revision, we will try to add one or several representative FACS plot(s) with old and new data to provide the gating on CTV fluorescence, bearing these points in mind when extending the experiments from ~7 years ago (Fig. 1b, c). With the representative examples of the old data pasted in here, we will aver, however, that using divisions 0-6, and ≥7 was reasonable. 

      Ditto for DMK with normal glutamine. However, in the spirit of eLife transparency lacking in many other journals, this comparison is more fraught than the referee comment would make things seem. The concentration tolerated by cells is highly dependent on the medium and glutamine concentration, and perhaps on rates of glutaminolysis (due to its generation of ammonia). In practice, we find that DMK becomes more toxic to B cells unless glutamine is low or glutaminolysis is restricted. Thus, the concentration of DMK that is tolerated and used in Fig. 1b, c can become toxic to the B cells when using the higher levels of glutamine in typical culture media (2 mM or more) - at which point the "normal conditions + DMK" "control" involves the surviving cells in conditions with far greater cell death and less population expansion than the "low glutamine + DMK". condition. Overall, we appreciate the suggestion to show more DMK data and will work to do so for the earlier proliferation data (shown above) and the new experiments.  

      Author response image 1.

       

      (3) A brief explanation should be provided for the exclusive use of IgG1 as the readout in class-switching assays, given that naïve B cells are capable of switching to multiple isotypes. Clarifying why IgG1 was preferentially selected would aid in the interpretation of the results.

      We will edit the text to be more explicit and harmonize in light of the referee's suggestion that we focus the presentation of serologic data on IgG1 in the immunization experiments.

      [IgG1 provides the strongest signal and hence better signal/noise both in vitro and with the alum-based immunizations that are avatars for the adjuvant used in the majority of protein-based vaccines for humans.]

      (4) The immunization experiments presented in Figures 1 and 2 are well designed, and the data are comprehensively presented. However, to prevent potential misinterpretation, it should be clarified that the observed differences between NP and OVA immunizations cannot be attributed solely to the chemical nature of the antigens - hapten versus protein. A more significant distinction lies in the route of administration (intraperitoneal vs. intranasal) and the resulting anatomical compartment of the immune response (systemic vs. lung-restricted). This context should be explicitly stated to avoid overinterpretation of the comparative findings.

      We agree with the referee and will edit the text accordingly. Certainly, the difference in how the anti-ova response is elicited compared to the anti-NP response in the same mice or with a bit different an immunization regimen might be another factor - or the major factor - that could contribute towards explaining why glutaminolysis was important after ovalbumin inhalations (used because emergence of anti-ova Ab / ASCs is suppressed by the NP hapten after NP-ova immunization) but not needed for the anti-NP response unless Slc2a1 or Mpc2 also was inactivated. Thank you prompting addition of this caveat.

      Nevertheless, it seems fair to note that in Figures 1 and 2, the ASCs and Ab are being analyzed for NP and ova in the same mice, albeit with the NP-specific components not being driven by the inhalations of ovalbumin. With that in mind, when one compares the IgG1 anti-NP ASC and Ab to those for IgG1 anti-ovalbumin (ASC in bone marrow; Ab), the ovalbumin-specific response was reduced whereas the anti-NP response was not.

      (5) NP immunization is known to be an inducer of an IgG1-dominant Th2-type immune response in mice. IgG2c is not a major player unless a nanoparticle delivery system is used. However, the authors arbitrarily included IgG2c in their assays in Figures 2 and 3. This may be confusing for the readers. The authors should either justify the IgG2c-mediated analyses or remove them from the main figures. (It can be added as supplemental information with proper justification). 

      We will rearrange the Figure panels to move the IgM and IgG2c data to Supplemental Figures.

      For purposes of public discourse, we note that the data of previous Figure 3(c, g) show a very strong NP-specific IgG2c response that seems to contradict the concept that IgG2c responses necessarily are weak in this setting, and the important role of IgG2c (mouse - IgG1 in humans) in controlling or clearing various pathogens as well as in autoimmunity. So from the standpoint of providing a better sense of generality to the loss-of-function effects, we continue to think that these measurements are quite important. That said, the main text has many figure panels and as the review notes, the class switching and in vitro ASC generation were done with IL-4 / IgG1-promoting conditions. If possible, we will try to assay in vitro class switching with IFN-g rather than IL-4 but there may not be enough resources (time before lab closure; money).

      [As a collegial aside, we speculate that a greater or lesser IgG2c anti-NP response may arise due to different preparations of NP-carrier obtained from the vendor (Biosearch) having different amounts of TLR (e.g., TLR4) ligand. In any case, the points of presenting the IgG2c (and IgM) data were to push against the limiting boundaries of convention (which risks perpetuating a narrow view of potential outcomes) and make the breadth of results more apparent to readers.

      (6) Similarly, in affinity maturation analyses, including IgM is somewhat uncommon. I do not see any point in showing high affinity (NP2/NP20) IgMs (Figure 3d), since that data probably does not mean much.

      As noted in the reply immediately preceding this one, we appreciate this suggestion from the reviewer and will move the IgM and IgG2c to Supplemental status.

      Nonetheless, in collegial discourse we disagree a bit with the referee in light of our data as well as of work that (to our minds) leads one to question why inclusion of affinity maturation of IgM is so uncommon - as the referee accurately notes. Of course a defect in the capacity to class-switch is highly deleterious in patients but that is not the same as concluding that recall IgM or its affinity is of little consequence.

      In some of the pioneering work back in the 1980's, Bothwell showed that NP-carrier immunization generated hybridomas producing IgM Ab with extensive SHM (~11% of the 18 lineages; ~ 1/3 of the IgM hybridomas) [PMID: 8487778], IgM B cells appear to move into GC, and there is at least a reasonable published basis for the view that there are GC-derived IgM (unswitched) memory B cells (MBC) that would be more likely, upon recall activation, to differentiate into ASCs. [As an example, albeit with the Jenkins lab anti-rPE response, Taylor, Pape, and Jenkins generated quantitative estimates of the numbers of Ag-specific IgM<sup>+</sup>vs switched MBC that were GC-derived (or not). [PMID: 22370719]. While they emphasized that ~90% of  IgM<sup>+</sup> MBC appeared to be GC-independent, their data also indicated that ~1/2 of all GC-derived MBC were IgM<sup>+</sup> rather than switched (their Fig. 8, B vs C; also 8E, which includes alum-PE). And while we immensely respect the referee, we are perhaps less confident that IgM or high-affinity Ag-specific IgM doesn't mean that much, if only because of evidence that localized Ab compete for Ag and may thus influence selective processes [PMCID: PMC2747358; PMID: 15953185; PMID: 23420879; PMID: 27270306].

      (7) Following on my comment for the PC generation in Figure 1 (see above), in Figure 4, a strategy that relies solely on CD40L stimulation is performed. This is highly artificial for the PC generation and needs to be justified, or more physiologically relevant PC generation strategies involving anti-BCR, CD40L, and various cytokines should be shown. 

      In line with our response to point (1), we plan and will try to self-fund testing BCR-stimulated B cells (anti-CD40 to  anti-IgM and to anti-IgM + anti-CD40, all with BAFF, IL-4, and IL-5).

      (8) The effects of CB839 and UK5099 on cell viability are not shown. Including viability data under these treatment conditions would be a valuable addition to the supplementary materials, as it would help readers more accurately interpret the functional outcomes observed in the study. 

      We will add to the supplemental figures to present data that provide cues as to relative viability / survival under the experimental conditions used. [FSC X SSC as well as 7AAD or Ghost dye panels; we also hope to generate new data that include further experiments scoring annexin V staining.]

      (9) It is not clear how the RNA seq analysis in Figure 4h was generated. The experimental strategy and the setup need to be better explained.

      The revised manuscript will include more information (at minimum in the Methods, Legend), and we apologize that in this and a few other instances sufficiency of detail was sacrificed on the altar of brevity.

      [Adding a brief synopsis to any reader before the final version of record, given the many months it will take to generate new data, thoroughly revise the manuscript, etc:

      In three temporally and biologically independent experiments, cultures were harvested 3.5 days after splenic B cells were purified and cultured as in the experiments of Fig. 4a-e. total cellular RNA prepared from the twelve samples (three replicates for each of four conditions - DMSO vehicle control, CB839, UK5099, and CB839 + UK5099) was analyzed by RNA-seq. After the RNA-seq data were initially processed using the pipeline described in the Methods. For panels g & h of Fig 4, DE Seq2 was used to quantify and compare read counts in the three CB839 + UK5099 samples relative to the three independent vehicle controls and identify all genes for which variances yielded P<0.05. In Fig 4g, all such genes for which the difference was 'statistically significant' (i.e., P<0.05) were entered into the Immgen tool and thereby mapped to the B lineage subsets shown in the figure panels (i.e., g, h). In (g), these are displayed using one format, whereas (h) uses the 'heatmap' tool in MyGeneSet.  

      Reviewer #2 (Public review): 

      Summary: 

      In this manuscript, the authors investigate the functional requirements for glutamine and glutaminolysis in antibody responses. The authors first demonstrate that the concentrations of glutamine in lymph nodes are substantially lower than in plasma, and that at these levels, glutamine is limiting for plasma cell differentiation in vitro. The authors go on to use genetic mouse models in which B cells are deficient in glutaminase 1 (Gls), the glucose transporter Slc2a1, and/or mitochondrial pyruvate carrier 2 (Mpc2) to test the importance of these pathways in vivo. 

      Interestingly, deficiency of Gls alone showed clear antibody defects when ovalbumin was used as the immunogen, but not the hapten NP. For the latter response, defects in antibody titers and affinity were observed only when both Gls and either Mpc2 or Slc2a1 were deleted. These latter findings form the basis of the synthetic auxotrophy conclusion. The authors go on to test these conclusions further using in vitro differentiations, Seahorse assays, pharmacological inhibitors, and targeted quantification of specific metabolites and amino acids. Finally, the authors document reduced STAT3 and STAT1 phosphorylation in response to IL-21 and interferon (both type 1 and 2), respectively, when both glutaminolysis and mitochondrial pyruvate metabolism are prevented. 

      Strengths:

      (1) The main strength of the manuscript is the overall breadth of experiments performed. Orthogonal experiments are performed using genetic models, pharmacological inhibitors, in vitro assays, and in vivo experiments to support the claims. Multiple antigens are used as test immunogens--this is particularly important given the differing results. 

      (2) B cell metabolism is an area of interest but understudied relative to other cell types in the immune system. 

      (3) The importance of metabolic flexibility and caution when interpreting negative results is made clear from this study.

      Weaknesses:

      (1) All of the in vivo studies were done in the context of boosters at 3 weeks and recall responses 1 week later. This makes specific results difficult to interpret. Primary responses, including germinal centers, are still ongoing at 3 weeks after the initial immunization. Thus, untangling what proportion of the defects are due to problems in the primary vs. memory response is difficult.

      (2) Along these lines, the defects shown in Figure 3h-i may not be due to the authors' interpretation that Gls and Mpc2 are required for efficient plasma cell differentiation from memory B cells. This interpretation would only be correct if the absence of Gls/Mpc2 leads to preferential recruitment of low-affinity memory B cells into secondary plasma cells. The more likely interpretation is that ongoing primary germinal centers are negatively impacted by Gls and Mpc2 deficiency, and this, in turn, leads to reduced affinities of serum antibodies

      We provisionally plan to edit the wording of the conclusion a bit to add a possibility we consider unlikely to avoid a conclusion that MBCs bearing switched BCRs are affected once reactivated. We also will perform a new experiment to investigate, but unfortunately time before lab closure has been and remains our enemy both for performance and multiple replication of the work presented in Figure 3, panels h & i, and the related Supplemental Data (Supplemental Fig. 3a-j). Unfortunately, it will not be possible to do a memory experiment with recall immunization out at 8 weeks.  Despite the grant funding running out and institutional belt-tightening, however, we'll try to perform a new head-to-head comparison of 4 wk post-immunization with and without the boost at three weeks.

      The intriguing concern (points 1 & 2) provides a springboard for consideration of generalizations and simplifications. Germinal center durability is not at all monolithic, and instead is quite variable**. The premise (cognitive bias, perhaps?) in the interpretation is that in our previous work we find few if any GC B cells - NP-APC-binding or otherwise - above the background (non-immunized controls) three weeks after immunization with NP-ovalbumin in alum. Recognizing that it is not NP-carrier in alum as immunizations, we note for the readers and referee that Fig. 1 of the Taylor, Pape, & Jenkins paper considered above [PMID: 22370719] reported 10-fold more Ag-specific MBCs than GC B cells at day 29 post-immunization (the point at which the boost / recall challenge was performed in our Figure 3h, i).

      Viewed from that perspective, the surmise of the comment is that a major contribution to the differences in both all-affinity and high-affinity anti-NP IgG1 shown in Fig. 3i derives from the immunization at 4 wk stimulating GC B cells we cannot find as opposed to memory B cells. However, it is true that in the literature (especially with the experimentally different approach of transferring BCR-transgenic / knock-in versions of an NP-biased BCR) there may be meaningful pools of IgG1 and IgG2c GC B cells. Alternatively, our current reagents for immunizations may have become better at maintaining GC than those in the past - which we will try to test.

      The issue and question also relate to rates of output of plasma cells or rises in the serum concentrations of class-switched Ab. To this point, our prior experiences agree with the long-published data of the Kurosaki lab in Figure 3c of the Aiba et al paper noted above (Immunity, 2006) (and other such time courses). Readers can note that the IgG1 anti-NP response (alum adjuvant, as in our work) hits its plateau at 2 wk, and did not increase further from 2 to 3 wk. In other words, GC are on the decline and  Ab production has reached its plateau by the time of the 2nd immunization in Fig. 3h). 

      Assuming we understand the comment and line of reasoning correctly, we also lean towards disagreeing with the statement "This interpretation would only be correct if the absence of Gls/Mpc2 leads to preferential recruitment of low-affinity memory B cells into secondary plasma cells." Our evidence shows that both low-affinity as well as high-affinity anti-NP Ab (IgG1) went down as a result of combined gene-inactivation after the peak primary response (Fig. 3i). Recent papers show that affinity maturation is attributable to greater proliferation of plasmablasts with high-affinity BCR. Accordingly, the findings with loss of GLS and MPC function are quite consistent with the interpretation that much of the response after the second immunization draws on MBC differentiation into plasmablasta and then plasma cells, where the proliferative advantage of high-affinity cells is blunted by the impaired metabolism. The provisional plan, however, is to note the alternative, if less likely, interpretation proposed by the review.

      ** In some contexts, of course, especially certain viral infections or vaccination with lipid nanoparticles carrying modified mRNA, germinal centers are far more persistent; also, in humans even the seasonal flu vaccine **

      (3) The gating strategies for germinal centers and memory B cells in Supplemental Figure 2 are problematic, especially given that these data are used to claim only modest and/or statistically insignificant differences in these populations when Gls and Mpc2 are ablated. Neither strategy shows distinct flow cytometric populations, and it does not seem that the quantification focuses on antigen-specific cells.

      We will enhance these aspects of the presentation, using old and hopefully new data, but note for readers that many many other papers in the best journals show plots in which the separation of, say, GC-Tfh from overall Tfh is based on cut-off within what essentially is a continuous spectrum of emission as adjusted or compensated by the cytometer (spectral or conventional).

      Perhaps incorrectly, we omitted presenting data that included the results with NP-APC-staining - in part because within the GC B cell gate the frequencies of NP-binding events (GCB cells) were similar in double-knockout samples and controls. In practice, that would mean that the metabolic requirement applied about equally to NP+ and the total population. We will try to rectify this point in the revision.

      (4) Along these lines, the conclusions in Figure 6a-d may need to be tempered if the analysis was done on polyclonal, rather than antigen-specific cells. Alum induces a heavily type 2-biased response and is not known to induce much of an interferon signature. The authors' observations might be explained by the inclusion of other ongoing GCs unrelated to the immunization. 

      We will make sure the text is clear that the in vitro experiments do not represent GC B cells and that the RNA-seq data were not an Ag (SRBC)-specific subset.

      We also will try to work in a schematic along with expanding the Legends to make it more readily clear that the RNA-seq data (and hence the GSEA) involved immunizations with SRBC (not the alum / NP system which - it may be noted - in these experiments actually generated a robust IgG2c (type 1-driven) response along with the type 2-enhanced IgG1 response.

      Reviewer #3 (Public review): 

      Summary: 

      In their manuscript, the authors investigate how glutaminolysis (GLS) and mitochondrial pyruvate import (MPC2) jointly shape B cell fate and the humoral immune response. Using inducible knockout systems and metabolic inhibitors, they uncover a "synthetic auxotrophy": When GLS activity/glutaminolysis is lost together with either GLUT1-mediated glucose uptake or MPC2, B cells fail to upregulate mitochondrial respiration, IL 21/STAT3 and IFN/STAT1 signaling is impaired, and the plasma cell output and antigen-specific antibody titers drop significantly. This work thus demonstrates the promotion of plasma cell differentiation and cytokine signaling through parallel activation of two metabolic pathways. The dataset is technically comprehensive and conceptually novel, but some aspects leave the in vivo and translational significance uncertain.

      Strengths:

      (1) Conceptual novelty: the study goes beyond single-enzyme deletions to reveal conditional metabolic vulnerabilities and fate-deciding mechanisms in B cells.

      (2) Mechanistic depth: the study uncovers a novel "metabolic bottleneck" that impairs mitochondrial respiration and elevates ROS, and directly ties these changes to cytokine-receptor signaling. This is both mechanistically compelling and potentially clinically relevant.

      (3) Breadth of models and methods: inducible genetics, pharmacology, metabolomics, seahorse assay, ELISpot/ELISA, RNA-seq, two immunization models.

      (4) Potential clinical angle: the synergy of CB839 with UK5099 and/or hydroxychloroquine hints at a druggable pathway targeting autoantibody-driven diseases.

      We agree and thank the referee for the positive comments and this succinct summary of what we view as contributions of the paper.

      Weaknesses: 

      (1) Physiological relevance of "synthetic auxotrophy"

      The manuscript demonstrates that GLS loss is only crippling when glucose influx or mitochondrial pyruvate import is concurrently reduced, which the authors name "synthetic auxotrophy". I think it would help readers to clarify the terminology more and add a concise definition of "synthetic auxotrophy" versus "synthetic lethality" early in the manuscript and justify its relevance for B cells.

      We will edit the Abstract, Introduction, and Discussion to try to do better on this score. Conscious of how expansive the prose and data are even in the original submission, we appear to have taken some shortcuts that we will try to rectify. Thank you for highlighting this need to improve on a key concept!

      That said, we punctiliously & perhaps pedantically encourage readers to be completely accurate, in that under one condition of immunization GLS loss substantially reduced the anti-ovalbumin response (Fig. 1, Fig. 2a-c). And for this provisional response, we will expand a bit on the notion that synthetic auxotrophy represents effects on differentiation that appear to go beyond and not simply to be selective death, even though decreased population expansion is observed and one cannot exclude some contribution of enhanced death in vivo. Finally, we will note that this comment of the review raises interesting semantic questions about what represents "physiological relevance" but leave it at that.

      While the overall findings, especially the subset specificity and the clinical implications, are generally interesting, the "synthetic auxotrophy" condition feels a little engineered.

      One can readily say that CAR-T cells are 'a little engineered' so it is a matter of balancing this perspective of the referee against the strengths they highlight in points 1, 2, and 4. In any case, we will probably try to expand and be more explicit in the Discussion of the revised manuscript.

      In brief, even were the money not all gone, we would not believe that expanding the heft of this already rather large manuscript and set of data would be appropriate. As matters stand, a basic new insight about metabolic flexibility and its limits leads to evidence of a way to reduce generation of Ab and a novel impairment of STAT transcription factor induction by several cytokine receptors. The vulnerability that could be tested in later work on B cell-dependent autoimmunity includes the capacity to test a compound that already has been to or through FDA phase II in patients together with an FDA-approved standard-of-care agent.

      Put a different way, the point is that a basic curiosity to understand why decreasing glucose influx did not have an even more profound effect than what was observed, combined with curiosity as to why glutaminolysis was dispensable in relatively standard vaccine-like models of immunize / boost, provided a springboard to identification of new vulnerabilities. As above, we appreciate being made aware that this point merits being made more explicit in the Discussion of the edited version.

      Therefore, the findings strongly raise the question of the likelihood of such a "double hit" in vivo and whether there are conditions, disease states, or drug regimens that would realistically generate such a "bottleneck".

      Hence, the authors should document or at least discuss whether GC or inflamed niches naturally show simultaneous downregulation/lack of glutamine and/or pyruvate. The authors should also aim to provide evidence that infections (e.g., influenza), hypoxia, treatments (e.g., rapamycin), or inflammatory diseases like lupus co-limit these pathways. 

      Again, we appreciate some 'licensing' to be more expansive and explicit, and will try to balance editing in such points against undue tedium or tendentiously speculative length in the Discussion. In particular, we will note that a clear, simple implication of the work is to highlight an imperative to test CB839 in lupus patients already on hydroxychloroquine as standard-of-care, and to suggest development of UK5099 (already tested many times in mouse models of cancer) to complement glutaminase inhibition. 

      As backdrop, we note that the failure to advance imaging mass spectrometry to the capacity to quantify relative or absolute (via nano-DESI) concentrations of nutrients in localized interstitia is a critical gap in the entire field. Techniques that sample the interstitial fluid of tumor masses or in our case LN as a work-around have yielded evidence that there can be meaningful limitations of glucose and glutamine, but it needs to be acknowledged that such findings may be very model-specific and, as can be the case with cutting-edge science, are not without controversy. That said, yes, we had found that hypoxia reduced glutamine uptake but given the norms of focused, tidy packages only reported on leucine in an earlier paper [PMID27501247; PMCID5161594].

      It would hence also be beneficial to test the CB839 + UK5099/HCQ combinations in a short, proof-of-concept treatment in vivo, e.g., shortly before and after the booster immunization or in an autoimmune model. Likewise, it may also be insightful to discuss potential effects of existing treatments (especially CB839, HCQ) on human memory B cell or PC pools.

      We certainly agree that the suggestions offered in this comment are important next steps and the right approach to test if the findings reported here translate toward the treatment of autoimmune diseases that involve B cells, interferons, and pathophysiology mediated by auto-Ab. As practical points, performance and replication of such studies would take more time than the year allotted for return of a revised manuscript to eLife and in any case neither funds nor a lab remain to do these important studies. 

      Concrete evidence for our concurrence was embodied in a grant application to NIH that was essential for keeping a lab and doing any such studies. [We note, as a suggestion to others, that an essential component of such studies would be to test the effects of these compounds on B cells from patients and mice with autoimmunity]. Perhaps unfortunately for SLE patients, the review panelists did not agree about the importance of such studies. However, it can be hoped that the patent-holder of CB839 (and perhaps other companies developing glutaminase inhibitors) will see this peer-reviewed pre-print and the public dialogue, and recognize how positive results might open a valuable contribution to mitigation of diseases such as SLE.

      (2) Cell survival versus differentiation phenotype

      Claims that the phenotypes (e.g., reduced PC numbers) are "independent of death" and are not merely the result of artificial cell stress would benefit from Annexin-V/active-caspase 3 analyses of GC B cells and plasmablasts. Please also show viability curves for inhibitor-treated cell

      This comment leads us to see that the wording on this point may have been overly terse in the interests of brevity, and thereby open to some misunderstanding. Accordingly, we will expand out the text of the Abstract and elsewhere in the manuscript, to be more clear. In addition, we will add in some data on the point, hopefully including some results of new experiments.

      To clarify in this public context, it is not that an increase in death (along with the reported decrease in cell cycling) can be or is excluded - and in fact it likely exists in vitro. The point is that beyond any such increase, and taking into account division number (since there is evidence that PC differentiation and output numbers involve a 'division-counting' mechanism), the frequencies of CD138+ cells and of ASCs among the viable cells are lower, as is the level of Prdm1-encoded mRNA even before the big increase in CD138+ cells in the population. 

      (3) Subset specificity of the metabolic phenotype

      Could the metabolic differences, mitochondrial ROS, and membrane-potential changes shown for activated pan-B cells (Figure 5) also be demonstrated ex vivo for KO mouse-derived GC B cells and plasma cells? This would also be insightful to investigate following NP-immunization (e.g., NP+ GC B cells 10 days after NP-OVA immunization).

      We agree that such data could be nice and add to the comprehensiveness of the work. We will try to scrounge the resources (time; money; human) to test this roughly as indicated. That said, we would note that the frequencies and hence numbers of NP+ GC B cells are so low that even in the flow cytometer we suspect there will not be enough "events" to rely on the results with DCFDA in the tiny sub-sub-subset. It also bears noting that reliable flow cytometric identification of the small NP-specific plasmablast/plasma cell subset amidst the overall population, little of which arose from immunization or after deletion of the floxed segments in B cells, would potentially be misleading.

      (4) Memory B cell gating strategy

      I am not fully convinced that the memory-B-cell gate in Supplementary Figure 2d is appropriate. The legend implies the population is defined simply as CD19+GL7-CD38+ (or CD19+CD38++?), with no further restriction to NP-binding cells. Such a gate could also capture naïve or recently activated B cells. From the descriptions in the figure and the figure legend, it is hard to verify that the events plotted truly represent memory B cells. Please clarify the full gating hierarchy and, ideally, restrict the MBC gate to NP+CD19+GL7-CD38+ B cells (or add additional markers such as CD80 and CD273). Generally, the manuscript would benefit from a more transparent presentation of gating strategies.

      We will further expand the supplemental data displays to include more of the gating and analytic scheme, and hope to be able to have performed new experiments and analyses (including additional markers) that could mitigate the concern noted here. In addition, we will include flow data from the non-immunized control mice that had been analyzed concurrently in the experiments illustrated in this Figure.

      Although it should be noted that the labeling indicated that the gating included the important criterion that cells be IgD- (Supplemental Fig. 2b), which excludes the vast majority of naive B cells, in principle marginal zone (MZ) B cells might fall within this gate. However, the MZ B population is unlikely to explain the differences shown in Supplemental Fig. 2b-d.

      (5) Deletion efficiency - [The] mRNA data show residual GLS/MPC2 transcripts (Supplementary Figure 8). Please quantify deletion efficiency in GC B cells and plasmablasts.

      Even were there resources to do this, the degree of reduction in target mRNA (Gls; Mpc2) renders this question superfluous.

      Are there likely to be some cells with only one, or even neither, allele converted from fl to D? Yes, but they would be a minor subset in light of the magnitude of mRNA reduction, in contrast to our published observations with Slc2a1. As to plasmablasts and plasma cells, the pre-existing populations make such an analysis misleading, while the scarcity of such cells recoverable with antigen capture techniques is so low as to make both RNA and genomic DNA analyses questionable.

    1. Author Response

      Reviewer #1 (Public Review):

      Comment 1:

      The pharmacological tools used in this study are highly non-selective. Gd3+, used here to block NALCN is actually more commonly used to block TRP channels. 2-APB inhibits not only TRPC channels, but also TRPM and IP3 receptors while stimulating TRPV channels (Bon and Beech, 2013), while FFA actually stimulates TRPC6 channels while inhibiting other TRPCs (Foster et al., 2009).

      We agree with the reviewer that the substances mentioned are not specific. Although we performed shRNA experiments against NALCN and TRPC6, we do plan to use more specific pharmacological modulators for these two channels; for this, L703,606 (the antagonist of NALCN) [1] and larixyl acetate (a potent TRPC6 inhibitor) [2] will be used. Actually, we have completed experiments of using larixyl acetate and the results are shown in Author response image 1.

      Author response image 1.

      Example time-course (A), traces (B) and the summaried data (C) for the effect of larixyl acetate (LA), the antagonist of TRPC6 channel, on the spontaneous firing activity of VTA DA neurons. Paired-sample T test, ** P < 0.01. n is number of neurons recorded and N is number of mice used

      Comment 2:

      The multimodal approach including shRNA knockdown experiments alleviates much of the concern about the non-specific pharmacological agents. Therefore, the author's claim that NALCN is involved in VTA dopaminergic neuron pacemaking is well-supported.

      However, the claim that TRPC6 is the key TRPC channel in VTA spontaneous firing is somewhat, but not completely supported. As with NALCN above, the pharmacology alone is much too non-specific to support the claim that TRPC6 is the TRP channel responsible for pacemaking. However, unlike the NALCN condition, there is an issue with interpreting the shRNA knockdown experiments. The issue is that TRPC channels often form heteromers with TRPC channels of other types (Goel, Sinkins and Schilling, 2002; Strübing et al., 2003). Therefore, it is possible that knocking down TRPC6 is interfering with the normal function of another TRPC channel, such as TRPC7 or TRPC4.

      According with your advice, we plan to perform single-cell qPCR experiments to check the expression level of other TRPC channels, after selective knockdown of TRPC6 in VTA DAT+ neurons, results will be shown later in the revised version. From our single-cell RNA-seq results, TRPC7 and TRPC4 are found not to be present broadly like TRPC6 in the VTA DA neurons, therefore it is possible that knocking down TRPC6 maybe not interfering with the normal function of another TRPC channel, such as TRPC7 or TRPC4.

      Comment 3:

      The claim that TRPC6 channels in the VTA are involved in the depressive-like symptoms of CMUS is supported.

      However, the connection between the mPFC-projecting VTA neurons, TRPC6 channels, and the chronic unpredictable stress model (CMUS) of depression is not well supported. In Figure 2, it appears that the mPFC-projecting VTA neurons have very low TRPC6 expression compared to VTA neurons projecting to other targets. However, in figure 6, the authors focus on the mPFC-projecting neurons in their CMUS model and show that it is these neurons that are no longer sensitive to pharmacological agents non-specifically blocking TRPC channels (2-APB, see above comment). Finally, in figure 7, the authors show that shRNA knockdown of TRPC6 channels (in all VTA dopaminergic neurons) results in depressive-like symptoms in CMUS mice. Due to the low expression of TRPC6 in mPFC-projecting VTA neurons, the author's claims of "broad and strong expression of TRPC6 channels across VTA DA neurons" is not fully supported. Because of the messy pharmacological tools used, it cannot be clamed that TRPC6 in the mPFC-projecting VTA neurons is altered after CMUS. And because the knockdown experiments are not specific to mPFC-projecting VTA neurons, it cannot be claimed that reducing TRPC6 in these specific neurons is causing depressive symptoms.

      The reason we focused on the mPFC-projecting VTA DA neurons is that this pathway is indicated in depressive-like behaviors of the CMUS model[3-5]. Although mPFC-projecting VTA DA neurons seem have lower level of TRPC6, we reason they are still functional there. However, we do agree with the reviewer that the statement “broad and strong expression of TRPC6 channels across VTA DA neurons" is not fully supported. We have changed the statements based on the reviewer suggestion. Furthermore, we also plan to selectively knockdown TRPC6 in the mPFC-projecting VTA DA neurons, and then study the behavior.

      Comment 4:

      It is important to note that the experiments presented in Figure 1 have all been previously performed in VTA dopaminergic neurons (Khaliq and Bean, 2010) including showing that low calcium increases VTA neuron spontaneous firing frequency and that replacement of sodium with NMDG hyperpolarizes the membrane potential.

      We agree with reviewer that similar experiments have been performed previously [6]for the flow of our manuscript and for general readers.

      Comment 5:

      The authors explanation for the increase in firing frequency in 0 calcium conditions is that calcium-activated potassium channels would no longer be activated. However, there is a highly relevant finding that low calcium enhances the NALCN conductance through the calcium sensing receptor from Dejian Ren's lab (Lu et al., 2010) which is not cited in this paper. This increase in NALCN conductance with low calcium has been shown in SNc dopaminergic neurons (Philippart and Khaliq, 2018), and is likely a factor contributing to the low-calcium-mediated increase in spontaneous VTA neuron firing.

      We agree with the reviewer and thanks for the suggestions. A discussion for this has been added.

      Comment 6:

      One of the only demonstrations of the expression and physiological significance of TRPCs in VTA DA neurons was published by (Rasmus et al., 2011; Klipec et al., 2016) which are not cited in this paper. In their study, TRPC4 expression was detected in a uniformly distributed subset of VTA DA neurons, and TRPC4 KO rats showed decreased VTA DA neuron tonic firing and deficits in cocaine reward and social behaviors.

      We thank the reviewer for the suggestion.The references and a discussion for this has been added.

      Comment 7:

      Out of all seven TRPCs, TRPC5 is the only one reported to have basal/constitutive activity in heterologous expression systems (Schaefer et al., 2000; Jeon et al., 2012). Others TRPCs such as TRPC6 are typically activated by Gq-coupled GPCRs. Why would TRPC6 be spontaneously/constitutively active in VTA DA neurons?

      In a complex neuronal environment where VTA DA neurons are located, multiple modulatory factors including the GPCRs could be dynamically active, this could lead to the activation of TRP channels including TRPC6.

      Comment 8:

      A new paper from the group of Myoung Kyu Park (Hahn et al., 2023) shows in great detail the interactions between NALCN and TRPC3 channels in pacemaking of SNc DA neurons.

      The reference mentioned has been added. We thank the reviewer.

      Reviewer #2 (Public Review):

      Comment 1:

      These results do not show that TRPC6 mediates stress effects on depression-like behavior. As stated by the authors in the first sentence of the final paragraph, "downregulation of TRPC6 proteins was correlated with reduced firing activity of the VTA DA neurons, the depression-like behaviors, and that knocking down of TRPC6 in the VTA DA neurons confer the mice with depression behaviors." Therefore, the results show associations between TRPC6 downregulation and stress effects on behavior, occlusion of the effects of one by the other on some outcome measures, and cell manipulation effects that resemble stress effects. There is no experiment that shows reversal of stress effects with cell/circuit-specific TRPC6 manipulations. Please adjust the title, abstract and interpretation accordingly.

      We agree with the reviewer’s suggestion. The title was changed to ‘’The cation channel mechanisms of subthreshold inward depolarizing currents in the VTA dopaminergic neurons and their roles in the chronic stress-induced depression-like behavior” and the abstract and interpretation were also adjusted accordingly.

      Comment 2:

      Statistical tests and results are unclear throughout. For all analyses, please report specific tests used, factors/groups, test statistic and p-value for all data analyses reported. In some cases, the chosen test is not appropriate. For example, in Figure 6E, it is not clear how an experiment with 2 factors (stress and drug) can be analyzed with a 1-way RM ANOVA. The potential impact of inappropriate statistical tests on results makes it difficult to assess the accuracy of data interpretation.

      We have redone the statistical analysis as suggested by the reviewer and added specific tests used, factors/groups, test statistic and p-value for all data analyses into the revised manuscript.

      Comment 3:

      Why were only male mice used? Please justify and discuss in the manuscript. Also, change the title to reflect this.

      Although most similar previous studies used male mice or rats[7, 8], we do agree with the reviewer that the female animals should also be tested, in consideration possible role of sex hormones, as such we plan to repeat some key experiments on female mice.

      Comment 4:

      Number of recorded cells is very low in Figure 1. Where in VTA did recordings occur? Given the heterogeneity in this brain region, this n may be insufficient. Additional information (e.g., location within VTA, criteria used to identify neurons) should be included. Report the number of mice (i.e., n = 6 cells from X mice) in all figures.

      Yes indeed, the number here is not high. More experiments will be performed to increase the N/n number. And the location of recorded cells in VTA and the number of used mice are now shown in all figures; criteria to identify neurons is stated in the Methods- Identification of DA neurons and electrophysiological recordings. At the end of electrophysiological recordings, the recorded VTA neurons were collected for single-cell PCR. VTA DA neurons were identified by single-cell PCR for the presence of TH and DAT.

      Comment 5:

      Authors refer to VTA DA neurons as those that are DAT+ in line 276, although TH expression is considered the standard of DAergic identity, and studies (e.g., Lammel et al, 2008) have shown that a subset of VTA DA neurons have low levels of DAT expression. Authors should reword/clarify that these are DAT-expressing VTA DA neurons.

      The study published by Lammel[9] in 2015 has shown the low dopamine specificity of transgene expression in ventral midbrain of TH-Cre mice; on the other hand, DAT-Cre mice exhibit dopamine-specific Cre expression patterns, although DAT-Cre mice are likely to suffer from their own limitations (for example, low DAT expression in mesocortical DA neurons may make it difficult to target this subpopulation, see Lammel et al., 2008[10]). Hence, in our study, the DAT was used as criteria to identify DAT neurons. Of course, TH and DAT were all tested in single-cell PCR to identify whether the recorded cells were DA neurons.

      Comment 6:

      Neuronal subtype proportions should be quantified and reported (Fig. 1Aii).

      Neuronal subtype proportions are now quantified and reported in Fig. 1Aii.

      Comment 7:

      In addition to reporting projection specificity of neurons expressing specific channels, it would be ideal to report these data according to spatial location in VTA.

      The spatial location of recorded cells in VTA are now shown in all figures.

      Comment 8:

      The authors state that there are a small number of Glut neurons in VTA, then they state that a "significant proportion" of VTA neurons are glutamatergic.

      Thanks, “a significant proportion of neurons” has been changed to “ less than half of sequenced DA neurons”.

      Comment 9:

      It is an overstatement that VTA DA neurons are the key determinant of abnormal behaviors in affective disorders.

      Thanks, we have amended the statement to that “Dopaminergic (DA) neurons in the ventral tegmental area (VTA) play an important role in mood, reward and emotion-related behaviors”.

      Reviewer #3 (Public Review):

      Comment 1:

      The authors of this study have examined which cation channels specifically confer to ventral tegmental area dopaminergic neurons their autonomic (spontaneous) firing properties. Having brought evidence for the key role played by NALCN and TRPC6 channels therein, the authors aimed at measuring whether these channels play some role in so-called depression-like (but see below) behaviors triggered by chronic exposure to different stressors. Following evidence for a down-regulation of TRPC6 protein expression in ventral tegmental area dopaminergic cells of stressed animals, the authors provide evidence through viral expression protocols for a causal link between such a down-regulation and so-called depression-like behaviors. The main strength of this study lies on a comprehensive bottom-up approach ranging from patch-clamp recordings to behavioral tasks. However, the interpretation of the results gathered from these behavioral tasks might also be considered one main weakness of the abovementioned approach. Thus, the authors make a confusion (widely observed in numerous publications) with regard to the use of paradigms (forced swim test, tail suspension test) initially aimed (and hence validated) at detecting the antidepressant effects of drugs and which by no means provide clues on "depression" in their subjects. Indeed, in their hands, the authors report that stress elicits changes in these tests which are opposed to those theoretically seen after antidepressant medication. However, these results do not imply that these changes reflect "depression" but rather that the individuals under scrutiny simply show different responses from those seen in nonstressed animals. These limits are even more valid in nonstressed animals injected with TRPC6 shRNAs (how can 5-min tests be compared to a complex and chronic pathological state such as depression?). With regard to anxiety, as investigated with the elevated plus-maze and the open field, the data, as reported, do not allow to check the author's interpretation as anxiety indices are either not correctly provided (e.g. absolute open arm data instead of percents of open arm visits without mention of closed arm behaviors) or subjected to possible biases (lack of distinction between central and peripheral components of the apparatus).

      We agree with the reviewer that behavior tests we used here is debatable whether they represent a real depression state, and this is an open question that could be discussed from different respective. Since these testes (forced swimming and tail suspension), as the reviewer noted, were “widely observed in numerous publications”, we used these seemly only options to reflect a “depression-like” state. One could argue that since these testes were initially used for testing antidepressants (“validated”), with decreased immobility time as indications of anti-depressive effects, why not an increased immobility time reflect a “depression-like” state. As for anxiety tests, both absolute time in open and closed arms are now provided.

    1. Author response:

      Responses to Editors:

      We appreciate Reviewer 1’s first concern regarding the difficulty of disentangling the contributions of tightly-coupled brain regions to the speech-gesture integration process—particularly due to the close temporal and spatial proximity of the stimulation windows and the potential for prolonged disruption. We would like to provide clarification and evidence supporting the validity of our methodology.

      Our previous study (Zhao et al., 2021, J. Neurosci) employed the same experimental protocol—using inhibitory double-pulse transcranial magnetic stimulation (TMS) over the inferior frontal gyrus (IFG) and posterior middle temporal gyrus (pMTG) in one of eight 40-ms time windows. The findings from that study demonstrated a time-window-selective disruption of the semantic congruency effect (i.e., reaction time costs driven by semantic conflict), with no significant modulation of the gender congruency effect (i.e., reaction time costs due to gender conflict). This result establishes that double-pulse TMS provides sufficient temporal precision to independently target the left IFG and pMTG within these 40-ms windows during gesture-speech integration. Importantly, by comparing the distinctively inhibited time windows for IFG and pMTG, we offered clear evidence of distinct engagement and temporal dynamics between these regions during different stages of gesture-speech semantic processing.

      Furthermore, we reviewed prior studies utilizing double-pulse TMS on structurally and functionally connected brain regions to explore neural contributions across timescales as brief as 3–60 ms. These studies, which encompass areas from the tongue and lip areas of the primary motor cortex (M1) to high-level semantic regions such as the pMTG and ATL (Author response table 1), consistently demonstrate the methodological rigor and precision of double-pulse TMS in disentangling the neural dynamics of different regions within these short temporal windows.

      Author response table 1.

      Double-pulse TMS studies on brain regions over 3-60 ms time interval

      Response to Reviewer #1:

      (1) For concern on the difficulty of disentangling the contributions of tightly-coupled brain regions to the speech-gesture integration process:

      We trust that the explanation provided above has clarified this issue.

      (2) For concern on the rationale for delivering HD-tDCS/TMS in set time windows for each region, as well as how these time windows were determined and how the current results compare to our previous studies from 2018 and 2023:

      The current study builds on a series of investigations that systematically examined the temporal and spatial dynamics of gesture-speech integration. In our earlier work (Zhao et al., 2018, J. Neurosci), we demonstrated that interrupting neural activity in the IFG or pMTG using TMS selectively disrupted the semantic congruency effect (reaction time costs due to semantic incongruence), without affecting the gender congruency effect (reaction time costs due to gender incongruence). These findings identified the IFG and pMTG as critical hubs for gesture-speech integration. This informed the brain regions selected for subsequent studies.

      In Zhao et al. (2021, J. Neurosci), we employed a double-pulse TMS protocol, delivering stimulation within one of eight 40-ms time windows, to further examine the temporal involvement of the IFG and pMTG. The results revealed time-window-selective disruptions of the semantic congruency effect, confirming the dynamic and temporally staged roles of these regions during gesture-speech integration.

      In Zhao et al. (2023, Frontiers in Psychology), we investigated the semantic predictive role of gestures relative to speech by comparing two experimental conditions: (1) gestures preceding speech by a fixed interval of 200 ms, and (2) gestures preceding speech at its semantic identification point. We observed time-window-selective disruptions of the semantic congruency effect in the IFG and pMTG only in the second condition, leading to the conclusion that gestures exert a semantic priming effect on co-occurring speech. These findings underscored the semantic advantage of gesture in facilitating speech integration, further refining our understanding of the temporal and functional interplay between these modalities.

      The design of the current study—including the choice of brain regions and time windows—was directly informed by these prior findings. Experiment 1 (HD-tDCS) targeted the entire gesture-speech integration process in the IFG and pMTG to assess whether neural activity in these regions, previously identified as integration hubs, is modulated by changes in informativeness from both modalities (i.e., entropy) and their interactions (mutual information, MI). The results revealed a gradual inhibition of neural activity in both areas as MI increased, evidenced by a negative correlation between MI and the tDCS inhibition effect in both regions. Building on this, Experiments 2 and 3 employed double-pulse TMS and event-related potentials (ERPs) to further assess whether the engaged neural activity was both time-sensitive and staged. These experiments also evaluated the contributions of various sources of information, revealing correlations between information-theoretic metrics and time-locked brain activity, providing insights into the ‘gradual’ nature of gesture-speech integration.

      We acknowledge that the rationale for the design of the current study was not fully articulated in the original manuscript. In the revised version, we will provide a more comprehensive and coherent explanation of the logic behind the three experiments, ensuring clear alignment with our previous findings.

      (3) For concern about the use of Pearson correlation and the normality of EEG data.

      We appreciate the reviewer’s thoughtful consideration. In Figure 5 of the manuscript, we have already included normal distribution curves that illustrate the relationships between the average ERP amplitudes within each ROI or elicited clusters and the three information models. Additionally, multiple comparisons were addressed using FDR correction, as outlined in the manuscript.

      To further clarify the data, we will calculate the Shapiro-Wilk test, a widely accepted method for assessing bivariate normality, for both the MI/entropy and averaged ERP data. The corresponding p-values will be provided in the following-up point-to-point responses.

      (4) For concern about the ROI selection, and the suggestion of using whole-brain electrodes to build models of different variables (MI/entropy) to predict neural responses:

      For the EEG data, we conducted both a traditional region-of-interest (ROI) analysis, with ROIs defined based on a well-established work (Habets et al., 2011), and a cluster-based permutation approach, which utilizes data-driven permutations to enhance robustness and address multiple comparisons. The latter method complements the hypothesis-driven ROI analysis by offering an exploratory, unbiased perspective. Notably, the results from both approaches were consistent, reinforcing the reliability of our findings.

      To make the methods more accessible to a broader audience, we will provide a clear description of the methods used and how they relate to each other in the revised manuscript.

      Reference:

      Habets, B., Kita, S., Shao, Z.S., Ozyurek, A., and Hagoort, P. (2011). The Role of Synchrony and Ambiguity in Speech-Gesture Integration during Comprehension. J Cognitive Neurosci 23, 1845-1854. 10.1162/jocn.2010.21462

      (5) For concern about the median split of the data:

      To identify ERP components or spatiotemporal clusters that demonstrated significant semantic differences, we split each model into higher and lower halves, focusing on indexing information changes reflected by entropy or mutual information (MI). To illustrate the gradual activation process, the identified components and clusters were further analyzed for correlations with each information matrix. Remarkably, consistent results were observed between the ERP components and clusters, providing robust evidence that semantic information conveyed through gestures and speech significantly influenced the amplitude of these components or clusters. Moreover, the semantic information was shown to be highly sensitive, varying in tandem with these amplitude changes.

      We acknowledge that the rationale behind this approach may not have been sufficiently clear in the initial manuscript. In our revision, we will ensure a more detailed and precise explanation to enhance the clarity and coherence of this logical framework.

      Response to Reviewer #2:

      We greatly appreciate Reviewer2 ’s concern regarding whether "mutual information" adequately captures the interplay between the meanings of speech and gesture. We would like to clarify that the materials used in the present study involved gestures performed without actual objects, paired with verbs that precisely describe the corresponding actions. For example, a hammering gesture was paired with the verb “hammer”, and a cutting gesture was paired with the verb “cut”. In this design, all gestures conveyed redundant meaning relative to the co-occurring speech, creating significant overlap between the information derived from speech alone and that from gesture alone.

      We understand the reviewer’s concern about cases where gestures and speech may provide complementary rather than redundant information. To address this, we have developed an alternative metric for quantifying information gains contributed by supplementary multisensory cues, which will be explored in a subsequent study. However, for the present study, we believe that the observed overlap in information serves as an indicator of the degree of multisensory convergence, a central focus of our investigation.

      Regarding the reviewer’s concern about how the neural processes of speech-gesture integration may change with variations in the relative timing between speech and gesture stimuli, we would like to highlight findings from our previous study (Zhao, 2023, Frontiers in Psychology). In that study, we explored the semantic predictive role of gestures relative to speech under two conditions: (1) gestures preceding speech by a fixed interval of 200 ms, and (2) gestures preceding speech of its semantic identification point. Interestingly, only in the second condition did we observe time-window-selective disruptions of the semantic congruency effect in the IFG and pMTG. This led us to conclude that gestures play a semantic priming role for co-occurring speech. Building on this, we designed the present study with gestures preceding speech of its semantic identification point to reflect this semantic priming relationship. Additionally, ongoing research is exploring gesture and speech interactions in natural conversational settings to investigate whether the neural processes identified here are consistent across varying contexts.

      To prevent any similar concerns from causing doubt among the audience and to ensure clarity regarding the follow-up study, we will provide a detailed discussion of the two issues in the revised manuscript.

      Response to Reviewer #3:

      The primary aim of this study is to investigate whether the degree of activity in the established integration hubs, IFG and pMTG, is influenced by the information provided by gesture-speech modalities and/or their interactions. While we provided evidence for the differential involvement of the IFG and pMTG by delineating their dynamic engagement across distinct time windows of gesture-speech integration and associating these patterns with unisensory information and their interaction, we acknowledge that the mechanisms underlying these dynamics remain open to interpretation. Specifically, whether the observed effects stem from difficulties in semantic control processes, as suggested by Reviewer 3, or from resolving information uncertainty, as quantified by entropy, falls outside the scope of the current study. Importantly, we view these two interpretations as complementary rather than mutually exclusive, as both may be contributing factors. Nonetheless, we agree that addressing this question is a compelling avenue for future research. In the revised manuscript, we will include an exploratory analysis to investigate whether the confounding difficulty, stemming from the number of lexical or semantic representations, is limited to high-entropy items. Additionally, we will address and discuss alternative interpretations.

      Regarding the concern of conceptual equivocation, we would like to emphasize that this study represents the first attempt to focus on the relationship between information quantity and neural engagement. In our initial presentation, we inadvertently conflated the commonly used term "graded hub," which refers to anatomical distribution, with its usage in the present context. We sincerely apologize for this oversight and are grateful for the reviewer’s careful critique. In the revised manuscript, we will clearly articulate the study’s objectives, clarify the representations of entropy and mutual information, and accurately describe their association with neural engagement.

      Reference

      Teige, C., Mollo, G., Millman, R., Savill, N., Smallwood, J., Cornelissen, P. L., & Jefferies, E. (2018). Dynamic semantic cognition: Characterising coherent and controlled conceptual retrieval through time using magnetoencephalography and chronometric transcranial magnetic stimulation. Cortex, 103, 329-349.

      Amemiya, T., Beck, B., Walsh, V., Gomi, H., & Haggard, P. (2017). Visual area V5/hMT+ contributes to perception of tactile motion direction: a TMS study. Scientific reports, 7(1), 40937.

      Muessgens, D., Thirugnanasambandam, N., Shitara, H., Popa, T., & Hallett, M. (2016). Dissociable roles of preSMA in motor sequence chunking and hand switching—a TMS study. Journal of Neurophysiology, 116(6), 2637-2646.

      Vernet, M., Brem, A. K., Farzan, F., & Pascual-Leone, A. (2015). Synchronous and opposite roles of the parietal and prefrontal cortices in bistable perception: a double-coil TMS–EEG study. Cortex, 64, 78-88.

      Pitcher, D. (2014). Facial expression recognition takes longer in the posterior superior temporal sulcus than in the occipital face area. Journal of Neuroscience, 34(27), 9173-9177.

      Bardi, L., Kanai, R., Mapelli, D., & Walsh, V. (2012). TMS of the FEF interferes with spatial conflict. Journal of cognitive neuroscience, 24(6), 1305-1313.

      D’Ausilio, A., Bufalari, I., Salmas, P., & Fadiga, L. (2012). The role of the motor system in discriminating normal and degraded speech sounds. Cortex, 48(7), 882-887.

      Pitcher, D., Duchaine, B., Walsh, V., & Kanwisher, N. (2010). TMS evidence for feedforward and feedback mechanisms of face and body perception. Journal of Vision, 10(7), 671-671.

      Gagnon, G., Blanchet, S., Grondin, S., & Schneider, C. (2010). Paired-pulse transcranial magnetic stimulation over the dorsolateral prefrontal cortex interferes with episodic encoding and retrieval for both verbal and non-verbal materials. Brain Research, 1344, 148-158.

      Kalla, R., Muggleton, N. G., Juan, C. H., Cowey, A., & Walsh, V. (2008). The timing of the involvement of the frontal eye fields and posterior parietal cortex in visual search. Neuroreport, 19(10), 1067-1071.

      Pitcher, D., Garrido, L., Walsh, V., & Duchaine, B. C. (2008). Transcranial magnetic stimulation disrupts the perception and embodiment of facial expressions. Journal of Neuroscience, 28(36), 8929-8933.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      This is an interesting study on the role of FGF signaling in the induction of primitive streak-like cells (PS-LC) in human 2D-gastruloids. The authors use a previously characterized standard culture that generates a ring of PS-LCs (TBXT+) and correlate this with pERK staining. A requirement for FGF signaling in TBXT induction is demonstrated via pharmacological inhibition of MEK and FGFR activity. A second set of culture conditions (with no exogenous FGFs) suggests that endogenous FGFs are required for pERK and TBXT induction. The authors then characterize, via scRNA-seq, various components of the FGF pathway (genes for ligands, receptors, ERK regulators, and HSPG regulation). They go on to characterize the pFGFR1, receptor isoforms, and polarized localization of this receptor. Finally, they perform FGF4 inhibition and use a cell line with a limited FGF17 inactivation (heterozygous null) and show that loss of these FGFs reduces PS-LC and derivative cell types.

      Strengths:

      (1) As the authors point out, the role of FGF signaling in gastrulation is less well understood than other signaling pathways. Hence this is a valuable contribution to that field.

      (2) The FGF4 and FGF17 loss-of-function experiments in Figure 5 are very intriguing. This is especially so given the intriguing observation that these FGFs appear to be dominating in this model of human gastrulation, in contrast to what FGFs dominate in mice, chicks, and frogs.

      (3) In general this paper is valuable as a further development of the Human gastruloid system and the role of FGF signaling in the induction of PS-CLs. The wide net that the authors cast in characterizing the FGF ligand gene, receptor isoforms, and downstream components provides a foundation for future work. As the authors write near the beginning of the Discussion "Many questions remain."

      We thank the reviewer for these positive comments.

      Weaknesses:

      (1) FGFs are cell survival factors in various aspects of development. The authors fail to address cell death due to loss of FGF signaling in their experiments. For example, in Figure 1E (which requires statistical analysis) and 1G (the bottom FGFRi row), there appears to be a significant amount of cell loss. Is this due to cell death? The authors should address the question of whether the role of FGF/ERK signaling is to keep the cells alive.

      Indeed, FGF also strongly affects cell number and it is an interesting question to what extent this depends on ERK. Our manuscript focuses instead on the role of FGF/ERK signaling in cell fate patterning. However, as mentioned in our discussion, figure 1de show that doxycycline induced pERK leads to more TBXT+ cells than the control without restoring cell number, suggesting the role of FGF in controlling cell number is independent of the requirement for FGF/ERK in PS-LC differrentiation. Unpublished data below showing a MEK inhibitor dose response further supports this: low doses of MEKi are sufficient to inhibit differentiation without affecting cell number. To address the reviewer’s question we will include this data in the revised manuscript and perform several additional experiments to determine in more detail how cell death and proliferation depend on FGF.

      Author response image 1.

      MEK affects differentiation and cell number at different doses. a-c) control and MEKi (0.3uM) treated colonies with similar cell number but different TBXT expression. d-f) quantification of cell number per colonies (d), percentage of TBXT-positive cell per colony (e), and the distribution of pERK intensities for different doses of MEK inhibitor (f). N>6 colonies per condition. MEKi = PD0325901. Scalebar = 50 micron.

      (2) Regarding the sparse cells in 1G, is there a reduction in cell number only with FGFRi and not MEKi? Is this reproducible? Gattiglio et al (Development, 2023, PMID: 37530863) present data supporting a "community effect" in the FGF-induced mesoderm differentiation of mouse embryonic stem cells. Could a community effect be at play in this human system (especially given the images in the bottom row of 1G)? If the authors don't address this experimentally they should at least address the ideas in Gattoglio et al.

      Indeed, FGFRi reproducibly affects cell number more than MEKi, in line with the fact that pathways downstream of FGF other than MAPK/ERK (e.g. PI3K) play important roles in cell survival and growth. We think the lack of differentiation in MEKi and FGFRi in Fig.1g cannot be attributed to a loss of cells combined with a community effect. This is because without FGFRi or MEKi cells also differentiate to primitive streak at much lower densities than those shown, consistent with the data we show above in response to (1), which argue against a primarily indirect effect of FGF on PS-LC differentiation through cell density. In the context of directed differentiation (rather than 2D gastruloids), we will show this in a controlled manner by repeating the experiment in Fig.1g while adjusting cell seeding densities to obtain similar final cell densities in all three conditions. We will also include Gattoglio et al. in our revised discussion.

      (3) Do the FGF4 and FGF17 LOF experiments in Figure 5 affect cell numbers like FGFRi in Figure 1?

      It seems the effect on cell number is small but we will analyze this carefully and include it in the revised manuscript. A small effect would be consistent with our unpublished data below showing a near uniform proliferation rate. This in turn suggests that low levels of pERK in the center are sufficient to maintain proliferation there while the much higher pERK levels in the PS-LC ring (that we think depend on FGF4 and FGF17) do not signifcantly increase the proliferation rate (see Fig.1 in the manuscript for the pERK pattern). Thus, loss of high pERK in PS-LC ring while maintaining low pERK throughout would not be expected to have a major impact on cell number but would impact differentiation. In contrast, loss of all FGF signaling through FGFRi does dramatically affect cell number. This is again consistent with the data provided in response to (1) showing that ERK levels can be reduced to a point where PS-LC differentiation is lost without significantly affecting cell number. We will include the data below in the revised manuscript.

      Author response image 2.

      Why examine PS-LC induction only in FGF17 heterozygous cells and not homozygous FGF17 nulls?

      We were unable to obtain homozygous FGF17 nulls, it is not clear if there is a reason for this. We will try again and otherwise attempt to corroborate our findings with further knockdown data.

      (4) The idea that FGF8 plays a dominant role during gastrulation of other species but not humans is so intriguing it warrants deeper testing. The authors dismiss FGF8 because its mRNA "...levels always remained low." (line 363) as well as the data published in Zhai et al (PMID: 36517595) and Tyser et al (PMID: 34789876). But there are cases in mouse development where a gene was expressed at levels so low, that it might be dismissed, and yet LOF experiments revealed it played a role or even was required in a developmental process. The authors should consider FGF8 inhibition or inactivation to explore its potential role, despite its low levels of expression.

      We agree with the reviewer that FGF8 is worth investigating further and we will now pursue this.

      (5) Redundancy is a common feature in FGF genetics. What is the effect of inhibiting FGF4 in FGF17 LOF cells?

      We will attempt to do the experiment the reviewer suggests.

      (6) I suggest stating that the authors take more caution in describing FGF gradients. For example, in one Results heading they write "Endogenous FGF4 and FGF17 gradients underly the ERK activity pattern.", implying an FGF protein gradient. However, they only present data for FGF mRNA , not protein. This issue would be clarified if they used proper nomenclature for gene, mRNA (italics), and protein (no italics) throughout the paper.

      We will edit the paper to more clearly distinguish protein and mRNA.

      Reviewer #2 (Public review):

      Summary:

      The role of FGFs in embryonic development and stem cell differentiation has remained unclear due to its complexity. In this study, the authors utilized a 2D human stem cell-based gastrulation model to investigate the functions of FGFs. They discovered that FGF-dependent ERK activity is closely linked to the emergence of primitive streak cells. Importantly, this 2D model effectively illustrates the spatial distribution of key signaling effectors and receptors by correlating these markers with cell fate markers, such as T and ISL1. Through inhibition and loss-of-function studies, they further corroborated the needs of FGF ligands. Their data shows that FGFR1 is the primary receptor, and FGF2/4/17 are the key ligands for primitive streak development, which aligns with observations in primate embryos. Additional experiments revealed that the reduction of FGF4 and FGF17 decreases ERK activity.

      Strengths:

      This study provides comprehensive data and improves our understanding of the role of FGF signaling in primate primitive streak formation. The authors provide new insights related to the spatial localization of the key components of FGF signaling and attempt to reveal the temporal dynamics of the signal propagation and cell fate decision, which has been challenging.

      Weaknesses:

      Given the solid data, the work only partially clarifies the complex picture of FGF signaling, so details remain somewhat elusive. The findings lack a strong punchline, which may limit their broader impact.

      We thank this reviewer for their valuable feedback and the compliment on the solidity of our data. The punchline of our work is that FGF4- and FGF17-dependent ERK signaling plays a key role in human PS-LC differentiation, and that these are different FGFs than those thought to drive mouse gastrulation. A second key point is that like BMP and TGFβ signaling, FGF signaling is restricted to the basolateral sides of pluripotent stem cell colonies due to polarized receptor expression, which is crucial for understanding the response to exogenous ligands added to the cell medium. Indeed, many facets of FGF signaling remain to investigated in the future, such as how FGF regulates and is regulated by other signals, which we will dedicate a different manuscript to.

      Reviewer #3 (Public review):

      Jo and colleagues set out to investigate the origins and functions of localized FGF/ERK signaling for the differentiation and spatial patterning of primitive streak fates of human embryonic stem cells in a well-established micropattern system. They demonstrate that endogenous FGF signaling is required for ERK activation in a ring-domain in the micropatterns, and that this localized signaling is directly required for differentiation and spatial patterning of specific cell types. Through high-resolution microscopy and transwell assays, they show that cells receive FGF signals through basally localized receptors. Finally, the authors find that there is a requirement for exogenous FGF2 to initiate primitive streak-like differentiation, but endogenous FGFs, especially FGF4 and FGF17, fully take over at later stages.

      Even though some of the authors' findings - such as the localized expression of FGF ligands during gastrulation and the importance of FGF/ERK signaling for cell differentiation in the primitive streak - have been reported in model organisms before, this is one of the first studies to investigate the role of FGF signaling during primitive streak-like differentiation of human cells. In doing so, the paper reports a number of interesting and valuable observations, namely the basal localization of FGF receptors which mirrors that of BMP and Nodal receptors, as well as the existence of a positive feedback loop centered on FGF signaling that drives primitive-streak differentiation. The authors also perform a comparison of the role of different FGFs across species and try to assign specific functions to individual FGFs. In the absence of clean genetic loss-of-function cell lines, this part of the work remains less strong.

      We thank the reviewer for emphasizing the value of our findings in a human model for gastrulation. We agree more loss-of-function experiments would provide further insight into the role of different FGFs, and we plan to provide additional data along these lines in the revised manuscript.

    1. Author Response

      We thank the reviewers and editorial team for the positive reaction to our paper and for the constructive recommendations and comments on our work. Here we provide a brief provisional response to key points that were identified. We will give a detailed point-by-point response with highlighted changes in our manuscript when we upload the revised version of our paper.

      Reviewer 1:

      Statistical evaluation of the null

      In Experiment 2, we inferred the existence of a null effect of image category on suppression depth based on frequentist statistics. At the reviewer’s suggestion we performed a statistical evaluation of the evidence in favour of the null effect using a Bayesian repeated measures ANOVA implemented in JASP. That analysis provides strong evidence for the null (BF01= 20.38) and will be included in the final version of the paper.

      Likelihood of exceptional cases

      We acknowledge that our selection of categories is only a sampling of possible categories to which our novel tCFS method can be applied for deriving suppression depth. Other possibilities that come to mind include objects that emerge from specific configurations of simple 'tokens' such as dots (such as actions defined by biological motion (Watson et al., 2004)) or different shaped tokens configured to generate pareidolia faces (Zhou et al., 2021). We will expand on the possibility of these exceptional cases impacting bCFS and reCFS thresholds in the discussion of our revised manuscript.

      Reviewer 2:

      In response to the claim “the paper overreaches by claiming breakthrough thresholds are insufficient for drawing certain conclusions about subconscious processing.”

      We agree that breakthrough thresholds can provide useful information to draw conclusions about unconscious processing – as our procedure is predicated on breakthrough thresholds. Our key point is that breakthrough provides only half of the needed information and will amend our manuscript accordingly. In so doing, we will also shift our focus toward the influence of semantics and low-level factors, including discussion of the possibility that suppression depth and bCFS thresholds could be driven by statistically orthogonal factors.

      Reviewer 3:

      On the appropriateness of log-transformed contrast

      Our motivation to quantify suppression depth after log-transform to decibel scale was two-fold. First, we recognised that the traditional use of a linear contrast ramp in bCFS is at odds with the well-characterised profile of contrast discrimination thresholds which obey a power law (Legge, 1981) and the observations that neural contrast response functions show the same compressive non-linearity in many different cortical processing areas (e.g.: V1, V2, V3, V4, MT, MST, FST, TEO. See Ekstrom et al., 2009). Increasing contrast in linear steps could thus lead to a rapid saturation of the response function, which may account for the overshoot that has been reported in many canonical bCFS studies. For example, in Jiang et al. (2007), target contrast reached 100% after 1 second, yet average suppression times for faces and inverted faces were 1.36 and 1.76 seconds respectively. As contrast response functions in visual neurons saturate at high contrast, the upper levels of a linear contrast ramp have less and less effect on the target's strength. This approach to response asymptote may have exaggerated small differences between stimulus conditions and may have inflated some previously reported differences. In sum, the use of a log-transformed contrast ramp allows finer increments in contrast to be explored before saturation, a simple manipulation which we hope will be adopted by our field.

      Second, by quantifying suppression depth as a decibel change, we enable the comparison of suppression depth between experiments and laboratories, which inevitably differ in presentation environments. As a comparison, a reaction-time for bCFS of 1.36 s cannot easily be compared without access to near-identical stimulation and testing environments. In addition, once ramp contrast is log-transformed it effectively linearises the neural contrast response function. This means that different studies that use different contrast levels for masker or target can be directly compared because a given suppression depth (for example, 15 dB) is the same proportionate difference between bCFS and reCFS regardless of the contrasts used in the particular study.

      We also acknowledge that different stimulus categories may engage neural and visual processing associated with different contrast gain values (e.g., magno- vs parvo-mediated processing). But the breaks and returns to suppression of a given stimulus category would be dependent on the same contrast gain function appropriate for that stimulus which thus permits their direct comparison. Indeed, this is why our novel approach offers a promising technique for comparing suppression depth associated with various stimulus categories (a point mentioned above). Viewed in this way, differences in actual durations of break times (such as we report in our paper) may tell us more about differences in gain control within neural mechanisms responsible for processing of those categories.

      Consider that preferential processing could shift both bCFS and reCFS thresholds together

      This is related to the point raised in the previous comment. A stimulus that is preferentially processed (such as a face) could have lower bCFS and reCFS thresholds than other stimuli such that it emerges into awareness at a lower contrast but also remains visible at lower contrasts. We plan to address this interpretation of our data in our revised discussion and highlight that this type of preferential processing could well occur, and yet could still produce the same uniform suppression depth.

      Can the effect of contrast ramp be explained by slower RTs?

      A 500 ms reaction time estimate would not account for the magnitude of the changes observed in Experiment 3. Suppression depths in our slow, medium, and fast contrast ramps were 9.64 dB, 14.64 dB and 18.97 dB, respectively (produced by step sizes of .035, .07 and .105 dB per video frame at 60 fps). At each rate, assuming a 500 ms reaction time for both thresholds (1 second total) would capture a change of 2.1 dB, 4.2 dB, 6.3 dB. This difference cannot account for the size of the effects observed between our different ramp speeds.

      Non-zero switch rate probability affecting ramping

      We agree that for a given ramp speed there is a variable probability of a switch in perceptual state for both bCFS and reCFS portions of the trial. To put it in other words, for a given ramp speed and a given observer the distribution of durations at which transitions occur will exhibit variance. We see that variance in our data (just as it’s present in conventional binocular rivalry duration histograms), as a non-zero probability of switches at very short durations (for example). One might surmise that slower ramp speeds would afford more opportunity for stochastic transitions to occur and that the measured suppression depths for slow ramps are underestimates of the suppression depth produced by contrast adaptation. Yet by the same token, the same underestimation would occur during fast ramp speeds, indicating that that difference may be even larger than we reported. In our revision we will spell this out in more detail, and indicate that a non-zero probability of switches at any time may lead to an underestimation of all recorded suppression depths.

      In our data, we believe the contribution of these stochastic switches are minimal. Our current Supplementary Figure 1(d) indicates that there is a non-zero probability of responses early in each ramp (e.g. durations < 2 seconds), yet these are a small proportion of all percept durations. This small proportion is clear in the empirical cumulative density function of percept durations, which we include in Author response image 1, and will address in our detailed response. Notably, during slow-ramp conditions, average percept durations actually increased, implying a resistance to any effect of early stochastic switching. We plan to expand on our analysis of these reaction-time differences in our revised manuscript.

      Author response image 1.

      The specificity of the DHO fit

      In our revised manuscript we will increase the justification for this model, and plan to include a comparison of model fits over time (as opposed to response number in the current manuscript).

      References

      Ekstrom, L. B., Roelfsema, P. R., Arsenault, J. T., Kolster, H., & Vanduffel, W. (2009). Modulation of the contrast response function by electrical microstimulation of the macaque frontal eye field. The Journal of Neuroscience: The Official Journal of the Society for Neuroscience, 29(34), 10683–10694.

      Jiang, Y., Costello, P., & He, S. (2007). Processing of invisible stimuli: advantage of upright faces and recognizable words in overcoming interocular suppression. Psychological Science, 18(4), 349–355.

      Legge, G. E. (1981). A power law for contrast discrimination. Vision Research, 21(4), 457–467.

      Watson, T. L., Pearson, J., & Clifford, C. W. G. (2004). Perceptual grouping of biological motion promotes binocular rivalry. Current Biology: CB, 14(18), 1670–1674.

      Zhou, L.-F., Wang, K., He, L., & Meng, M. (2021). Twofold advantages of face processing with or without visual awareness. Journal of Experimental Psychology. Human Perception and Performance, 47(6), 784–794.

    1. Author response:

      Data replicability

      There are no replicates contained in the manuscript. (Reviewer #1)

      We respectfully disagree with this statement. In this manuscript, we included both cell and animal replicates. For cell replicates, we analyzed over 50.000 cells using RNAscope and over 10.000 cells using RNAseq, employing two independent methods on different animals. We believe this extensive analysis is sufficient by any standards. Regarding animal replicates, we generated four different transgenic lines (two knockin lines and two BAC transgenic lines), which is an uncommon and rigorous effort. We analyzed dozens of animals, consistently observing the expression pattern of Smim32 and its derived transgenes across multiple experiments, including crosses between transgenics and various reporter lines, which is again an uncommon and rigorous effort. These experiments were conducted on animals from different litters to ensure robustness. Additionally, our longitudinal study, which includes 13 animals harvested at two-day intervals from E16 to P20, provides further consistency of our data. 

      However, to underscore the consistency of endogenous Smim32 expression, when submitting a revised manuscript, we will present Smim32 expression levels across individuals in single-cell RNA-seq data. Furthermore, we will pool data from different transgenic animals to demonstrate interindividual variability in the claustrum of adult animals. 

      Additional examples of female mice should also be included and separately quantified. (Reviewer #1)

      We initially analyzed both males and females for one line (the Smim32-Cre knock-in line). Since we observed no differences between males and females (which we will note in the revised manuscript), we subsequently limited our analyses to males to minimize the use of animals. 

      Claustrum definition

      Weaknesses lie in poor anatomical definitions of the claustrum (and endopiriform nucleus). (Reviewer #2)

      No other orthogonal approaches were used to define the claustrum, such as retrograde neuroanatomical tracing from cortex. (Reviewer #3)

      We share the reviewers’ opinion that the claustrum (CLA) and endopiriform nucleus (EN) are poorly defined anatomically in rodent brains due to the limited development of white matter tracts. This ambiguity has led to many conflicting descriptions of CLA/EN boundaries in various papers and atlases, including those by Paxinos and the Allen Brain Institute. Notably, the Allen Institute frequently updates the shape and anatomical location of the CLA/EN in their reference atlas, resulting in different websites displaying various versions (as illustrated in rebuttal figure 1 at comparable levels of the anteroposterior brain axis). It remains uncertain which version would most effectively satisfy the entire scientific community, if any. Indeed, after many years of working on these structures and surveying the literature, we regret to note that there is currently no consensus on the anatomical definition of the CLA and EN, even among expert laboratories using tracing or staining methods. At one end of the spectrum, some authors define the CLA as a small nucleus that could be, for example, characterized by the PVrich plexus. At the other end, other authors consider it part of a larger complex that includes the EN and extends dorsally to the S2 cortex. Additionally, differing definitions of the core and shell regions, as well as the precise anteroposterior extent of the nucleus, further complicate the issue.

      Author response image 1.

      Comparison of CLA and EN shapes in two recent versions of the Allen brain atlas

      Given this lack of consensus, we deliberately opted for a molecular definition of the claustrum and its projection neurons. We used a set of well-documented canonical markers for the claustrum and neighboring neurons to determine the expression pattern of Smim32. The claustrum-specific markers we selected (Nr4a2, Lxn, Gnb4, Car3, etc.) have been extensively studied and allow us to distinguish claustrum projection neurons from neighboring and intermingled populations. Although none of these individual markers are exclusively specific to CLA and EN neurons, the combined expression of these markers provides greater confidence in identifying the different neuronal populations in space.

      Smim32 expression is used to define claustrum anatomical boundaries, rather than first using several structural, molecular, and connectivity lines of evidence to define the claustrum anatomically and then to assess whether Smim32 expression fits within this anatomical definition. (Reviewer #2)

      Contrary to the reviewer's suggestion, we do not define the claustrum based on Smim32 expression. Instead, Figures 1 and 2 demonstrate that Smim32 expression is highly correlated with the expression of known claustrum markers (Nr4a2, Lxn, Gnb4, Car3, etc.), both regionally and at the cellular level. As suggested by Peng et al. (2021, Fig. 4 and Extended Data Fig. 11), this population of cells, which includes the claustrum, a specific subset of cells in cortical layer 6, and the dorsal endopiriform nucleus, forms a discrete group of neurons sharing the same transcriptomic identity. Given what is known about the connectivity of claustrum and endopiriform nucleus projection neurons, this population obviously includes neurons projecting to various areas, likely fulfilling distinct functions. Whether these cells should be subdivided based on projection area, developmental origin, or structural features is beyond the scope of this article.

      Specificity issues

      Cre/Flp expression driven by the Smim32 promoter is present in non-claustrum regions, including the neighboring cortex, striatum, and endopiriform nucleus as well as the more distant thalamic reticular nucleus. (Reviewer #2)

      The Smim32 gene is not specific to the claustrum. (Reviewer #3)

      We do not claim that endogenous Smim32 expression is exclusive to the claustrum or that the knock-in lines, by themselves, are sufficient to isolate claustrum neurons without combined approaches based on the transgenic lines presented here. However, there are significant differences in the expression pattern between endogenous Smim32 and the expression of Cre in the various derived transgenic lines, which might not have been clear in the current manuscript. Notably, there is no expression of Cre in the striatum and the thalamic reticular nucleus, and only sparse expression in the endopiriform nucleus in Tg61(Smim32-cre). Each transgenic line provides different levels of overlap with the endogenous Smim32 expression, with the Tg61(Smim32-cre)  line allowing for the most specific genetic access to claustrum neurons. Again, for greater specificity, any of these lines could be used in combined approaches, such as viral targeting (as shown in Figure 6A and B) or using transgenic intersectional (dual recombinase) approaches based on Cre- and Flp-expressing mice with an overlap in the claustrum, leading to circuit-specific and/or claustrum-only labeling.

      This means that our claims are supported by the observed data. However, we acknowledge that we may not have clearly explained the specificity of the random transgenes, which could have led some reviewers to believe that « the data do not support the claims ».

      We will clarify these points in the revised manuscript and include additional examples and quantifications to highlight the differences between endogenous Smim32 expression and Cre expression in the transgenic Tg61(Smim32-cre)  line.

      Regarding Cre-expressing cells in the neighboring cortex (layer 6 projection neurons), these cells are genetically distinct from other layer 6 cortical neurons and express the same canonical markers as claustrum projection neurons, likely sharing also the same transcriptomic identity. We will provide a more detailed characterization of these cells in the revised manuscript.

      Since Smim32 driven recombinase (in 61 or 62lrod) is not exclusively expressed in the claustrum, it is not clear how Smim32 is an advantage over possible Nr4a2 or, the more selective, GNB4 Cre driver lines. (Reviewer #2)

      Over the years, we have found a limited number of Cre lines used in the literature for targeting claustrum neurons. These include Gnb4-cre, Slc17a6-cre (also known as Vglut2-cre), Egr2-cre, Tg(Tbx21-cre), Ntng2-cre, Cux2-cre and Esr2-cre lines. We have not found any study describing and/or using an Nr4a2-cre line. Although a Nr4a2-Dre line exists (that we have studied in our laboratories), caution is warranted in its use, as it lacks the complete coding sequence of the Nr4a2 gene.

      One problem with Nr4a2 is its documented expression in the adjacent Layer 6b cortical neurons, which discards it as a suitable candidate to selectively target the claustrum. Furthermore, Nr4a2 is also expressed in a majority of the endopiriform nucleus neurons, whereas endogenous Smim32 is expressed in a smaller proportion of these cells, and is restricted mainly to the dorsal endopiriform nucleus. These reasons led us to select Smim32 over Nr4a2.

      Author response image 2.

      (A) In situ hybridization for various CLA/EN marker genes. (B) Developmental recombination observed outside the CLA/EN in various cre lines (all data from the Allen brain databases)

      What are the advantages of using the different Smim32-cre lines over the existing Cre lines mentioned above?

      Let’s first consider the Gnb4-cre line, which is considered one of the best available. Although the endogenous Gnb4 gene appears to have a similar expression pattern to Nr4a2, Slc17a6, and Smim32 in the striato-claustro-insular region of adult mice (Rebuttal Figure 2A), the results observed with the Gnb4-cre line either shows otherwise, or indicate that the Cre line does not fully recapitulate Gnb4 endogenous expression (Rebuttal Figure 3). Indeed some neurons in the insular cortex, piriform cortex, and putamen express the Cre recombinase (possibly due to low Gnb4 expression not detected in the in situ hybridization data of the Allen brain institute or due to nonspecific transgene expression) and will recombine viral vectors injected in adult mice (Rebuttal Figure 3). Therefore, this Cre expression outside the CLA/EN neurons in the Gnb4-cre line presents complications for data interpretation, depending on the viral injection coordinates and the quantity of injected vectors. 

      Author response image 3.

      Specificity of the Gnb4-Cre line tested with viral transduction in adult mice (all data from the Allen Brain Institute database). The top and middle rows display the same data but with different scaling of the lookup tables to highlight either the patterns of axonal projections (top) or the infected neurons themselves (middle). The bottom row shows a higher magnification of the infection site. Note that individual neurons cannot be resolved in experiment 485903475 due to signal saturation.  

      Cre expression in the CLA appears more specific in the various Smim32-cre transgenic lines than in many of the lines mentioned above. Although we have no doubt that the different existing transgenic lines can target CLA neurons, the selectivity of the targeting (for example, the fraction and types of CLA neurons versus potential non-CLA neurons) remains to be fully described for most of the lines. It is particularly true in the case of Tbx21 and Esr2 (used as drivers for the Tg(Tbx21-cre) and Esr2-cre transgenic lines). Tbx21 is not endogenously expressed in adult CLA neurons (evaluated by in situ and RNAseq data) and Egr2, if expressed in the claustrum, is not restricted to CLA neurons as it is an immediate early gene expressed in recently active neurons (Rebuttal Figure 2A). 

      Cre expression in the EN is observed in all Cre-expressing transgenic lines used to target the claustrum (with the exception of Slc17a6-cre). This can naturally be problematic for some approaches. Luckily, the random integrant Tg61(Smim32-cre) we describe in our manuscript shows a strong expression in the claustrum, and very limited expression outside the CLA (a very weak activity in the EN), representing a novel tool with improved claustrum selectivity. An advantage of the Tg61(Smim32-cre) over the Slc17a6-cre is that more CLA neurons can be targeted with the Tg61(Smim32-cre) line. 

      Another advantage of our four transgenic lines is their versatility; they can be used to recombine reporter lines as well as FRT-floxed and loxP-floxed knockouts in limited neuronal populations. They will be employed in the future for intersectional genetics to exclusively target CLA neurons. Existing transgenic lines cannot offer these possibilities because their marker genes are broadly expressed in the brain during embryogenesis, leading to the impact on a large number of non-CLA/EN neurons. This is evident in the Gnb4-cre and Slc17a6-cre lines crossed with the Ai14 reporter line expressing the fluorescent protein tomato (Rebuttal Figure 2B, right panels). Similar observations have been made for the Ntng2-Cre and Cux2-cre lines (see the Allen Brain Institute database for these data). Alternatively, inducible recombinase systems, such as the Gnb4-IRES2CreERT2-D line, could be used. However, the Gnb4-IRES2-CreERT2-D line requires tamoxifen to induce Cre recombination, which can be problematic depending on the research context, as well as recombinations in the absence of tamoxifen treatment (see experiments 560948627 and 560948194 in the Allen Brain database).

      It is unclear how Smim32 relates to claustrum in other mammalian species (e.g. primates) (Reviewer #3)

      As mentioned in the last paragraph of the introduction of the initial manuscript, Smim32 is specifically expressed in the claustrum of a primate species, Homo sapiens (reference 37 of the initially submitted manuscript).

      Availability of the transgenic mice

      These mice should be made available to the community through commercial vendors. (Reviewer #1 and #2 in private comments)

      We are pleased to see that two of the three reviewers would like to see these mice available. These mice will not be kept for ourselves, and we will distribute them at some point in time, but this will naturally occur after the publication of the revised manuscript.

      Critical comments on discussion and other topics

      A clear description of the search in the Allen Mouse Brain Atlas is missing. A search for Smim32 in the ISH mouse atlas did not provide any hits and so it would be useful to include in the methods or results section the exact query used for examination of Smim32 expression as well as other genes identified in this process. (Reviewer #2)

      Smim32 has been referred to by different names in various versions of the mouse genome. For the readers not versed in navigating genomes and annotations, before being officially named Smim32, this gene was originally called Gm6753 (as noted in the Allen Brain Institute database, see Rebuttal Figure 2A for an example of their in situ data) and later Gm45623.

      Several sentences highlighting the shortfalls of other approaches are overstated and should be toned down. (Reviewer #1)

      Very concerning is problematic language in the abstract and introduction sections that diminish the impact of several published studies (not cited) that have led to important findings regarding claustrum function. The authors create an argument that all the research performed thus far on the claustrum is unreliable because targeting the structure has been sub-optimal. (Reviewer #2)

      A more balanced discussion of the strengths and weaknesses of these mice should be included. (Reviewer #1)

      We regret if our choice of language inadvertently appeared to undermine the contributions of our colleagues; that was certainly not our intention. The paragraph in question was meant to address certain studies that we believe have led to inconsistent findings and unreliable data due to a lack of rigorous methodology in targeting claustrum projection neurons. To avoid singling out specific works, we chose not to cite them directly. We understand that some colleagues whose research does not fall under the “various cases” mentioned may feel unfairly targeted by this statement. We will revise this section to better clarify our intent and ensure it is respectful of all contributions. We will rephrase passages in the abstract, introduction, and discussion to provide a balanced view of the strengths and weaknesses of these mice.

      Our main goal is to provide tools to specifically target claustrum cells based on their transcriptomic identity, which we believe is the best means to assess the function of any neuronal population. Due to the intermingling of claustrum neurons with neighboring populations, employing stereotaxic injections in the claustrum without genetic segregation will always infect and label physically adjacent cells that do not belong to the claustrum, ontologically and functionally speaking. 

      Similarly, targeting claustrum neurons retrogradely by injecting into claustrum projection sites likely labels neurons from different populations. For instance, as reviewer 1 mentions Erwin et al. (2021), infecting retrosplenial projections without genetic specificity labels many claustrum Synpr+ neurons (considered the claustrum core), a small proportion of claustrum Nnat+ neurons (considered the claustrum shell by some, and non-claustrum neurons by others), and some neighboring cortical L6b neurons. These three populations have very different transcriptomic identities, connectivity patterns, and likely distinct functions.

      Thus, we believe that genetic specificity provides an important added value for selectively targeting the claustrum or claustro-insular complex.

      A better characterization of all data should be undertaken. (Reviewer #1)

      Having generated hundreds of transgenic lines over the years, we have never performed a more thorough analysis of transgenic lines, nor have a recollection of reading a publication evaluating at such a precise level the expression pattern of transgenes in mice. We, therefore, do not see exactly what the reviewer means by this remark. It is possible, not being native English speakers, that we did not grasp a certain form of joke.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Paturi et.al. presents a detailed structural and mechanistic study of the DRB7.2:DRB4 complex in plants, focusing on its role in sequestering endogenous inverted-repeat dsRNA precursors and inhibiting Dicer-like protein 3 (DCL3) activity. By truncating the two proteins, they systematically identify the domains involved in direct interaction between DRB7.2 and DRB4 and study the interactions between the two using biophysical techniques (ITC and NMR). They show using NMR that the interacting domains between the two proteins are likely partially unfolded or aggregated in the absence of the binding partner and determining the NMR structure of the individual interacting domains in the presence of the isotopically unlabelled partner using sparse restrain data combined with Rosetta. They also determine the complex structure of the interacting DRB7.2 dsRBD domain and the DRB4 D3 domain using X-ray crystallography.

      Strengths:

      Overall, the manuscript is well written, provides molecular details at high resolution between the interaction of DRB7.2 and DRB4, and the data in the manuscript strongly supports the proposed model where DRB7.2:DRB4 complex sequesters the DCL3 substrates inhibiting its function of producing epigenetically activated siRNAs.

      Weaknesses:

      Major comments:

      (1) The manuscript, unfortunately, completely lacks functional validation of the determined DRB7.2:DRB4 complex structure, which is required for the rigorous validation of the proposed model. For functional validation of the determined structures, the author should at least present the mutational analysis (impact on complex formation, RNA affinity) of the point mutants derived from the structure of the DRB7.2:DRB4 complex.

      We thank the reviewer for pointing out a crucial aspect that is missed out in our manuscript. With the inputs and experiments proposed above, we would certainly like to perform additional mutational analysis to determine the impact on the heterodimeric complex formation and identify the key essential residues involved in the RNA binding.

      We expect that we can accomplish this study in the next ~ 4-6 months as we may have to create a combination of mutations for residues involved in the dimerization interface, namely, T131, V132, E134, F136, W156, and V161 on DRB7.2M. Having said that, the disruption of the heterodimer interface would probably lead to DRB7.2M and DRB4D3 returning to their fast-intermediate timescale exchanging native homo-oligomeric state/partially folded state.

      For dsRNA binding, six residues (i.e., A85 and K86 (a1), H112 and K114 (b1-b2 loop), and K142 and K144 (a2)) involved in the RNA binding interface and a few other residues based on the mutational data will be considered.

      (2) The proposed model shows the DRB7.2M and DRB4D3 as partially folded/aggregated proteins in the absence of the complex, understandably from the presented NMR data of the individual domains. However, in the cellular context, when the RNAs are present, especially DRB7.2M might be properly folded/not aggregated. Could the authors support or negate this by showing the <sup>15</sup>N HSQC spectrum of DRB7.2M in complex with the 13 bp dsRNA?

      While we have no direct proof that the DRB7.2M might be folded/not aggregated in the presence of RNAs in the cellular context, the in vitro NMR-based titration studies of alone DRB7.2 (Author response image 1A) with two molar equivalence of 13 bp dsRNA (Author response image 1B and R1C) indicate that there is no change in overall spectral pattern (except for the apparent chemical shift perturbations as expected from fast-intermediate exchange timescale binding of DRB7.2M with 13 bp dsRNA), implying that the dsRNA alone is neither necessary nor sufficient to disrupt the native fast exchange oligomeric states sampled by individual DRB7.2 and DRB7.2M.

      Author response image 1.

      DRB7.2M binding interaction with 13bp dsRNA (A) 1H-15N TROSY-HSQC of U[15N, 2H] DRB7.2M. (B) 1H-15N TROSY-HSQC of U[15N, 2H] DRB7.2M in the presence of 13 bp dsRNA with 1:2 molar equivalence. (C) An overlay of (A) and (B) indicates no evident changes in the broadening of resonances. (D) The 15N linewidth analysis of unbound (red) and bound (green) forms of U[15N, 2H] DRB7.2M resonances for which the assignment could be traced from the assignments of the DRB7.2M:DRB4D3 complex.

      Furthermore, the line-width analysis, shown in Author response image 1D, implies that the ~R<sub>2</sub> rates are roughly identical in the presence of dsRNA, indicating that the native oligomeric state of DRB7.2M remains unperturbed by the presence of dsRNA. Our observation also corroborates with the crystal structure presented in the manuscript, where we have observed that the hetero-dimeric interface lies on the opposite side of the dsRNA binding interface of the DRB7.2M:DRB4D3 complex.

      Therefore, the dsRNA substrate does not have any role in the native partially folded/oligomeric state of DRB7.2M.

      (3) It remains unclear from the manuscript if DRB7.1 will have a similar or different mechanism of interaction with DRB4. Based on the sequence comparisons of the two proteins, the authors should comment on this in the discussion section.

      Pairwise sequence alignment of full-length DRB7.2 and DRB7.1 reveals 50.7% similarity and a 33.2% identity derived from EMBOSS Needle (Author response image 2).

      Author response image 2.

      ClustalW alignment of full-length DRB7.2 and DRB7.1. The secondary structure elements are derived from the crystal structure of DRB7.2M (PDB ID: 8IGD). Identical residues are marked with red highlights, whereas similar residues are marked with yellow highlights, and the consensus residues (> 50%) are annotated below the sequence alignment.

      As expected, for the dsRBD region (corresponding to DRB7.2M), we observe a much higher degree of alignment with a 76.7% similarity with a 54.7% identity (Author response image 3).

      Author response image 3.

      ClustalW alignment of the dsRBD region of DRB7.2 and DRB7.1. The secondary structure elements are derived from the crystal structure of DRB7.2M (PDB ID: 8IGD). Identical residues are marked with red highlights, whereas similar residues are marked with yellow highlights, and the consensus residues (> 50%) are annotated below the sequence alignment.

      Moreover, the residues involved in the heterodimerization interface in DRB7.2M are identical to those in DRB7.1. As a matter of fact, the residues involved in the dimerization interface, namely, T131, V132, E134, F136, W156, and V161 in DRB7.2M are unchanged in DRB7.1, suggesting that DRB7.1M may interact with DRB4D3 using a similar manner as illustrated for DRB7.2M:DRB4D3 in the manuscript.

      Future studies will shed more light on the binding preference of DRB4D3 with DRB7.1 versus DRB7.2. One interesting thing to note is that DRB7.2 is exclusively present in the nucleus, whereas DRB7.1 is observed to localize in the nucleus as well as the cytoplasm. Therefore, spatial restriction may be one of the mechanisms that bring exclusivity to the interaction partner despite having a conserved interaction interface.

      Minor comments:

      (1) There are no errors for the N, dH, and dS values of the ITC measurements in Table 1. Also, it seems that the measurements are done only once. Values derived from at least triplicates should be presented. This would be helpful to increase confidence in the values derived from ITC, especially for the titration between DRB7.2, DRB4C, and DRB4D3, as the N value there is substantially lower than 1, which does not agree with the other data.

      We plan to estimate the errors as proposed by the reviewer in the revised manuscript to ensure that the presented data is of high confidence.

      Reviewer #2 (Public review):

      Summary:

      The manuscript by Paturi and colleagues uses an approach that combines structural biology and biochemistry to probe protein-protein and protein-RNA interactions for two protein factors related to the dsRNA pathway in plants.

      Strengths:

      A key finding in the research is the direct demonstration of the ability of the single dsRBD (double-strand RNA binding domain) of DRB7.2 to interact simultaneously with dsRNA as well as the C-terminal domain of DRB4. The heterodimerization of DRB7.2 and DRB4 is demonstrated to make a high-affinity complex with dsRNA, and it is proposed that this atypical use of the dsRBD domain to bridge the protein and RNA may contribute to the ability to prevent cleavage that would otherwise occur for dsRNA. The primary results for the interactions are generally well-supported by the data, and the conclusions are taken from the available results without excessive speculation.

      Weaknesses:

      There is a need for some statistical repeats, as well as a suggested movement of many protein characterization findings in the solution state to support data or to better indicate how these properties could play a role in the final proposed mechanism. There is also the need for certain measurement replicates, such as for the ITC data, which are derived from single measurements and lack sufficient estimates of error.

      We plan to restructure the manuscript on the lines proposed by the reviewer in the revised version. Moreover, as mentioned in the response to the comments of Reviewer 1, we suggest estimating the errors to ensure that the presented data is of high confidence in the revised version.

    1. Author Response

      We thank the reviewers for their useful and constructive comments. In this provisional response, we will address a few of the major issues and plan to submit a detailed, point-by-point response along with the revised manuscript.

      1. Robustness of activated combination of neurons (the ‘activated ensemble’).

      The reviewers have asked for additional analyses and visualization of the group of neurons activated and a classification analysis to illustrate the point that the activated set of neurons would allow discrimination between different concentrations even after the spiking activity reduced significantly in the later trials. We relied on visualization using PCA (Manuscript Fig. 4) and quantification using correlation analysis (Manuscript Fig. 5a and Manuscript Supplementary Figure 2). But this point can be easily amplified further to support our conclusions and address a major concern raised by the reviewers.

      Visualization of neural responses across trials and odorants: As recommended, we followed the procedures used in Stopfer et al., 2003 (Fig. 6c) and Miura et al., 2012(Fig. 3C) to image neural responses across recorded PNs as a matrix (Author response image 1).

      Author response image 1.

      Author response image 1: Spike counts averaged over the entire 4s odor presentation window across all recorded neurons are shown as a function of trial number (columns). The sorting is same across different panels. Note that there are 80 neurons whose response was monitored for hexanol and octanol responses (Dataset 1; first row of panels), and 81 neurons whose response was monitored for isoamyl acetate and benzaldehyde (Dataset 2; second row of panels). As can be noted, across the 25 trials the pattern of activation remains consistent. Also, the activated combination of neurons varied robustly with odor identity and intensity.

      Classification analysis: To illustrate that there is enough information to recognize an odorant and discriminate between different intensities, we performed a leave-one-trial-out classification analysis. The left-out trial was assigned the class label of its nearest neighbor (using correlation distance metric). The results from this classification analysis are shown below in Author response image 2. As a control, we shuffled the odor class labels and repeated the leave-one-trial-out classification analysis.

      Author response image 2.

      Author response image 2: Results from classification analysis are shown for the two datasets: hexanol–octanol at different concentrations (dataset 1; 80 PNs), and isoamyl acetate and benzaldehyde (dataset 2; 81 PNs). We did a leave-onetrial-out validation. The true odor label is shown along the x-axis and the predicted odor label is shown along the yaxis. As can be noted, the class labels for every single trial were correctly predicted in both datasets. The result after class labels were shuffled is also shown for comparison. These results strongly support our conclusion that odor intensity information is preserved and odor concentration can be recognized independent of adaptation.

      Correlation with the first trial:

      We had shown the correlation across odorants and concentrations as a function of the trial (manuscript Figure 5A). To complement these analyses, here we focus on the correlations with the response evoked in the first trial of each odorant at high and low concentrations and plot this information as a function of trial number (Author response image 3, 4). As can be noted, the correlation across different trials of a given odorant at specific concentrations remains much higher than all other conditions.

      Author response image 3.

      Author response image 3: (top-left) Correlation between 80-dimensional neural responses (averaged over the entire 4s odor presentation window) with the first trial of hexanol at high intensity (hex-H; 1% v/v) is plotted as a function of trial number. (top-right) similar plots but correlation computed with neural responses evoked during the first trial of octanol at high intensity (oct-H; 1% v/v). (bottom-left) similar plots but correlation computed with neural responses evoked in the first trial of hexanol at low intensity (hex-L; 1% v/v). (bottom-right) similar plots but correlation computed with neural responses evoked in the first trial of octanol at low intensity (oct-L; 1% v/v).

      Author response image 4.

      Author response image 4: (top-left) Correlation between 81-dimensional neural responses (averaged over the entire 4s odor presentation window) with the first trial of isoamyl acetate at high intensity (iaa-H; 1% v/v) is plotted as a function of trial number. (top-right) similar plots but correlation computed with neural responses evoked in the first trial of benzaldehyde at a high intensity (bza-H; 1% v/v). (bottom-left) similar plots but correlation computed with neural responses evoked in the first trial of isoamyl acetate at low intensity (iaa-L; 1% v/v). (bottom-right) similar plots but correlation computed with neural responses evoked in the first trial of benzaldehyde at low intensity (bza-L; 1% v/v).

      Behavioral significance and dynamics: The reviewers had wondered about the relevance of the behavior to the organism. The maxillary palps are sensory organs close to the mouth parts that are used to grab food and help with the feeding process. In our previous studies, we had shown that these palpopening responses are innately triggered by many ‘appetitive odorants.’ However, the probability of palp opening varied across different odorants (Chandak and Raman, 2023). Some odorants evoked higher palp-opening responses and others reduced the probability of palp-opening response (below the median value across odorants). Since all other parameters (such as the clicking sound of valves, and mechanical cues due to airflow during odor presentation), are the same across these different odorants, these observed differences in palp-opening response probability are attributed to the identity of the odorants presented.

      Author response image 5.

      Author response image 5: Preference indices were calculated for all odors tested and are shown as a bar plot (n = 26 locusts). Blue bars indicate odors classified as appetitive, gray bars indicate neutral odors and red bars indicate unappetitive odors. Locusts with a significant deviation from the median response (one-sided binomial test, P < 0.1, were classified as either being appetitive or unappetitive; P < 0.1, P < 0.05, **P < 0.01). Error bars indicate s.e.m. [Replotted Fig 1.c from Chandak and Raman, 2023].

      We had also shown that we could train locusts to have stereo-typed palp-opening responses using the classical conditioning approach (odor – odor-conditioned stimulus and food reward – unconditioned stimulus; Video: https://static- content.springer.com/esm/art%3A10.1038%2Fncomms7953/MediaObjects/41467_2015_BFncomms7953 _MOESM483_ESM.mov; Saha et al., 2015). The dynamics of those conditioned palp-opening responses have been well characterized.

      We will use similar tracking procedures to monitor and quantify the dynamics of innate palp-opening responses as well. We will add supplementary videos to fully capture this behavior.

      Early vs. late neural responses:

      Since behavioral responses are more likely to start as soon as the odorant is presented, the reviewers wondered whether there are differences in the observed findings if we focus only on the early neural activity (as it might be more important to triggering behavior). Note that the median response time for conditioned palp-opening responses is less than 750 ms (Saha et al., 2015, Chandak and Raman, 2023). Hence, we divided the neural dataset and analyzed the neural response patterns during these early (0-750 ms after onset) and late (2-4 s after odor onset) time windows. In both these epochs, we found that the total spike counts across neurons reduced as a function of trial number or repetition and the combination of neuron activated remained robust (Author response images 6-11). Hence, we conclude that while the neural responses in different time windows would be important for shaping other parameters of behavioral response dynamics, the overall behavioral response probability that we used in our analysis had a similar relationship with early, late, or total neural activity during the entire odor presentation (i.e. time-window of the neural response did not matter for the analyses presented in the manuscript).

      Author response image 6.

      Author response image 6: Total spike counts reduced as a function of trial number. This reduction was observed for the total spike counts during the entire odor presentation window and during both the early (0-750 ms) and late (2-4 s) response time windows. Dataset 1: 80 PNs, hexanol, and octanol odorants.

      Author response image 7.

      Author response image 7: Total spike counts reduced as a function of trial number. This reduction was observed for the total spike counts during the entire odor presentation window and during both the early (0-750 ms) and late (2-4 s) response time windows. Dataset 2: 81 PNs, isoamyl acetate, and benzaldehyde odorants.

      Author response image 8.

      Author response image 8: Similar plots as in Figures 3 and 4 but analyzing 80-dimensional spike count vectors calculated using only the first 750 ms of odor-evoked response. Note that the correlation with the odor evoked response in the first trial remains high across trials. But between different odorants or different intensities of the same odorant, the response correlation drops significantly. Dataset 1: 80 PNs, hexanol, and octanol odorants.

      Author response image 9.

      Author response image 9: Similar plots as in Figures 3 and 4 but analyzing 80-dimensional spike count vectors calculated using only the last 2 seconds of odor-evoked response. Note that the correlation with the odor evoked response in the first trial remains high across trials. But between different odorants or different intensities of the same odorant, the response correlation drops significantly. Dataset 1: 80 PNs, hexanol, and octanol odorants.

      Author response image 10.

      Author response image 10: Similar plots as in Figures 3 and 4 but analyzing 80-dimensional spike count vectors calculated using only the first 750 ms of odor-evoked response. Note that the correlation with the odor evoked response in the first trial remains high across trials. But between different odorants or different intensities of the same odorant, the response correlation drops significantly. Dataset 2: 81 PNs, isoamyl acetate, and benzaldehyde odorants.

      Author response image 11.

      Author response image 11: Similar plots as in Figures 3 and 4 but analyzing 80-dimensional spike count vectors calculated using only the last 2 seconds of odor-evoked response. Note that the correlation with the odor evoked response in the first trial remains high across trials. But between different odorants or different intensities of the same odorant, the response correlation drops significantly. Dataset 2: 81 PNs, isoamyl acetate, and benzaldehyde odorants.

      Other Statistical Tests:

      The reviewers felt that in many analyses, we did not include error bars to indicate the sample size, SEM, or SD. We will fix this by adding the sample size information to each panel and as appropriate. However, we would also like to point out that many of the analyses are done in a trial-by-trial fashion (e.g. Manuscript Figures 3 – 6). For these analyses, it would not be possible to add SEM or SD. One condition (hex -H or iaa-H) was repeated in each dataset, and we have added them in the results shown in this response letter to demonstrate repeatability. We will strive our best to add these statistics as would be appropriate, but this cannot be done for the trial-by-trial analyses.

      References:

      Stopfer M, Jayaraman V, Laurent G. Intensity versus identity coding in an olfactory system. Neuron. 2003 Sep 11;39(6):991-1004. doi: 10.1016/j.neuron.2003.08.011. PMID: 12971898.

      Miura K, Mainen ZF, Uchida N. Odor representations in olfactory cortex: distributed rate coding and decorrelated population activity. Neuron. 2012 Jun 21;74(6):1087-98. doi: 10.1016/j.neuron.2012.04.021. PMID: 22726838; PMCID: PMC3383608.

      Chandak, R., Raman, B. Neural manifolds for odor-driven innate and acquired appetitive preferences. Nat Commun 14, 4719 (2023). https://doi.org/10.1038/s41467-023-40443-2

      Saha, D., Li, C., Peterson, S. et al. Behavioural correlates of combinatorial versus temporal features of odour codes. Nat Commun 6, 6953 (2015). https://doi.org/10.1038/ncomms7953

    1. Author response:

      We thank the reviewers for their thoughtful comments and constructive suggestions. We describe how we will address each point below and are grateful for the guidance on areas where our work could be clarified or expanded. In particular, we note the following:

      Selection scan summary statistics: In our revised manuscript, we will include summary statistics from the selection scans. We believe this addition will enhance transparency and provide additional context for readers.

      Reporting of outliers: As highlighted by the editor, the reviewers expressed differing views on the most appropriate way to report outliers. To provide a comprehensive and balanced presentation, we will report both the empirical selection statistics and the corresponding converted p-values. This dual approach will allow readers to fully interpret the results under both perspectives.

      Methodological considerations: We have carefully considered the reviewers' methodological suggestions and will incorporate them into our revisions where possible. These changes strengthen the rigor and clarity of the analyses.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The paper reports an analysis of whole-genome sequence data from 40 Faroese. The authors investigate aspects of demographic history and natural selection in this population. The key findings are that the Faroese (as expected) have a small population size and are broadly of Northwest European ancestry. Accordingly, selection signatures are largely shared with other Northwest European populations, although the authors identify signals that may be specific to the Faroes. Finally, they identify a few predicted deleterious coding variants that may be enriched in the Faroes.

      Strengths:

      The data are appropriately quality-controlled and appear to be of high quality. Some aspects of the Faroese population history are characterized, in particular, by the relatively (compared to other European populations) high proportion of long runs of homozygosity, which may be relevant for disease mapping of recessive variants. The selection analysis is presented reasonably, although as the authors point out, many aspects, for example differences in iHS, can reflect differences in demographic history or population-specific drift and thus can't reliably be interpreted in terms of differences in the strength of selection.

      Weaknesses:

      The main limitations of the paper are as follows:

      (1) The data are not available. I appreciate that (even de-identified) genotype data cannot be shared; however, that does substantially reduce the value of the paper. Minimally, I think the authors should share summary statistics for the selection scans, in line with the standard of the field.

      We agree with the reviewer that sharing the selection scan results is important, so in the next revision of this manuscript we will make the selection scan summary statistics publicly available, and clearly lay out the guidelines and research questions for which the data can be accessed.

      (2) The insight into the population history of the Faroes is limited, relative to what is already known (i.e., they were settled around 1200 years ago, by people with a mixture of Scandinavian and British ancestry, have a small effective population size, and any admixture since then comes from substantially similar populations). It's obvious, for example, that the Faroese population has a smaller bottleneck than, say, GBR.

      More sophisticated analyses (for example, ARG-based methods, or IBD or rare variant sharing) would be able to reveal more detailed and fine-scale information about the history of the populations that is not already known. PCA, ADMIXTURE, and HaplotNet analysis are broad summaries, but the interesting questions here would be more specific to the Faroes, for example, what are the proportions of Scandinavian vs Celtic ancestry? What is the date and extent of sex bias (as suggested by the uniparental data) in this admixture? I think that it is a bit of a missed opportunity not to address these questions.

      We clarify that we did quantify the proportions of various ancestry components as estimated by HaploNet in main text Figure 5 and supplemental figures S5 and S6. In our revisions, we will include the average global ancestry of the various components in the Main Text so that this result is more clear.

      We agree that more fine-scale demographic analyses would be informative. We have begun working on an estimation of the admixture date, for example, but have encountered problems with using different standard date estimation software, which give very inconsistent and unstable results. We suspect this might be due to the strong bottleneck experienced in the history of the Faroe Islands breaking one or more of the assumptions of these methods. We will continue working on this problem in coming months, possibly using simulations to assess where the problem might be. We recognize that our relatively small sample size places limits on the fine-scale demographic analyses that can be performed. We are addressing this in ongoing work by generating a larger cohort, which we hope will enable more detailed inference in the future.

      (3) I don't really understand the rationale for looking at HLA-B allele frequencies. The authors write that "ankylosing spondylitis (AS) may be at a higher prevalence in the Faroe Islands (unpublished data), however, this has not been confirmed by follow-up epidemiological studies". So there's no evidence (certainly no published evidence) that AS is more prevalent, and hence nothing to explain with the HLA allele frequencies?

      We agree that no published studies have confirmed a higher prevalence of ankylosing spondylitis (AS) in the Faroe Islands. Our recruitment data suggest that AS might be more common than in other European populations, but we understand that this is only based on limited, unpublished observations and what we are hearing from the community. We emphasized in our original manuscript that this is based on observational evidence from the FarGen project. However, as this reviewer pointed out, we can be more clear that this prevalence has not been formally studied.

      In our next revision we will clarify in the text that our recruitment data suggest a higher prevalence of AS may be possible, but more formal epidemiological studies are needed to confirm this observation. The reason we study HLA-B allele frequencies is to see if the genetic background of the Faroese population could help explain this possible difference, since HLA-B27 is already known to play a strong role in AS.

      Reviewer #2 (Public review):

      In this paper, Hamid et al present 40 genomes from the Faroe Islands. They use these data (a pilot study for an anticipated larger-scale sequencing effort) to discuss the population genetic diversity and history of the sample, and the Faroes population. I think this is an overall solid paper; it is overall well-polished and well-written. It is somewhat descriptive (as might be expected for an explorative pilot study), but does make good use of the data.

      The data processing and annotation follows a state-of-the-art protocol, and at least I could not find any evidence in the results that would pinpoint towards bioinformatic issues having substantially biased some of the results, and at least preliminary results lead to the identification of some candidate disease alleles, showing that small, isolated cohorts can be an efficient way to find populations with locally common, but globally rare disease alleles.

      I also enjoyed the population structure analysis in the context of ancient samples, which gives some context to the genetic ancestry of Faroese, although it would have been nice if that could have been quantified, and it is unfortunate that the sampling scheme effectively precludes within-Faroes analyses.

      We note that although the ancestry proportions are not specified in the main text, we did quantify ancestry proportions in the modern Faroese individuals and other ancient samples, and we visualized these proportions in Figure 5 and Supplementary Figures S5 and S6. As stated in our response to Reviewer #1, in our revisions, we will more clearly state the average global ancestry of the various components in the Main Text.

      I am unfortunately quite critical of the selection analysis, both on a statistical level and, more importantly, I do not believe it measures what the authors think it does.

      Major comments:

      (1) Admixture timing/genomic scaling/localization:

      As the authors lay out, the Faroes were likely colonized in the last 1,000-1,500 years, i.e., 40-60 generations ago. That means most genomic processes that have happened on the Faroese should have signatures that are on the order of ~1-2cM, whereas more local patterns likely indicate genetic history predating the colonization of the islands. Yet, the paper seems to be oblivious to this (to me) fascinating and somewhat unique premise. Maybe this thought is wrong, but I think the authors miss a chance here to explain why the reader should care beyond the fact that the small populations might have high-frequency risk alleles and the Faroes are intrinsically interesting, but more importantly, it also makes me think it leads to some misinterpretations in the selection analysis

      See response to point #3

      (2) ROH:

      Would the sampling scheme impact ROH? How would it deal with individuals with known parental coancestry? As an example of what I mean by my previous comment, 1MB is short enough in that I would expect most/many 1MB ROH-tracts to come from pedigree loops predating the colonization of the Faroes. (i.e, I am actually quite surprised that there isn't much more long ROH, which makes me wonder if that would be impacted by the sampling scheme).

      The sampling scheme was designed to choose 40 Faroese individuals that were representative of the different regions and were minimally related. There were no pairs of third-degree relatives or closer (pi-hat > 0.125) in either the Faroese cohort or the reference populations. It is possible that this sampling scheme would reduce the amount of longer ROHs in the population, but we should still be able to see overall patterns of ROH reflective of bottlenecks in the past tens of generations. Additionally, based on this reviewer's earlier comment, 1 Mb ROHs would still be relevant to demographic events in the last 40-60 generations given that on average 1 cM corresponds to 1 Mb in humans, though we recognize that is not an exact conversion.

      That said, the “sum total amount of the genome contained in long ROH” as we described in the manuscript includes all ROHs greater than 1Mb. Although we group all ROHs longer than 1Mb into one category in the current manuscript, we can look more specifically at the distribution of the longer ROH in future revisions and add discussion into what this might tell us about the timing of bottlenecks. 

      For now, we share a plot of the distribution in ROH lengths across all individuals for each cohort. As this plot shows, there certainly are ROHs longer than 1Mb in the Faroese cohort, and on average there is a higher proportion of long ROH particularly in the 5-15 Mb range in the Faroese cohort relative to the other cohorts.

      Author response image 1.

      (3) Selection scan:

      We are talking about a bottlenecked population that is recently admixed (Faroese), compared to a population (GBR) putatively more closely related to one of its sources. My guess would be that selection in such a scenario would be possibly very hard to detect, and even then, selection signals might not differentiate selection in Faroese vs. GBR, but rather selection/allele frequency differences between different source populations. I think it would be good to spell out why XP-EHH/iHS measures selection at the correct time scale, and how/if these statistics are expected to behave differently in an admixed population.

      The reviewer brings up good points about the utility of classical selection statistics in populations that are admixed or bottlenecked, and whether the timescale at which these statistics detect selection is relevant for understanding the selective history of the Faroese population. We break down these concerns separately.

      (1) Bottlenecks: Recent bottlenecks result in higher LD within a population. However, demographic events such as bottlenecks affect global genomic patterns while positive selection is expected to affect local genomic patterns. For this reason, iHS and XP-EHH statistics are standardized against the genome-wide background, to account for population-specific demographic history.

      (2) Admixture: The term “admixture” has different interpretations depending on the line of inquiry and the populations being studied. Across various time and geographic scales, all human populations are admixed to some degree, as gene flow between groups is a common fixture throughout our history. For example,

      even the modern British population has “admixed” ancestry from North / West European sources as well, dating to at least as recently as the Medieval & Viking periods (Gretzinger et al. 2022, Leslie et al. 2015), yet we do not commonly consider it an “admixed” population, and we are not typically concerned about applying haplotype-based statistics in this population. This is due to the low divergence between the source populations. In the case of the Faroe Islands, we believe admixture likely occurred on a similar timescale. We see low variance in ancestry proportions estimated by HaploNet, both from the historical Faroese individuals (250BP) and the modern samples. This indicates admixture predating the settlement of the Faroe Islands, where recombination has had time to break up long ancestry tracts and the global ancestry proportions have reached an equilibrium. That is, these ancestry patterns suggest that the modern Faroese are most likely descended from already admixed founders. We mention this as a likely possibility in the main text: “This could have occurred either via a mixture of the original “West Europe” ancestry with individuals of predominantly “North Europe” ancestry, or a by replacement with individuals that were already of mixed ancestry at the time of arrival in the islands (the latter are not uncommon in Viking Age mainland Europe).” And, as with the case of the British population, the closely-related ancestral sources for the Faroese founders were likely not so diverged as to have differences in allele frequencies and long-range haplotypes that would disrupt signals of selection from iHS or XP-EHH.

      (3) Time scale: It is certainly possible, and in fact likely, that iHS measures selection older than the settlement of the Faroe Islands. In our manuscript, we calculated iHS in both the Faroese and the closely related British cohort, and we highlight in the main Main Text that the top signals, with the exception of LCT, are shared between the two cohorts, indicative of selection that began prior to the population split. iHS is a commonly calculated statistic, and it is often calculated in a single population without comparing to others, so we feel it is important to show our result demonstrating these shared selection signals. In future revisions, we will emphasize in the main text that we are not claiming to have identified selection that occurred in the Faroese population post-settlement with the iHS statistic. As far as XP-EHH, it is a statistic designed to identify differentiated variants that are fixed or approaching fixation in one population but not others. The time-scale of selection that XP-EHH can detect would therefore be dependent on the populations used for comparison. As XP-EHH has the best power to identify alleles that are fixed or approaching fixation in one population but not others, it is less likely to detect older selection events / incomplete sweeps from the source populations.

      In our next revision, we will more clearly state limitations of these statistics under various population histories, and clarify the time-scale at which we are detecting selection for iHS vs XP-EHH.

      (4) Similarly, for the discussion of LCT, I am not convinced that the haplotypes depicted here are on the right scale to reflect processes happening on the Faroes. Given the admixture/population history, it at the very least should be discussed in the context of whether the 13910 allele frequency on the Faroes is at odds with what would be expected based on the admixture sources.

      We agree that more investigation into the LCT allele frequency in the other ancient samples may provide some insight into the selection history, particularly in light of ancient admixture. Please note, we did look at the allele frequency of the LCT allele rs4988235 and stated in the main text that it was present at high frequencies in the historical (250BP) Faroese samples. The frequency of this allele in the imputed historical Faroese samples is 82% while the allele is present at ~74% frequency in modern samples. We did not report the exact percentage in the main text because the sample size of the historical samples (11 individuals) is small and coverage of ancient samples is low, leading to potential errors in imputation. However, we can try to calculate the LCT allele frequency in other ancient samples, and assuming that we have good proxies for the sources at the time of admixture, we may calculate the expected allele frequency in the admixed ancestors of the Faroese founders in the next revision.

      (5) I am lacking information to evaluate the procedure for turning the outliers into p-values. Both iHS and XP-EHH are ratio statistics, meaning they might be heavy-tailed if one is not careful, and the central limit theorem may not apply. It would be much easier (and probably sufficient for the points being made here) to reframe this analysis in terms of empirical outliers.

      Given that there are disagreements on the best approach to reporting selection scan results from the reviewers, in our revision, we can additionally supply both the standardized iHS / XP-EHH values in the supplementary information as well as these values transformed to p-values. As the p-values are derived from the empirical distribution, the “significant” p-values are also empirical outliers from the empirical distribution, so the conclusions of the manuscript do not change. We found that the p-value approach and controlling for FDR is more conservative, with fewer signals reaching “significance” than are considered empirical outliers based on common approaches such as IQR or arbitrary percentile cutoffs.

      (6) Oldest individual predating gene flow: It seems impossible to make any statements based on a single individual. Why is it implausible that this person (or their parents), e.g., moved to the Faroes within their lifetime and died there?

      We agree with the reviewer that this is a plausible explanation, and in future revisions we will update the main text to acknowledge this possibility.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      Mast cells have previously been reported to play an important role in bacterial immune defense and act protectively in sepsis. However, many of these findings were based on studies using Kit mutant mice. In this study, the authors conducted a detailed investigation using mast cell-deficient Cpa3 Cre-Master mice. As a result, the authors found that the Cpa3 Cre-Master mice exhibited responses similar to wild-type mice in terms of bacterial immune defense. This suggests that the observed phenotype is not due to mast cell-dependent bacterial immune defense, but rather is associated with dysbiosis of the gut microbiota.

      Strengths:

      Mast cells have long been reported to play an important role in the protective response against sepsis, and their function in infection defense has been demonstrated. However, Kit mutant mice have been reported to exhibit impaired peristalsis, and several mast cell-specific genetically modified mouse lines have since been developed and examined in detail. This study presents an important finding by logically demonstrating that the exacerbation of sepsis in Kit mice is due to alterations in the gut microbiota, and that the phenotype previously thought to be mast cell-dependent was, in fact, not.

      In addition, the experiments were carefully designed using mice with matched genetic backgrounds. These findings underscore the importance of microbiota composition in interpreting immune phenotypes and highlight the need for co-housing controls in mutant mouse studies.

      A major strength of this work is the robustness of the CLP data, generated over eight years by three independent researchers across two institutions with large sample sizes, lending strong support to the conclusions.

      Weaknesses:

      The study assesses only a limited subset of gut bacterial species, leaving the extent to which E. coli expansion contributes to the observed phenotype unclear.

      We will add new data based on 16S rRNA sequencing to the revised version.

      Moreover, in the cohousing experiments, there is no evidence provided to confirm successful microbiota normalization between groups.

      We note that co-housing is a generally accepted method for microbiota equalization or conversion (Caruso et al., Cell Rep. 2019, Ridaura et al., Science 2013, and reviewed in Moore et al., Clin. Transl. Immunol. 2016). In any case, Kit<sup>W/Wv</sup> mutants were made resistant to CLP by co-housing. Similar microbiota sequencing results between groups,while useful, would again only be correlative.

      A more detailed analysis of the microbial composition would be necessary to strengthen the reliability of the findings.

      See above

      It is also important to note that Cpa3-deficient mice exhibit not only mast cell depletion but also defects in basophils and T cells. These additional immunological alterations may counterbalance one another, potentially masking phenotypic changes and complicating interpretation.

      Regarding basophils in Cpa3<sup>Cre</sup> mice, compared to wild-type mice, basophils are reduced to about 39% of normal (Feyerabend et al., Immunity 2011). In Kit<sup>W/Wv</sup> mice, compared to wild-type mice, basophils are reduced to about 11% of normal. To our knowlegde, there has been no phenotype reported in which a reduction in basophils compensates for the loss for mast cells. Given that Kit<sup>W/Wv</sup> mice have about threefold lower numbers of basophils, and are highly susceptible to sepsis, there is no evidence that a reduction in basophils is protective in mast cell-deficient mice. On the contrary, mice that were normal for mast cells but had their basophils depleted were more susceptible to sepsis (Piliponsky et al., Nat. Immunol. 2019). Hence, basophils appear to be protective, and their reduction increases susceptibility. In light of these data and considerations, there is no evidence for a reduction in basophils to counterbalance the loss of mast cells in Cpa3<sup>Cre</sup> mice.

      Regarding T cells, there is no evidence, and there are no reports, that Cpa3<sup>Cre</sup> mice have defects in T cells (Feyerabend et al., Immunity 2011, Feyerabend et al., Cell Metabolism 2016). Cpa3 is weakly and transiently expressed early in the T cell lineage (Feyerabend et al., Immunity 2009; for expression levels in T cells versus mast cells, see below figure from the Immgen Database). In summary, in contrast to the reviewer's claim, there are no known defects in T cell development or functions in Cpa3<sup>Cre</sup> mice.

      Author response image 1.

      Generated from the Immgen database. Shown are RNAseq gene expression levels of diverse T-cell and mast cell populations.

      Furthermore, it remains to be determined whether the altered gut microbiota observed in Kit<sup>W/Wv</sup> mice is a consequence of impaired intestinal motility, whether a similar phenotype is observed in KitW-sh/W-sh mice, and whether comparable results occur in SCF-deficient models. Addressing these questions would provide greater clarity on the contribution of mast cells versus secondary factors in the observed phenotypes.

      Mice without mast cells (Cpa3<sup>Cre</sup> mice) are as resistant to sepsis as wild-type mice. Hence, mast cells are not involved in the immunity against sepsis, and 'secondary factors' are not involved in this simple experiment (both groups of mice, wild type and Cpa3<sup>Cre</sup> mice, were on the idential genetic background). Second, Kit<sup>W/Wv</sup> mice are also as resistant to sepsis as wild-type mice when confronted with the identical intestinal slurry. Therefore, Kit<sup>W/Wv</sup> mice have no immune deficit in response to sepsis. Hence, in our view, the underlying immunological question regarding the role of mast cells in sepsis has been conclusively addressed by our data. Future studies may address the mechanism that causes dysbiosis in Kit<sup>W/Wv</sup> mice, and other Kit mutants and steel mutants could be examined as well. These questions are, however, unrelated to the role of mast cells in sepsis, or the response of Kit<sup>W/Wv</sup> mice to sepsis, and would therefore not affect the central conclusion of our manuscript ("Susceptibility of Kit-mutant mice to sepsis caused by enteral dysbiosis, not mast cell deficiency").

      Given that Kit<sup>W/Wv</sup> mice exhibit impaired peristalsis, is the observed increase in E. coli a consequence of this dysfunction?

      See above

      Previous studies with BMMC reconstitution experiments have indicated that mast cells are a source of TNF - how does this align with the current findings?

      It is possible that cultured and transplanted mast cells (BMMC) produce TNF. Given that we did not find a reduction in TNF levels in the peritoneal lavage or serum in mice without mast cells undergoing sepsis, under physiological conditions mast cell-derived TNF does not seem to have a measuable impact on total TNF levels.

      Reviewer #2 (Public review):

      Summary:

      This study presents a useful finding that the high susceptibility to CLP sepsis of Kit-mutant mice is not due to mast cell deficiency, but to dysbiosis.

      However, the present data are insufficient and incomplete to support the conclusion, and would benefit from more rigorous approaches. With the mechanism part strengthened, this paper would be of interest to researchers on mast cell biology and mucosal immunology.

      We disagree with this view that our data are insufficient and incomplete. Our results demonstrate that mice lacking mast cells (Cpa3<sup>Cre</sup> mice) are as resistant to sepsis as wild-type mice, indicating that mast cells do not play a detectable role in immunity against sepsis. Additionally, we show that Kit<sup>W/Wv</sup> mice exhibit the same resistance to sepsis as wild-type mice when confronted with the identical intestinal slurry. This finding demonstrates that Kit<sup>W/Wv</sup> mice have no immune deficit in response to sepsis. These central data are both sufficient and complete, given that our data fully address the immunological questions regarding the role of mast cells in sepsis. Our study aimed to investigate the role of mast cells in sepsis, not to examine the mechanisms of dysbiosis or associated pathological phenotypes in Kit mutant controls.

      Recommendations:

      (1) The authors showed that E. coli increases in the cecum of Kit-mutant mice, which causes high CLP susceptibility. However, they did not provide any evidence E. coli is responsible for the high susceptibility.

      We showed that E. coli CFUs were increased in the cecum of Kit-mutant mice, but we did not state that this causes CLP susceptibility. We wrote: 'Hence, Kit<sup>W/Wv</sup> microbiota contains high levels of E. coli, which may underlie the observed pathogenicity'. We demonstrated that intestinal slurry from Kit<sup>W/Wv</sup> mice is more pathogenic compared to intestinal slurry from wild-type mice. However, we did not search for, or identify the bacterial species that causes this increased pathogenicity because we were adressing the role of mast cell in sepsis. 

      In the Figure 3 experiments, the authors administered the same number of cecal bacteria and did not show the number of E. coli after the administration.

      The samples were split and one aliquot was analysed by microbiology and the other aliquot was injected intraperitoneally. Fig. 3d shows the colony forming units (for Lactobacilli and E coli) from aliquots of cecal slurry used in the intraperitoneal injection experiments shown in Fig. 3a-c. Hence, our data show the colony forming units that were injected into the mice. It is unclear to us why this is not the key information rather than 'the number of E. coli after the administration'.

      The authors should provide evidence showing that depletion of E. coli decreases susceptibility.

      See response to point 1 above.

      (2) The author should provide direct evidence of dysbiosis by, for example, shotgun sequencing of cecal and fecal contents.

      The large increase in E coli counts in Kit<sup>W/Wv</sup> is evidence of dysbiosis. To obtain data beyond classical microbiology, we also performed 16S rRNA sequencing which will be included in the revision.

      (3) In case the authors find dysbiosis, they should analyze the mechanisms by which Kit mutation causes dysbiosis.

      The mechanism that causes dysbiosis in Kit<sup>W/Wv</sup> mice (which emerged from our work) belongs to other research areas that address the role of Kit in intestinal pathophysiology. These questions are unrelated to the role of mast cells in sepsis, or the response of Kit<sup>W/Wv</sup> mice to sepsis. Regardless of the results of such experiments, the conclusion ("Susceptibility of Kit-mutant mice to sepsis caused by enteral dysbiosis, not mast cell deficiency") remains unaffected. In brief, further explorations of pathological phenotypes of a control mutant will not add to the core message. Along these lines, the review process and the revision shall center on making the core of a paper as conclusive as possible, and not widen a paper by requests 'tangential to the main conclusion' (Kaelin Jr. Nature 2017).

      References

      Caruso, R., Ono, M., Bunker, M. E., Núñez, G. & Inohara, N. Dynamic and Asymmetric Changes of the Microbial Communities after Cohousing in Laboratory Mice. Cell Rep. 27, 3401-3412.e3 (2019).

      Feyerabend, T. B. et al. Deletion of Notch1 Converts Pro-T Cells to Dendritic Cells and Promotes Thymic B Cells by Cell-Extrinsic and Cell-Intrinsic Mechanisms. Immunity 30, 67–79 (2009).

      Feyerabend, T. B. et al. Cre-Mediated Cell Ablation Contests Mast Cell Contribution in Models of Antibody- and T Cell-Mediated Autoimmunity. Immunity 35, 832–844 (2011).

      Feyerabend, T. B., Gutierrez, D. A. & Rodewald, H.-R. Of Mouse Models of Mast Cell Deficiency and Metabolic Syndrome. Cell Metab 24, 1–2 (2016).

      Kaelin Jr, W. G. Publish houses of brick, not mansions of straw. Nature 545, 387–387 (2017).

      Moore, R. J. & Stanley, D. Experimental design considerations in microbiota/inflammation studies. Clin. Transl. Immunol. 5, e92 (2016).

      Piliponsky, A. M. et al. Basophil-derived tumor necrosis factor can enhance survival in a sepsis model in mice. Nat. Immunol. 20, 129–140 (2019).

      Ridaura, V. K. et al. Gut Microbiota from Twins Discordant for Obesity Modulate Metabolism in Mice. Science 341, 1241214 (2013).

    1. Author response:

      Reviewer #1 (Public review):

      Wang et al., recorded concurrent EEG-fMRI in 107 participants during nocturnal NREM sleep to investigate brain activity and connectivity related to slow oscillations (SO), sleep spindles, and in particular their co-occurrence. The authors found SO-spindle coupling to be correlated with increased thalamic and hippocampal activity, and with increased functional connectivity from the hippocampus to the thalamus and from the thalamus to the neocortex, especially the medial prefrontal cortex (mPFC). They concluded the brain-wide activation pattern to resemble episodic memory processing, but to be dissociated from task-related processing and suggest that the thalamus plays a crucial role in coordinating the hippocampal-cortical dialogue during sleep.

      The paper offers an impressively large and highly valuable dataset that provides the opportunity for gaining important new insights into the network substrate involved in SOs, spindles, and their coupling. However, the paper does unfortunately not exploit the full potential of this dataset with the analyses currently provided, and the interpretation of the results is often not backed up by the results presented. I have the following specific comments.

      Thank you for your thoughtful and constructive feedback. We greatly appreciate your recognition of the strengths of our dataset and findings Below, we address your specific comments and provide responses to each point you raised to ensure our methods and results are as transparent and comprehensible as possible. We hope these revisions address your comments and further strengthen our manuscript. Thank you again for the constructive feedback.

      (1) The introduction is lacking sufficient review of the already existing literature on EEG-fMRI during sleep and the BOLD-correlates of slow oscillations and spindles in particular (Laufs et al., 2007; Schabus et al., 2007; Horovitz et al., 2008; Laufs, 2008; Czisch et al., 2009; Picchioni et al., 2010; Spoormaker et al., 2010; Caporro et al., 2011; Bergmann et al., 2012; Hale et al., 2016; Fogel et al., 2017; Moehlman et al., 2018; Ilhan-Bayrakci et al., 2022). The few studies mentioned are not discussed in terms of the methods used or insights gained.

      We acknowledge the need for a more comprehensive review of prior EEG-fMRI studies investigating BOLD correlates of slow oscillations and spindles. However, these articles are not all related to sleep SO or spindle. Articles (Hale et al., 2016; Horovitz et al., 2008; Laufs, 2008; Laufs, Walker, & Lund, 2007; Spoormaker et al., 2010) mainly focus on methodology for EEG-fMRI, sleep stages, or brain networks, which are not the focus of our study. Thank you again for your attention to the comprehensiveness of our literature review, and we will expand the introduction to include a more detailed discussion of the existing literature, ensuring that the contributions of previous EEG-fMRI sleep studies are adequately acknowledged.

      Introduction, Page 4 Lines 62-76

      “Investigating these sleep-related neural processes in humans is challenging because it requires tracking transient sleep rhythms while simultaneously assessing their widespread brain activation. Recent advances in simultaneous EEG-fMRI techniques provide a unique opportunity to explore these processes. EEG allows for precise event-based detection of neural signal, while fMRI provides insight into the broader spatial patterns of brain activation and functional connectivity (Horovitz et al., 2008; Huang et al., 2024; Laufs, 2008; Laufs, Walker, & Lund, 2007; Schabus et al., 2007; Spoormaker et al., 2010). Previous EEG-fMRI studies on sleep have focused on classifying sleep stages or examining the neural correlates of specific waves (Bergmann et al., 2012; Caporro et al., 2012; Czisch et al., 2009; Fogel et al., 2017; Hale et al., 2016; Ilhan-Bayrakcı et al., 2022; Moehlman et al., 2019; Picchioni et al., 2011). These studies have generally reported that slow oscillations are associated with widespread cortical and subcortical BOLD changes, whereas spindles elicit activation in the thalamus, as well as in several cortical and paralimbic regions. Although these findings provide valuable insights into the BOLD correlates of sleep rhythms, they often do not employ sophisticated temporal modeling (Huang et al., 2024), to capture the dynamic interactions between different oscillatory events, e.g., the coupling between SOs and spindles.”

      (2) The paper falls short in discussing the specific insights gained into the neurobiological substrate of the investigated slow oscillations, spindles, and their interactions. The validity of the inverse inference approach ("Open ended cognitive state decoding"), assuming certain cognitive functions to be related to these oscillations because of the brain regions/networks activated in temporal association with these events, is debatable at best. It is also unclear why eventually only episodic memory processing-like brain-wide activation is discussed further, despite the activity of 16 of 50 feature terms from the NeuroSynth v3 dataset were significant (episodic memory, declarative memory, working memory, task representation, language, learning, faces, visuospatial processing, category recognition, cognitive control, reading, cued attention, inhibition, and action).

      Thank you for pointing this out, particularly regarding the use of inverse inference approaches such as “open-ended cognitive state decoding.” Given the concerns about the indirectness of this approach, we decided to remove its related content and results from Figure 3 in the main text and include it in Supplementary Figure 7. We will refocus the main text on direct neurobiological insights gained from our EEG-fMRI analyses, particularly emphasizing the hippocampal-thalamocortical network dynamics underlying SO-spindle coupling, and we will acknowledge the exploratory nature of these findings and highlight their limitations.

      Discussion, Page 17-18 Lines 323-332

      “To explore functional relevance, we employed an open-ended cognitive state decoding approach using meta-analytic data (NeuroSynth: Yarkoni et al. (2011)). Although this method usefully generates hypotheses about potential cognitive processes, particularly in the absence of a pre- and post-sleep memory task, it is inherently indirect. Many cognitive terms showed significant associations (16 of 50), such as “episodic memory,” “declarative memory,” and “working memory.” We focused on episodic/declarative memory given the known link with hippocampal reactivation (Diekelmann & Born, 2010; Staresina et al., 2015; Staresina et al., 2023). Nonetheless, these inferences regarding memory reactivation should be interpreted cautiously without direct behavioral measures. Future research incorporating explicit tasks before and after sleep would more rigorously validate these potential functional claims.”

      (3) Hippocampal activation during SO-spindles is stated as a main hypothesis of the paper - for good reasons - however, other regions (e.g., several cortical as well as thalamic) would be equally expected given the known origin of both oscillations and the existing sleep-EEG-fMRI literature. However, this focus on the hippocampus contrasts with the focus on investigating the key role of the thalamus instead in the Results section.

      We appreciate your insight regarding the relative emphasis on hippocampal and thalamic activation in our study. We recognize that the manuscript may currently present an inconsistency between our initial hypothesis and the main focus of the results. To address this concern, we will ensure that our Introduction and Discussion section explicitly discusses both regions, highlighting the complementary roles of the hippocampus (memory processing and reactivation) and the thalamus (spindle generation and cortico-hippocampal coordination) in SO-spindle dynamics.

      Introduction, Page 5 Lines 87-103

      “To address this gap, our study investigates brain-wide activation and functional connectivity patterns associated with SO-spindle coupling, and employs a cognitive state decoding approach (Margulies et al., 2016; Yarkoni et al., 2011)—albeit indirectly—to infer potential cognitive functions. In the current study, we used simultaneous EEG-fMRI recordings during nocturnal naps (detailed sleep staging results are provided in the Methods and Table S1) in 107 participants. Although directly detecting hippocampal ripples using scalp EEG or fMRI is challenging, we expected that hippocampal activation in fMRI would coincide with SO-spindle coupling detected by EEG, given that SOs, spindles, and ripples frequently co-occur during NREM sleep. We also anticipated a critical role of the thalamus, particularly thalamic spindles, in coordinating hippocampal-cortical communication.

      We found significant coupling between SOs and spindles during NREM sleep (N2/3), with spindle peaks occurring slightly before the SO peak. This coupling was associated with increased activation in both the thalamus and hippocampus, with functional connectivity patterns suggesting thalamic coordination of hippocampal-cortical communication. These findings highlight the key role of the thalamus in coordinating hippocampal-cortical interactions during human sleep and provide new insights into the neural mechanisms underlying sleep-dependent brain communication. A deeper understanding of these mechanisms may contribute to future neuromodulation approaches aimed at enhancing sleep-dependent cognitive function and treating sleep-related disorders.”

      Discussion, Page 16-17 Lines 292-307

      “When modeling the timing of these sleep rhythms in the fMRI, we observed hippocampal activation selectively during SO-spindle events. This suggests the possibility of triple coupling (SOs–spindles–ripples), even though our scalp EEG was not sufficiently sensitive to detect hippocampal ripples—key markers of memory replay (Buzsáki, 2015). Recent iEEG evidence indicates that ripples often co-occur with both spindles (Ngo, Fell, & Staresina, 2020) and SOs (Staresina et al., 2015; Staresina et al., 2023). Therefore, the hippocampal involvement during SO-spindle events in our study may reflect memory replay from the hippocampus, propagated via thalamic spindles to distributed cortical regions.

      The thalamus, known to generate spindles (Halassa et al., 2011), plays a key role in producing and coordinating sleep rhythms (Coulon, Budde, & Pape, 2012; Crunelli et al., 2018), while the hippocampus is found essential for memory consolidation (Buzsáki, 2015; Diba & Buzsá ki, 2007; Singh, Norman, & Schapiro, 2022). The increased hippocampal and thalamic activity, along with strengthened connectivity between these regions and the mPFC during SO-spindle events, underscores a hippocampal-thalamic-neocortical information flow. This aligns with recent findings suggesting the thalamus orchestrates neocortical oscillations during sleep (Schreiner et al., 2022). The thalamus and hippocampus thus appear central to memory consolidation during sleep, guiding information transfer to the neocortex, e.g., mPFC.”

      (4) The study included an impressive number of 107 subjects. It is surprising though that only 31 subjects had to be excluded under these difficult recording conditions, especially since no adaptation night was performed. Since only subjects were excluded who slept less than 10 min (or had excessive head movements) there are likely several datasets included with comparably short durations and only a small number of SOs and spindles and even less combined SO-spindle events. A comprehensive table should be provided (supplement) including for each subject (included and excluded) the duration of included NREM sleep, number of SOs, spindles, and SO+spindle events. Also, some descriptive statistics (mean/SD/range) would be helpful.

      We appreciate your recognition of our sample size and the challenges associated with simultaneous EEG-fMRI sleep recordings. We acknowledge the importance of transparently reporting individual subject data, particularly regarding sleep duration and the number of detected SOs, spindles, and SO-spindle events. To address this, we will provide comprehensive tables in the supplementary materials, contains descriptive information about sleep-related characteristics (Table S1), as well as detailed information about sleep waves at each sleep stage for all 107 subjects(Table S2-S4), listing for each subject:(1)Different sleep stage duration; (2)Number of detected SOs; (3)Number of detected spindles; (4)Number of detected SO-spindle coupling events; (5)Density of detected SOs; (6)Density of detected spindles; (7)Density of detected SO-spindle coupling events.

      However, most of the excluded participants were unable to fall asleep or had too short a sleep duration, so they basically had no NREM sleep period, so it was impossible to count the NREM sleep duration, SO, spindle, and coupling numbers.

      Supplementary Materials, Page 42-54, Table S1-S4

      (Consider of the length, we do not list all the tables here. Please refer to the revised manuscript.)

      (5) Was the 20-channel head coil dedicated for EEG-fMRI measurements? How were the electrode cables guided through/out of the head coil? Usually, the 64-channel head coil is used for EEG-fMRI measurements in a Siemens PRISMA 3T scanner, which has a cable duct at the back that allows to guide the cables straight out of the head coil (to minimize MR-related artifacts). The choice for the 20-channel head coil should be motivated. Photos of the recording setup would also be helpful.

      Thank you for your comment regarding our choice of the 20-channel head coil for EEG-fMRI measurements. We acknowledge that the 64-channel head coil is commonly used in Siemens PRISMA 3T scanners; however, the 20-channel coil was selected due to specific practical and technical considerations in our study. In particular, the 20-channel head coil was compatible with our EEG system and ensured sufficient signal-to-noise ratio (SNR) for both EEG and fMRI acquisition. The EEG electrode cables were guided through the lateral and posterior openings of the head coil, secured with foam padding to reduce motion and minimize MR-related artifacts. Moreover, given the extended nature of nocturnal sleep recordings, the 20-channel coil allowed us to maintain participant comfort while still achieving high-quality simultaneous EEG-fMRI data.

      We have made this clearer in the revised manuscript.

      Methods, Page 20 Lines 385-392

      “All MRI data were acquired using a 20-channel head coil on a research-dedicated 3-Tesla Siemens Magnetom Prisma MRI scanner. Earplugs and cushions were provided for noise protection and head motion restriction. We chose the 20-channel head coil because it was compatible with our EEG system and ensured sufficient signal-to-noise ratio (SNR) for both EEG and fMRI acquisition. The EEG electrode cables were guided through the lateral and posterior openings of the head coil, secured with foam padding to reduce motion and minimize MR-related artifacts. Moreover, given the extended nature of nocturnal sleep recordings, the 20-channel coil helped maintain participant comfort while still achieving high-quality simultaneous EEG-fMRI data.”

      (6) Was the EEG sampling synchronized to the MR scanner (gradient system) clock (the 10 MHz signal; not referring to the volume TTL triggers here)? This is a requirement for stable gradient artifact shape over time and thus accurate gradient noise removal.

      Thank you for raising this important point. We confirm that the EEG sampling was synchronized to the MR scanner’s 10 MHz gradient system clock, ensuring a stable gradient artifact shape over time and enabling accurate artifact removal. This synchronization was achieved using the standard clock synchronization interface of the EEG amplifier, minimizing timing jitter and drift. As a result, the gradient artifact waveform remained stable across volumes, allowing for more effective artifact correction during preprocessing. We appreciate your attention to this critical aspect of EEG-fMRI data acquisition.

      We have made this clearer in the revised manuscript.

      Methods, Page 19-20 Lines 371-383

      “EEG was recorded simultaneously with fMRI data using an MR-compatible EEG amplifier system (BrainAmps MR-Plus, Brain Products, Germany), along with a specialized electrode cap. The recording was done using 64 channels in the international 10/20 system, with the reference channel positioned at FCz. In order to adhere to polysomnography (PSG) recording standards, six electrodes were removed from the EEG cap: one for electrocardiogram (ECG) recording, two for electrooculogram (EOG) recording, and three for electromyogram (EMG) recording. EEG data was recorded at a sample rate of 5000 Hz, the resistance of the reference and ground channels was kept below 10 kΩ, and the resistance of the other channels was kept below 20 kΩ. To synchronize the EEG and fMRI recordings, the BrainVision recording software (BrainProducts, Germany) was utilized to capture triggers from the MRI scanner. The EEG sampling was synchronized to the MR scanner’s 10 MHz gradient system clock, ensuring a stable gradient artifact shape over time and enabling accurate artifact removal. This was achieved via the standard clock synchronization interface of the EEG amplifier, minimizing timing jitter and drift.”

      (7) The TR is quite long and the voxel size is quite large in comparison to state-of-the-art EPI sequences. What was the rationale behind choosing a sequence with relatively low temporal and spatial resolution?

      We acknowledge that our chosen TR and voxel size are relatively long and large compared to state-of-the-art EPI sequences. This decision was made to optimize the signal-to-noise ratio (SNR) and reduce susceptibility-related distortions, which are particularly critical in EEG-fMRI sleep studies where head motion and physiological noise can be substantial. A longer TR allowed us to sample whole-brain activity with sufficient coverage, while a larger voxel size helped enhance BOLD sensitivity and minimize partial volume effects in deep brain structures such as the thalamus and hippocampus, which are key regions of interest in our study. We appreciate your concern and hope this clarification provides sufficient rationale for our sequence parameters.

      We have made this clearer in the revised manuscript.

      Methods, Page 20-21 Lines 398-408

      “Then, the “sleep” session began after the participants were instructed to try and fall asleep. For the functional scans, whole-brain images were acquired using k-space and steady-state T2*-weighted gradient echo-planar imaging (EPI) sequence that is sensitive to the BOLD contrast. This measures local magnetic changes caused by changes in blood oxygenation that accompany neural activity (sequence specification: 33 slices in interleaved ascending order, TR = 2000 ms, TE = 30 ms, voxel size = 3.5 × 3.5 × 4.2 mm<sup>3</sup>, FA = 90°, matrix = 64 × 64, gap = 0.7 mm). A relatively long TR and larger voxel size were chosen to optimize SNR and reduce susceptibility-related distortions, which are critical in EEG-fMRI sleep studies where head motion and physiological noise can be substantial. The longer TR allowed whole-brain coverage with sufficient temporal resolution, while the larger voxel size helped enhance BOLD sensitivity and minimize partial volume effects in deep brain structures (e.g., the thalamus and hippocampus), which are key regions of interest in this study.”

      (8) The anatomically defined ROIs are quite large. It should be elaborated on how this might reduce sensitivity to sleep rhythm-specific activity within sub-regions, especially for the thalamus, which has distinct nuclei involved in sleep functions.

      We appreciate your insight regarding the use of anatomically defined ROIs and their potential limitations in detecting sleep rhythm-specific activity within sub-regions, particularly in the thalamus. Given the distinct functional roles of thalamic nuclei in sleep processes, we acknowledge that using a single, large thalamic ROI may reduce sensitivity to localized activity patterns. To address this, we will discuss this limitation in the revised manuscript, acknowledging that our approach prioritizes whole-structure effects but may not fully capture nucleus-specific contributions.

      Discussion, Page 18 Lines 333-341

      “Despite providing new insights, our study has several limitations. First, our scalp EEG did not directly capture hippocampal ripples, preventing us from conclusively demonstrating triple coupling. Second, the combination of EEG-fMRI and the lack of a memory task limit our ability to parse fine-grained BOLD responses at the DOWN- vs. UP-states of SOs and link observed activations to behavioral outcomes. Third, the use of large anatomical ROIs may mask subregional contributions of specific thalamic nuclei or hippocampal subfields. Finally, without a memory task, we cannot establish a direct behavioral link between sleep-rhythm-locked activation and memory consolidation. Future studies combining techniques such as ultra-high-field fMRI or iEEG with cognitive tasks may refine our understanding of subregional network dynamics and functional significance during sleep.”

      (9) The study reports SO & spindle amplitudes & densities, as well as SO+spindle coupling, to be larger during N2/3 sleep compared to N1 and REM sleep, which is trivial but can be seen as a sanity check of the data. However, the amount of SOs and spindles reported for N1 and REM sleep is concerning, as per definition there should be hardly any (if SOs or spindles occur in N1 it becomes by definition N2, and the interval between spindles has to be considerably large in REM to still be scored as such). Thus, on the one hand, the report of these comparisons takes too much space in the main manuscript as it is trivial, but on the other hand, it raises concerns about the validity of the scoring.

      We appreciate your concern regarding the reported presence of SOs and spindles in N1 and REM sleep and the potential implications. Our detection method for detecting SO, spindle, and coupling were originally designed only for N2&N3 sleep data based on the characteristics of the data itself, and this method is widely recognized and used in the sleep research (Hahn et al., 2020; Helfrich et al., 2019; Helfrich et al., 2018; Ngo, Fell, & Staresina, 2020; Schreiner et al., 2022; Schreiner et al., 2021; Staresina et al., 2015; Staresina et al., 2023). While, because the detection methods for SO and spindle are based on percentiles, this method will always detect a certain number of events when used for other stages (N1 and REM) sleep data, but the differences between these events and those detected in stage N23 remain unclear. We will acknowledge the reasons for these results in the Methods section and emphasize that they are used only for sanity checks.

      Methods, Page 25 Lines 515-524

      “We note that the above methods for detecting SOs, spindles, and their couplings were originally developed for N2 and N3 sleep data, based on the specific characteristics of these stages. These methods are widely recognized in sleep research (Hahn et al., 2020; Helfrich et al., 2019; Helfrich et al., 2018; Ngo, Fell, & Staresina, 2020; Schreiner et al., 2022; Schreiner et al., 2021; Staresina et al., 2015; Staresina et al., 2023). However, because this percentile-based detection approach will inherently identify a certain number of events if applied to other stages (e.g., N1 and REM), the nature of these events in those stages remains unclear compared to N2/N3. We nevertheless identified and reported the detailed descriptive statistics of these sleep rhythms in all sleep stages, under the same operational definitions, both for completeness and as a sanity check. Within the same subject, there should be more SOs, spindles, and their couplings in N2/N3 than in N1 or REM (see also Figure S2-S4, Table S1-S4).”

      (10) Why was electrode F3 used to quantify the occurrence of SOs and spindles? Why not a midline frontal electrode like Fz (or a number of frontal electrodes for SOs) and Cz (or a number of centroparietal electrodes) for spindles to be closer to their maximum topography?

      We appreciate your suggestion regarding electrode selection for SO and spindle quantification. Our choice of F3 was primarily based on previous studies (Massimini et al., 2004; Molle et al., 2011), where bilateral frontal electrodes are commonly used for detecting SOs and spindles. Additionally, we considered the impact of MRI-related noise and, after a comprehensive evaluation, determined that F3 provided an optimal balance between signal quality and artifact minimization. We also acknowledge that alternative electrode choices, such as Fz for SOs and Cz for spindles, could provide additional insights into their topographical distributions.

      (11) Functional connectivity (hippocampus -> thalamus -> cortex (mPFC)) is reported to be increased during SO-spindle coupling and interpreted as evidence for coordination of hippocampo-neocortical communication likely by thalamic spindles. However, functional connectivity was only analysed during coupled SO+spindle events, not during isolated SOs or isolated spindles. Without the direct comparison of the connectivity patterns between these three events, it remains unclear whether this is specific for coupled SO+spindle events or rather associated with one or both of the other isolated events. The PPIs need to be conducted for those isolated events as well and compared statistically to the coupled events.

      We appreciate your critical perspective on our functional connectivity analysis and the interpretation of hippocampus-thalamus-cortex (mPFC) interactions during SO-spindle coupling. We acknowledge that, in the current analysis, functional connectivity was only examined during coupled SO-spindle events, without direct comparison to isolated SOs or isolated spindles. To address this concern, we have conducted PPI analyses for all three ROIs(Hippocampus, Thalamus, mPFC) and all three event types (SO-spindle couplings, isolated SOs, and isolated spindles). Our results indicate that neither isolated SOs nor isolated Spindles yielded significant connectivity changes in all three ROIs, as all failed to survive multiple comparison corrections. This suggests that the observed connectivity increase is specific to SO-spindle coupling, rather than being independently driven by either SOs or spindles alone.

      Results, Page 14 Lines 248-255

      “Crucially, the interaction between FC and SO-spindle coupling revealed that only the functional connectivity of hippocampus -> thalamus (ROI analysis, t<sub>(106)</sub> = 1.86, p = 0.0328) and thalamus -> mPFC (ROI analysis, t<sub>(106)</sub> = 1.98, p = 0.0251) significantly increased during SO-spindle coupling, with no significant changes in all other pathways (Fig. 4e). We also conducted PPI analyses for the other two events (SOs and spindles), and neither yielded significant connectivity changes in the three ROIs, as all failed to survive whole-brain FWE correction at the cluster level (p < 0.05). Together, these findings suggest that the thalamus, likely via spindles, coordinates hippocampal-cortical communication selectively during SO-spindle coupling, but not isolated SOs or spindle events alone.”

      (12) The limited temporal resolution of fMRI does indeed not allow for easily distinguishing between fMRI activation patterns related to SO-up- vs. SO-down-states. For this, one could try to extract the amplitudes of SO-up- and SO-down-states separately for each SO event and model them as two separate parametric modulators (with the risk of collinearity as they are likely correlated).

      We appreciate your insightful comment regarding the challenge of distinguishing fMRI activation patterns related to SO-up vs. SO-down states due to the limited temporal resolution of fMRI. While our current analysis does not differentiate between these two phases, we acknowledge that separately modeling SO-up and SO-down states using parametric modulators could provide a more refined understanding of their distinct neural correlates. However, as you notes, this approach carries the risk of collinearity, and there is indeed a high correlation between the two amplitudes across all subjects in our results (r=0.98). Future studies could explore more on leveraging high-temporal-resolution techniques. While implementing this in the current study is beyond our scope, we will acknowledge this limitation in the Discussion section.

      Discussion, Page 17 Lines 308-322

      “An intriguing aspect of our findings is the reduced DMN activity during SOs when modeled at the SO trough (DOWN-state). This reduced DMN activity may reflect large-scale neural inhibition characteristic of the SO trough. The DMN is typically active during internally oriented cognition (e.g., self-referential processing or mind-wandering) and is suppressed during external stimuli processing (Yeshurun, Nguyen, & Hasson, 2021). It is unlikely, however, that this suppression of DMN during SO events is related to a shift from internal cognition to external responses given it is during deep sleep time. Instead, it could be driven by the inherent rhythmic pattern of SOs, which makes it difficult to separate UP- from DOWN-states (the two temporal regressors were highly correlated, and similar brain activation during SOs events was obtained if modelled at the SO peak instead, Fig. S5). Since the amplitude at the SO trough is consistently larger than that at the SO peak, the neural activation we detected may primarily capture the large-scale inhibition from DOWN-state. Interestingly, no such DMN reduction was found during SO-spindle coupling, implying that coupling may involve distinct neural dynamics that partially re-engage DMN-related processes, possibly reflecting memory-related reactivation. Future research using high-temporal-resolution techniques like iEEG could clarify these possibilities.

      Discussion, Page 18 Lines 333-341

      “Despite providing new insights, our study has several limitations. First, our scalp EEG did not directly capture hippocampal ripples, preventing us from conclusively demonstrating triple coupling. Second, the combination of EEG-fMRI and the lack of a memory task limit our ability to parse fine-grained BOLD responses at the DOWN- vs. UP-states of SOs and link observed activations to behavioral outcomes. Third, the use of large anatomical ROIs may mask subregional contributions of specific thalamic nuclei or hippocampal subfields. Finally, without a memory task, we cannot establish a direct behavioral link between sleep-rhythm-locked activation and memory consolidation. Future studies combining techniques such as ultra-high-field fMRI or iEEG with cognitive tasks may refine our understanding of subregional network dynamics and functional significance during sleep.

      (13) L327: "It is likely that our findings of diminished DMN activity reflect brain activity during the SO DOWN-state, as this state consistently shows higher amplitude compared to the UP-state within subjects, which is why we modelled the SO trough as its onset in the fMRI analysis." This conclusion is not justified as the fact that SO down-states are larger in amplitude does not mean their impact on the BOLD response is larger.

      We appreciate your concern regarding our interpretation of diminished DMN activity reflecting the SO down-state. We acknowledge that the current expression is somewhat misleading, and our interpretation of it is: it could be driven by the inherent rhythmic pattern of SOs, which makes it difficult to separate UP- from DOWN-states (the two temporal regressors were highly correlated, and similar brain activation during SOs events was obtained if modelled at the SO peak instead). Since the amplitude at the SO trough is consistently larger than that at the SO peak, the neural activation we detected may primarily capture the large-scale inhibition from DOWN-state. And we will make this clear in the Discussion section.

      Discussion, Page 17 Lines 308-322

      “An intriguing aspect of our findings is the reduced DMN activity during SOs when modeled at the SO trough (DOWN-state). This reduced DMN activity may reflect large-scale neural inhibition characteristic of the SO trough. The DMN is typically active during internally oriented cognition (e.g., self-referential processing or mind-wandering) and is suppressed during external stimuli processing (Yeshurun, Nguyen, & Hasson, 2021). It is unlikely, however, that this suppression of DMN during SO events is related to a shift from internal cognition to external responses given it is during deep sleep time. Instead, it could be driven by the inherent rhythmic pattern of SOs, which makes it difficult to separate UP- from DOWN-states (the two temporal regressors were highly correlated, and similar brain activation during SOs events was obtained if modelled at the SO peak instead, Fig. S5). Since the amplitude at the SO trough is consistently larger than that at the SO peak, the neural activation we detected may primarily capture the large-scale inhibition from DOWN-state. Interestingly, no such DMN reduction was found during SO-spindle coupling, implying that coupling may involve distinct neural dynamics that partially re-engage DMN-related processes, possibly reflecting memory-related reactivation. Future research using high-temporal-resolution techniques like iEEG could clarify these possibilities.

      (14) Line 77: "In the current study, while directly capturing hippocampal ripples with scalp EEG or fMRI is difficult, we expect to observe hippocampal activation in fMRI whenever SOs-spindles coupling is detected by EEG, if SOs- spindles-ripples triple coupling occurs during human NREM sleep". Not all SO-spindle events are associated with ripples (Staresina et al., 2015), but hippocampal activation may also be expected based on the occurrence of spindles alone (Bergmann et al., 2012).

      We appreciate your clarification regarding the relationship between SO-spindle coupling and hippocampal ripples. We acknowledge that not all SO-spindle events are necessarily accompanied by ripples (Staresina et al., 2015). However, based on previous research, we found that hippocampal ripples are significantly more likely to occur during SO-spindle coupling events. This suggests that while ripple occurrence is not guaranteed, SO-spindle coupling creates a favorable network state for ripple generation and potential hippocampal activation. To ensure accuracy, we will revise the manuscript to delete this misleading sentence in the Introduction section and acknowledge in the Discussion that our results cannot conclusively directly observe the triple coupling of SO, spindle, and hippocampal ripples.

      Discussion, Page 18 Lines 333-341

      “Despite providing new insights, our study has several limitations. First, our scalp EEG did not directly capture hippocampal ripples, preventing us from conclusively demonstrating triple coupling. Second, the combination of EEG-fMRI and the lack of a memory task limit our ability to parse fine-grained BOLD responses at the DOWN- vs. UP-states of SOs and link observed activations to behavioral outcomes. Third, the use of large anatomical ROIs may mask subregional contributions of specific thalamic nuclei or hippocampal subfields. Finally, without a memory task, we cannot establish a direct behavioral link between sleep-rhythm-locked activation and memory consolidation. Future studies combining techniques such as ultra-high-field fMRI or iEEG with cognitive tasks may refine our understanding of subregional network dynamics and functional significance during sleep.”

      Reviewer #2 (Public review):

      In this study, Wang and colleagues aimed to explore brain-wide activation patterns associated with NREM sleep oscillations, including slow oscillations (SOs), spindles, and SO-spindle coupling events. Their findings reveal that SO-spindle events corresponded with increased activation in both the thalamus and hippocampus. Additionally, they observed that SO-spindle coupling was linked to heightened functional connectivity from the hippocampus to the thalamus, and from the thalamus to the medial prefrontal cortex-three key regions involved in memory consolidation and episodic memory processes.

      This study's findings are timely and highly relevant to the field. The authors' extensive data collection, involving 107 participants sleeping in an fMRI while undergoing simultaneous EEG recording, deserves special recognition. If shared, this unique dataset could lead to further valuable insights. While the conclusions of the data seem overall well supported by the data, some aspects with regard to the detection of sleep oscillations need clarification.

      The authors report that coupled SO-spindle events were most frequent during NREM sleep (2.46 [plus minus] 0.06 events/min), but they also observed a surprisingly high occurrence of these events during N1 and REM sleep (2.23 [plus minus] 0.09 and 2.32 [plus minus] 0.09 events/min, respectively), where SO-spindle coupling would not typically be expected. Combined with the relatively modest SO amplitudes reported (~25 µV, whereas >75 µV would be expected when using mastoids as reference electrodes), this raises the possibility that the parameters used for event detection may not have been conservative enough - or that sleep staging was inaccurately performed. This issue could present a significant challenge, as the fMRI findings are largely dependent on the reliability of these detected events.

      Thank you very much for your thorough and encouraging review. We appreciate your recognition of the significance and relevance of our study and dataset, particularly in highlighting how simultaneous EEG-fMRI recordings can provide complementary insights into the temporal dynamics of neural oscillations and their associated spatial activation patterns during sleep. In the sections that follow, we address each of your comments in detail. We have revised the text and conducted additional analyses wherever possible to strengthen our argument, clarify our methodological choices. We believe these revisions improve the clarity and rigor of our work, and we thank you for helping us refine it.

      We appreciate your insightful comments regarding the detection of sleep oscillations. Our methods for detecting SOs, spindles, and their couplings were originally developed for N2 and N3 sleep data, based on the specific characteristics of these stages. These methods are widely recognized in sleep research (Hahn et al., 2020; Helfrich et al., 2019; Helfrich et al., 2018; Ngo, Fell, & Staresina, 2020; Schreiner et al., 2022; Schreiner et al., 2021; Staresina et al., 2015; Staresina et al., 2023). However, because this percentile-based detection approach will inherently identify a certain number of events if applied to other stages (e.g., N1 and REM), the nature of these events in those stages remains unclear compared to N2/N3. We nevertheless identified and reported the detailed descriptive statistics of these sleep rhythms in all sleep stages, under the same operational definitions, both for completeness and as a sanity check. Within the same subject, there should be more SOs, spindles, and their couplings in N2/N3 than in N1 or REM. We will acknowledge the reasons for these results in the Methods section and emphasize that they are used only for sanity checks.

      Regarding the reported SO amplitudes (~25 µV), during preprocessing, we applied the Signal Space Projection (SSP) method to more effectively remove MRI gradient artifacts and cardiac pulse noise. While this approach enhances data quality, it also reduces overall signal power, leading to systematically lower reported amplitudes. Despite this, our SO detection in NREM sleep (especially N2/N3) remain physiologically meaningful and are consistent with previous fMRI studies using similar artifact removal techniques. We appreciate your careful evaluation and valuable suggestions.

      In addition, we will provide comprehensive tables in the supplementary materials, contains descriptive information about sleep-related characteristics (Table S1), as well as detailed information about sleep waves at each sleep stage for all 107 subjects(Table S2-S4), listing for each subject:(1)Different sleep stage duration; (2)Number of detected SOs; (3)Number of detected spindles; (4)Number of detected SO-spindle coupling events; (2)Density of detected SOs; (3)Density of detected spindles; (4)Density of detected SO-spindle coupling events.

      Methods, Page 25 Lines 515-524

      “We note that the above methods for detecting SOs, spindles, and their couplings were originally developed for N2 and N3 sleep data, based on the specific characteristics of these stages. These methods are widely recognized in sleep research (Hahn et al., 2020; Helfrich et al., 2019; Helfrich et al., 2018; Ngo, Fell, & Staresina, 2020; Schreiner et al., 2022; Schreiner et al., 2021; Staresina et al., 2015; Staresina et al., 2023). However, because this percentile-based detection approach will inherently identify a certain number of events if applied to other stages (e.g., N1 and REM), the nature of these events in those stages remains unclear compared to N2/N3. We nevertheless identified and reported the detailed descriptive statistics of these sleep rhythms in all sleep stages, under the same operational definitions, both for completeness and as a sanity check. Within the same subject, there should be more SOs, spindles, and their couplings in N2/N3 than in N1 or REM (see also Figure S2-S4, Table S1-S4).”

      Supplementary Materials, Page 42-54, Table S1-S4

      (Consider of the length, we do not list all the tables here. Please refer to the revised manuscript.)

      Reviewer #3 (Public review):

      Summary:

      Wang et al., examined the brain activity patterns during sleep, especially when locked to those canonical sleep rhythms such as SO, spindle, and their coupling. Analyzing data from a large sample, the authors found significant coupling between spindles and SOs, particularly during the upstate of the SO. Moreover, the authors examined the patterns of whole-brain activity locked to these sleep rhythms. To understand the functional significance of these brain activities, the authors further conducted open-ended cognitive state decoding and found a variety of cognitive processing may be involved during SO-spindle coupling and during other sleep events. The authors next investigated the functional connectivity analyses and found enhanced connectivity between the hippocampus, the thalamus, and the medial PFC. These results reinforced the theoretical model of sleep-dependent memory consolidation, such that SO-spindle coupling is conducive to systems-level memory reactivation and consolidation.

      Strengths:

      There are obvious strengths in this work, including the large sample size, state-of-the-art neuroimaging and neural oscillation analyses, and the richness of results.

      Weaknesses:

      Despite these strengths and the insights gained, there are weaknesses in the design, the analyses, and inferences.

      Thank you for your detailed and thoughtful review of our manuscript. We are delighted that you recognize our advanced analysis methods and rich results of neuroimaging and neural oscillations as well as the large sample size data. In the following sections, we provide detailed responses to each of your comments. And we have revised the text and conducted additional analyses to strengthen our arguments and clarify our methodological choices. We believe these revisions enhance the clarity and rigor of our work, and we sincerely appreciate your thoughtful feedback in helping us refine the manuscript.

      (1) A repeating statement in the manuscript is that brain activity could indicate memory reactivation and thus consolidation. This is indeed a highly relevant question that could be informed by the current data/results. However, an inherent weakness of the design is that there is no memory task before and after sleep. Thus, it is difficult (if not impossible) to make a strong argument linking SO/spindle/coupling-locked brain activity with memory reactivation or consolidation.

      We appreciate your suggestion regarding the lack of a pre- and post-sleep memory task in our study design. We acknowledge that, in the absence of behavioral measures, it is hard to directly link SO-spindle coupling to memory consolidation in an outcome-driven manner. Our interpretation is instead based on the well-established role of these oscillations in memory processes, as demonstrated in previous studies. We sincerely appreciate this feedback and will adjust our Discussion accordingly to reflect a more precise interpretation of our findings.

      Discussion, Page 18 Lines 333-341

      “Despite providing new insights, our study has several limitations. First, our scalp EEG did not directly capture hippocampal ripples, preventing us from conclusively demonstrating triple coupling. Second, the combination of EEG-fMRI and the lack of a memory task limit our ability to parse fine-grained BOLD responses at the DOWN- vs. UP-states of SOs and link observed activations to behavioral outcomes. Third, the use of large anatomical ROIs may mask subregional contributions of specific thalamic nuclei or hippocampal subfields. Finally, without a memory task, we cannot establish a direct behavioral link between sleep-rhythm-locked activation and memory consolidation. Future studies combining techniques such as ultra-high-field fMRI or iEEG with cognitive tasks may refine our understanding of subregional network dynamics and functional significance during sleep.”

      (2) Relatedly, to understand the functional implications of the sleep rhythm-locked brain activity, the authors employed the "open-ended cognitive state decoding" method. While this method is interesting, it is rather indirect given that there were no behavioral indices in the manuscript. Thus, discussions based on these analyses are speculative at best. Please either tone down the language or find additional evidence to support these claims.

      Moreover, the results from this method are difficult to understand. Figure 3e showed that for all three types of sleep events (SO, spindle, SO-spindle), the same mental states (e.g., working memory, episodic memory, declarative memory) showed opposite directions of activation (left and right panels showed negative and positive activation, respectively). How to interpret these conflicting results? This ambiguity is also reflected by the term used: declarative memory and episodic memories are both indexed in the results. Yet these two processes can be largely overlapped. So which specific memory processes do these brain activity patterns reflect? The Discussion shall discuss these results and the limitations of this method.

      We appreciate your critical assessment of the open-ended cognitive state decoding method and its interpretational challenges. Given the concerns about the indirectness of this approach, we decided to remove its related content and results from Figure 3 in the main text and include it in Supplementary Figure 7.

      Due to the complexity of memory-related processes, we acknowledge that distinguishing between episodic and declarative memory based solely on this approach is not straightforward. We will revise the Supplementary Materials to explicitly discuss these limitations and clarify that our findings do not isolate specific cognitive processes but rather suggest general associations with memory-related networks.

      Discussion, Page 17-18 Lines 323-332

      “To explore functional relevance, we employed an open-ended cognitive state decoding approach using meta-analytic data (NeuroSynth: Yarkoni et al. (2011)). Although this method usefully generates hypotheses about potential cognitive processes, particularly in the absence of a pre- and post-sleep memory task, it is inherently indirect. Many cognitive terms showed significant associations (16 of 50), such as “episodic memory,” “declarative memory,” and “working memory.” We focused on episodic/declarative memory given the known link with hippocampal reactivation (Diekelmann & Born, 2010; Staresina et al., 2015; Staresina et al., 2023). Nonetheless, these inferences regarding memory reactivation should be interpreted cautiously without direct behavioral measures. Future research incorporating explicit tasks before and after sleep would more rigorously validate these potenial functional claims.”

      (3) The coupling strength is somehow inconsistent with prior results (Hahn et al., 2020, eLife, Helfrich et al., 2018, Neuron). Specifically, Helfrich et al. showed that among young adults, the spindle is coupled to the peak of the SO. Here, the authors reported that the spindles were coupled to down-to-up transitions of SO and before the SO peak. It is possible that participants' age may influence the coupling (see Helfrich et al., 2018). Please discuss the findings in the context of previous research on SO-spindle coupling.

      We appreciate your concern regarding the temporal characteristics of SO-spindle coupling. We acknowledge that the SO-spindle coupling phase results in our study are not identical to those reported by Hahn et al. (2020); Helfrich et al. (2018). However, these differences may arise due to slight variations in event detection parameters, which can influence the precise phase estimation of coupling. Notably, Hahn et al. (2020) also reported slight discrepancies in their group-level coupling phase results, highlighting that methodological differences can contribute to variability across studies. Furthermore, our findings are consistent with those of Schreiner et al. (2021), further supporting the robustness of our observations.

      That said, we acknowledge that our original description of SO-spindle coupling as occurring at the "transition from the lower state to the upper state" was not entirely precise. The -π/2 phase represents the true transition point, while our observed coupling phase is actually closer to the SO peak rather than strictly at the transition. We will revise this statement in the manuscript to ensure clarity and accuracy in describing the coupling phase.

      Discussion, Page 16 Lines 283-291

      “Our data provide insights into the neurobiological underpinnings of these sleep rhythms. SOs, originating mainly in neocortical areas such as the mPFC, alternate between DOWN- and UP-states. The thalamus generates sleep spindles, which in turn couple with SOs. Our finding that spindle peaks consistently occurred slightly before the UP-state peak of SOs (in 83 out of 107 participants), concurs with prior studies, including Schreiner et al. (2021). Yet it differs from some results suggesting spindles might peak right at the SO UP-state (Hahn et al., 2020; Helfrich et al., 2018). Such discrepancies could arise from differences in detection algorithms, participant age (Helfrich et al., 2018), or subtle variations in cortical-thalamic timing. Nonetheless, these results underscore the importance of coordinated SO-spindle interplay in supporting sleep-dependent processes.”

      (4) The discussion is rather superficial with only two pages, without delving into many important arguments regarding the possible functional significance of these results. For example, the author wrote, "This internal processing contrasts with the brain patterns associated with external tasks, such as working memory." Without any references to working memory, and without delineating why WM is considered as an external task even working memory operations can be internal. Similarly, for the interesting results on SO and reduced DMN activity, the authors wrote "The DMN is typically active during wakeful rest and is associated with self-referential processes like mind-wandering, daydreaming, and task representation (Yeshurun, Nguyen, & Hasson, 2021). Its reduced activity during SOs may signal a shift towards endogenous processes such as memory consolidation." This argument is flawed. DMN is active during self-referential processing and mind-wandering, i.e., when the brain shifts from external stimuli processing to internal mental processing. During sleep, endogenous memory reactivation and consolidation are also part of the internal mental processing given the lack of external environmental stimulation. So why during SO or during memory consolidation, the DMN activity would be reduced? Were there differences in DMN activity between SO and SO-spindle coupling events?

      We appreciate your concerns regarding the brevity of the discussion and the need for clearer theoretical arguments. We will expand this section to provide more in-depth interpretations of our findings in the context of prior literature. Regarding working memory (WM), we acknowledge that our phrasing was ambiguous. We will modify this statement in the Discussion section.

      For the SO-related reduction in DMN activity, we recognize the need for a more precise explanation. This reduced DMN activity may reflect large-scale neural inhibition characteristic of the SO trough. The DMN is typically active during internally oriented cognition (e.g., self-referential processing or mind-wandering) and is suppressed during external stimuli processing (Yeshurun, Nguyen, & Hasson, 2021). It is unlikely, however, that this suppression of DMN during SO events is related to a shift from internal cognition to external responses given it is during deep sleep time. Instead, it could be driven by the inherent rhythmic pattern of SOs, which makes it difficult to separate UP- from DOWN-states (the two temporal regressors were highly correlated, and similar brain activation during SOs events was obtained if modelled at the SO peak instead). Since the amplitude at the SO trough is consistently larger than that at the SO peak, the neural activation we detected may primarily capture the large-scale inhibition from DOWN-state.

      To address your final question, we have conducted the additional post hoc comparison of DMN activity between isolated SOs and SO-spindle coupling events. Our results indicate that

      DMN activation during SOs was significantly lower than during SO-spindle coupling (t<sub>(106)</sub> = -4.17, p < 1e-4). This suggests that SO-spindle coupling may involve distinct neural dynamics that partially re-engage DMN-related processes, possibly reflecting memory-related reactivation. We appreciate your constructive feedback and will integrate these expanded analyses and discussions into our revised manuscript.

      Results, Page 11 Lines 199-208

      “Spindles were correlated with positive activation in the thalamus (ROI analysis, t<sub>(106)</sub> = 15.39, p < 1e-4), the anterior cingulate cortex (ACC), and the putamen, alongside deactivation in the DMN (Fig. 3c). Notably, SO-spindle coupling was linked to significant activation in both the thalamus (ROI analysis, t<sub>(106)</sub> \= 3.38, p = 0.0005) and the hippocampus (ROI analysis, t<sub>(106)</sub> \= 2.50, p = 0.0070, Fig. 3d). However, no decrease in DMN activity was found during SO-spindle coupling, and DMN activity during SO was significantly lower than during coupling (ROI analysis, t<sub>(106)</sub> \= -4.17, p < 1e-4). For more detailed activation patterns, see Table S5-S7. We also varied the threshold used to detect SO events to assess its effect on hippocampal activation during SO-spindle coupling and observed that hippocampal activation remained significant when the percentile thresholds for SO detection ranged between 71% and 80% (see Fig. S6).”

      Discussion, Page 17-18 Lines 308-332

      “An intriguing aspect of our findings is the reduced DMN activity during SOs when modeled at the SO trough (DOWN-state). This reduced DMN activity may reflect large-scale neural inhibition characteristic of the SO trough. The DMN is typically active during internally oriented cognition (e.g., self-referential processing or mind-wandering) and is suppressed during external stimuli processing (Yeshurun, Nguyen, & Hasson, 2021). It is unlikely, however, that this suppression of DMN during SO events is related to a shift from internal cognition to external responses given it is during deep sleep time. Instead, it could be driven by the inherent rhythmic pattern of SOs, which makes it difficult to separate UP- from DOWN-states (the two temporal regressors were highly correlated, and similar brain activation during SOs events was obtained if modelled at the SO peak instead, Fig. S5). Since the amplitude at the SO trough is consistently larger than that at the SO peak, the neural activation we detected may primarily capture the large-scale inhibition from DOWN-state. Interestingly, no such DMN reduction was found during SO-spindle coupling, implying that coupling may involve distinct neural dynamics that partially re-engage DMN-related processes, possibly reflecting memory-related reactivation. Future research using high-temporal-resolution techniques like iEEG could clarify these possibilities.

      To explore functional relevance, we employed an open-ended cognitive state decoding approach using meta-analytic data (NeuroSynth: Yarkoni et al. (2011)). Although this method usefully generates hypotheses about potential cognitive processes, particularly in the absence of a pre- and post-sleep memory task, it is inherently indirect. Many cognitive terms showed significant associations (16 of 50), such as “episodic memory,” “declarative memory,” and “working memory.” We focused on episodic/declarative memory given the known link with hippocampal reactivation (Diekelmann & Born, 2010; Staresina et al., 2015; Staresina et al., 2023). Nonetheless, these inferences regarding memory reactivation should be interpreted cautiously without direct behavioral measures. Future research incorporating explicit tasks before and after sleep would more rigorously validate these potential functional claims.”

      Reviewing Editor Comment:

      The reviewers think that you are working on a relevant and important topic. They are praising the large sample size used in the study. The reviewers are not all in line regarding the overall significance of the findings, but they all agree the paper would strongly benefit from some extra work, as all reviewers raise various critical points that need serious consideration.

      We appreciate your recognition of the relevance and importance of our study, as well as your acknowledgment of the large sample size as a strength of our work. We understand that there are differing perspectives regarding the overall significance of our findings, and we value the constructive critiques provided. We are committed to addressing the key concerns raised by all reviewers, including refining our analyses, clarifying our interpretations, and incorporating additional discussions to strengthen the manuscript. Below, we address your specific recommendations and provide responses to each point you raised to ensure our methods and results are as transparent and comprehensible as possible. We believe that these revisions will significantly enhance the rigor and impact of our study, and we sincerely appreciate your thoughtful feedback in helping us improve our work.

      Reviewer #1 (Recommendations for the authors):

      (1) The phrase "overnight sleep" suggests an entire night, while these were rather "nocturnal naps". Please rephrase.

      Thank you for pointing this out. We have revised the phrasing in our manuscript to "nocturnal naps" instead of "overnight sleep" to more accurately reflect the duration of the sleep recordings.

      (2) Sleep staging results (macroscopic sleep architecture) should be provided in more detail (at least min and % of the different sleep stages, sleep onset latency, total sleep duration, total recording duration), at least mean/SD/range.

      Thank you for this suggestion. We will provide comprehensive tables in the supplementary materials, contains descriptive information about sleep-related characteristics. This information will help provide a clearer overview of the macroscopic sleep architecture in our dataset.

      Supplementary Materials, Page 42, Table S1

      Author response table 1.

      Descriptive results of demographic information and sleep characteristics. Note: The total recorded time is equal to the awake time plus the total sleep time. The sleep onset latency is the time taken to reach the first sleep epoch. The Sleep Efficiency is the ratio of actual sleep time to total recording time.

      Reviewer #2 (Recommendations for the authors):

      In order to allow for a better estimation of the reliability of the detected sleep events, please:

      (1) Provide densities and absolute numbers of all detected SOs and spindles (N1, NREM, and REM sleep).

      Thank you for pointing this out. We will provide comprehensive tables in the supplementary materials, contains detailed information about sleep waves at each sleep stage for all 107 subjects (Table S2-S4), listing for each subject:1) Different sleep stage duration; 2) Number of detected SOs; 3) Number of detected spindles; 4) Number of detected SO-spindle coupling events; 5) Density of detected SOs; 6) Density of detected spindles; 7) Density of detected SO-spindle coupling events.

      Supplementary Materials, Page 43-54, Table S2-S4

      (Consider of the length, we do not list all the tables here. Please refer to the revised manuscript.)

      (2) Show ERPs for all detected SOs and spindles (per sleep stage).

      Thank you for the suggestion. We will provide ERPs for all detected SOs and spindles, separated by sleep stage (N1, N2&N3, and REM) in supplementary Fig. S2-S4. These ERP waveforms will help illustrate the characteristic temporal profiles of SOs and spindles across different sleep stages.

      Methods, Page 25, Line 525-532

      “Event-related potentials (ERP) analysis. After completing the detection of each sleep rhythm event, we performed ERP analyses for SOs, spindles, and coupling events in different sleep stages. Specifically, for SO events, we took the trough of the DOWN-state of each SO as the zero-time point, then extracted data in a [-2 s to 2 s] window from the broadband (0.1–30 Hz) EEG and used [-2 s to -0.5 s] for baseline correction; the results were then averaged across 107 subjects (see Fig. S2a). For spindle events, we used the peak of each spindle as the zero-time point and applied the same data extraction window and baseline correction before averaging across 107 subjects (see Fig. S2b). Finally, for SO-spindle coupling events, we followed the same procedure used for SO events (see Fig. 2a, Figs. S3–S4).”

      Supplementary Materials, Page 36-38, Fig. S2-S4

      Author response image 1.

      ERPs of SOs and spindles coupling during different sleep stages across all 107 subjects. a. ERP of SOs in different sleep stages using the broadband (0.1–30 Hz) EEG data. We align the trough of the DOWN-state of each SO at time zero (see Methods for details). The orange line represents the SO ERP in the N1 stage, the black line represents the SO ERP in the N2&N3 stage, and the green line represents the SO ERP in the REM stage. b. ERP of spindles in different sleep stages using the broadband (0.1–30 Hz) EEG data. We align the peak of each spindle at time zero (see Methods for details). The color scheme is the same as in panel a.

      Author response image 2.

      ERP and time-frequency patterns of SO-spindle coupling in the N1 stage. The averaged temporal frequency pattern and ERP across all instances of SO-spindle coupling, computed over all subjects, following the same procedure as in Fig. 2a, but for N1 stage.

      Author response image 3.

      ERP and time-frequency patterns of SO-spindle coupling in the REM stage. The averaged temporal frequency pattern and ERP across all instances of SO-spindle coupling, computed over all subjects, again following the same procedure as in Fig. 2a, but for REM stage.

      (3) Provide detailed info concerning sleep characteristics (time spent in each sleep stage etc.).

      Thank you for this suggestion. Same as the response above, we will provide comprehensive tables in the supplementary materials, contains descriptive information about sleep-related characteristics.

      Supplementary Materials, Page 42, Table S1 (same as above)

      (4) What would happen if more stringent parameters were used for event detection? Would the authors still observe a significant number of SO spindles during N1 and REM? Would this affect the fMRI-related results?

      Thank you for this suggestion. Our methods for detecting SOs, spindles, and their couplings were originally developed for N2 and N3 sleep data, based on the specific characteristics of these stages. These methods are widely recognized in sleep research (Hahn et al., 2020; Helfrich et al., 2019; Helfrich et al., 2018; Ngo, Fell, & Staresina, 2020; Schreiner et al., 2022; Schreiner et al., 2021; Staresina et al., 2015; Staresina et al., 2023). However, because this percentile-based detection approach will inherently identify a certain number of events if applied to other stages (e.g., N1 and REM), the nature of these events in those stages remains unclear compared to N2/N3. We nevertheless identified and reported the detailed descriptive statistics of these sleep rhythms in all sleep stages, under the same operational definitions, both for completeness and as a sanity check. Within the same subject, there should be more SOs, spindles, and their couplings in N2/N3 than in N1 or REM (see also Figure S2-S4, Table S1-S4).

      Furthermore, in order to explore the impact of this on our fMRI results, we conducted an additional sensitivity analysis by applying different detection parameters for SOs. Specifically, we adjusted amplitude percentile thresholds for SO detection (the parameter that has the greatest impact on the results). We used the hippocampal activation value during N2&N3 stage SO-spindle coupling as an anchor value and found that when the parameters gradually became stricter, the results were similar to or even better than the current results. However, when we continued to increase the threshold, the results began to gradually decrease until the threshold was increased to 80%, and the results were no longer significant. This indicates that our results are robust within a specific range of parameters, but as the threshold increases, the number of trials decreases, ultimately weakening the statistical power of the fMRI analysis.

      Thank you again for your suggestions on sleep rhythm event detection. We will add the results in Supplementary and revise our manuscript accordingly.

      Results, Page 11, Line 199-208

      “Spindles were correlated with positive activation in the thalamus (ROI analysis, t<sub>(106)</sub> = 15.39, p < 1e-4), the anterior cingulate cortex (ACC), and the putamen, alongside deactivation in the DMN (Fig. 3c). Notably, SO-spindle coupling was linked to significant activation in both the thalamus (ROI analysis, t<sub>(106)</sub> \= 3.38, p = 0.0005) and the hippocampus (ROI analysis, t<sub>(106)</sub> \= 2.50, p = 0.0070, Fig. 3d). However, no decrease in DMN activity was found during SO-spindle coupling, and DMN activity during SO was significantly lower than during coupling (ROI analysis, t<sub>(106)</sub> \= -4.17, p < 1e-4). For more detailed activation patterns, see Table S5-S7. We also varied the threshold used to detect SO events to assess its effect on hippocampal activation during SO-spindle coupling and observed that hippocampal activation remained significant when the percentile thresholds for SO detection ranged between 71% and 80% (see Fig. S6).”

      Supplementary Materials, Page 40, Fig. S6

      Author response image 4.

      Influence of the percentile threshold for SO detection on hippocampal activation (ROI) during SO-spindle coupling. We changed the percentile threshold for SO event detection in the EEG data analysis and then reconstructed the GLM design matrix based on the SO events detected at each threshold. The brain-wide activation pattern of SO-spindle couplings in the N2/3 stage was extracted using the same method as shown in Fig. 3. The gray horizontal line represents the significant range (71%–80%). * p < 0.05.

      Finally, we sincerely thank all again for your thoughtful and constructive feedback. Your insights have been invaluable in refining our analyses, strengthening our interpretations, and improving the clarity and rigor of our manuscript. We appreciate the time and effort you have dedicated to reviewing our work, and we are grateful for the opportunity to enhance our study based on your recommendations.

      References:

      Bergmann, T. O., Mölle, M., Diedrichs, J., Born, J., & Siebner, H. R. (2012). Sleep spindle-related reactivation of category-specific cortical regions after learning face-scene associations. NeuroImage, 59(3), 2733-2742.

      Buzsáki, G. (2015). Hippocampal sharp wave‐ripple: A cognitive biomarker for episodic memory and planning. Hippocampus, 25(10), 1073-1188.

      Caporro, M., Haneef, Z., Yeh, H. J., Lenartowicz, A., Buttinelli, C., Parvizi, J., & Stern, J. M. (2012). Functional MRI of sleep spindles and K-complexes. Clinical neurophysiology, 123(2), 303-309.

      Coulon, P., Budde, T., & Pape, H.-C. (2012). The sleep relay—the role of the thalamus in central and decentral sleep regulation. Pflügers Archiv-European Journal of Physiology, 463, 53-71.

      Crunelli, V., Lőrincz, M. L., Connelly, W. M., David, F., Hughes, S. W., Lambert, R. C., Leresche, N., & Errington, A. C. (2018). Dual function of thalamic low-vigilance state oscillations: rhythm-regulation and plasticity. Nature Reviews Neuroscience, 19(2), 107-118.

      Czisch, M., Wehrle, R., Stiegler, A., Peters, H., Andrade, K., Holsboer, F., & Sämann, P. G. (2009). Acoustic oddball during NREM sleep: a combined EEG/fMRI study. PloS one, 4(8), e6749.

      Diba, K., & Buzsáki, G. (2007). Forward and reverse hippocampal place-cell sequences during ripples. Nature Neuroscience, 10(10), 1241.

      Diekelmann, S., & Born, J. (2010). The memory function of sleep. Nature Reviews Neuroscience, 11(2), 114-126.

      Fogel, S., Albouy, G., King, B. R., Lungu, O., Vien, C., Bore, A., Pinsard, B., Benali, H., Carrier, J., & Doyon, J. (2017). Reactivation or transformation? Motor memory consolidation associated with cerebral activation time-locked to sleep spindles. PloS one, 12(4), e0174755.

      Hahn, M. A., Heib, D., Schabus, M., Hoedlmoser, K., & Helfrich, R. F. (2020). Slow oscillation-spindle coupling predicts enhanced memory formation from childhood to adolescence. Elife, 9, e53730.

      Halassa, M. M., Siegle, J. H., Ritt, J. T., Ting, J. T., Feng, G., & Moore, C. I. (2011). Selective optical drive of thalamic reticular nucleus generates thalamic bursts and cortical spindles. Nature Neuroscience, 14(9), 1118-1120.

      Hale, J. R., White, T. P., Mayhew, S. D., Wilson, R. S., Rollings, D. T., Khalsa, S., Arvanitis, T. N., & Bagshaw, A. P. (2016). Altered thalamocortical and intra-thalamic functional connectivity during light sleep compared with wake. NeuroImage, 125, 657-667.

      Helfrich, R. F., Lendner, J. D., Mander, B. A., Guillen, H., Paff, M., Mnatsakanyan, L., Vadera, S., Walker, M. P., Lin, J. J., & Knight, R. T. (2019). Bidirectional prefrontal-hippocampal dynamics organize information transfer during sleep in humans. Nature Communications, 10(1), 3572.

      Helfrich, R. F., Mander, B. A., Jagust, W. J., Knight, R. T., & Walker, M. P. (2018). Old brains come uncoupled in sleep: slow wave-spindle synchrony, brain atrophy, and forgetting. Neuron, 97(1), 221-230. e224.

      Horovitz, S. G., Fukunaga, M., de Zwart, J. A., van Gelderen, P., Fulton, S. C., Balkin, T. J., & Duyn, J. H. (2008). Low frequency BOLD fluctuations during resting wakefulness and light sleep: A simultaneous EEG‐fMRI study. Human brain mapping, 29(6), 671-682.

      Huang, Q., Xiao, Z., Yu, Q., Luo, Y., Xu, J., Qu, Y., Dolan, R., Behrens, T., & Liu, Y. (2024). Replay-triggered brain-wide activation in humans. Nature Communications, 15(1), 7185.

      Ilhan-Bayrakcı, M., Cabral-Calderin, Y., Bergmann, T. O., Tüscher, O., & Stroh, A. (2022). Individual slow wave events give rise to macroscopic fMRI signatures and drive the strength of the BOLD signal in human resting-state EEG-fMRI recordings. Cerebral Cortex, 32(21), 4782-4796.

      Laufs, H. (2008). Endogenous brain oscillations and related networks detected by surface EEG‐combined fMRI. Human brain mapping, 29(7), 762-769.

      Laufs, H., Walker, M. C., & Lund, T. E. (2007). ‘Brain activation and hypothalamic functional connectivity during human non-rapid eye movement sleep: an EEG/fMRI study’—its limitations and an alternative approach. Brain, 130(7), e75.

      Margulies, D. S., Ghosh, S. S., Goulas, A., Falkiewicz, M., Huntenburg, J. M., Langs, G., Bezgin, G., Eickhoff, S. B., Castellanos, F. X., & Petrides, M. (2016). Situating the default-mode network along a principal gradient of macroscale cortical organization. Proceedings of the National Academy of Sciences, 113(44), 12574-12579.

      Massimini, M., Huber, R., Ferrarelli, F., Hill, S., & Tononi, G. (2004). The sleep slow oscillation as a traveling wave. Journal of Neuroscience, 24(31), 6862-6870.

      Moehlman, T. M., de Zwart, J. A., Chappel-Farley, M. G., Liu, X., McClain, I. B., Chang, C., Mandelkow, H., Özbay, P. S., Johnson, N. L., & Bieber, R. E. (2019). All-night functional magnetic resonance imaging sleep studies. Journal of neuroscience methods, 316, 83-98.

      Molle, M., Bergmann, T. O., Marshall, L., & Born, J. (2011). Fast and slow spindles during the sleep slow oscillation: disparate coalescence and engagement in memory processing. Sleep, 34(10), 1411-1421.

      Ngo, H.-V., Fell, J., & Staresina, B. (2020). Sleep spindles mediate hippocampal-neocortical coupling during long-duration ripples. Elife, 9, e57011.

      Picchioni, D., Horovitz, S. G., Fukunaga, M., Carr, W. S., Meltzer, J. A., Balkin, T. J., Duyn, J. H., & Braun, A. R. (2011). Infraslow EEG oscillations organize large-scale cortical– subcortical interactions during sleep: a combined EEG/fMRI study. Brain research, 1374, 63-72.

      Schabus, M., Dang-Vu, T. T., Albouy, G., Balteau, E., Boly, M., Carrier, J., Darsaud, A., Degueldre, C., Desseilles, M., & Gais, S. (2007). Hemodynamic cerebral correlates of sleep spindles during human non-rapid eye movement sleep. Proceedings of the National Academy of Sciences, 104(32), 13164-13169.

      Schreiner, T., Kaufmann, E., Noachtar, S., Mehrkens, J.-H., & Staudigl, T. (2022). The human thalamus orchestrates neocortical oscillations during NREM sleep. Nature communications, 13(1), 5231.

      Schreiner, T., Petzka, M., Staudigl, T., & Staresina, B. P. (2021). Endogenous memory reactivation during sleep in humans is clocked by slow oscillation-spindle complexes. Nature Communications, 12(1), 3112.

      Singh, D., Norman, K. A., & Schapiro, A. C. (2022). A model of autonomous interactions between hippocampus and neocortex driving sleep-dependent memory consolidation. Proceedings of the National Academy of Sciences, 119(44), e2123432119.

      Spoormaker, V. I., Schröter, M. S., Gleiser, P. M., Andrade, K. C., Dresler, M., Wehrle, R., Sämann, P. G., & Czisch, M. (2010). Development of a large-scale functional brain network during human non-rapid eye movement sleep. Journal of Neuroscience, 30(34), 11379-11387.

      Staresina, B. P., Bergmann, T. O., Bonnefond, M., van der Meij, R., Jensen, O., Deuker, L., Elger, C. E., Axmacher, N., & Fell, J. (2015). Hierarchical nesting of slow oscillations, spindles and ripples in the human hippocampus during sleep. Nature Neuroscience, 18(11), 1679-1686.

      Staresina, B. P., Niediek, J., Borger, V., Surges, R., & Mormann, F. (2023). How coupled slow oscillations, spindles and ripples coordinate neuronal processing and communication during human sleep. Nature Neuroscience, 1-9.

      Yarkoni, T., Poldrack, R. A., Nichols, T. E., Van Essen, D. C., & Wager, T. D. (2011). Large-scale automated synthesis of human functional neuroimaging data. Nature methods, 8(8), 665-670.

      Yeshurun, Y., Nguyen, M., & Hasson, U. (2021). The default mode network: where the idiosyncratic self meets the shared social world. Nature Reviews Neuroscience, 1-12.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors describe a massively parallel reporter assays (MPRA) screen focused on identifying polymorphisms in 5' and 3' UTRs that affect translation efficiency and thus might have a functional impact on cells. The topic is of timely interest, and indeed, several related efforts have recently been published and preprinted (e.g., https://pubmed.ncbi.nlm.nih.gov/37516102/ and https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10635273/). This study has several major issues with the results and their presentation.

      Major comments:

      (1) The main issue is that it appears that the screen has largely failed, yet the reasons for that are unclear, which makes it difficult to interpret. The authors start with a library that includes approximately 6,000 variants, which makes it a medium-sized MPRA. But then, only 483 pairs of WT/mutated UTRs yield high-confidence information, which is already a small number for any downstream statistical analysis, particularly since most don't actually affect translation in the reporter screen setting (which is not unexpected). It is unclear why >90% of the library did not give highconfidence information. The profiles presented as base-case examples in Figure 2B don't look very informative or convincing. All the subsequent analysis is done on a very small set of UTRs that have an effect, and it is unclear to this reviewer how these can yield statistically significant and/or biologically relevant associations.

      To make sure our final results are technically and statistically sound, we applied stringent selection criteria and cutoffs in our analytics workflow. First, from our RNA-seq dataset, we filtered the UTRs with at least 20 reads in a polysome profile across all three repeated experiments. Secondly, in the following main analysis using a negative binomial generalized linear model (GLM), we further excluded the UTRs that displayed batch effect, i.e. their batch-related main effect and interaction are significant. We believe our measure has safeguarded the filtered observations (UTRs) from the (potential) high variation of our massively parallel translation assays and thus gives high confidence to our results.

      Regarding the interpretation of Figure 2B, since we aimed to identify the UTRs whose interaction term of genotype and fractions is significant in our generalized linear model, it is statistically conventional to double-check the interaction of the two variables using such a graph. For instance, in the top left panel of Figure 2B (5'UTR of ANK2:c.-39G>T), we can see that read counts of WT samples congruously decreased from Mono to Light, whereas the read counts of mutant samples were roughly the same in the two fractions – the trend is different between WT and mutant. Ergo, the distinct distribution patterns of two genotypes across three fractions in Figure 2B offer the readers a convincing visual supplement to our statistics from GLM.

      In contrast to Figure 2B, the graphs of nonsignificant UTRs (shown below) reveal that the trends between the two genotypes are similar across the 'Mono and Light' and 'Light and Heavy' polysome fractions. Importantly, our analysis remains unaffected by differential expression levels between WT and mutant, as it specifically distinguishes polysome profiles with different distributions. This consistent trend further supports the lack of interaction between genotype and polysome fractions for these UTRs.

      Author response image 1.

      Figure: Examples of non-significant UTR pairs in massively parallel polysome profiling assays.

      (2) From the variants that had an effect, the authors go on to carry out some protein-level validations and see some changes, but it is not clear if those changes are in the same direction as observed in the screen.

      To infer the directionality of translation efficiency from polysome profiles, a common approach involves pooling polysome fractions and comparing them with free or monosome fractions to identify 'translating' fractions. However, this method has two major potential pitfalls: (i) it sacrifices resolution and does not account for potential bias toward light or heavy polysomes, and (ii) it fails to account for discrepancies between polysome load and actual protein output (as discussed in https://doi.org/10.1016/j.celrep.2024.114098 and https://doi.org/10.1038/s41598-019-47424-w). Therefore, our analysis focused on the changes within polysome profiles themselves. 'Significant' candidates were identified based on a significant interaction between genotype and polysome distribution using a negative binomial generalized linear model, without presupposing the direction of change on protein output. 

      (3) The authors follow up on specific motifs and specific RBPs predicted to bind them, but it is unclear how many of the hits in the screen actually have these motifs, or how significant motifs can arise from such a small sample size.

      We calculated the Δmotif enrichment in significant UTRs versus nonsignificant UTRs using Fisher’s exact test. For example, the enrichment of the Δ‘AGGG’ motif in 3’ UTRs is shown below:

      Author response table 1.

      This test yields a P-value of 0.004167 by Fisher’s exact test. The P-values and Odds ratios of Δmotifs in relation to polysome shifting are included in Supplementary Table S4, and we will update the detailed motif information in the revised Supplementary Table S4.

      (4) It is particularly puzzling how the authors can build a machine learning predictor with >3,000 features when the dataset they use for training the model has just a few dozens of translation-shifting variants.

      We understand the concern regarding the relatively small number of translation-shifting variants compared to the large number of features. To address this, we employed LASSO regression, which, according to The Elements of Statistical Learning by Hastie, Tibshirani, and Friedman, is particularly suitable for datasets where the number of features 𝑝𝑝 is much larger than the number of samples 𝑁𝑁. LASSO effectively performs feature selection by shrinking less important coefficients to zero, allowing us to build a robust and generalizable model despite the limited number of variants.

      (5) The lack of meaningful validation experiments altering the SNPs in the endogenous loci by genome editing limits the impact of the results.

      We plan to assess the endogenous effect by generating CRISPR knock-in clones carrying the UTR variant.

      Reviewer #2 (Public Review):

      Summary:

      In their paper "Massively Parallel Polyribosome Profiling Reveals Translation Defects of Human Disease‐Relevant UTR Mutations" the authors use massively parallel polysome profiling to determine the effects of 5' and 3' UTR SNPs (from dbSNP/ClinVar) on translational output. They show that some UTR SNPs cause a change in the polysome profile with respect to the wild-type and that pathogenic SNPs are enriched in the polysome-shifting group. They validate that some changes in polysome profiles are predictive of differences in translational output using transiently expressed luciferase reporters. Additionally, they identify sequence motifs enriched in the polysome-shifting group. They show that 2 enriched 5' UTR motifs increase the translation of a luciferase reporter in a proteindependent manner, highlighting the use of their method to identify translational control elements.

      Strengths:

      This is a useful method and approach, as UTR variants have been more difficult to study than coding variants. Additionally, their evidence that pathogenic mutations are more likely to cause changes in polysome association is well supported.

      Weaknesses:

      The authors acknowledge that they "did not intend to immediately translate the altered polysome profile into an increase or decrease in translation efficiency, as the direction of the shift was not readily evident. Additionally, sedimentation in the sucrose gradient may have been partially affected by heavy particles other than ribosomes." However, shifted polysome distribution is used as a category for many downstream analyses. Without further clarity or subdivision, it is very difficult to interpret the results (for example in Figure 5A, is it surprising that the polysome shifting mutants decrease structure? Are the polysome "shifts" towards the untranslated or heavy fractions?)

      Our approach, combining polysome fractionation of the UTR library with negative binomial generalized linear model (GLM) analysis of RNA-seq data, systematically identifies variants that affect translational efficiency. The GLM model is specifically designed to detect UTR pairs with significant interactions between genotype and polysome fractions, relying solely on changes in polysome profiles to identify variants that disrupt translation. Consequently, our analytical method does not determine the direction of translation alteration.

      Following the massively parallel polysome profiling, we sought to understand how these polysomeshifting variants influence the translation process. To do this, we examined their effects on RNA characteristics related to translation, such as RBP binding and RNA structure. In Figure 5A, we observed a notable trend in significant hits within 5’ UTRs—they tend to increase ΔG (weaker folding energy) in response to changes in polysome profiles, regardless of whether protein production increases or decreases (Fig. 3).

    1. Author Response:

      Reviewer #1 (Public Review):

      Despite numerous studies on quinidine therapies for epilepsies associated with GOF mutant variants of Slack, there is no consensus on its utility due to contradictory results. In this study Yuan et al. investigated the role of different sodium selective ion channels on the sensitization of Slack to quinidine block. The study employed electrophysiological approaches, FRET studies, genetically modified proteins and biochemistry to demonstrate that Nav1.6 N- and C-tail interacts with Slack's C-terminus and significantly increases Slack sensitivity to quinidine blockade in vitro and in vivo. This finding inspired the authors to investigate whether they could rescue Slack GOF mutant variants by simply disrupting the interaction between Slack and Nav1.6. They find that the isolated C-terminus of Slack can reduce the current amplitude of Slack GOF mutant variants co-expressed with Nav1.6 in HEK cells and prevent Slack induced seizures in mouse models of epilepsy. This study adds to the growing list of channels that are modulated by protein-protein interactions, and is of great value for future therapeutic strategies.

      I have a few comments with regard to how Nav1.6 sensitize Slack to block by quinidine.

      (1) It is not clear to me if the Slack induced current amplitude varies depending on the specific Nav subtype. To this end, it would be valuable to test if Slack open probability is affected by the presence of specific Nav subtypes. Nav induced differences in Slack current amplitude and open probability could explain why individual Nav subtypes show varied ability to sensitize Slack to quinidine blockade.

      We appreciate the reviewer for raising this point. In order to address whether the whole-cell current amplitudes of Slack varies depending on the specific NaV subtype, we examined Slack current amplitudes upon co-expression of Slack with specific NaV subtypes in HEK293 cells. The results have shown that there are no significant differences in Slack current amplitudes upon co-expression of Slack with different NaV channel subtypes (Author response image 1), suggesting whole-cell Slack current amplitudes cannot explain the varied ability of NaV subtypes to sensitize Slack to quinidine blockade. To investigate the effect of different NaV channel subtypes on Slack open probability, we will perform the single-channel recordings in the future studies.

      Author response image 1.

      The amplitudes of Slack currents upon co-expression of Slack with specific NaV subtypes in HEK293 cells. ns, p > 0.05, one-way ANOVA followed by Bonferroni’s post hoc test.

      (2) It has previously been shown that INaP (persistent sodium current) is important for inducing Slack currents. Here the authors show that INaT (transient sodium current) of Nav1.6 is necessary for the sensitization of Slack to quinidine block whereas INaP surprisingly has no effect. The authors then show that the N-tail together with C-tail of Nav1.6 can induce same effect on Slack as full-length Nav1.6 in presence of high intracellular concentrations of sodium. However, it is not clear to me how the isolated N- and C-tail of Nav1.6 can induce sensitization of Slack to quinidine by interacting with C-terminus of Slack, while sensitization also is dependant on INaT. The authors speculate on different slack open conformation, but one could speculate if there is a missing link, such as an un-identified additional interacting protein that causes the coupling.

      We fully agree the importance of investigating the detailed mechanism underlying the sensitization of Slack to quinidine blockade mediated by the N- and C-termini of NaV1.6. Regarding the possibility of additional interacting proteins (“missing link”) that mediate the coupling between Slack and NaV1.6, our GST-pull down assays involving Slack and the N- and C-termini of NaV1.6 (Fig. S7) suggest a direct interaction between Slack and NaV1.6 channels. This finding leads us to consider the possibility of additional interacting proteins might be excluded. In order to further address these questions, we plan to employ structural biological methods, such as cryo-electron microscopy (cryo-EM).

      Reviewer #2 (Public Review):

      This is a very interesting paper about the coupling of Slack and Nav1.6 and the insight this brings to the effects of quinidine to treat some epilepsy syndromes.

      Slack is a sodium-activated potassium channel that is important to hyperpolarization of neurons after an action potential. Slack is encoded by KNCT1 which has mutations in some epilepsy syndromes. These types of epilepsy are treated with quinidine but this is an atypical antiseizure drug, not used for other types of epilepsy. For sufficient sodium to activate Slack, Slack needs to be close to a channel that allows robust sodium entry, like Na channels or AMPA receptors. but more mechanistic information is not available. Of particular interest to the authors is what allows quinidine to be effective in reducing Slack.

      In the manuscript, the authors show that Nav, not AMPA receptors are responsible for Slack activation, at least in cultured neurons (HeK293, primary cortical neurons). Most of the paper focuses on the evidence that Nav1.6 promotes Slack sensitivity to quinidine.

      (1) The paper is very well written although there are reservations about the use of non-neuronal cells or cultured primary neurons rather than a more intact system.

      We appreciate the reviewer's positive evaluation of our work. We acknowledge that utilizing a more intact system would provide valuable insights into the inhibitory effect of quinidine on Slack-NaV1.6. However, there are certain challenges associated with studying Slack currents in their entirety.

      First, in our experiments, isolating Slack currents from Na+-activated K+ currents in an intact system is challenging as selective inhibitors for Slick are currently unavailable. To address this, we propose using Slick gene knockout mice to specifically measure Slack currents under physiological conditions in the future investigations. Second, we have observed that the interaction between Slack and NaV1.6 primarily occurs at the axon initial segment of neurons. This poses a difficulty when using brain slices for measurements, as employing the whole-cell voltage-clamp technique to assess Slack at the axon initial segment may introduce systemic errors.

      We believe that testing the pharmacological effects of quinidine on Slack-NaV1.6 in primary neurons remains the optimal approach. Although non-neuronal cells or cultured primary neurons may not fully replicate the complexity of an intact system, they still provide valuable insights into the interactions between Slack and NaV1.6, and the effects of quinidine.

      (2) I also have questions about the figures.

      We will make the necessary modifications and clarifications based on the reviewer's comments:

      (3) Finally, riluzole is not a selective drug, so the limitations of this drug should be discussed.

      We thank the reviewer for raising this point. We will discuss the limitations of riluzole in our revised version of the manuscript.

      (4) On a minor point, the authors use the term in vivo but there are no in vivo experiments.

      We thanks the reviewer for raising this point. In our experiments, although we did not conduct experiments directly in living organisms, our results demonstrated the co-immunoprecipitation of NaV1.6 with Slack in homogenates from mouse cortical and hippocampal tissues (Fig. 3C). This result may support that the interaction between Slack and NaV1.6 occurs in vivo.

      Reviewer #3 (Public Review):

      Yuan et al., set out to examine the role of functional and structural interaction between Slack and NaVs on the Slack sensitivity to quinidine. Through pharmacological and genetic means they identify NaV1.6 as the privileged NaV isoform in sensitizing Slack to quinidine. Through biochemical assays, they then determine that the C-terminus of Slack physically interacts with the N- and C-termini of NaV1.6. Using the information gleaned from the in vitro experiments the authors then show that virally-mediated transduction of Slack's C-terminus lessens the extent of SlackG269S-induced seizures. These data uncover a previously unrecognized interaction between a sodium and a potassium channel, which contributes to the latter's sensitivity to quinidine.

      The conclusions of this paper are mostly well supported by data, but some aspects of functional and structural studies in vivo as well as physically interaction need to be clarified and extended.

      (1) Immunolabeling of the hippocampus CA1 suggests sodium channels as well as Slack colocalization with AnkG (Fig 3A). Proximity ligation assay for NaV1.6 and Slack or a super-resolution microscopy approach would be needed to increase confidence in the presented colocalization results. Furthermore, coimmunoprecipitation studies on the membrane fraction would bolster the functional relevance of NaV1.6-Slac interaction on the cell surface.

      We thank the reviewer for good suggestions. We acknowledge that employing proximity ligation assay and high-resolution techniques would significantly enhance our understanding of the localization of the Slack-NaV1.6 coupling.

      At present, the technical capabilities available in our laboratory and institution do not support high-resolution testing. However, we are enthusiastic about exploring potential collaborations to address these questions in the future. Furthermore, we fully recognize the importance of conducting co-immunoprecipitation (Co-IP) assays from membrane fractions. While we have already completed Co-IP assays for total protein and quantified the FRET efficiency values between Slack and NaV1.6 in the membrane region, the Co-IP assays on membrane fractions will be conducted in our future investigations.

      (2) Although hippocampal slices from Scn8a+/- were used for studies in Fig. S8, it is not clear whether Scn8a-/- or Scn8a+/- tissue was used in other studies (Fig 1J & 1K). It will be important to clarify whether genetic manipulation of NaV1.6 expression (Fig. 1K) has an impact on sodium-activated potassium current, level of surface Slack expression, or that of NaV1.6 near Slack.

      We thank the reviewer for pointing this out. In Fig. 1G,J,K, primary cortical neurons from homozygous NaV1.6 knockout (Scn8a-/-) mice were used. We will clarify this information in the revised manuscript. In terms of the effects of genetic manipulation of NaV1.6 expression on IKNa and surface Slack expression, we compared the amplitudes of IKNa measured from homozygous NaV1.6 knockout (NaV1.6-KO) neurons and wild-type (WT) neurons. The results showed that homozygous knockout of NaV1.6 does not alter the amplitudes of IKNa (Author response image 2). The level of surface Slack expression will be tested further.

      Author response image 2.

      The amplitudes of IKNa in WT and NaV1.6-KO neurons (data from manuscript Fig. 1K). ns, p > 0.05, unpaired two-tailed Student’s t test.

      (3) Did the epilepsy-related Slack mutations have an impact on NaV1.6-mediated sodium current?

      We thank the reviewer’s question. We examined the amplitudes of NaV1.6 sodium current upon expression alone or co-expression of NaV1.6 with epilepsy-related Slack mutations (K629N, R950Q, K985N). The results showed that the tested epilepsy-related Slack mutations do not alter the amplitudes of NaV1.6 sodium current (Author response image 3).

      Author response image 3.

      The amplitudes of NaV1.6 sodium currents upon co-expression of NaV1.6 with epilepsy-related Slack mutant variants (SlackK629N, SlackR950Q, and SlackK985N). ns, p>0.05, one-way ANOVA followed by Bonferroni’s post hoc test.

      4) Showing the impact of quinidine on persistent sodium current in neurons and on NaV1.6-expressing cells would further increase confidence in the role of persistent sodium current on sensitivity of Slack to quinidine.

      We appreciate the reviewer’s question. Previous studies have shown that quinidine can inhibit persistent sodium currents at low concentrations1. In our experiments, blocking persistent sodium currents by application of riluzole in the bath solution showed no significant effects on the sensitivity of Slack to quinidine blockade upon co-expression of Slack with NaV1.6 (Fig. 2F,H). This result suggested that persistent sodium currents were not involved in the sensitization of Slack to quinidine blockade.

      1. Ju YK, Saint DA, Gage PW. Effects of lignocaine and quinidine on the persistent sodium current in rat ventricular myocytes. Br J Pharmacol. Oct 1992; 107(2):311-6. doi:10.1111/j.1476-5381.1992.tb12743.x
    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary

      The authors describe a method for gastruloid formation using mouse embryonic stem cells (mESCs) to study YS and AGM-like hematopoietic differentiation. They characterise the gastruloids during nine days of differentiation using a number of techniques including flow cytometry and single-cell RNA sequencing. They compare their findings to a published data set derived from E10-11.5 mouse AGM. At d9, gastruloids were transplanted under the adrenal gland capsule of immunocompromised mice to look for the development of cells capable of engrafting the mouse bone marrow. The authors then applied the gastruloid protocol to study overexpression of Mnx1 which causes infant AML in humans.

      In the introduction, the authors define their interpretation of the different waves of hematopoiesis that occur during development. 'The subsequent wave, known as definitive, produces: first, oligopotent erythro-myeloid progenitors (EMPs) in the YS (E8-E8.5); and later myelo-lymphoid progenitors (MLPs - E9.5-E10), multipotent progenitors (MPPs - E10-E11.5), and hematopoietic stem cells (HSCs - E10.5-E11.5), in the aorta-gonadmesonephros (AGM) region of the embryo proper.' Herein they designate the yolk sac-derived wave of EMP hematopoiesis as definitive, according to convention, although paradoxically it does not develop from intraembryonic mesoderm or give rise to HSCs.

      The apparent perplexity of the Reviewer with our definition of primitive and definitive waves is somewhat surprising, as it is widely used in the field (e.g. PMID: 18204427; PMID: 28299650; PMID: 33681211). Definitive haematopoiesis, encompassing EMP, MLP, MPP and HSC, highlights their origin from haemogenic hendothelium, generation of mature cells with adult characteristics from progenitors with multilineage potential and direct and indirect developmental contributions to the intra-embryonic and time-restricted generation of HSCs.

      General comments

      The authors make the following claims in the paper:

      (1) The development of a protocol for hemogenic gastruloids (hGx) that recapitulates YS and AGM-like waves of blood from HE.

      (2) The protocol recapitulates both YS and EMP-MPP embryonic blood development 'with spatial and temporal accuracy'.

      (3) The protocol generates HSC precursors capable of short-term engraftment in an adrenal niche.

      (4) Overexpression of MNX1 in hGx transforms YS EMP to 'recapitulate patient transcriptional signatures'.

      (5) hGx is a model to study normal and leukaemic embryonic hematopoiesis.

      There are major concerns with the manuscript. The statements and claims made by the authors are not supported by the data presented, data is overinterpreted, and the conclusions cannot be justified. Furthermore, the data is presented in a way that makes it difficult for the reader to follow the narrative, causing confusion. The authors have not discussed how their hGx compares to the previously published mouse embryoid body protocols used to model early development and hematopoiesis. the data is presented in a way that makes it difficult for the reader to follow the narrative, causing confusion. The authors have not discussed how their hGx compares to the previously published mouse embryoid body protocols used to model early development and hematopoiesis.

      Specific points

      (1) It is claimed that HGxs capture cellularity and topography of developmental blood formation. The hGx protocol described in the manuscript is a modification of a previously published gastruloid protocol (Rossi et al 2022). The rationale for the protocol modifications is not fully explained or justified. There is a lack of novelty in the presented protocol as the only modifications appear to be the inclusion of Activin A and an extension of the differentiation period from 7 to 9 days of culture. No direct comparison has been made between the two versions of gastruloid differentiation to justify the changes.

      The Reviewer paradoxically claims that the protocol is not novel and that it differs from a previous publication in at least 2 ways – the patterning pulse and the length of the protocol. Of these, the patterning pulse is key. As documented in Fig. S1, we cannot obtain Flk1-GFP expression in the absence of Activin A. Expression of Flk1 is a fundamental step in haemato-endothelial specification and, accordingly, we do not see CD41 or CD45+ cells in the absence of Activin A. Also, in our hands, there is a clear time-dependent progression of marker expression, with sequential acquisition of CD41 and CD45, with the latter not detectable until 192h (Fig. 1C-D), another key difference relative to the Rossi et al (2022) protocol. The 192h-timepoint, we argue in the manuscript, and present further evidence for in this rebuttal, corresponds to the onset of AGM-like haematopoiesis. We have empirically extended the protocol to maximise the CD45+ cell output (Fig. S1B-D).

      The inclusion of Activin A at high concentration at the beginning of differentiation would be expected to pattern endoderm rather than mesoderm. BMP signaling is required to induce Flk1+ mesoderm, even in the presence of Wnt.

      Again, we call the Reviewer’s attention to Fig. S1 which clearly shows that Activin A (with no BMP added) is required for induction of Flk1 expression, in the presence of Wnt. Activin A in combination with Wnt, is used in other protocols of haemato-endothelial differentiation from pluripotent cells, with no BMP added in the same step of patterning and differentiation (PMID: 39227582; PMID: 39223325). In the latter protocol, we also call the Reviewer’s attention to the fact that a higher concentration of Activin A precludes the need for BMP4 addition. Finally, one of us has recently reported that Activin A, on its own, will induce FLK1, as well as other anterior mesodermal progenitors (https://www.biorxiv.org/content/10.1101/2025.01.11.632562v1)..) In addressing the Reviewer’s concerns with the dose of Activin A used, we titrated its concentration against activation of Flk1, confirming optimal Flk1-GFP expression at the 100ng/ml dose used in the manuscript.

      Author response image 1.

      Dose-dependent requirement of Activin A for induction of Flk1 expression in haemogenic gastruloids. Composite GFP and brightfield live imaging of Flk1-GFP haemogenic gastruloids at 96h. Images were acquired using a Cytation5 instrument (Thermo). Images are representative of 12 gastruloids per condition.

      FACS analysis of the hGx during differentiation is needed to demonstrate the co-expression of Flk1-GFP and lineage markers such as CD34 to indicate patterning of endothelium from Flk1+ mesoderm. The FACS plots in

      Fig. 1 show c-Kit expression but very little VE-cadherin which suggests that CD34 is not induced. Early endoderm expresses c-Kit, CXCR4, and Epcam, but not CD34 which could account for the lack of vascular structures within the hGx as shown in Fig. 1E.

      We were surprised by the Reviewer’s comment that there are no endothelial structures in our gastruloids. The presence of a Flk1-GFP+ network is visible in the GFP images in Fig.1B, from 144h onwards, also shown in Author response image 2A. In addition, our single-cell RNA-seq data, included in the manuscript, confirms the presence of endothelial cells with a developing endothelial, including arterial, programme. This can be seen in Fig. 2B, F of the manuscript and is represented in Author response image 2B. In contrast with the Reviewer’s claims that no endothelial cells are formed, the data show that Kdr (Flk1)+ cells co-express Cdh5/VE-Cadherin and indeed Cd34, attesting to the presence of an endothelial programme. Arterial markers Efnb2, Flt1, and Dll4 are present. A full-blown programme, which also includes haemogenic markers including Sox17, Esam, Cd44 and Mecom is clear at early (144h) and, particularly at late (192h) timepoints in cells sorted on detection of surface c-Kit (Author response image 2B). Further to the data shown in B, already present in the manuscript, we also document co-expression of Flk1-GFP and CD34 by flow cytometry (Author response image 2C).

      Author response image 2.

      Haemogenic gastruloids have a branched vascular network. A. Whole-mount confocal imaging of 144h-haemogenic gastruloids. B. Differentiation of an arterial endothelial programme in haemogenic gastruloids; singlecell RNA-seq data of differentiating haemogenic gastruloids, sorted on cell surface expression of c-Kit at 144 and 192h; gene expression colour scale from yellow (low) to orange (high); grey = no detectable expression. C. Flow cytometry plots of 216h-haemogenic gastruloids showing detection of haemato-endothelial marker CD34.

      (2) The protocol has been incompletely characterised, and the authors have not shown how they can distinguish between either wave of Yolk Sac (YS) hematopoiesis (primitive erythroid/macrophage and erythro-myeloid EMP) or between YS and intraembryonic Aorta-Gonad-Mesonephros (AGM) hematopoiesis. No evidence of germ layer specification has been presented to confirm gastruloid formation, organisation, and functional ability to mimic early development. Furthermore, differentiation of YS primitive and YS EMP stages of development in vitro should result in the efficient generation of CD34+ endothelial and hematopoietic cells. There is no flow cytometry analysis showing the kinetics of CD34 cell generation during differentiation. Benchmarking the hGx against developing mouse YS and embryo data sets would be an important verification.

      The Reviewer is correct that we have not provided detailed characterisation of the different germ layers, as this was not the focus of the study. In that context, we were surprised by the earlier comment assuming co-expression of c-kit, Cxcr4 and Epcam, which we did not show, while overlooking the endothelial programme reiterated above, which we have presented.

      Given our focus on haemato-endothelial specification, we have started the single-cell RNA-seq characterisation of the haemogenic gastruloid at 120h and have not looked specifically at earlier timepoints of embryo patterning.

      This said, we show the presence of neuroectodermal cells in cluster 9; on the other hand, cluster 7 includes hepatoblast-like cells, denoting endodermal specification. We are happy to include this characterisation, to the extent that it is present, in a revised version of the manuscript. However, in the absence of earlier timepoints and given the bias towards mesodermal specification, we expect that specification of ectodermal and endodermal programmes may be incomplete.

      In respect of the contention regarding the capture of YS-like and AGM-like haematopoiesis, we have presented evidence in the manuscript that haemogenic cells generated during gastruloid differentiation, particularly at late 192h and 216h timepoints project onto highly purified c-Kit+ CD31+ Gfi1-expressing cells from mouse AGM (PMID: 38383534), providing support for the recapitulation of the corresponding developmental stage. In distinguishing between YS-like and AGM-like haematopoiesis, we call the Reviewer’s attention to the replotting of the single-cell RNA-seq data already in the manuscript, which we provided in response to point 1 (Author response image 2B), which highlights an increase in Sox17, but not Sox18, expression in the 192h haemogenic endothelium, which suggests an association with AGM haematopoiesis (PMID: 20228271). A significant association of Cd44 and Procr expression with the same time-point (Fig. 2F in the manuscript), further supports an AGM-like endothelial-to-haematopoietic transition at the 192h timepoint.

      Following on the Reviewer’s comments about CD34, we also inspected co-expression of CD34 with CD41 and CD45, the latter co-expression present in, although not necessarily exclusive to, AGM haematopoiesis.

      Reassuringly, we observed clear co-expression with both markers (Author response image 3), in addition to a CD41+CD34-population, which likely reflects YS EMP-independent erythropoiesis. Interestingly, marker expression is responsive to the levels of Activin A used in the patterning pulse, with the 100ng/ml Activin A used in our protocol superior to 75ng/ml.

      Author response image 3.

      Association of CD34 with CD41 and CD45 expression is Activin A-responsive and supports the presence of definitive haematopoiesis. A. Flow cytometry analysis of CD34 and CD41 expression in 216h-haemogenic gastruloids; two doses of Activin A were used in the patterning pulse with CHI99021 between 48-72h. FMO controls shown. B. Flow cytometry analysis of CD34 and CD45 at 216h in the same experimental conditions.

      We agree that it remains challenging to identify markers exclusive to AGM haematopoiesis, which is operationally equated with generation of transplantable haematopoietic stem cells. While HSC generation is a key event characteristic of the AGM, not all AGM haematopoiesis corresponds to HSCs, an important point in evaluating the data presented in the manuscript, and indeed acknowledged by us.

      Author response image 4.

      Clustering of haemogenic gastruloid cells sorted on the basis of haemato-endothelial surface markers CD41, C-Kit and CD45. A. Leiden clustering to single-cell RNA-seq data. B. Time stamps of sorted haemogenic gastruloid cells in A. C. Surface marker stamps of cells in A.

      Given the centrality of this point in comments by all the Reviewers, we have conducted projections of our single-cell RNA-seq data against two studies which (1) capture arterial and haemogenic specification in the para-splanchnopleura (pSP) and AGM region between E8.0 and E11 (Hou et al, PMID: 32203131), and (2) uniquely capture YS, AGM and FL progenitors and the AGM endothelial-to-haematopoietic transition (EHT) in the same scRNA-seq dataset (Zhu et al, PMID: 32392346).

      Focusing the analysis on the subsets of haemogenic gastruloid cells sorted as CD41+ (144h) CKit+ (144h and 192h) and CD45+ (192h and 216h) (Author response image 4AC), we show:

      (1) That a subset of haemato-endothelial cells from haemogenic gastruloids at 144h to 216h project onto intra-embryonic cells spanning E8.25 to E10 (Author response image 5A-B). This is in agreement with our interpretation that 216h are no later than the MPP/pre-HSC state of embryonic development, requiring further maturation to generate long-term engrafting HSC.

      (2) That haemogenic gastruloids contain YS-like (including EMP-like) and AGM-like haematopoietic cells (Author response image 6A-B). Significantly, some of the cells, particularly c-Kit-sorted cells with a candidate endothelial and HE-like signature project onto AGM pre-HE and HE, as well as IAHC, and later, predominantly 216h cells, have characteristics of MPP/LMPP-like cells from the FL.

      Altogether, the data support the notion that haemogenic gastruloids capture YS and AGM haematopoiesis until E10, as suggested by us in the manuscript. We thought it was important to share this preliminary data with the Editors at an early stage, and we will incorporate a deeper analysis in a revised version of the manuscript.

      Single-cell RNA sequencing was used to compare hGx with mouse AGM. The authors incorrectly conclude that ' ..specification of endothelial and HE cells in hGx follows with time-dependent developmental progression into putative AGM-like HE..' And, '...HE-projected hGx cells.......expressed Gata2 but not Runx1, Myb, or Gfi1b..' Hemogenic endothelium is defined by the expression of Runx1 and Gfli1b is downstream of Runx1.

      As a hierarchy of regulation, Gata2 precedes and drives Runx1 expression at the specification of HE (PMID: 17823307; PMID: 24297996), while Runx1 drives the EHT, upstream of Gfi1b in haematopoietic clusters (PMID: 34517413).

      Author response image 5.

      Projection of sorted haemogenic gastruloid cells onto Hou et al dataset (PMID: 32203131) analysing development of mouse intra-embryonic haematopoiesis. A. Time signatures of Hou et al data. B. Projection of Leiden clusters in Author response image 4A. Methodology as described in our manuscript; 68% gastruloid cells projected.

      Author response image 6.

      Projection of sorted haemogenic gastruloid cells onto Zhu et al dataset (PMID: 32392346), capturing arterial endothelial and haemogenic endothelial development, in reference to YS, AGM and FL haematopoietic progenitors. A. Functional cluster classification as per Zhu et al. B. Projection of Leiden clusters in Author response image 4A. Methodology as detailed in our manuscript; 58% gastruloid cells projected. Haematopoietic clusters annotated as in A.

      (3) The hGx protocol 'generates hematopoietic SC precursors capable of short-term engraftment' is not supported by the data presented. Short-term engraftment would be confirmed by flow cytometric detection of hematopoietic cells within the recipient bone marrow, spleen, thymus, and peripheral blood that expressed the BFP transgene. This analysis was not provided. PCR detection of transcripts, following an unspecified number of amplification cycles, as shown in Figure 3G (incorrectly referred to as Figure 3F in the legend) is not acceptable evidence for engraftment.

      We provide the full flow cytometry analysis of spleen engraftment in the 5 mice which received implantation of 216h-haemogenic gastruloids in the adrenal gland; an additional (control) animal received adrenal injection of PBS (Author response image 7). The animals were analysed at 4 weeks. In this experiment, the bone marrow collection was limiting, and material was prioritised for PCR.

      We had previously provided only representative plots of flow cytometry analysis of bone marrow and spleen in Fig. S4E, which we described as low-level engraftment. The analysis was complemented with genomic DNA PCR, where detection was present in only some of the replicates tested per animal. We confirm that PCR analysis used conventional 40 cycles; the sensitivity was shown in Fig. S4F. As shown in Fig. 3 A-C, no more than 7 CD45+CD144+ multipotent cells are present per haemogenic gastruloid, with 3 haemogenic gastruloids implanted in the adrenal gland of each transplanted animal. We argue that the low level of cytometric and molecular engraftment at 4 weeks, from haemogenic gastruloid-derived progenitors that have not progressed beyond a stage equivalent to E10 Author response image 5A-B) and that we have described as requiring additional maturation in vivo, are not surprising.

      Author response image 7.

      BFP engraftment of Nude recipient mice 4 weeks after unilateral adrenal implantation of 216h-haemogenic gastruloids. Flow cytometry analysis of spleen engraftment. Genomic PCR analysis is shown in Fig. 3G of the manuscript.

      Transplanted hGx formed teratoma-like structures, with hematopoietic cells present at the site of transplant only analysed histologically. Indeed, the quality of the images provided does not provide convincing validation that donor-derived hematopoietic cells were present in the grafts.

      As stated in the text, the images mean to illustrate that the haemogenic gastruloids developed in situ. The observation of donor-derived blood cells in the implanted haemogenic gastruloids would not correspond to engraftment, as we have amply demonstrated that they have generated blood cells in vitro. There is no evidence that there are remaining pluripotent cells in the haemogenic gastruloid after 9 days of differentiation, and it is therefore not clear that these are teratomas

      There is no justification for the authors' conclusion that '... the data suggest that 216h hGx generate AGM-like pre-HSC capable of at least short-term multilineage engraftment upon maturation...'. Indeed, this statement is in conflict with previous studies demonstrating that pre-HSCs in the dorsal aorta of the mouse embryo are immature and actually incapable of engraftment.

      We have clearly stated that we do not see haematopoietic engraftment through transplantation of dissociated haemogenic gastruloids, which reach the E10 state containing pre-HSC (Author response image 5). Instead, we observed rare myelo-erythroid (in the manuscript) and myelo-lymphoid (Author response image 9 below, in response to Reviewer 2) engraftment upon in vivo maturation of haemogenic gastruloids with preserved 3D organisation. These statements are not contradictory.

      The statement '...low-level production of engrafting cells recapitulates their rarity in vivo, in agreement with the embryo-like qualities of the gastruloid system....' is incorrect. Firstly, no evidence has been provided to show the hGx has formed a dorsal aorta facsimile capable of generating cells with engrafting capacity. Secondly, although engrafting cells are rare in the AGM, approximately one per embryo, they are capable of robust and extensive engraftment upon transplantation.

      We are happy to rephrase the statement to simply say that “…the data suggest that 216h haemogenic gastruloids contain candidate AGM-like progenitors with some short-term engraftment potential but incomplete functional maturation.” To be clear, with our existing statement we meant to highlight that the production of definitive AGM-like haematopoietic progenitors (not all of which are engrafting) in haemogenic gastruloids does not correspond to non-physiological single-lineage programming. We did not claim that we achieved production of HSC, which would be long-term engrafting.

      (4) Expression MNX1 transcript and protein in hematopoietic cells in MNX1 rearranged acute myeloid leukaemia (AML) is one cause of AML in infants. In the hGX model of this disease, Mnx1 is overexpressed in the mESCs that are used to form gastruloids. Mnx1 overexpression seems to confer an overall growth advantage on the hGx and increase the serial replating capacity of the small number of hematopoietic cells that are generated. The inefficiency with which the hGx model generates hematopoietic cells makes it difficult to model this disease. The poor quality of the cytospin images prevents accurate identification of cells. The statement that the kit-expressing cells represent leukemic blast cells is not sufficiently validated to support this conclusion. What other stem cell genes are expressed? Surface kit expression also marks mast cells, frequently seen in clonogenic assays of blood cells. Flow cytometric and gene expression analyses using known markers would be required.

      The haemogenic gastruloid model generates haematopoietic and haemato-endothelial cells. MNX1 expands Kit+ cells at 144h, which we show to have a haemato-endothelial signature (manuscript Fig. 2B, which we replotted in Author response image 2B).

      Serial replating of CFC assays is a conventional in vitro assay of leukaemia transformation. Critically, colony replating is not maintained in EV control cells, attesting to the transformation potential of MNX1.

      Although we have not fully-traced the cellular hierarchy of MNX1-driven transformation in the haemogenic gastruloid system, the in vitro replating expands a Kit+ cell (Fig. 5E), which reflects the surface phenotype of the leukaemia, also recapitulated in the mouse model initiated by MNX1-overexpressing FL cells. Importantly, it recapitulates the transcriptional profile of MNX1-leukaemia patients (Fig. 6C), which is uniquely expressed by MNX1144h and replated colony cells, but not to MNX1 216h gastruloid cells, arguing against a generic signature of MNX1 overexpression (Fig. 6B). Importantly, the MNX1-transformation of haemogenic gastruloid cells is superior to the FL leukaemia model at capturing the unique transcriptional features of MNX1-driven leukaemia, distinct from other forms of AML in the same age group (Fig S7). It is possible that this corresponds to a preleukaemia event, and we will explore this in future studies, which are beyond the proof-of-principle nature of this paper.

      (5) In human infant MNX1 AML, the mutation is thought to arise at the fetal liver stage of development. There is no evidence that this developmental stage is mimicked in the hGx model.

      We never claim that the haemogenic gastruloid model mimics the foetal liver. We propose that susceptibility to MNX1 is at the HE-to-EMP transition. Moreover, and importantly, contrary to the Reviewer’s statement, there is no evidence in the literature that the mutation arises in the foetal liver stage, just that the mutation arises before birth (PMID: 38806630), which is different. In a mouse model of MNX1 overexpression, the authors achieve leukaemia engraftment upon MNX1 overexpression in foetal liver, but not in bone marrow cells (PMID: 37317878). This is in agreement with a vulnerability of embryonic / foetal, but not adult cells to the MNX1 expression caused by the translocation. However, haematopoietic cells in the foetal liver originate from YS and AGM precursors, so the origin of the MNX1-susceptible cells can be in those locations, rather than the foetal liver itself.

      Reviewer #2 (Public review):<br /> Summary:<br /> In this manuscript, the authors develop an exciting new hemogenic gastruloid (hGX) system, which they claim reproduces the sequential generation of various blood cell types. The key advantage of this cellular system would be its potential to more accurately recapitulate the spatiotemporal emergence of hematopoietic progenitors within their physiological niche compared to other available in vitro systems. The authors present a large set of data and also validate their new system in the context of investigating infant leukemia.<br /> Strengths:<br /> The development of this new in vitro system for generating hematopoietic cells is innovative and addresses a significant drawback of current in vitro models. The authors present a substantial dataset to characterize this system, and they also validate its application in the context of investigating infant leukemia.<br /> Weaknesses:<br /> The thorough characterization and full demonstration that the cells produced truly represent distinct waves of hematopoietic progenitors are incomplete. The data presented to support the generation of late yolk sac (YS) progenitors, such as lymphoid cells, and aortic-gonad-mesonephros (AGM)-like progenitors, including pre-hematopoietic stem cells (pre-HSCs), by this system are not entirely convincing. Given that this is likely the manuscript's most crucial claim, it warrants further scrutiny and direct experimental validation. Ideally, the identity of these progenitors should be further demonstrated by directly assessing their ability to differentiate into lymphoid cells or fully functional HSCs. Instead, the authors primarily rely on scRNA-seq data and a very limited set of markers (e.g., Ikzf1 and Mllt3) to infer the identity and functionality of these cells. Many of these markers are shared among various types of blood progenitors, and only a well-defined combination of markers could offer some assurance of the lymphoid and pre-HSC nature of these cells, although this would still be limited in the absence of functional assays.<br /> The identification of a pre-HSC-like CD45⁺CD41⁻/lo c-Kit⁺VE-Cadherin⁺ cell population is presented as evidence supporting the generation of pre-HSCs by this system, but this claim is questionable. This FACS profile may also be present in progenitors generated in the yolk sac such as early erythro-myeloid progenitors (EMPs). It is only within the AGM context, and in conjunction with further functional assays demonstrating the ability of these cells to differentiate into HSCs and contribute to long-term repopulation, that this profile could be strongly associated with pre-HSCs. In the absence of such data, the cells exhibiting this profile in the current system cannot be conclusively identified as true pre-HSCs.

      At this preliminary response stage, we present 2 additional pieces of evidence to support our claims that we capture YS and AGM stages of haematopoietic development. In future experiments, we can complement these with functional assays, including co-culture with OP9 and OP9-DL stroma.

      Author response image 8.

      EZH2 inhibition affects CD41+ cellular output in haemogenic gastruloids at 144, but not 216h. A. Flow cytometry analysis of CD41 expression in 144h-haemogenic gastruloid treated with 0.5μM EZH2 inhibitor GSK126 from 120h. DMSO (0.05%), vehicle. 1 of 2 independent experiments (average CD41+: DMSO, 21.20%; GSK126, 12.10%; CD45 not detected). B. Flow cytometry analysis of CD41 and CD45 expression in 216h gastruloids, treated with DMSO or GSK216. (DMSO: average CD41+, 15.28%; average CD45+ 0.46%. GSK126: average CD41+, 23.78%; average CD45+, 2.08%).

      In Author response images 5 and 6, we project our single-cell RNA-seq data onto (1) developing intra-embryonic pSP and AGM between E8 and E11 (Author response image 5) and (2) a single-cell RNA-seq study of HE development which combines haemogenic and haematopoietic cells from the YS, the developing HE and IAHC in the AGM, and FL (Author response image 6). Our data maps E8.25-E10 (Author response image 5) and captures YS EMP and erythroid and myeloid progenitors, as well as AGM pre-HE, HE and IAHC, with some cells matching HSPC and LMPP (Author response image 6), as suggested by the projection onto the Thambyrajah et al data set (Fig. S3 in the manuscript).

      Given the difficulty in finding markers that specifically associate with AGM haematopoiesis, we inspected the possibility of capturing different regulatory requirements at different stages of gastruloid development mirroring differential effects in the embryo. Polycomb EZH2 is specifically required for EMP differentiation in the YS, but does not affect AGM-derived haematopoiesis; it is also not required for primitive erythroid cells (PMID: 29555646; PMID: 34857757). We treated haemogenic gastruloids from 120h onwards with either DMSO (0.05%) or GSK126 (0.5μM), and inspected the cellularity of gastruloids at 144h, which we equate with YS-EMP, and 216h – putatively AGM haematopoiesis (Author response image 8). We show that EZH2 inhibition / GSK126 treatment specifically reduces %CD41+ cells at 144h (Author response image 8A), but does not reduce %CD41+ or %CD45+ cells at 216h (Author response image 8B).

      Although preliminary, these data, together with the scRNA-seq projections described, provide evidence to our claim that 144h haemogenic gastruloids capture YS EMPs, while CD41+ and CD45+ cells isolated at 216h reflect AGM progenitors. We cannot conclude as to the functional nature of the AGM cells from this experiment.

      The engraftment data presented are also not fully convincing, as the observed repopulation is very limited and evaluated only at 4 weeks post-transplantation. The cells detected after 4 weeks could represent the progeny of EMPs that have been shown to provide transient repopulation rather than true HSCs.

      We clearly state that there is low level engraftment and do not claim to have generated HSC. We describe cells with short-term engraftment potential. Although the cells we show in the manuscript at 4 weeks could be EMPs (Author response image 7 and Fig. 3 and S3), we now have 8-week analysis of implant recipients, in which we observed, again low-level, engraftment of the recipient bone marrow in 1:3 animals (Author response image 9). This engraftment is myeloid-lymphoid and therefore likely to have originated in a later progenitor. To be clear, we do not claim that this corresponds to the presence of HSC. It nevertheless supports the maturation of progenitors with engraftment potential.

      Author response image 9.

      Flow cytometry BFP engraftment of recipient bone marrow 8-weeks post implantation of 216hhaemogenic gastruloids in the adrenal gland of Nude mice. 1:3 animals show BFP CD45+ engraftment in the myeloid (Mac1+) and B-lymphoid (B220+) lineages. 3 haemogenic gastruloids were implanted unilaterally in the adrenal gland of each animal. A. Engrafted animal, showing CD45+ BFP cells of myeloid (CD11b) and B-lymphoid affiliation (B220). B. Non-engrafted mouse recipient of haemogenic gastruloid implants.

      Reviewer #3 (Public review):<br /> In this study, the authors employ a mouse ES-derived "hemogenic gastruloid" model which they generated and which they claim to be able to deconvolute YS and AGM stages of blood production in vitro. This work could represent a valuable resource for the field. However, in general, I find the conclusions in this manuscript poorly supported by the data presented. Importantly, it isn't clear what exactly are the "YS" and the "AGM"-like stages identified in the culture and where is the data that backs up this claim. In my opinion, the data in this manuscript lack convincing evidence that can enable us to identify what kind of hematopoietic progenitor cells are generated in this system. Therefore, the statement that "our study has positioned the MNX1-OE target cell within the YS-EMP stage (line 540)" is not supported by the evidence presented in this study. Overall, the system seems to be very preliminary and requires further optimization before those claims can be made.<br /> Specific comments below:<br /> (1) The flow cytometric analysis of gastruloids presented in Figure 1 C-D is puzzling. There is a large % of c-Kit+ cells generated, but few VE-Cad+ Kit+ double positive cells. Similarly, there are many CD41+ cells, but very few CD45+ cells, which one would expect to appear toward the end of the differentiation process if blood cells are actually generated. It would be useful to present this analysis as consecutive gating (i.e. evaluating CD41 and CD45 within VE-Cad+ Kit+ cells, especially if the authors think that the presence of VE-Cad+ Kit+ cells is suggestive of EHT). The quantification presented in D is misleading as the scale of each graph is different.

      Fig. 1C-D provide an overview of haemogenic markers during the timecourse of haemogenic gastruloid differentiation, and does indeed show a late up-regulation of CD45, as the Reviewer points out would be expected. The %CD45+ cells is indeed low. However, we should point out that the haemogenic gastruloid protocol, although biased towards mesodermal outputs, does not aim to achieve pure haematopoietic specification, but rather place it in its embryo-like context. Consecutive gating at the 216h-timepoint is shown and quantified in Fig. 3A-B. We refute that the scale is misleading. It is a necessity to represent the data in a way that is interpretable by the reader: the gates (in C) are truly representative and annotated, as are the plot axes (in D).

      (2) The imaging presented in Figure 1E is very unconvincing. C-Kit and CD45 signals appear as speckles and not as membrane/cell surfaces as they should. This experiment should be repeated and nuclear stain (i.e. DAPI) should be included.

      We include the requested images below (Author response image 10).

      Author response image 10.

      Confocal images of haematopoietic production in haemogenic gastruloids. Wholemount, cleared haemogenic gastruloids were stained for CD45 (pseudo-coloured red) and c-Kit antigens (pseudo-coloured yellow) with indirect staining, as described in the manuscript. Flk1-GFP signal is shown in green. Nuclei are contrasted with DAPI. (A) 192h. (B) 216h.

      (3) Overall, I am not convinced that hematopoietic cells are consistently generated in these organoids. The authors should sort hematopoietic cells and perform May-Grunwald Giemsa stainings as they did in Figure 6 to confirm the nature of the blood cells generated.

      It is factual that the data are reproducible and complemented by functional assays shown in Fig. 3, which clearly demonstrate haematopoietic output. The single-cell RNA-seq data also show expression of a haematopoietic programme. Nevertheless, we include Giemsa-Wright’s stained cytospins obtained at 216h to illustrate haematopoietic output (Reviewer Fig. 11). Inevitably, the cytospins will be inconclusive as to the presence of endothelial-to-haematopoietic transition or the generation of haematopoietic stem/progenitor cells, as these cells do not have a distinctive morphology.

      Author response image 11.

      Cytospin of dissociated haemogenic gastruloids at 216h. Cytospins were stained with Giemsa-Wright’s stain and are visualised with a 40x objective. Annotated are cells in the monocytic (dashed open arrow), granulocytic (solid open arrow), megakaryocytic (solid arrow) and erythroid (asterisk) lineages; arrowheads indicate cells with a non-specific blast-like morphology. Representative image.

      (4) The scRNAseq in Figure 2 is very difficult to interpret. Specific points related to this:<br /> - Cluster annotation in Figure 2a is missing and should be included.<br /> - Why do the heatmaps show the expression of genes within sorted cells? Couldn't the authors show expression within clusters of hematopoietic cells as identified transcriptionally (which ones are they? See previous point)? Gene names are illegible.<br /> - I see no expression of Hlf or Myb in CD45+ cells (Figure 2G). Hlf is not expressed by any of the populations examined (panels E, F, G). This suggests no MPP or pre-HSC are generated in the culture, contrary to what is stated in lines 242-245. (PMID 31076455 and 34589491).<br /> Later on, it is again stated that "hGx cells... lacked detection of HSC genes like Hlf, Gfi1, or Hoxa9" (lines 281-283). To me, this is proof of the absence of AGM-like hematopoiesis generated in those gastruloids.

      Author response image 12.

      Expression of endothelial, haemogenic and haematopoietic genes in haemogenic gastruloid cells sorted at 144h, 192h and 216h. UMAP as in Author response image 4. Pecam (CD31) and CD34 represent endothelial genes also detected in haemogenic endothelium. CD44 is specifically enriched at the endothelial-to-haemogenic transition. Mecom is detected in haemogenic endothelium and haematopoietic progenitors. Mllt3 and Runx1 are haematopoietic markers. Hoxa9 and Hlf are associated with haematopoietic stem and progenitor cells and their detection is rare in haemogenic gastruloids at 216h.

      For a combination of logistic and technical reasons, we performed single-cell RNA-seq using the Smart-Seq2 platform, which is inherently low throughput. We overcame the issue of cell coverage by complementing whole-gastruloid transcriptional profiling at successive time-points with sorting of subpopulations of cells based on individual markers documented in Fig. 1. We clearly stated which platform was used as well as the number and type of cells profiled (Fig. S2A and lines 172-179 of the manuscript), and our approach is standard. We will review our representation of the data in a revised manuscript. Nevertheless, at this stage, we provide plots of the expression of key haematopoietic markers over UMAPs of haemogenic gastruloid timecourse (Author response image 12). We also show preliminary qRT-PCR data with increased Hlf expression upon extension of the protocol to 264h (Author response image 13), further confirming haematopoietic specification, including of candidate definitive progenitor cells, in the haemogenic gastruloid model.

      Author response image 13.

      Hlf expression is up-regulated in late stage haemogenic gastruloids. Quantitative RT-PCR analysis of Hlf expression in unfractionated haemogenic gastruloids cultured for 264h. From 168h onwards, haemogenic gastruloids were cultured in N2B27 in the presence of VEGF, SCF, FLT3L and TPO, all recombinant mouse cytokines, as described in the manuscript. Shown are mean±standard deviation of n=5 replicates from 2 mouse ES cell lines, respectively Flk1-GFP and Rosa26-BFP::Flk1-GFP, reported in the manuscript; 2-tailed unpaired t-test with Welch correction.

      (5) Mapping of scRNA-Seq data onto the dataset by Thambyrajah et al. is not proof of the generation of AGM HE. The dataset they are mapping to only contains AGM cells, therefore cells do not have the option to map onto something that is not AGM. The authors should try mapping to other publicly available datasets also including YS cells.

      We have done this and the data are presented in Author response image 5 and 6. As detailed in response to Reviewer 1, we have conducted projections of our single-cell RNA-seq data against two studies which (1) capture arterial and haemogenic specification in the para-splanchnopleura (pSP) and AGM region between E8.0 and E11 (Hou et al, PMID: 32203131) (Author response image 5), and (2) uniquely capture YS, AGM and FL progenitors and the AGM endothelial-to-haematopoietic transition (EHT) in the same scRNA-seq dataset (Zhu et al, PMID: 32392346) (Author response image 6). Specifically in answering the Reviewers’ point, we show that different subsets of haemogenic gastruloid cells sorted on haemogenic surface markers c-Kit, CD41 and CD45 cluster onto pre-HE and HE, intra-aortic clusters and FL progenitor compartments, and to YS EMP and erythroid and myeloid progenitors. This lends support to our claim that the haemogenic gastruloid system specifies both YS-like and AGM-like cells.

      (6) Conclusions in Figure 3, named "hGx specify cells with preHSC characteristics" are not supported by the data presented here. Again, I am not convinced that hematopoietic cells can be efficiently generated in this system, and certainly not HSCs or pre-HSCs.

      We have provided evidence, both in the manuscript and in this response to Reviewers, that there is haematopoietic specification, including of progenitor cells, in the haemogenic gastruloid system (Fig. 3 and Author response image 7,9). We have added data in this response that supports the specification of YS-like and AGM-like cells (Author response image 5-6, 8). Importantly, we have never claimed that haemogenic gastruloids generate HSC. We accept the Reviewer’s comment that we have not provided sufficient evidence for the specification of pre-HSC-like cells. We will re-phrase Fig. 3 conclusion as “Haemogenic gastruloids specify cells with characteristics of definitive haematopoietic progenitors”.

      - FACS analysis in 3A is again very unconvincing. I do not think the population identified as c-Kit+ CD144+ is real. Also, why not try gating the other way around, as commonly done (e.g. VE-Cad+ Kit+ and then CD41/CD45)?

      There is nothing unconventional about our gating strategy, which was done from a more populated gate onto the less abundant one to ensure that the results are numerically more robust. In the case of haemogenic gastruloids, unlike the AGM preparations the Reviewer may be referring to, CD41 and CD45+ cells are more abundant as there is no circulation of more differentiated haematopoietic cells away from the endothelial structures. This said, we did perform the gating as suggested (Author response image 14), indeed confirming that most VE-cad+ Kit+ cells are CD45+. Interestingly VE-cad+Kit- are predominantly CD41+, reinforcing the true haematopoietic nature of these cells.

      Author response image 14.

      Flow cytometry analysis of VE-cadherin+ cells in haemogenic gastruloids at 216h of the differentiation protocol, probing co-expression of CD45, CD41 and c-Kit.

      - The authors must have tried really hard, but the lack of short- or long-engraftment in a number of immunodeficient mouse models (lines 305-313) really suggests that no blood progenitors are generated in their system. I am not familiar with the adrenal gland transplant system, but it seems like a very non-physiological system for trying to assess the maturation of putative pre-HSCs. The data supporting the engraftment of these mice, essentially seen only by PCR and in some cases with a very low threshold for detection, are very weak, and again unconvincing. It is stated that "BFP engraftment of the Spl and BM by flow cytometry was very low level albeit consistently above control (Fig. S4E)" (lines 337-338). I do not think that two dots in a dot plot can be presented as evidence of engraftment.

      We have presented the data with full disclosure and do not deny that the engraftment achieved is low-level and short-term, indicating incomplete maturation of definitive haematopoietic progenitors in the current haemogenic gastruloid system. However, we call the Reviewer’s attention to the fact that detection of BFP+ cells by PCR and flow cytometry in the recipient animals at 4 weeks is consistent between the 2 methods (Author response image 7).

      Furthermore, we have now also been able to detect low-level myelo-lymphoid engraftment in the bone marrow 8 weeks after adrenal implantation, again suggesting the presence of a small number of definitive haematopoietic progenitors that potentially mature from the 3 haemogenic gastruloids implanted (Author response image 9).

      (7) Given the above, I find that the foundations needed for extracting meaningful data from the system when perturbed are very shaky at best. Nevertheless, the authors proceed to overexpress MNX1 by LV transduction, a system previously shown to transform fetal liver cells, mimicking the effect of the t(7;12) AML-associated translocation. Comments on this section:<br /> - The increase in the size of the organoid when MNX1 is expressed is a very unspecific finding and not necessarily an indication of any hematopoietic effect of MNX1 OE.

      We agree with the Reviewer on this point; it is nevertheless a reproducible observation which we thought relevant to describe for completeness and data reproducibility.

      - The mild increase of cKit+ cells (Figure 4E) at the 144hr timepoint and the lack of any changes in CD41+ or CD45+ cells suggests that the increase in Kit+ cells % is not due to any hematopoietic effect of MNX1 OE. No hematopoietic GO categories are seen in RNA seq analysis, which supports this interpretation. Could it be that just endothelial cells are being generated?

      The Reviewer is correct that the MNX1-overexpressing cells have a strong endothelial signature, which is present in the patients (Fig. 4A). We investigated a potential link with c-Kit by staining cells from the replating colonies during the process of in vitro transformation with CD31. We observed that 40-50% of c-Kit+ cells (20-30% total colony cells) co-expressed CD31(Author response image 15), at least at early plating. These cells co-exist with haematopoietic cells, namely Ter119+ cells, as expected from the YS-like erythroid and EMP-like affiliation of haematopoietic output from 144h-haemogenic gastruloids (Fig. 5F).

      Author response image 15.

      Endothelial affiliation of MNX1-oe replating cells from haemogenic gastruloid. A. Representative flow cytometry plot of plate 1 CFC from MNX1-overexpressing haemogenic gastruloids at 144h. B. Quantification of the proportion of CD31+c-Kit+ cells in plates 1 and 2 of MNX1-oe-driven in vitro transformation.

      (8) There seems to be a relatively convincing increase in replating potential upon MNX1-OE, but this experiment has been poorly characterized. What type of colonies are generated? What exactly is the "proportion of colony forming cells" in Figures 5B-D? The colony increase is accompanied by an increase in Kit+ cells; however, the flow cytometry analysis has not been quantified.

      Given the inability to replate control EV cells, there is not a population to compare with in terms of quantification. The level of c-Kit+ represented in Fig. 5E is achieved at plate 2 or 3 (depending on the experiment), both of which are significantly enriched for colony-forming cells relative to control (Fig. 5B, D).

      (9) Do hGx cells engraft upon MNX1-OE? This experiment, which appears not to have been performed, is essential to conclude that leukemic transformation has occurred.

      For the purpose of this study, we are satisfied with confirmation of in vitro transformation potential of MNX1 haemogenic gastruloids, which can be used for screening purposes. Although interesting, in vivo leukaemia engraftment from haemogenic gastruloids is beyond the scope of this study.

    1. Author response:

      We kindly thank the senior editor, the reviewing editor, and the esteemed reviewers for their invaluable insights in enhancing our manuscript. The assessment and feedback, particularly on the role of directly released bacterial ATP versus OMV-delivered bacterial ATP and its role on neutrophils, addressing study limitations, and discussing our models is highly appreciated.

      The points you raised let us critically rethink our approach, our results, and our conclusions. Furthermore, it gave us the chance to elaborate on some critical aspects that you mentioned. With your help, we will make clarifications throughout the manuscript, and we will add the data about neutrophil numbers in the different organs (reviewer #1, weaknesses #3).

      Reviewer #1 (Public Review):

      Summary:

      • Extracellular ATP represents a danger-associated molecular pattern associated to tissue damage and can act also in an autocrine fashion in macrophages to promote proinflammatory responses, as observed in a previous paper by the authors in abdominal sepsis. The present study addresses an important aspect possibly conditioning the outcome of sepsis that is the release of ATP by bacteria. The authors show that sepsis-associated bacteria do in fact release ATP in a growth dependent and strain-specific manner. However, whether this bacterial derived ATP play a role in the pathogenesis of abdominal sepsis has not been determined. To address this question, a number of mutant strains of E. coli has been used first to correlate bacterial ATP release with growth and then, with outer membrane integrity and bacterial death. By using E. coli transformants expressing the ATP-degrading enzyme apyrase in the periplasmic space, the paper nicely shows that abdominal sepsis by these transformants results in significantly improved survival. This effect was associated with a reduction of peritoneal macrophages and CX3CR1+ monocytes, and an increase in neutrophils. To extrapolate the function of bacterial ATP from the systemic response to microorganisms, the authors exploited bacterial OMVs either loaded or not with ATP to investigate the systemic effects devoid of living microorganisms. This approach showed that ATP-loaded OMVs induced degranulation of neutrophils after lysosomal uptake, suggesting that this mechanism could contribute to sepsis severity.

      Strengths:

      • A strong part of the study is the analysis of E. coli mutants to address different aspects of bacterial release of ATP that could be relevant during systemic dissemination of bacteria in the host.

      We want to thank the reviewer for recognizing this important aspect of our experimental approach.

      Weaknesses:

      • As pointed out in the limitations of the study whether ATP-loaded OMVs provide a mechanistic proof of the pathogenetic role of bacteria-derived ATP independently of live microorganisms in sepsis is interesting but not definitively convincing. It could be useful to see whether degranulation of neutrophils is differentially induced by apyrase-expressing vs control E. coli transformants.

      We thank the reviewer for raising several important points. In our study, we assessed local and systemic effects of released bacterial ATP. The consequences of local bacterial ATP release were assessed using an apyrase-expressing E. coli transformant. Locally, bacterial ATP resulted in a decrease in neutrophil numbers and we hypothesize that directly released bacterial ATP either leads to neutrophil death (e.g. via P2X7 receptor (Proietti et al., 2019)) or interferes with the recruitment of neutrophils (e.g. via P2Y receptors (Junger, 2011)).

      The systemic consequences were assessed using ATP-loaded and empty OMV. We have shown that degranulation is induced by OMV-derived bacterial ATP. ATP-containing OMV are engulfed by neutrophils, reach its endolysosomal compartment and might activate purinergic receptors, which then lead to aberrant degranulation. This concept, that needs to be explored in future studies, is fundamentally different from classical purinergic signaling via directly released bacterial ATP into the extracellular space.

      It is possible that neutrophil degranulation is also modulated by directly released bacterial ATP. We agree that this should be assessed in future studies. Also, the role of OMV-derived bacterial ATP should be assessed locally as well as the importance of directly released vs. OMV-mediated bacterial ATP dissected locally. Based on our measurements (Figure 4-figure supplement 1A and Figure 5C), we estimate that the effect of OMV-derived bacterial ATP might be much smaller than the effects of directly released bacterial ATP. Thus, direct ATP release might predominate locally. However, we fully agree that this has to be investigated in a future study to reconcile the different aspects of bacterial ATP signaling. A paragraph will be added to the manuscript, in which we discuss this particular issue.

      • Also, the increase of neutrophils in bacterial ATP-depleted abdominal sepsis, which has better outcomes than "ATP-proficient" sepsis, seems difficult to correlate to the hypothesized tissue damage induced by ATP delivered via non-infectious OMVs.

      We fully acknowledge the mentioned discrepancy. What we propose is that bacterial ATP exhibits different functions that are dependent on the release mechanism (see above). Locally, in the peritoneal cavity, neutrophil numbers are decreased by directly released bacterial ATP. Remotely, ATP is delivered via OMV and impacts on neutrophil function. We agree that, in particular, in the peritoneal cavity, both effects may play a role. However, the impact of directly released bacterial ATP seems to be dominant (see above).

      We propose that neutrophils are decreased locally because of directly released bacterial ATP, which prevents efficient infection control and, therefore, impairs sepsis survival. In addition, these fewer neutrophils might even be dysregulated by the engulfment of bacterial ATP delivered via OMV, which leads to an upregulated and possibly aberrant degranulation process worsening local and remote tissue damage. We agree that in addition to neutrophil numbers, the function of local neutrophils should be assessed with and without the influence of OMV-delivered bacterial ATP. This could be done by RNA sequencing of primary neutrophils from the peritoneal cavity or neutrophil cell lines as well as degranulation assays.

      • Are the neutrophils counts affected by ATP delivered via OMVs?

      This is difficult to show in the peritoneal cavity where we have both, directly released bacterial ATP and OMV-derived bacterial ATP. We assessed such putative difference, however, for the systemic organs and the blood, where we did not find any differences in neutrophil numbers. We will include the figure in the revised manuscript as Figure 6-figure supplement 3C.

      Author response image 1.

      • A comparison of cytokine profiles in the abdominal fluids of E. coli and OMV treated animals could be helpful in defining the different responses induced by OMV-delivered vs bacterial-released ATP. The analyses performed on OMV treated versus E. coli infected mice are not closely related and difficult to combine when trying to draw a hypothesis for bacterial ATP in sepsis.

      We fully agree that there are several open questions that remain to be elucidated, in particular, to differentiate the local role of directly released versus OMV-delivered bacterial ATP. In this study, we laid the foundation for future in vivo research to examine the specific role of bacterial ATP in sepsis. Such future research avenues might be to investigate the local effects of OMV-delivered bacterial ATP, and how neutrophil migration, apoptosis and degranulation are altered. We agree that exploration of the local secretory immune response and cytokine profiles are relevant to understand the different mechanisms of how bacterial ATP alters sepsis. However, such experiments should be ideally performed in systems where the source and the delivery of ATP can be modulated locally.

      • Also it was not clear why lung neutrophils were used for the RNAseq data generation and analysis.

      Thank you for this remark. We have chosen primary lung neutrophils for four reasons:

      (1) Isolation of primary lung neutrophils allowed us to assess an in vivo response that would not have been possible with cell lines.

      (2) The lung and the respiratory system are among the clinically most important organs affected during sepsis resulting in a significant cause of mortality.

      (3) We show in Figure 6C that specifically in the lung, OMV are engulfed by neutrophils, which shows the relevance of the lung also in our study context.

      (4) And finally, lung neutrophils were chosen to examine specifically distant and not local effects.

      Reviewer #2 (Public Review):

      Summary:

      • In their manuscript "Released Bacterial ATP Shapes Local and Systemic Inflammation during Abdominal Sepsis", Daniel Spari et al. explored the dual role of ATP in exacerbating sepsis, revealing that ATP from both host and bacteria significantly impacts immune responses and disease progression.

      Strengths:

      • The study meticulously examines the complex relationship between ATP release and bacterial growth, membrane integrity, and how bacterial ATP potentially dampens inflammatory responses, thereby impairing survival in sepsis models. Additionally, this compelling paper implies a concept that bacterial OMVs act as vehicles for the systemic distribution of ATP, influencing neutrophil activity and exacerbating sepsis severity.

      We thank the reviewer for mentioning these key points and supporting the relevance of our study.

      Weaknesses:

      (1) The researchers extracted and cultivated abdominal fluid on LB agar plates, then randomly picked 25 colonies for analysis. However, they did not conduct 16S rRNA gene amplicon sequencing on the fluid itself. It is worth noting that the bacterial species present may vary depending on the individual patients. It would be beneficial if the authors could specify whether they've verified the existence of unculturable species capable of secreting high levels of Extracellular ATP.

      Most septic complications are caused by a limited spectrum of bacteria, belonging mainly either to the Firmicutes or the Proteobacteria phyla, including E. coli, K. pneumoniae, S. aureus or E. faecalis (Diekema et al., 2019; Mureșan et al., 2018). We validated this well documented existing evidence by randomly assessing 25 colonies. For the planned experiments, it was crucial to work with culturable bacteria; otherwise, ATP measurements, the modulation of ATP generation or loading of OMV would not have been possible. Using such culturable bacteria allowed us to describe mechanisms of ATP release.

      We fully agree that hard-to-culture or unculturable bacteria might contribute significantly to septic complications. This, however, would need to be explored in future studies using extensive culturing methods (Cheng et al., 2022).

      (2) Do mice lacking commensal bacteria show a lack of extracellular ATP following cecal ligation puncture?

      ATP is typically secreted by many cells of the host in active and passive manners in the case of any injury, including cecal ligation and puncture (Burnstock, 2016; Dosch et al., 2018; Eltzschig et al., 2012; Idzko et al., 2014). We hypothesize that bacterial ATP is a potential priming agent at early stages of sepsis, and indeed, at such early time points, a comparison of peritoneal ATP levels between germfree and colonized mice could support our hypothesis. Future studies addressing this question must, however, correct for the different immune responses between germ-free and colonized mice. This is of utmost importance, especially for the cecal ligation and puncture model, since the cecum of germ-free mice is extremely large, making such experiments hard to control.

      (3) The authors isolated various bacteria from abdominal fluid, encompassing both Gram-negative and Gram-positive types. Nevertheless, their emphasis appeared to be primarily on the Gram-negative E. coli. It would be beneficial to ascertain whether the mechanisms of Extracellular ATP release differ between Gram-positive and Gram-negative bacteria. This is particularly relevant given that the Gram-positive bacterium E. faecalis, also isolated from the abdominal fluid, is recognized for its propensity to release substantial amounts of Extracellular ATP.

      We fully agree with this comment. In this paper, we used E. coli as our model organism to determine the principles of sepsis-associated bacterial ATP release and therefore focused on gram-negative bacteria. In addition to the direct, growth-dependent release, we found a relevant impact of OMV-delivered bacterial ATP. For this latter purpose, a gram-negative strain, in which OMV generation has been well described (Schwechheimer & Kuehn, 2015), was chosen. Recently, gram-positive bacteria have been shown to secrete ATP and OMV as well (Briaud & Carroll, 2020; Hironaka et al., 2013; Iwase et al., 2010). Given the fundamental differences in the structure of the cell wall of gram-positive bacteria and the mechanisms of OMV generation and release, future studies are required to assess the relevance of directly released and OMV-delivered ATP in gram-positive bacteria.

      (4) The authors observed changes in the levels of LPM, SPM, and neutrophils in vivo. However, it remains uncertain whether the proliferation or migration of these cells is modulated or inhibited by ATP receptors like P2Y receptors. This aspect requires further investigation to establish a convincing connection.

      We fully agree with this comment. The decrease in LPM and the consequential predomination of SPM have been well described after inflammatory stimuli in the context of the macrophage disappearance reaction (Ghosn et al., 2010). Also, it has been shown that purinergic signaling modulates infiltration of neutrophils and can lead to cell death as a consequence of P2Y and P2X receptor activation (Junger, 2011; Proietti et al., 2019). In our study, we propose that intracellular purinergic receptors contribute to neutrophil function during sepsis. After introducing the general principles and fundaments of bacterial ATP with our studies, we fully agree that additional experiments need to address downstream purinergic receptor activation. That, however, would go beyond the scope of our study.

      (5) Additionally, is it possible that the observed in vivo changes could be triggered by bacterial components other than Extracellular ATP? In this research field, a comprehensive collection of inhibitors is available, so it is desirable to utilize them to demonstrate clearer results.

      This question is of utmost importance and defined the choice of our model and experimental approach. When we started the project, we used two different E. coli mutants that release low (ompC) and high (eaeH) amounts of ATP. However, the limitation of this approach is that these are different bacteria, which may also differ in the components they secrete or the surface proteins they express. We, therefore, decided against that approach. With the approach we finally used (same bacterium, just with and without ATP), we aimed to minimize the influence of non-ATP bacterial components.

      (6) Have the authors considered the role of host-derived Extracellular ATP in the context of inflammation?

      Yes, the role of host-derived extracellular ATP in inflammation and sepsis is well-established with contradictory results (Csóka et al., 2015; Ledderose et al., 2016). This conflicting data was the rationale to test the relevance of bacterial ATP. We suggest that bacterial ATP is essential in the early phase of sepsis when bacteria invade the sterile compartment and before efficient host response, including the eukaryotic release of ATP, is established.

      (7) The authors mention that Extracellular ATP is rapidly hydrolyzed by ectonucleotases in vivo. Are the changes of immune cells within the peritoneal cavity caused by Extracellular ATP released from bacterial death or by OMVs?

      This is a relevant question that was also asked by reviewer #1, and we answered it in detail above (weaknesses comment #1 and #2). From our ATP measurements (Figure 4-figure supplement 1A and Figure 5C), we conclude that locally, the role of directly released bacterial ATP (extracellular) predominates over OMV-derived bacterial ATP. Furthermore, the mechanisms between directly released and OMV-derived bacterial ATP (within OMV, engulfed and transported to the endolysosomal compartment) are different, and especially extracellular ATP has been described to lead to apoptosis via P2X7 signaling.

      (8) In the manuscript, the sample size (n) for the data consistently remains at 2. I would suggest expanding the sample size to enhance the robustness and rigor of the results.

      Two biological replicates (independent cultures) were only used for the bacteria cultures in Figure 1, Figure 2, and Figure 3, which achieved similar results and the standard deviation remained very small, indicating its robustness. In the in vitro experiments in Figure 5 we used a sample size of 6 (three biological replicates measured in technical duplicates), since we saw bigger deviations in our measurements. For the in vivo experiments, we always used 5 or more animals in at least two independent experiments.

      References

      Briaud, P., & Carroll, R. K. (2020). Extracellular Vesicle Biogenesis and Functions in Gram-Positive Bacteria. Infection and Immunity, 88(12), 10.1128/iai.00433-20. https://doi.org/10.1128/iai.00433-20

      Burnstock, G. (2016). P2X ion channel receptors and inflammation. Purinergic Signalling, 12(1), 59–67. https://doi.org/10.1007/s11302-015-9493-0

      Cheng, A. G., Ho, P.-Y., Aranda-Díaz, A., Jain, S., Yu, F. B., Meng, X., Wang, M., Iakiviak, M., Nagashima, K., Zhao, A., Murugkar, P., Patil, A., Atabakhsh, K., Weakley, A., Yan, J., Brumbaugh, A. R., Higginbottom, S., Dimas, A., Shiver, A. L., … Fischbach, M. A. (2022). Design, construction, and in vivo augmentation of a complex gut microbiome. Cell, 185(19), 3617-3636.e19. https://doi.org/10.1016/j.cell.2022.08.003

      Csóka, B., Németh, Z. H., Törő, G., Idzko, M., Zech, A., Koscsó, B., Spolarics, Z., Antonioli, L., Cseri, K., Erdélyi, K., Pacher, P., & Haskó, G. (2015). Extracellular ATP protects against sepsis through macrophage P2X7 purinergic receptors by enhancing intracellular bacterial killing. The FASEB Journal, 29(9), 3626–3637. https://doi.org/10.1096/fj.15-272450

      Diekema, D. J., Hsueh, P.-R., Mendes, R. E., Pfaller, M. A., Rolston, K. V., Sader, H. S., & Jones, R. N. (2019). The Microbiology of Bloodstream Infection: 20-Year Trends from the SENTRY Antimicrobial Surveillance Program. Antimicrobial Agents and Chemotherapy, 63(7), e00355-19. https://doi.org/10.1128/AAC.00355-19

      Dosch, M., Gerber, J., Jebbawi, F., & Beldi, G. (2018). Mechanisms of ATP Release by Inflammatory Cells. International Journal of Molecular Sciences, 19(4), 1222. https://doi.org/10.3390/ijms19041222

      Eltzschig, H. K., Sitkovsky, M. V., & Robson, S. C. (2012). Purinergic Signaling during Inflammation. New England Journal of Medicine, 367(24), 2322–2333. https://doi.org/10.1056/NEJMra1205750

      Ghosn, E. E. B., Cassado, A. A., Govoni, G. R., Fukuhara, T., Yang, Y., Monack, D. M., Bortoluci, K. R., Almeida, S. R., Herzenberg, L. A., & Herzenberg, L. A. (2010). Two physically, functionally, and developmentally distinct peritoneal macrophage subsets. Proceedings of the National Academy of Sciences, 107(6), 2568–2573. https://doi.org/10.1073/pnas.0915000107

      Hironaka, I., Iwase, T., Sugimoto, S., Okuda, K., Tajima, A., Yanaga, K., & Mizunoe, Y. (2013). Glucose Triggers ATP Secretion from Bacteria in a Growth-Phase-Dependent Manner. Applied and Environmental Microbiology, 79(7), 2328–2335. https://doi.org/10.1128/AEM.03871-12

      Idzko, M., Ferrari, D., & Eltzschig, H. K. (2014). Nucleotide signalling during inflammation. Nature, 509(7500), 310–317. https://doi.org/10.1038/nature13085

      Iwase, T., Shinji, H., Tajima, A., Sato, F., Tamura, T., Iwamoto, T., Yoneda, M., & Mizunoe, Y. (2010). Isolation and Identification of ATP-Secreting Bacteria from Mice and Humans. Journal of Clinical Microbiology, 48(5), 1949–1951. https://doi.org/10.1128/JCM.01941-09

      Junger, W. G. (2011). Immune cell regulation by autocrine purinergic signalling. Nature Reviews Immunology, 11(3), 201–212. https://doi.org/10.1038/nri2938

      Ledderose, C., Bao, Y., Kondo, Y., Fakhari, M., Slubowski, C., Zhang, J., & Junger, W. G. (2016). Purinergic Signaling and the Immune Response in Sepsis: A Review. Clinical Therapeutics, 38(5), 1054–1065. https://doi.org/10.1016/j.clinthera.2016.04.002

      Mureșan, M. G., Balmoș, I. A., Badea, I., & Santini, A. (2018). Abdominal Sepsis: An Update. The Journal of Critical Care Medicine, 4(4), 120–125. https://doi.org/10.2478/jccm-2018-0023

      Proietti, M., Perruzza, L., Scribano, D., Pellegrini, G., D’Antuono, R., Strati, F., Raffaelli, M., Gonzalez, S. F., Thelen, M., Hardt, W.-D., Slack, E., Nicoletti, M., & Grassi, F. (2019). ATP released by intestinal bacteria limits the generation of protective IgA against enteropathogens. Nature Communications, 10(1), Article 1. https://doi.org/10.1038/s41467-018-08156-z

      Schwechheimer, C., & Kuehn, M. J. (2015). Outer-membrane vesicles from Gram-negative bacteria: Biogenesis and functions. Nature Reviews Microbiology, 13(10), 605–619. https://doi.org/10.1038/nrmicro3525

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This work made a lot of efforts to explore the multifaceted roles of the inferior colliculus (IC) in auditory processing, extending beyond traditional sensory encoding. The authors recorded neuronal activitity from the IC at single unit level when monkeys were passively exposed or actively engaged in behavioral task. They concluded that 1)IC neurons showed sustained firing patterns related to sound duration, indicating their roles in temporal perception, 2) IC neuronal firing rates increased as sound sequences progress, reflecting modulation by behavioral context rather than reward anticipation, 3) IC neurons encode reward prediction error and their capability of adjusting responses based on reward predictability, 4) IC neural activity correlates with decision-making. In summary, this study tried to provide a new perspective on IC functions by exploring its roles in sensory prediction and reward processing, which are not traditionally associated with this structure.

      Strengths:

      The major strength of this work is that the authors performed electrophysiological recordings from the IC of behaving monkeys. Compared with the auditory cortex and thalamus, the IC in monkeys has not been adequately explored.

      We appreciate the reviewer’s acknowledgment of the efforts and strengths of our study. Indeed, our goal was to provide a comprehensive exploration of the multifaceted roles of the inferior colliculus (IC) in auditory processing and beyond, particularly in sensory prediction and reward processing. The use of electrophysiological recordings in behaving monkeys was central to our approach, as we sought to uncover the underexplored aspects of IC function in these complex cognitive domains. We are pleased that the reviewer recognizes the value of investigating the IC, a structure that has not been adequately explored in primates compared to other auditory regions like the cortex and thalamus. This feedback reinforces our belief that our work contributes significantly to advancing the understanding of the IC's roles in cognitive processing.

      We look forward to addressing any further points the reviewers may have and refining our manuscript accordingly. Thank you for your constructive feedback and for recognizing the strengths of our research approach.

      Weaknesses:

      (1) The authors cited several papers focusing on dopaminergic inputs in the IC to suggest the involvement of this brain region in cognitive functions. However, all those cited work were done in rodents. Whether monkey's IC shares similar inputs is not clear.

      We appreciate the reviewer's insightful comment on the limitations of extrapolating findings from rodent models to monkeys, particularly concerning dopaminergic inputs to the Inferior Colliculus (IC). While it is true that most studies on dopaminergic inputs to the IC have been conducted in rodents, to our knowledge, no studies have been conducted specifically in primates. To address the reviewer's concern, we have added a statement in both the introduction and discussion sections of our manuscript:

      - Introduction: " However, these studies were conducted in rodents, and the existence and role of dopaminergic inputs in the primate IC remain underexplored."

      - Discussion: " However, the exact mechanisms and functions of dopamine modulation in the inferior colliculus are still not fully understood, particularly in primates. "

      (2) The authors confused the two terms, novelty and deviation. According to their behavioral paradigm, deviation rather than novelty should be used in the paper because all the stimuli have been presented to the monkeys during training. Therefore, there is actually no novel stimuli but only deviant stimuli. This reflects that the author has misunderstood the basic concept.

      We appreciate the reviewer's clarification regarding the distinction between "novelty" and "deviation" in the context of our behavioral paradigm. We agree that, given the nature of our experimental design where all stimuli were familiar to the monkeys during training, the term "deviation" more accurately describes the stimuli used in our study rather than "novelty."

      To address this, we have revised the manuscript to replace the term "novelty" with "deviation" wherever applicable. This change has been made to ensure accurate terminology is used throughout the paper, thereby eliminating any potential misunderstanding of the concepts involved in our study.

      We thank the reviewer for pointing out this important distinction, which has improved the clarity and precision of our manuscript.

      (3) Most of the conclusions were made based on correlational analysis or speculation without providing causal evidences.

      We appreciate the reviewer’s concern regarding the reliance on correlational analyses in our study. Indeed, we acknowledge that the conclusions drawn primarily reflect correlations between neuronal activity and behavioral outcomes, rather than direct causal evidence. This limitation is inherent to many electrophysiological studies, particularly those conducted in behaving primates, where direct manipulation of specific neural circuits to establish causality is often challenging.

      This limitation becomes even more complex when considering the IC’s role as a key lower-level relay station in the auditory pathway. Manipulating IC activity could potentially affect auditory responses in downstream pathways, which, in turn, may influence sensory prediction and decision-making processes. Moreover, we hypothesize that the sensory prediction and reward signals observed in the IC may not have direct causal effects but may instead be driven by top-down projections from higher cognitive regions. However, it is important to emphasize that our study provides novel evidence that the IC may exhibit multiple facets of cognitive signaling, which could inspire future research into the underlying mechanisms and broader functional implications of these signals.

      To address this, we have taken the following steps in our revised manuscript:

      (1) Clarified the Scope of Conclusions: We have revised the language in the Results and Discussion sections to explicitly state that our findings represent correlational relationships rather than causal mechanisms. For example, we now refer to the associations observed between IC activity and behavioral outcomes as "correlational" and have refrained from making definitive causal claims without supporting experimental evidence.

      (2) Proposed Future Directions: In the Discussion section, we have included suggestions for future studies to directly test the causality of the observed relationships. We acknowledge the need for further investigation to substantiate the causal links between IC activity and cognitive functions such as sensory prediction, decision-making, and reward processing.

      We believe these revisions provide a more balanced interpretation of our findings while emphasizing the importance of future research to build on our results and establish causal relationships. Thank you for raising this critical point, which has led to a more rigorous and transparent presentation of our study.

      (4) Results are presented in a very "straightforward" manner with too many detailed descriptions of phenomena but lack of summary and information synthesis. For example, the first section of Results is very long but did not convey clear information.

      We appreciate the reviewer’s feedback regarding the presentation of our results. We understand that the detailed descriptions of phenomena may have made it difficult to discern the key findings and overarching themes in the study. We recognize the importance of balancing detailed reporting with clear summaries and synthesis to effectively communicate our findings.

      To address this concern, we have made the following revisions to the manuscript:

      (1) Condensed and Synthesized Key Findings: We have streamlined the presentation of the Results section by condensing overly detailed descriptions and focusing on the most critical aspects of the data. Key findings are now summarized at the end of each subsection to ensure that the main points are clearly conveyed.

      (2) Enhanced Section Summaries: We have added summary statements at the end of each major results section to synthesize the findings and highlight their significance. This should help guide the reader through the narrative and emphasize the key takeaways from each part of the study.

      (3) Improved Flow and Clarity: We have revised the structure and organization of the Results section to improve the flow of information. By rearranging certain paragraphs and refining the language, we aim to present the results in a more cohesive and coherent manner.

      We believe these changes will make the Results section more accessible and informative, allowing readers to more easily grasp the significance of our findings. Thank you for your valuable suggestion, which has significantly improved the clarity and impact of our manuscript.

      (5) The logic between different sections of Results is not clear.

      We appreciate the reviewer’s observation regarding the lack of clear logical connections between different sections of the Results. We acknowledge that a coherent flow is essential for effectively communicating the progression of findings and their implications.

      To address this concern, we have made the following revisions:

      (1) Enhanced Transitions Between Sections: We have introduced clearer transitional statements between sections of the Results. These transitions explicitly state how each new section builds upon or relates to the previous findings, creating a more cohesive narrative.

      (2) Integration of Findings: In several places within the Results, we have added brief synthesis paragraphs that integrate findings across sections. These integrative summaries help to tie together the different aspects of our study, demonstrating how they collectively contribute to our understanding of the Inferior Colliculus’s (IC) role in sensory prediction, decision-making, and reward processing.

      (3) Clarified Rationale: At the beginning of each major section, we have clarified the rationale behind why certain experiments were conducted, connecting them more clearly to the overarching goals of the study. This should help the reader understand the purpose of each set of results in the context of the broader research objectives.

      We believe these changes improve the overall coherence and readability of the Results section, allowing readers to better follow the logical progression of our study. We are grateful for this constructive feedback and believe it has significantly enhanced the manuscript.

      (6) In the Discussion, there is excessive repetition of results, and further comparison with and discussion of potentially related work are very insufficient. For example, Metzger, R.R., et al. (J Neurosc, 2006) have shown similar firing patterns of IC neurons and correlated their findings with reward.

      We appreciate the reviewer's insightful critique regarding the excessive repetition in the Discussion and the lack of sufficient comparison with related work. We acknowledge that a well-balanced Discussion should not only interpret findings but also place them in the context of existing literature to highlight the novelty and significance of the study.

      To address these concerns, we have made the following revisions:

      (1) Reduction of Repetition: We have carefully revised the Discussion to minimize redundant repetition of the Results. Instead of restating the findings, we now focus more on their implications, limitations, and how they advance the current understanding of the Inferior Colliculus (IC) and its broader cognitive roles.

      (2) Incorporation of Related Work: We have expanded the Discussion to include a more comprehensive comparison with existing literature, specifically highlighting studies that have reported similar findings. For example, we now discuss the work by Metzger et al. (2006), which demonstrated similar firing patterns of IC neurons and correlated these with reward-related processes. This comparison helps contextualize our results and emphasizes the novel contributions our study makes to the field.

      We believe these revisions have significantly improved the quality of the Discussion by reducing unnecessary repetition and providing a more thorough engagement with the relevant literature. We are grateful for the reviewer's valuable feedback, which has helped us refine and strengthen the manuscript.

      Reviewer #2 (Public review):

      Summary:

      The inferior colliculus (IC) has been explored for its possible functions in behavioral tasks and has been suggested to play more important roles rather than simple sensory transmission. The authors revealed the climbing effect of neurons in IC during decision-making tasks, and tried to explore the reward effect in this condition.

      Strengths:

      Complex cognitive behaviors can be regarded as simple ideals of generating output based on information input, which depends on all kinds of input from sensory systems. The auditory system has hierarchic structures no less complex than those areas in charge of complex functions. Meanwhile, IC receives projections from higher areas, such as auditory cortex, which implies IC is involved in complex behaviors. Experiments in behavioral monkeys are always time-consuming works with hardship, and this will offer more approximate knowledge of how the human brain works.

      We greatly appreciate the reviewer's positive summary of our work and recognition of the effort involved in conducting experiments on behaving monkeys. We agree with the reviewer that the inferior colliculus (IC) plays a significant role beyond mere sensory transmission, particularly in integrating sensory inputs with higher cognitive functions. Our study aims to shed light on these complex functions by revealing the climbing effect of IC neurons during decision-making tasks and exploring how reward influences this dynamic.

      We are encouraged that the reviewer acknowledges the importance of investigating the IC's role within the broader framework of complex cognitive behaviors and appreciates the hierarchical nature of the auditory system. The reviewer's comments reinforce the value of our research in contributing to a more nuanced understanding of how the IC might contribute to sensory-cognitive integration.

      We thank the reviewer for highlighting the significance of using behavioral monkey models to approximate human brain function. We are hopeful that our findings will serve as a stepping stone for further research exploring the multifaceted roles of the IC in cognition and behavior.

      We will now proceed to address the specific concerns and suggestions provided by the reviewer in the following sections.

      Weaknesses:

      These findings are more about correlation but not causality of IC function in behaviors. And I have a few major concerns.

      We appreciate the reviewer’s concern regarding the reliance on correlational analyses in our study. We acknowledge the importance of distinguishing between correlation and causality. As detailed in our response to Question 3 from Reviewer #1, we recognize the limitations of relying on correlational data and the challenges of establishing direct causal links in electrophysiological studies involving behaving primates.

      We have taken steps to clarify this distinction throughout our manuscript. Specifically, we have revised the Results and Discussion sections to ensure that the findings are presented as correlational, not causal, and we have proposed future studies utilizing more direct manipulation techniques to assess causality. We hope these revisions adequately address your concerns.

      Comparing neurons' spike activities in different tests, a 'climbing effect' was found in the oddball paradigm. The effect is clearly related to training and learning process, but it still requires more exploration to rule out a few explanations. First, repeated white noise bursts with fixed inter-stimulus-interval of 0.6 seconds was presented, so that monkeys might remember the sounds by rhymes, which is some sort of learned auditory response. It is interesting to know monkeys' responses and neurons' activities if the inter-stimuli-interval is variable. Second, the task only asked monkeys to press one button and the reward ratio (the ratio of correct response trials) was around 78% (based on the number from Line 302). so that, in the sessions with reward, monkeys had highly expected reward chances, does this expectation cause the climbing effect?

      We thank the reviewer for raising these insightful points regarding the 'climbing effect' observed in the oddball paradigm and its potential relationship with training, learning processes, and reward expectation. Below, we address each of the reviewer's specific concerns:

      (1) Inter-Stimulus Interval (ISI) and Rhythmic Auditory Response:

      The reviewer suggests that the fixed inter-stimulus interval (ISI) of 0.6 seconds might lead to a rhythmic auditory response, where monkeys could anticipate the sounds. We appreciate this perspective. However, we believe that rhythm is unlikely to play a significant role in the 'climbing effect' for the following reason: The 'climbing effect' starts from the second sound in the block (Fig.2D and Fig.3B), before any rhythm or pattern could be fully established, as a rhythm generally requires at least three repetitions to form. Unfortunately, we did not explore variable ISIs in the current study, so we cannot directly address this concern with the data at hand.

      (2) Reward Expectation and Climbing Effect:

      The reviewer raises an important concern about whether the 'climbing effect' could be influenced by the monkeys' high reward expectation, especially given the high reward ratio (~78%) in the sessions. While it is plausible that reward expectation could contribute to the observed increase in neuronal firing rates, we believe the results from our reward experiment (Fig. 4) suggest otherwise. In this experiment, even though reward expectation was likely formed due to the consistent pairing of sounds with rewards (100%), we did not observe a climbing effect in the auditory response. The presence of reward prediction error (Fig. 4D) further suggests that while the monkeys may form reward expectations, these expectations do not directly drive the climbing effect.

      To clarify this point, we have added sentences in the revised manuscript to explicitly discuss the relationship between reward expectation and the climbing effect, emphasizing that our findings indicate the climbing effect is not primarily due to reward expectation.

      We believe these revisions provide a clearer understanding of the factors contributing to the climbing effect and address the reviewer's concerns effectively. Thank you for these valuable suggestions.

      "Reward effect" on IC neurons' responses were showed in Fig. 4. Is this auditory response caused by physical reward action or not? In reward sessions, IC neurons have obvious response related to the onset of water reward. The electromagnetic valve is often used in water-rewarding system and will give out a loud click sound every time when the reward is triggered. IC neurons' responses may be simply caused by the click sound if the electromagnetic valve is used. It is important to find a way to rule out this simple possibility.

      We appreciate the reviewer’s concern regarding the potential confounding factor introduced by the electromagnetic valve’s click sound during water reward delivery, which could be misinterpreted as an auditory response rather than a response to the reward itself. Anticipating this possibility, we took measures to eliminate it by placing the electromagnetic valve outside the soundproof room where the neuronal recordings were performed.

      To address your concern more explicitly, we have added sentences in the Methods section of the revised manuscript detailing this setup, ensuring that readers are aware of the steps we took to eliminate this potential confound. By doing so, we believe that the observed reward-related neural activity in the IC is attributable to the reward processing itself rather than an auditory response to the valve click. We appreciate you bringing this important aspect to our attention, and we hope our clarification strengthens the interpretation of our findings.

      Reviewer #3 (Public review):

      Summary:

      The authors aimed to investigate the multifaceted roles of the Inferior Colliculus (IC) in auditory and cognitive processes in monkeys. Through extracellular recordings during a sound duration-based novelty detection task, the authors observed a "climbing effect" in neuronal firing rates, suggesting an enhanced response during sensory prediction. Observations of reward prediction errors within the IC further highlight its complex integration in both auditory and reward processing. Additionally, the study indicated IC neuronal activities could be involved in decision-making processes.

      Strengths:

      This study has the potential to significantly impact the field by challenging the traditional view of the IC as merely an auditory relay station and proposing a more integrative role in cognitive processing. The results provide valuable insights into the complex roles of the IC, particularly in sensory and cognitive integration, and could inspire further research into the cognitive functions of the IC.

      We appreciate the reviewer’s positive summary of our work and recognition of its potential impact on the field. We are pleased that the reviewer acknowledges the significance of our findings in challenging the traditional view of the Inferior Colliculus (IC) as merely an auditory relay station and in proposing its integrative role in cognitive processing.

      Our study indeed aims to provide new insights into the multifaceted roles of the IC, particularly in the context of sensory and cognitive integration. We believe that this research could pave the way for future studies that further explore the cognitive functions of the IC and its involvement in complex behavioral processes.

      We are encouraged by the reviewer’s positive assessment and are committed to continuing to refine our work in response to the constructive feedback provided. We hope that our findings will contribute to advancing the understanding of the IC’s role in the broader context of neuroscience.

      We will now proceed to address the specific concerns and suggestions provided by the reviewer in the following sections.

      Weaknesses:

      Major Comments:

      (1) Structural Clarity and Logic Flow:

      The manuscript investigates three intriguing functions of IC neurons: sensory prediction, reward prediction, and cognitive decision-making, each of which is a compelling topic. However, the logical flow of the manuscript is not clearly presented and needs to be well recognized. For instance, Figure 3 should be merged into Figure 2 to present population responses to the order of sounds, thereby focusing on sensory prediction. Given the current arrangement of results and figures, the title could be more aptly phrased as "Beyond Auditory Relay: Dissecting the Inferior Colliculus's Role in Sensory Prediction, Reward Prediction, and Cognitive Decision-Making."

      We appreciate the reviewer’s detailed feedback on the structural clarity and logical flow of the manuscript. We understand the importance of presenting our findings in a clear and cohesive manner, especially when addressing multiple complex topics such as sensory prediction, reward prediction, and cognitive decision-making.

      To address the reviewer's concerns, we have made the following revisions:

      (1) Reorganization of Figures and Results:

      We agree with the suggestion to merge Figure 3 into Figure 2. By doing so, we can present the population responses to the order of sounds more effectively, thereby streamlining the focus on sensory prediction. This will allow readers to more easily follow the progression of the results related to this key function of the IC.

      We have reorganized the Results section to ensure a smoother transition between the different aspects of IC function that we are investigating. The new structure will better guide the reader through the narrative, aligning with the themes of sensory prediction, reward prediction, and cognitive decision-making.

      (2) Revised Title:

      In line with the reviewer's suggestion, we have revised the title to "Beyond Auditory Relay: Dissecting the Inferior Colliculus's Role in Sensory Prediction, Reward Prediction, and Cognitive Decision-Making." We believe this title more accurately reflects the scope and focus of our study, as it highlights the three core functions of the IC that we are investigating.

      (3) Improved Logic Flow:

      We have added introductory statements at the beginning of each section within the Results to clarify the rationale behind the experiments and the logical connections between them. This should help to improve the overall flow of the manuscript and make the progression of our findings more intuitive for readers.

      We believe these changes significantly enhance the clarity and logical structure of the manuscript, making it easier for readers to understand the sequence and importance of our findings. Thank you for your valuable suggestion, which has led to a more coherent and focused presentation of our work.

      (2) Clarification of Data Analysis:

      Key information regarding data analysis is dispersed throughout the results section, which can lead to confusion. Providing a more detailed and cohesive explanation of the experimental design would significantly enhance the interpretation of the findings. For instance, including a detailed timeline and reward information for the behavioral paradigms shown in Figures 1C and D would offer crucial context for the study. More importantly, clearly presenting the analysis temporal windows and providing comprehensive statistical analysis details would greatly improve reader comprehension.

      We appreciate the reviewer’s insightful comment regarding the need for clearer and more cohesive explanations of the data analysis and experimental design. We recognize that a well-structured presentation of this information is essential for the reader to fully understand and interpret our findings. To address this, we have made the following revisions:

      (1) Detailed Explanation of Experimental Design:

      We have included a more detailed explanation of the experimental design, particularly for the behavioral paradigms shown in Figures 1C and 1D. This includes a comprehensive timeline of the experiments, along with explicit information about the reward structure and timing. By providing this context upfront, we aim to give readers a clearer understanding of the conditions under which the neuronal recordings were obtained.

      (2) Cohesive Presentation of Data Analysis:

      Key information regarding data analysis, which was previously dispersed throughout the Results section, has been consolidated and moved to a dedicated subsection within the Methods. This subsection now provides a step-by-step description of the analysis process, including the temporal windows used for examining neuronal activity, as well as the specific statistical methods employed.

      We have also ensured that the temporal windows used for different analyses (e.g., onset window, late window, etc.) are clearly defined and consistently referenced throughout the manuscript. This will help readers track the use of these windows across different figures and analyses.

      (3) Enhanced Statistical Analysis Details:

      We have expanded the description of the statistical analyses performed in the study, including the rationale behind the choice of tests, the criteria for significance, and any corrections for multiple comparisons. These details are now presented in a clear and accessible format within the Methods section, with relevant information also highlighted in the Result section or the figure legends to facilitate understanding.

      We believe these changes will significantly improve the clarity and comprehensibility of the manuscript, allowing readers to better follow the experimental design, data analysis, and the conclusions drawn from our findings. Thank you for this valuable feedback, which has helped us to enhance the rigor and transparency of our presentation.

      (3) Reward Prediction Analysis:

      The conclusion regarding the IC's role in reward prediction is underdeveloped. While the manuscript presents evidence that IC neurons can encode reward prediction, this is only demonstrated with two example neurons in Figure 6. A more comprehensive analysis of the relationship between IC neuronal activity and reward prediction is necessary. Providing population-level data would significantly strengthen the findings concerning the IC's complex functionalities. Additionally, the discussion of reward prediction in lines 437-445, which describes IC neuron responses in control experiments, does not sufficiently demonstrate that IC neurons can encode reward expectations. It would be valuable to include the responses of IC neurons during trials with incorrect key presses or no key presses to better illustrate this point.

      We deeply appreciate the detailed feedback provided regarding the conclusions on the inferior colliculus (IC)'s role in reward prediction within our manuscript. We acknowledge the importance of a robust and comprehensive presentation of our findings, particularly when discussing complex neural functionalities.

      In response to the reviewers' concerns, we have made the following revisions to strengthen our manuscript:

      (1) Inclusion of Population-Level Data for IC Neurons:

      In the revised manuscript, we have included population-level results for IC neurons in a supplementary figure. Initially, we focused on two example neurons that did not exhibit motor-related responses to key presses to isolate reward-related signals. However, most IC neurons exhibit motor responses during key presses (as indicated in Fig.7), which can complicate distinguishing between reward-related activity and motor responses. This complexity is why we initially presented neurons without motor responses. To clarify this point, we have added sentences in the Results section to explain the rationale behind our selection of neurons and to address the potential overlap between motor and reward responses in the IC.

      (2) Addition of Data on Key Press Errors and No-Response Trials:

      In response to the reviewer’s suggestion, we have demonstrated Peri-Stimulus Time Histograms (PSTHs) for two example neurons during error trials as below, including incorrect key presses and no-response trials. Given that the monkeys performed the task with high accuracy, the number of error trials is relatively small, especially for the control condition (as shown in the top row of the figure). While we remain cautious in drawing definitive conclusions from this limited trials, we observed that no clear reward signals were detected during the corresponding window (typically centered around 150 ms after the end of the sound). It is important to note that the experiment was initially designed to explore decision-making signals in the IC, rather than focusing specifically on reward processing. However, the data in Fig. 6 demonstrated intriguing signals of reward prediction error, which is why we believe it is important to present them.

      When combined with the results from our reward experiment (Fig. 5), we believe these findings provide compelling evidence of reward prediction errors being processed by IC neurons. Additionally, we observed that the reward prediction error in the IC appears to be signed, meaning that IC neurons showed robust responses to unexpected rewards but not to unexpected no-reward scenarios. However, the sign of the reward prediction error should be explored in greater depth with specifically designed experiments in future studies.

      Author response image 1.

      (A) PSTH of the neuron from Figure 6a during a key press trial under control condition. The number in the parentheses in the legend represents the number of trials for control condition. (B) PSTHs of the neuron from Figure 6a during non-key press trials under experimental conditions. The numbers in the parentheses in the legend represent the number of trials for experimental conditions. (C-D) Equivalent PSTHs as in A-B but from the neuron in Figure 6b.

      We are grateful for the reviewer's insightful suggestions, which have allowed us to improve the depth and rigor of our analysis. We believe these revisions significantly enhance our manuscript's conclusions regarding the complex functionalities of IC.

    1. Reviewer #2 (Public Review):

      Patterns scored into or painted on durable media have long been considered important markers of the cognitive capabilities of hominins. More specifically, the association of such markers with Homo sapiens has been used to argue that our evolutionary success was in part shaped by our unique ability to code, store and convey information through abstract conventions.

      That singularity of association has been cast into doubt in the last decade with finds of designs apparently painted or carved by Neanderthals, and potentially by even earlier hominins. Even allowing for these developments, however, extending the capability to generate putatively abstract designs to a relatively small-brained hominin like Homo naledi is contentious. The evidential bar for such claims is necessarily high, and I don't believe that it has been cleared here.

      The central issue is that the engravings themselves are not dated. As the authors themselves note, the minimum age constraint provided by U/Th on flowstone does not necessarily relate to the last occupation of the Dinaledi cave system, as the earlier ESR age on teeth does not necessarily document first use of the cave. The authors state that "At present we have no evidence limiting the time period across which H. naledi was active in the cave system". On those grounds though, assigning the age range of presently dated material within the cave system to the engravings - as the current title unambiguously does - is not justifiable.

      Because we don't know when they were made, the association between the engravings and Homo naledi rests on the assertion that no humans entered and made alterations to the cave system between its last occupation by Homo naledi, and its recent scientific recording. This is argued on page 6 with the statement that "No physical or cultural evidence of any other hominin population occurs within this part of the cave system".

      There is an important contrast between the quotes I have referred to in the last two paragraphs. In the earlier quote, the absence of evidence for Homo naledi in the cave system >335 ka and <241 ka is not considered evidence for their absence before or after these ages. Just because we have no evidence that Homo naledi was in the cave at 200 ka doesn't mean they weren't there, which is an argument I think most archaeologists would accept. When it comes to other kinds of humans, though - per the latter quote - the opposite approach is taken. Specifically, the present lack of physical evidence of more recent humans in the cave is considered evidence that no such humans visited the cave until its exploration by cavers 40 years ago. I don't think many archaeologists would consider that argument compelling. I can see why the authors would be drawn to make that assertion, but an absence of evidence cannot be used to argue in one way for use of the cave by Homo naledi and in another way for use of the cave by all other humans.

      A second problem is with what Homo naledi might have made engravings. The authors state that "The lines appear to have been made by repeatedly and carefully passing a pointed or sharp lithic fragment or tool into the grooves". The authors then describe one rock with superficial similarities to a flake from the more recent site of Blombos to suggest that sharp-edge stones with which to make the engravings were available to Homo naledi. Blombos is considered relevant here presumably because it has evidence for Middle Stone Age engravings. The authors do not, however, demonstrate any usewear on that stone object such as might be expected if it was used to carve dolomite. Given that it is presented as the only such find in the cave system so far, this seems important.

      My greater concern is that the authors did not compare the profile morphology of the Dinaledi engravings with the extensive literature on the morphology of scored lines caused by sharp-edge stone implements (e.g., Braun et al. 2016, Pante et al. 2017). I appreciate that the research group is reticent to undertake any invasive work until necessary, but non-destructive techniques could have been used to produce profiles with which to test the proposition that the engravings were made with a sharp edge stone.

      One thing I noticed in this respect is that the engravings seem very wide, both in absolute terms and relative to their depth. The data I collected from the Middle Stone Age engraved ochre from Klein Kliphuis suggested average line widths typically around 0.1-0.2 mm (Mackay and Welz 2008). The engraved lines at Dinaledi appear to be much wider, perhaps 2-5 mm. This doesn't discount the possibility that the engravings in the Dinaledi system were carved with a sharp edge stone - the range of outcomes for such engravings in soft rock can be quite variable (Hodgskiss 2010) - only that detailed analysis should precede rather than follow any assertion about their mode of formation.

      None of this is to say that the arguments mounted here are wrong. It should be considered possible that Homo naledi made the engravings in the Dinaledi cave system. The problem is that other explanations are not precluded.

      As an example, the western end of the Dinaledi subsystem has a particular geometry to the intersection of its passages, with three dominant orientations, one vertical (which is to say, north-south), and two diagonal (Figure 1). The major lines on Panel A have one repeated vertical orientation and two repeated diagonal orientations (Figure 16), particularly in the upper area not impacted by stromatolites. The lines in both the cave system and engravings in Panel A appear to intersect at similar angles. Several of the cave features appear, superficially at least, to be replicated. In fact, scaled, rotated, and super-imposed, Figure 16 is a plausible 'mud map' of the western end of the Dinaledi system carved incrementally by people exploring the caves. A figure showing this is included here:

      Of course, there are problems with this suggestion. The choice of the upper part of Panel A is selective, the similarity is superficial, and the scales are not necessarily comparable. (Note, btw, that all of those caveats hold equally well for the comparison the authors make between the unmodified rock from Dinaledi and the flake from Blombos in Figure 19). However, the point is that such a 'mud map hypothesis' is, as with the arguments mounted in this paper, both plausible and hard to prove.

      Having read this paper a few times, I am intrigued by the engravings in the Dinaledi system and look forward to learning more about them as this research unfolds. Based on the evidence presently available, however, I feel that we have no robust grounds for asserting when these engravings were made, by whom they were made, or for what reason they were made.

      References:

      • Braun, D. R., et al. (2016). "Cut marks on bone surfaces: influences on variation in the form of traces of ancient behaviour." Interface Focus 6: 20160006.

      • Hodgskiss, T. (2010). "Identifying grinding, scoring and rubbing use-wear on experimental ochre pieces." Journal of Archaeological Science 37: 3344-3358.

      • Mackay, A. & A. Welz (2008). "Engraved ochre from a Middle Stone Age context at Klein Kliphuis in the Western Cape of South Africa." Journal of Archaeological Science 35: 1521-1532.

      • Pante, M. C., et al. (2017). "A new high-resolution 3-D quantitative method for identifying bone surface modifications with implications for the Early Stone Age archaeological record." J Hum Evol 102: 1-11.

    1. Author Response:

      Points from reviewer 1 (Public Review):

      In this manuscript, Yong and colleagues link perturbations in lysosomal lipid metabolism with the generation of protein aggregates resulting from proteosome inhibition.

      We apologize for any confusion in the explanation of the results. We found that both proteasome inhibition and, independently, perturbations to lysosomal lipid metabolism lead to accumulation of protein aggregates in the lysosome. There was no evidence of proteasome inhibition in the context of lysosomal lipid perturbations (Figure 4J).

      Despite using various tools of lysosomal function, acidity, permeability, etc, the authors couldn't identify the link between lysosomal lipid metabolism and protein aggregate formation.

      Indeed, despite testing numerous mechanistic hypotheses, we have yet to explain how perturbation of lysosomal lipid metabolism causes protein aggregates. However, we have demonstrated that lipids are both necessary (via epistasis and serum delipidation) and sufficient (media supplementation) to drive these phenotypes.

      Although this work is interesting and thought-provoking, their approach to identify novel pathways involved in proteostasis is limited and this weakens the contribution of the paper in its current form.

      We are glad the reviewer found the work to be thought-provoking. As a fundamental cellular process critical for longevity, we agree that the connections made here between lipids, lysosomes and protein aggregates are interesting and broaden the impact of cellular health on proteostasis. Though we have falsified multiple hypotheses for how perturbation of lysosomal lipid metabolism could influence protein aggregation, we agree that a major weakness of the current work is our limited mechanistic understanding of this process. We hope that by engaging the thoughtful and creative eLife readership, novel mechanistic hypotheses will emerge.

      Points from reviewer 2 (Public Review):

      This might be too much of an ask, but they should go further in excluding one very attractive alternative model: effects on proteasome activity. This explanation should be addressed definitively because the transcription factor that regulates proteasome subunit gene expression (Nrf1/NFE2L1) is processed in the ER and is therefore well placed to be influenced by membrane conditions, and because it is shown here that proteasome inhibition increase ProteoStat puncta.

      We appreciate the constructive suggestion to examine loss of proteasome expression as a relevant mechanism linking cellular dyslipidemia with proteostasis impairment. We analyzed the genome-wide perturb-seq data from Replogle et al. [1], which was performed in K562 cells cultured under similar conditions to our screen. As expected, perturbation of Nrf1/NFE2L1 reduced expression of proteasome subunits, whereas perturbation of proteasome subunits that increased proteostat staining (e.g. PSMD2, PSMD13) homeostatically increased expression of multiple proteasome subunits. In contrast, other top hits, including those related to lipid-related perturbations (e.g. MYLIP, PSAP) did not reduce the expression of genes encoding the proteasome (Author response image 1).

      Author response image 1.

      The relative expression of genes encoding proteasomal subunits for representative genes was re-plotted from genome-wide perturb-seq data in K562 cells [1]. Shown are hit genes that increase Proteostat staining along with non-targeting controls and the positive control gene NFE2L1. Proteasome expression was induced by proteasome impairment (PSMD2 and PSMD13) and repressed by NFE2L1 knockdown. Other hit genes related to lipid metabolism and lysosome function did not consistently impact the expression of proteasome subunits.

      The authors address proteasome activity only by using a dye that is not referenced. Here a much more solid answer is needed.

      We thank Reviewer #2 for bringing to our attention the missing reference for the proteasome activity probe we used (Me4BodipyFL-Ahx3Leu3VS). Both this probe [2] and its close derivative [3], BodipyFL-Ahx3Leu3VS, were fully characterized previously. We’ll include these references in the revision. In our hands, this probe behaved as expected under MG132 and Bortezomib treatment when quantified by flow cytometry (Fig. 4I), and by in-blot fluorescence scan (data will be included as supplementary in the revision). We further observed that HMGCR KD increased proteasome activity, consistent with what’s suggested by current literature. This validated our use of this probe and strongly suggested that proteasome activity was not perturbed by impaired lipid homeostasis.

      In general, most conclusions in the paper rely essentially solely on ProteoStat assays. The entire study would be greatly strengthened if the authors incorporated biochemical or other modalities to substantiate their results.

      We agree that orthogonal characterization of proteostasis impairment would be valuable. We chose the ProteoStat stain as a reporter of proteostasis because it is capable of integrating the aggregation states of multiple endogenously expressed proteins, and in the absence of exogenous stressors such as the overexpression of aggregation-prone proteins. With aging, a context where ProteoStat staining increases, hundreds of proteins exhibit reduced solubility [4], thus motivating the focus on endogenously expressed proteins. Despite the biochemical limitations, we think our work is differentiated from published screens focused on specific metastable proteins by our focus on regulators of endogenous proteostasis.

      The presentation would be improved greatly if the authors provided diagrams illustrating the pathways implicated in their results, as well as their models.

      We thank Reviewer #2 for the helpful suggestion. We have provided the suggested diagrams below (Author response image 2).

      Author response image 2.

      Mechanistic models linking screen hits to accrual of lysosomal protein aggregates, related to Figure 4. Perturbations that increased cholesterol and sphingolipid levels were evaluated for effects on lysosomal pH, lysosomal proteolytic capacity, lysosomal membrane permeability, lipid peroxidation and proteasome activity. None of these mechanisms appear to play a causal role in protein aggregation in response to elevated lipids.

      Author Response References

      1. Replogle, J. M. et al. Mapping information-rich genotype-phenotype landscapes with genome-scale Perturb-seq. Cell 185, 2559-2575.e28 (2022).

      2. Berkers, C. R. et al. Probing the Specificity and Activity Profiles of the Proteasome Inhibitors Bortezomib and Delanzomib. Mol Pharmaceut 9, 1126–1135 (2012).

      3. Berkers, C. R. et al. Profiling Proteasome Activity in Tissue with Fluorescent Probes. Mol. Pharmaceutics 4, 739–748 (2007).

      4. David, D. C. et al. Widespread Protein Aggregation as an Inherent Part of Aging in C. elegans. Plos Biol 8, e1000450 (2010).

    1. Author Response

      We would like to thank the reviewers for providing constructive feedback on the manuscript. To address the weaknesses identified, we are performing additional experiments and generating additional data, to be added to the updated manuscript.

      (1) The utility of a pipeline depends on the generalization properties.

      While the proposed pipeline seems to work for the data the authors acquired, it is unclear if this pipeline will actually generalize to novel data sets possibly recorded by a different microscope (e.g. different brand), or different imagining conditions (e.g. illumination or different imagining artifacts) or even to different brain regions or animal species, etc.

      The authors provide a 'black-box' approach that might work well for their particular data sets and image acquisition settings but it is left unclear how this pipeline is actually widely applicable to other conditions as such data is not provided.

      In my experience, without well-defined image pre-processing steps and without training on a wide range of image conditions pipelines typically require significant retraining, which in turn requires generating sufficient amounts of training data, partly defying the purpose of the pipeline. It is unclear from the manuscript, how well this pipeline will perform on novel data possibly recorded by a different lab or with a different microscope.

      To address generalizability, we are performing several validation experiments with data from different 1) channels, 2) species (rat), and 3) microscopes, to highlight the robustness of our deep learning (DL) segmentation model to out-of-distribution data with different characteristics and acquisition protocols. We first used our model to segment three images (507x507 x&y, 250-170 um z) from three C57BL/6 mice acquired on the same two-photon fluorescent microscope following the same imaging protocol. The vasculature was labelled with the Texas Red dextran, as in the current experiment. In place of the EYFP signal from pyramidal neurons (2nd channel), gaussian noise was generated with a mean and standard deviation identical to the acquired vascular channel. A second set of two images(507x507 x&y, 300-400 um z) from two Fischer rats with Alexa680-dextran label in the plasma; these rats were imaged on the same two-photon fluorescence microscope, but with galvano scanners (instead of resonant scanners). A second channel of random Gaussian noise was also added here. Finally, an image of vasculature from a ex-vivo cleared mouse brain (1665x1205x780 um) imaged on a light sheet fluorescence microscope (Miltenyi UltraMicroscope Blaze) was also segmented with our model. Lectin-DyLight 649 was used to label the vasculature in this cohort. The Dice Score, Precision, Recall, Hausdorff 95%, and Mean surface distance will be reported for all of these additional image segmentations, upon generation of ground truth images. Finally, examples of the generated segmentation masks are presented in Author response image 1 for visual comparison. Of final note, should the segmentation results on a new data set be unsatisfactory, the methods downstream from segmentation are still applicable and the model can be further fine-tuned on other out-of-distribution data.

      Author response image 1.

      Examples of the deep learning model output on out of distribution data from a different mouse strain, from a different species (Fischer rat), and on a different microscope using a different imaging modality.

      (2) Some of the chosen analysis results seem to not fully match the shown data, or the visualization of the data is hard to interpret in the current form.

      We are updating the visualizations to make them more accessible and we will ensure matching between tables and figures.

      (3) Additionally, some measures seem not fully adapted to the current situation (e.g. the efficiency measure does not consider possible sources or sinks). Thus, some additional analysis work might be required to account for this.

      Thank you for your comment. The efficiency metric was selected as it does not consider sources or sinks. We do agree that accounting for vessel subtypes in the analysis (thus classifying larger vessels as either supplying or draining) would be uniquely useful: notwithstanding, it is extremely laborious. We are therefore leveraging machine learning in a parallel project to afford vessel classification by subtype. The source/sink analysis is also confounded by the small field-of-view of in situ 2PFM. Future work will investigate network remodelling across the whole brain with ex-vivo light sheet fluorescence microscopy.

      (4) The authors apply their method to in vivo data. However, there are some weaknesses in the design that make it hard to accept many of the conclusions and even to see that the method could yield much useful data with this type of application. Primarily, the acquisition of a large volume of tissue is very slow. In order to obtain a network of vascular activity, large volumes are imaged with high resolution. However, the volumes are scanned once every 42 seconds following stimulation. Most vascular responses to neuronal activation have come and gone in 42 seconds so each vessel segment is only being sampled at a single time point in the vascular response. So all of the data on diameter changes are impossible to compare since some vessels are sampled during the initial phase of the vascular response, some during the decay, and many probably after it has already returned to baseline. The authors attempt to overcome this by alternating the direction of the scan (from surface to deep and vice versa). But this only provides two sample points along the vascular response curve and so the problem still remains.

      We thank the Reviewer for bringing up this important point.

      Although vessels can show relatively rapid responses to perturbation, vascular responses to photostimulation of ChannelRhodopsin-2 in neighbouring neurons are typically long lasting: they do not come and go in 42 seconds. To demonstrate this point, we acquired higher temporal-resolution images of smaller volumes of tissue over 5 minutes preceding and following the 5-s photoactivation with the original parameters. Imaging protocol was different in that we utilized a piezoelectric motor, smaller field of view, and only 3x frame averaging, resulting in a temporal resolution of 1.57-2.63 seconds. This acquisition was repeated at 4 different cortical depths (325 um, 250 um, 150um, and 40 um) in a single mouse.The vascular radii were estimated using our presented pipeline. Raw data and LOESS fits are shown in Author response image 2 (below). Vessels shorter than 20 um in length were excluded from the analysis. A video of one of the acquisitions is shown along with the timecourses of select vessels’ caliber changes in Author response image 3. The vascular caliber changes following photostimulation persisted for several minutes, consistent with earlier observations by us and others1–4. These higher temporal-resolution scans of smaller tissue volumes will be repeated in two more mice; we will therein assess the repeatability of individual vessel responses to repeated stimulations.

      Author response image 2.

      A. The vascular radii of multiple vessels were imaged at 4 different cortical depths, each within a 507 x (75-150) x (30-45)um tissue volume. Baseline scanning lasted for 5 minutes, followed by 5 seconds of blue or green light stimulation at 4.3 mW/mm2, and culminating in 5 minutes of post-stimulation scanning. B. LOESS fits of the vessel radius estimates for each vessel segment identified.

      Author response image 3.

      Estimated vascular radius at each timepoint for select vessels from the imaging stack shown in the following video: https://flip.com/s/kB1eTwYzwMJE

      (5) A second problem is the use of optogenetic stimulation to activate the tissue. First, it has been shown that blue light itself can increase blood flow (Rungta et al 2017). The authors note the concern about temperature increases but that is not the same issue. The discussion mentions that non-transgenic mice were used to control for this with "data not shown". This is very important data given these earlier reports that have found such effects and so should be included.

      We will update the manuscript to incorporate the data on volumetric scanning in nontransgenic C57BL/6 mice undergoing blue light stimulation, with identical parameters as those used in Thy-ChR2 mice. As before, responders were identified as vessels that following blue light stimulation show a radius change greater than 2 standard deviations of their baseline radius standard deviation: their estimated radii changes are shown in Author response image 4 below. There were no statistical difference between radii distributions of any of the photostimulation conditions and pre-photostimulation baseline. A comparison of this with the transgenic THY1-ChR2-EYFP mice will be included in manuscript updates.

      Author response image 4.

      Radius change measurements for responding vessels from the Thy1-ChR2 mice described in the manuscript (top row) vs. 4 wild-type C57BL6/J mice (bottom row). Response to photostimulation was defined as a change above twice their baseline standard deviation. 458nm light was applied at 1.1 mW/mm^2 and 4.3 mW/mm^2; while 552 nm light was applied at 4.3 mW/mm^2. No statistically significant differences were observed between the radii distributions in any condition, Wilcoxon test, Bonferroni correction.

      (6) Secondly, there doesn't seem to be any monitoring of neural activity following the photo-stimulation. The authors repeatedly mention "activated" neurons and claim that vessel properties change based on distance from "activated" neurons. But I can't find anything to suggest that they know which neurons were active versus just labeled. Third, the stimulation laser is focused at a single depth plane. Since it is single-photon excitation, there is likely a large volume of activated neurons. But there is no way of knowing the spatial arrangement of neural activity and so again, including this as a factor in the analysis of vascular responses seems unjustified.

      Given the high fidelity of Channel-Rhodpsin2 activation with blue light, we assume that all labeled neurons within the volume of photostimulation are being activated. Depending on their respective connectivities, their postsynaptic neurons (whether or not they are labelled) are also activated. We indeed agree with the reviewer that the spatial distribution of neuronal activation is not well defined. We will revise the manuscript to update the terminology from activated to labeled neurons and stress in the Discussion that the motivation for assessing the distance to the closest labelled neuron as one of our metrics is purely to demonstrate the possibility of linking vascular response to activations in some of their neighbouring neurons and including morphological metrics in the computational pipeline. Of final note, the depth-dependence of the distance between labelled neurons and responding vessels can also readily be assessed using our computational pipeline.

      (7) The study could also benefit from more clear illustration of the quality of the model's output. It is hard to tell from static images of 3-D volumes how accurate the vessel segmentation is. Perhaps some videos going through the volume with the masks overlaid would provide some clarity. Also, a comparison to commercial vessel segmentation programs would be useful in addition to benchmarking to the ground truth manual data.

      We generated a video demonstrating the deep-learning model outputs and have made the video available here: https://flip.com/s/_XBs4yVxisNs Additional videos will be uploaded.

      (8) Another useful metric for the model's success would be the reproducibility of the vessel responses. Seeing such a large number of vessels showing constrictions raises some flags and so showing that the model pulled out the same response from the same vessels across multiple repetitions would make such data easier to accept.

      We have generated a figure demonstrating the repeatability of the vascular responses following photoactivation in a volume, and presented them next to the corresponding raw acquisitions for visual inspection. It is important to note that there is a significant biological variability in vessels’ responses to repeated stimulation, as described previously 2,5. Constrictions have been reported in the literature by our group and others 1,3,4,6,7, though their prevalence has not been systematically studied to date. Concerning the reproducibility of our analysis, we will demonstrate model reproducibility (as a metric of its success) in the updated manuscript.

      Author response image 5.

      Registered acquisitions of the vasculature before and after optogenetic stimulation for 5 scan pairs over 3 different stimulation conditions. The estimated radii along vessel segments are presented.

      Author response image 6.

      Sample capillaries constrictions from maximum intensity projections at repeated timepoints following optogenetic stimulation. Baseline (pre-stimulation) image is shown on the left and the post-stimulation image, on the right, with the estimated radius changes listed to the left.

      (9) A number of findings are questionable, at least in part due to these design properties. There are unrealistically large dilations and constrictions indicated. These are likely due to artifacts of the automated platform. Inspection of these results by eye would help understand what is going on.

      Some of the dilations were indeed large in magnitude. We present select examples of large dilations and constrictions ranging in magnitude from 2.08 to 10.80 um for visual inspection (for reference, average, across vessel and stimuli, magnitude of radius changes were 0.32 +/- 0.54 um). Diameter changes above 5 um were visually inspected.

      Author response image 7.

      Additional views of diameter changes in maximum intensity projections ranging in magnitude from 2.08 um to 10.80 um.

      (10) In Figure 6, there doesn't seem to be much correlation between vessels with large baseline level changes and vessels with large stimulus-evoked changes. It would be expected that large arteries would have a lot of variability in both conditions and veins much less. There is also not much within-vessel consistency. For instance, the third row shows what looks like a surface vessel constricting to stimulation but a branch coming off of it dilating - this seems biologically unrealistic.

      We now plot photostimulation-elicited vesselwise radius changes vs. their corresponding baseline radius standard deviations (Author response image 8 below). The Pearson correlation between the baseline standard deviation and the radius change was 0.08 (p<1e-5) for 552nm 4.3 mW/mm^2 stimulation, -0.08 (p<1e-5) for 458nm 1.1 mW/mm^2 stimulation, and -0.04 (p<1e-5) for 458nm 4.3 mW/mm^2 stimulation. For non-control (i.e. blue) photostimulation conditions, the change in the radius is thus negatively correlated to the vessel’s baseline radius standard deviation. The within-vessel consistency is explicitly evaluated in Figure 8 of the manuscript. As for the instance of a surface vessel constricting while a downstream vessel dilates, it is important to remember that the 2PFM FOV restricts us to imaging a very small portion of the cortical microvascular network (one (among many) daughter vessels showing changes in the opposite direction to the parent vessel is not violating the conservation of mass).

      Author response image 8.

      A plot of the vessel radius change elicited by photostimulation vs. baseline radius standard deviation.

      (11) As mentioned, the large proportion of constricting capillaries is not something found in the literature. Do these happen at a certain time point following the stimulation? Did the same vessel segments show dilation at times and constriction at other times? In fact, the overall proportion of dilators and constrictors is not given. Are they spatially clustered? The assortativity result implies that there is some clustering, and the theory of blood stealing by active tissue from inactive tissue is cited. However, this theory would imply a region where virtually all vessels are dilating and another region away from the active tissue with constrictions. Was anything that dramatic seen?

      The kinetics of the vascular responses are not accessible via the current imaging protocol and acquired data; however, this computational pipeline can readily be adapted to test hypotheses surrounding the temporal evolution of the vascular responses, as shown in Author response image 2 (with higher temporal-resolution data). Some vessels dilate at some time points and constrict at others as shown in Author response image 2. As listed in Table 2, 4.4% of all vessels constrict and 7.5% dilate for 452nm stimulation at 4.3 mW/mm^2. There was no obvious spatial clustering of dilators or constrictors: we expect such spatial patterns to more likely result from different modes of stimulation and/or in the presence of a pathology. The assortativity peaked at 0.4 (i.e. is quite far from 1 where each vessel’s response exactly matches that of its neighbour).

      (12) Why were nearly all vessels > 5um diameter not responding >2SD above baseline? Did they have highly variable baselines or small responses? Usually, bigger vessels respond strongly to local neural activity.

      In Author response image 9, we now present the stimulation-induced radius changes vs. baseline radius variability across vessels with a radius greater than 5 um. The Pearson correlation between the radius change and the baseline radius standard deviation was 0.04 (p=0.5) for 552nm 4.3 mW/mm^2 stimulation, -0.26 (p<1e-5) for 458nm 1.1 mW/mm^2 stimulation, and -0.24 (p<1e-5) for 458nm 4.3 mW/mm^2 stimulation. We will incorporate an additional analysis to address this issue by identifying responding vessels as those showing supra-threshold percent change in their radius (instead of SD).

      Author response image 9.

      A plot of the vessel radius change elicited by photostimulation vs. baseline radius standard deviation in vessels with a baseline radius greater than 5 um.

      References

      (1) Alarcon-Martinez L, Villafranca-Baughman D, Quintero H, et al. Interpericyte tunnelling nanotubes regulate neurovascular coupling. Nature. 2020;kir 2.1(7823):91-95. doi:10.1038/s41586-020-2589-x

      (2) Mester JR, Bazzigaluppi P, Weisspapir I, et al. In vivo neurovascular response to focused photoactivation of Channelrhodopsin-2. NeuroImage. 2019;192:135-144. doi:10.1016/j.neuroimage.2019.01.036

      (3) O’Herron PJ, Hartmann DA, Xie K, Kara P, Shih AY. 3D optogenetic control of arteriole diameter in vivo. Nelson MT, Calabrese RL, Nelson MT, Devor A, Rungta R, eds. eLife. 2022;11:e72802. doi:10.7554/eLife.72802

      (4) Hartmann DA, Berthiaume AA, Grant RI, et al. Brain capillary pericytes exert a substantial but slow influence on blood flow. Nat Neurosci. Published online February 18, 2021:1-13. doi:10.1038/s41593-020-00793-2

      (5) Mester JR, Bazzigaluppi P, Dorr A, et al. Attenuation of tonic inhibition prevents chronic neurovascular impairments in a Thy1-ChR2 mouse model of repeated, mild traumatic brain injury. Theranostics. 2021;11(16):7685-7699. doi:10.7150/thno.60190

      (6) Mester JR, Rozak MW, Dorr A, Goubran M, Sled JG, Stefanovic B. Network response of brain microvasculature to neuronal stimulation. NeuroImage. 2024;287:120512. doi:10.1016/j.neuroimage.2024.120512

      (7) Hall CN, Reynell C, Gesslein B, et al. Capillary pericytes regulate cerebral blood flow in health and disease. Nature. 2014;508(7494):55-60. doi:10.1038/nature13165

    1. Author Response

      We thank both the editors and the Reviewers for their thoughtful comments and recommendations, that will certainly help us improve the manuscript. Below we address in a brief format some of the comments made, and then outline the changes to the manuscript that we plan to implement in the revision.

      We see three interrelated issues in the comments of the Reviewers:

      • the length and complexity of the manuscript;

      • the link to previously proposed formalisms;

      • the impact of adopting the proposed information-theoretic framework.

      With regard to all of these issues, we would first like to highlight that the overall goal of our effort was to integrate con tributions to understanding the mechanisms underlying cognitive control across multiple different disciplines, using the information theoretic framework as a common formalism, while respecting and building on prior efforts as much as possible. Accordingly, we sought to be as explicit as possible about how we bridge from prior work using information theory, as well as neural networks and dynamical systems theory, which contributed to length of the original manuscript. While we continue to consider this an important goal, we will do our best to shorten and clarify the main exposition by reorganizing the manuscript as suggested by Reviewer #1 (i.e., in a way that is similar to what we did in our previous Nature Physics paper on multitasking). Specifically, we will move a substantially greater amount of the bridging material to the Supple mentary Information (SI), including the detailed discussion of the Stroop task, and the description of the link to Koechlin & Summerfield’s [L1] information theory formalism. We will also now include an outline of the full model at the beginning of the manuscript, that includes control and learning, and then more succinctly describe simplifications that focus on specific issues and applications in the remainder of the document.

      Along similar lines, we will revise and harmonize our presentation of the formalism and notations, to make these more consistent, clearer and more concise throughout the document. Again, some of the inconsistencies in notation arose from our initial description of previous work, and in particular that of Koechlin & Summerfield[L1] that was an important inspiration for our work but that used slightly different notations. An important motivation for our introduction of new notation was that their formulation focused on the performance of a single task at a time, whereas a primary goal of our work was to extend the information theoretic treatment to simultaneous performance of multiple tasks. That is, in focusing on single tasks, Koechlin & Summerfield could refer to a task simply as a direct association between stimuli and responses, whereas we required a way of being able to refer to sets of tasks performed at once (”multitasks”), which in turn required specification of internal pathways. Moreover, they do not provide a mechanism to compute the conditional information Q(a|s) of a response/action s conditioned to a stimulus s does not provide a way to compute it explicitly. Our formalism instead provides a way to explicitly unpack this expression in terms of the efficacies –automatic (Eq. 5) or controlled (Eq. 15)– which can also account for the competition between different stimuli {s1, s2, . . . sn}. It also describes explicitly the competition between multiple tasks (Eq. 18, and Eq. 25 for multiple layers), because different ways of processing schemes for the same combinations of stimuli/responses can incur different levels of internal dependencies and thus require different control strategies.

      To mitigate any confusion over terminology we will, as noted above, move a detailed discussion of Koechlin & Summer- field’s formulation, and how it maps to the one we present, to the SI, while taking care to introduce ours clearly at the beginning of the main document, and use it consistently throughout the remainder of the document. We will also make an important distinction – between informational and cognitive costs – more clearly, that we did not do adequately in the original manuscript.

      Finally, to more clearly and concretely convey what we consider to be the most important contributions, we will restrict the number of examples we present to ones that relate most directly to the central points (e.g., the effect and limits of control in the presence of interference, and the differences in control strategy under limited temporal horizons). Accompanying our revision, we will also provide a full point-by-point response to the comments and questions raised by the Reviewers. We summarize some the key points we will address below.

      PRELIMINARY REPLY TO THE REPORT OF REVIEWER #1

      We want to thank the Reviewer for the time and effort put into reviewing our paper and constructive feedback that was provided. We also thank the Reviewer for recognizing the need for a clear computational account of how ”control” manages conflicts by scheduling tasks to be executed in parallel versus serially, and for the positive evaluation on our “efforts of the authors to give these intuitions a more concrete computational grounding.”. As noted in the general reply above, we regret the lack of clarity in several parts of the manuscript and in our introduction and use of the formalism. We consider the following to be the main points to be addressed:

      • the role of task graphs and their mapping to standard neural architectures

      • the description of entropy and related information-theoretic concepts;

      • confusing choice of symbols in our notation between stimuli/responses and serialization/reconfiguration costs;

      • missing definition of response time;

      Regarding the first part point, we acknowledge that the network architectures we focus on do not draw direct inspiration from conventional machine learning models. Instead, our approach is rooted in the longstanding tradition of using (often simpler, but also more readily interpretable) neural network models to address human cognitive function and how this may be implemented in the brain [L2]; and, in particular, the mechanisms underlying cognitive control (e.g., [L3, L4]). In this context, we emphasize that, for analytical clarity, we deliberately abstract away from many biological details, in an effort to identify those principles of function that are most relevant to cognitive function. Nevertheless, our network architecture is inspired by two concepts that are central to neurobiological mechanisms of control: inhibition and gain modulation. Specifi- cally, we incorporate mutual inhibition among neural processing units, a feature represented by the parameter β. This aspect of our model is consistent with biologically inspired frameworks of neural processing, such as those discussed by Munakata et al. (2011)[L5], reflecting the competitive dynamics observed in neural circuits. Moreover, we introduce the parameter ν to represent a strictly modulatory form of control, akin to the role of neuromodulators in the brain. This modulatory control adjusts the sensitivity of a node to differences among its inputs (e.g., Servan-Schreiber, Printz, & Cohen, (1990)[L6]; Aston-Jones & Cohen (2005)[L7]). Finally, as the Reviewer notes, additional hidden layers can improve expressivity in neural networks, enabling the efficient implementation of more complex tasks, and are a universal feature of biological and artificial neural systems. We thus examined multitasking capability under the assumption that multiple hidden layers are present in a network; irrespective of whether they are needed to implement the corresponding tasks.

      Regarding the second point, as noted above, we believe that the confusion arose from our review of the work by Koechlin & Summerfield. In their formalism, in which an action a is chosen (from a set of potential actions) with probability p(a), the cost of choosing that action is − log p(a). This is usually referred to as the information content or, alternatively, the localized entropy [L8]. As the Reviewer correctly observed, the canonical (Shannon) entropy is actually the expectation lEa[− log p(a)] over the localized entropies of a set of actions. In summarizing their formulation, we misleadingly stated that ”they used standard Shannon entropy formalism as a measure of the information required to select the action a.” We will now correct this to state: “[..] they used local entropy (− log p(a)) as a measure of the information required to select the action a, that can be treated as the cost of choosing that action.” We follow this formulation in our own, referring to informational cost as Ψ, and generalizing this to include cases in which more than one action may be chosen to perform at a time.

      Regarding the third point, the confusion is due to our use of the letters S and R for both the stimulus and response units (in Sec. II.B) and then serialization and reconstruction costs (in eqs 31-33). We will fix this by renaming the serialization and reconstruction costs more explicitly as S er and Rec.

      Finally, we realized we never explicitly stated the expression of the response time we used, but only pointed to it in the literature. In the manuscript we used the expression given in Eq. 53 of [L9], which provides response times as function of the error rates ER and the number of options .

      PRELIMINARY REPLY TO THE REPORT OF REVIEWER #2

      We want to thank the Reviewer for recognizing our effort to ”rigorously synthesize ideas about multi-tasking within an information-theoretic framework” and its potential. We also thank the Reviewer for the careful comments.

      To our best understanding, and similarly to Reviewer #1, the main comments of the Reviewer are on:

      • the length and density of the paper;

      • the presentation of the Koechlin & Summerfield’s formalism, and the mismatch/lack of clarity of ours in certain points;

      • the added value of the information theoretic formalism.

      Regarding the first two points, which are common to Reviewer #1, we plan to move a significant part of the manuscript to the Supplementary Information, both to improve readability and make the manuscript shorter, as well as to provide one consistent and cleaner formalism (in particular with regards to the typos and errors highlighted by the Reviewer). In par- ticular, with respect to the comment on Eq. 4-5-6, we will clarify that the probability p[ fi j] is the probability that a certain input dimension (i in this case) is selected by on node j to produce its response (averaged over the individual inputs in each input dimension). We will also take care to make sure that the definition and domain of the various probabilities and probability distributions we use are clearly delineated (e.g. where the costs computed for tasks and task pathways come from).

      Regarding the third point, we hope that our work offers value in at least two ways: i) it helps bring unity to ideas and descriptions about the capacity constraints associated with cognitive control that have previously been articulated in different forms (viz., neural networks, dynamical systems, and statistical mechanical accounts); and ii) doing so within an information theoretic framework not only lends rigor and precision to the formulation, but also allows us to cast the allocation of control in normative form – that is, as an optimization problem in which the agent seeks to minimize costs while maximizing gains. While we do not address specific empirical phenomena or datasets in the present treatment, we have done our best to provide examples showing that: a) our information theoretic formulation aligns with treatments using other formalisms that have been used to address empirical phenomena (e.g., with neural network models of the Stroop task); and b) our formulation can be used as a framework for providing a normative approach to widely studied empirical phenomena (e.g., the transition from control-dependent to automatic processing during skill acquisition) that, to date, have been addressed largely from a descriptive perspective; and that it can provide a formally rigorous approach to addressing such phenomena.

      [L1] E. Koechlin and C. Summerfield, Trends in cognitive sciences 11, 229 (2007).

      [L2] J. L. McClelland, D. E. Rumelhart, P. R. Group, et al., Explorations in the Microstructure of Cognition 2, 216 (1986).

      [L3] J. D. Cohen, K. Dunbar, and J. L. McClelland, Psychological Review 97, 332 (1990).

      [L4] E. K. Miller and J. D. Cohen, Annual review of neuroscience 24, 167 (2001).

      [L5] Y. Munakata, S. A. Herd, C. H. Chatham, B. E. Depue, M. T. Banich, and R. C. O’Reilly, Trends in cognitive sciences 15, 453 (2011).

      [L6] D. Servan-Schreiber, H. Printz, and J. D. Cohen, Science 249, 892 (1990).

      [L7] G. Aston-Jones and J. D. Cohen, Annu. Rev. Neurosci. 28, 403 (2005).

      [L8] T. F. Varley, Plos one 19, e0297128 (2024).

      [L9] T. McMillen and P. Holmes, Journal of Mathematical Psychology 50, 30 (2006).

    1. Author response:

      We would like to thank the three reviewers for the careful review and thoughtful comments on our manuscript. In addition to providing useful suggestions, they uncovered some embarrassing oversights on our part, related to experimental details including number of embryos, and quantification of variance in the observed changes for some of the experiments, which were inadvertently omitted in the submission. We provide below an initial response to the reviewer’s public reviews and expect to submit a revised manuscript comprehensively addressing all their concerns.

      I would like to start by addressing some of their most critical comments related to validation of the tools used to reduce soxB1 gene family function in the embryo.  In the absence of the critical supplementary data that we inadvertently failed to include, the reviewers were left with an understandable, but we feel erroneous impression, that there was insufficient validation of mutant and knockdown tools. 

      Reviewer #2 says “The sox2y589 mutant line is not properly verified in this manuscript, which could be done by examining ant-Sox2 antibody labeling, Western blot analysis or…”

      This validation, which had been performed previously both with antibody staining and with western blot analysis, was inadvertently omitted from the supplementary data submitted with the paper. The western blot data is shown here.

      Author response image 1.

      Validation of sox2 mutant phenotype with Western blot.

      Lysates were prepared from 25 embryos selected as wild type or potentially mutant based on the “loss of L1” phenotype at 6 dpf. This polyclonal antibody recognizes within the last 16 amino acids of the C-terminal.

      Author response image 2.

      Validation of sox2 mutant phenotype with antibody staining.

      Though in this experiment there was considerable background in the red channel, and it shows the lateral line nerve, loss of nuclear Sox2 expression is evident in the deposited neuromast of an embryo identified as a mutant based on its delayed deposition of the L1 neuromast.

      This data and a repeat of the antibody staining showing the primordium with loss of Sox2 will be included in a revised manuscript.

      Furthermore, Reviewer #2 comments “the authors show that the anti-Sox2 and antiSox3 antibody labeling is reduced but not absent in sox2 MO1 and sox3 MO-injected embryos, but do not show antibody labeling of the sox2 MO and sox3 MO-double injected embryos to determine if there is an additional knockdown”

      This will be included in a revised manuscript.

      Reviewer #2:

      The authors acknowledge that the sox2 MO1 used in this manuscript also alters sox3 function, but do not redo the experiments with a specific sox2 MO

      This is not exactly true. Having discovered sox2 MO1 simultaneously reduces sox2 and sox3 function, three new morpholinos were obtained based on another paper (Kamachi et al 2008), which had quantitatively assessed efficacy of three sox2 specific morpholinos (sox2 MO2, sox2 MO3, and sox2 MO4). The effects of these morpholinos on the pattern of L1 deposition was compared to that of sox2 MO1. This comparison was shown in supplementary Figure 2 and is included below. It shows that the sox2 specific morpholinos resulted in a poorly penetrant delay in deposition of L1, comparable to that of a sox2 mutant, which was quantified in supplementary Figure 3B. The observations with these three sox2 specific morpholinos independently supported the observations made with the sox2 mutant that reduction of sox2 on its own results in a delay in deposition of the first neuromast with low penetrance and that to effectively examine the role of these SoxB1 genes in the primordium their function needs to be compromised in a combinatorial manner. A conclusion that was independently supported by observations made by crossing sox1a, sox2 and sox3 mutants (Figure 3 and Supplementary Figure 3). Therefore, even though the initial use of a sox2 morpholino, which simultaneously knocks down sox3, was unintentional, its use turned out to be useful. It allowed us to examine effects of knocking down sox2 and sox3 with a single morpholino. Furthermore, though this project was initiated more than 15 years ago to specifically understand sox2 function, our focus had shifted to understanding the role of soxB1 family members sox1a, sox2 and sox3 functioning together as an interacting system that regulates Wnt activity in the primordium. Considering this broader focus, reflected in the title of the paper, it was not a priority to repeat every experiment previously done with the sox2MO1 with the new sox2 specific morpholinos. Instead, having acknowledged the “limitations” of sox2MO1, we used it to better understand effects of combinatorial reduction of SoxB1 function.

      Reviewer #1:

      It is not exactly clear what underlies the apparent redundancy. It would be helpful if the soxb gene family member expression was reported after loss of each.

      As suggested by reviewer #1, we had previously looked changes in expression of each of the soxB1 factors following loss of individual soxB1 factors but not included it in the supplementary data with the original submission. Independent of a reproducible and consistent expansion sox1a expression into the trailing zone, following loss of sox2 function, which is reported in the paper and quantified here where 10/10 mutant embryos showed the expansion (compare region within bracket in WT and sox2<sup>-/-</sup>), no consistent changes in the expression of other soxB1 family members was observed as part of a mechanism that might account for compensation when function of a particular soxB1 factor is soxB1 factor is lost. The data shown above together with more extensive quantification of changes will be included in a revised version of the manuscript. At this time the only consistent change was the expansion of sox1a to the trailing zone when lost. The data trailing zone when sox2 function is lost. This change reflects dependence of sox1a on Wnt activity and the fact that Wnt activity expands into the trailing zone when sox2 function is lost.  

      Author response image 3.

      Reviewer #3:

      Given that the expression patterns of Sox1a and Sox3 are not merely different but are largely reciprocal, the mechanistic basis of their very similar double mutant phenotypes with Sox2 remains opaque.

      The simplest way to think about compensation for gene function in a network is to think of it being determined by expression of a homolog or another gene with a similar function being expressed in a similar or overlapping domain.  However, it is more useful to think of Sox2 function in the primordium as part of a interacting network of SoxB1 factors whose differential regulatory mechanisms create a robust system that simultaneously regulates two key aspects of Wnt activity in the primordium; how high Wnt activity is allowed to get in the leading zone and how effectively it is shut off to facilitate protoneuromast maturation in the trailing zone. These features of Wnt activity influence both when and where nascent protoneuromasts will form in the wake of a progressively shrinking Wnt system and where they undergo effective maturation and stabilization prior to deposition. Changes in individual SoxB1 expression patterns provide some hints about how some SoxB1 factors may compensate when function of one or more of these factors is compromised. However, a deeper understanding of robustness and “compensation” will require a systems level understanding of this gene regulatory network with computational models, which we are currently working on in our group. It remains possible, for example, that how far into the trailing zone the Wnt activity has an influence is regulated at least in part by how high it is allowed to get in the leading zone by sox1a. Conversely, how high Wnt activity gets in the leading zone may be influenced by how effectively it is shut off in the trailing zone by sox2 and sox3, as this influences the size of the Wnt system, which in turn can influence the overall level of Wnt activity. In this manner Sox1a may cooperate with Sox2 and Sox3 to limit both how high Wnt activity is allowed to get in the primordium and to effectively shut it off in the trailing zone.

      Reviewer #3:

      Related to this, the authors discuss that Sox1a/Sox2 double knockdown produces a more severe phenotype than Sox2/Sox3 double knockdown, yet this difference is not obviously reflected in the data.

      The severity of the sox1a/sox2 double mutant phenotype compared to that of the sox2/sox3 double mutant is shown in Figure 3 K and N, and quantified in Supplementary Figure 3A. Simultaneous loss of sox2 and sox3 results in a small but relatively penetrant delay in where the first stable neuromast is deposited (Figure 2 N). By contrast, loss of sox2 and sox1a together consistently results in a longer delay in deposition of the first stable (Figure 2 K). A new graph, shown below, which will be incorporated in the revised paper, shows that there is a significant difference in the pattern of L1 deposition in sox1a<sup>-/-</sup>, sox2<sup>-/-</sup> and sox2<sup>-/-</sup>, sox3<sup>-/-</sup> double mutants. 

      Author response image 4.

      All 3 datasets found to be normally distributed by Shapiro-Wilk test. 1-way ANOVA showed significance (<0.0001), with Tukey’s multiple comparisons test showing significant difference between all 3 conditions. (***p=0.0008, ****p<0.0001)

      Reviewer #1:

      It would be good to more clearly state why sox3 is not regulated by Wnt given its expression is inhibited by the delta TCF construct (Figure 2M).

      The explanation for why we believe sox3 expression is determined by Fgf signaling, and not Wnt activity requires integrating what is observed both with induction of the delta TCF construct and the dominant negative Fgf receptor (DN FgfR). Loss of sox3 expression with induced expression of the delta TCF construct could result from loss of Wnt activity or the downstream loss of Fgf activity, which is ultimately dependent on Fgfs secreted by Wnt active cells in the leading domain. Distinguishing between these possibilities is based on inhibition of FGF signaling with the DN FgfR, described in the next paragraph. Heat Shock induced expression of DN FgfR expression results in loss of FGF signaling and the simultaneous expansion of Wnt activity into the trailing zone. As explained in the original text, loss of sox3 expression in this context, rather than its expansion, suggests its expression is determined by Fgf signaling not Wnt activity. We will emphasize that its loss, rather than its expansion, following induction of DN FgfR, indicates its expression is determined by Fgf signaling not Wnt activity.

      Reviewer #2:

      The manuscript lacks quantification of many of the experiments, making it difficult to conclude their significance.

      One of the biggest inadvertent omissions of the paper was the inadequate quantification of some of the results. Quantification of results with considerable variation in the outcome, like the pattern of L1 deposition,  was provided following manipulations where various combinations of sox1a, sox2, and sox3 function was lost (Figures 3, supplementary Figures 2 and 3) or where sox2MO1/sox3MO was used with or without IWR (Figure 5 and Figure 6). However, numbers for the experiments in Figures 2 were omitted in the Figure legend, where typically about 10 embryos for each manipulation were photographed, scored, and a representative image was used to make the figure. In these experiments  there was a very consistent result with 100% of the embryos showing changes represented by each panel in Figure 2. The only exception was Figure 2Y where 9/10 embryos showed the described change. Similarly in Figure 4 there was a consistent result and 100% of embryos showed the change shown. Numbers and statistics for these results will be included in a revised manuscript.

      Reviewer #2:

      The statistical analysis in Figure 5 and Supplementary Figures 2 and 3 should be one-way ANOVA or Kruskal-Wallis with a Dunn's multiple comparisons test rather than pair-wise comparisons.

      The analysis has been re-done following the reviewer’s suggestions. The analysis confirms the primary conclusions of the original submission, and this analysis will be incorporated in a revised manuscript. However, to improve the power of the analysis, experiments with low numbers of embryos will be repeated.

      See redone graphs in Figure 5 and supplementary Figure 2 and 3.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In their manuscript entitled 'The domesticated transposon protein L1TD1 associates with its ancestor L1 ORF1p to promote LINE-1 retrotransposition', Kavaklıoğlu and colleagues delve into the role of L1TD1, an RNA binding protein (RBP) derived from a LINE1 transposon. L1TD1 proves crucial for maintaining pluripotency in embryonic stem cells and is linked to cancer progression in germ cell tumors, yet its precise molecular function remains elusive. Here, the authors uncover an intriguing interaction between L1TD1 and its ancestral LINE-1 retrotransposon.

      The authors delete the DNA methyltransferase DNMT1 in a haploid human cell line (HAP1), inducing widespread DNA hypo-methylation. This hypomethylation prompts abnormal expression of L1TD1. To scrutinize L1TD1's function in a DNMT1 knock-out setting, the authors create DNMT1/L1TD1 double knock-out cell lines (DKO). Curiously, while the loss of global DNA methylation doesn't impede proliferation, additional depletion of L1TD1 leads to DNA damage and apoptosis.

      To unravel the molecular mechanism underpinning L1TD1's protective role in the absence of DNA methylation, the authors dissect L1TD1 complexes in terms of protein and RNA composition. They unveil an association with the LINE-1 transposon protein L1-ORF1 and LINE-1 transcripts, among others.

      Surprisingly, the authors note fewer LINE-1 retro-transposition events in DKO cells than in DNMT1 KO alone.

      Strengths:

      The authors present compelling data suggesting the interplay of a transposon-derived human RNA binding protein with its ancestral transposable element. Their findings spur interesting questions for cancer types, where LINE1 and L1TD1 are aberrantly expressed.

      Weaknesses:

      Suggestions for refinement:

      The initial experiment, inducing global hypo-methylation by eliminating DNMT1 in HAP1 cells, is intriguing and warrants a more detailed description. How many genes experience misregulation or aberrant expression? What phenotypic changes occur in these cells?

      The transcriptome analysis of DNMT1 KO cells showed hundreds of deregulated genes upon DNMT1 ablation. As expected, the majority were up-regulated and gene ontology analysis revealed that among the strongest up-regulated genes were gene clusters with functions in “regulation of transcription from RNA polymerase II promoter” and “cell differentiation” and genes encoding proteins with KRAB domains. In addition, the de novo methyltransferases DNMT3A and DNMT3B were up-regulated in DNMT1 KO cells suggesting the set-up of compensatory mechanisms in these cells. We will include this data set in the revised version of the manuscript.

      Why did the authors focus on L1TD1? Providing some of this data would be helpful to understand the rationale behind the thorough analysis of L1TD1.

      We have previously discovered that conditional deletion of the maintenance DNA methyltransferase DNMT1 in the murine epidermis results not only in the up-regulation of mobile elements, such as IAPs but also the induced expression of L1TD1 ((Beck et al, 2021), Suppl. Table 1 and Author response image 1). Similary, L1TD1 expression was induced by treatment of primary human keratinocytes or squamous cell carcinoma cells with the DNMT inhibitor aza-deoxycytidine (Author response image 2 and 3). These finding are in accordance with the observation that inhibition of DNA methyltransferase activity by azadeoxycytidine in human non-small cell lung cancer cells (NSCLCs) results in upregulation of L1TD1 (Altenberger et al, 2017). Our interest in L1TD1 was further fueled by reports on a potential function of L1TD1 as prognostic tumor marker. We will include this information in the revised manuscript.

      Author response image 1.

      RT-qPCR of L1TD1 expression in cultured murine control and Dnmt1 Δ/Δker keratinocytes. mRNA levels of L1td1 were analyzed in keratinocytes isolated at P5 from conditional Dnmt1 knockout mice (Beck et al., 2021). Hprt expression was used for normalization of mRNA levels and wildtype control was set to 1. Data represent means ±s.d. with n=4. **P < 0.01 (paired t-test).

      Author response image 2.

      RT-qPCR analysis of L1TD1 expression in primary human keratinocytes. Cells were treated with 5-aza-2-deoxycidine for 24 hours or 48 hours, with PBS for 48 hours or were left untreated. 18S rRNA expression was used for normalization of mRNA levels and PBS control was set to 1. Data represent means ±s.d. with n=3. **P < 0.01 (paired t-test).

      Author response image 3.

      Induced L1TD1 expression upon DNMT inhibition in squamous cell carcinoma cell lines SCC9 and SCCO12. Cells were treated with 5-aza-2-deoxycidine for 24 hours, 48 hours or 6 days. (A) Western blot analysis of L1TD1 protein levels using beta-actin as loading control. (B) Indirect immunofluorescence microscopy analysis of L1TD1 expression in SCC9 cells. Nuclear DNA was stained with DAPI. Scale bar: 10 µm. (C) RT-qPCR analysis of L1TD1 expression in primary human keratinocytes. Cells were treated with 5-aza-2deoxycidine for 24 hours or 48 hours, with PBS for 48 hours or were left untreated. 18S rRNA expression was used for normalization of mRNA levels and PBS control was set to 1. Data represent means ±s.d. with n=3. P < 0.05, *P < 0.01 (paired t-test).

      The finding that L1TD1/DNMT1 DKO cells exhibit increased apoptosis and DNA damage but decreased L1 retro-transposition is unexpected. Considering the DNA damage associated with retro-transposition and the DNA damage and apoptosis observed in L1TD1/DNMT1 DKO cells, one would anticipate the opposite outcome. Could it be that the observation of fewer transposition-positive colonies stems from the demise of the most transposition-positive colonies? Further exploration of this phenomenon would be intriguing.

      This is an important point and we were aware of this potential problem. Therefore, we calibrated the retrotransposition assay by transfection with a blasticidin resistance gene vector to take into account potential differences in cell viability and blasticidin sensitivity. Thus, the observed reduction in L1 retrotransposition efficiency is not an indirect effect of reduced cell viability.

      Based on previous studies with hESCs, it is likely that, in addition to its role in retrotransposition, L1TD1 has additional functions in the regulation of cell proliferation and differentiation. L1TD1 might therefore attenuate the effect of DNMT1 loss in KO cells generating an intermediate phenotype (as pointed out by Reviewer 2) and simultaneous loss of both L1TD1 and DNMT1 results in more pronounced effects on cell viability.

      Reviewer #2 (Public Review):

      In this study, Kavaklıoğlu et al. investigated and presented evidence for the role of domesticated transposon protein L1TD1 in enabling its ancestral relative, L1 ORF1p, to retrotranspose in HAP1 human tumor cells. The authors provided insight into the molecular function of L1TD1 and shed some clarifying light on previous studies that showed somewhat contradictory outcomes surrounding L1TD1 expression. Here, L1TD1 expression was correlated with L1 activation in a hypomethylation-dependent manner, due to DNMT1 deletion in the HAP1 cell line. The authors then identified L1TD1-associated RNAs using RIP-Seq, which displays a disconnect between transcript and protein abundance (via Tandem Mass Tag multiplex mass spectrometry analysis). The one exception was for L1TD1 itself, which is consistent with a model in which the RNA transcripts associated with L1TD1 are not directly regulated at the translation level. Instead, the authors found the L1TD1 protein associated with L1-RNPs, and this interaction is associated with increased L1 retrotransposition, at least in the contexts of HAP1 cells. Overall, these results support a model in which L1TD1 is restrained by DNA methylation, but in the absence of this repressive mark, L1TD1 is expressed and collaborates with L1 ORF1p (either directly or through interaction with L1 RNA, which remains unclear based on current results), leads to enhances L1 retrotransposition. These results establish the feasibility of this relationship existing in vivo in either development, disease, or both.

    1. Author Response

      We are grateful for the constructive comments of the reviewers and for the succinct assessment of our work by the editors. Here we provide a brief summary of our response to answer the major criticism of our reviewers. We will give a detailed point-to-point response soon when we upload a revision of our paper.

      1) The MATLAB code for the spatial autocorrelation analysis is now freely available at the following site: : https://github.com/dcsabaCD225/Moran_Matlab/blob/main/moran_local.m If any question arises during its implementation, please contact Csaba Dávid (david.csaba@koki.hu)

      2) Concerning the computer resources and times required to perform Moran’s I image analysis, here we provide a brief description of the hardware and the calculations for images with different sizes.

      Hardware used for performing the analysis:

      Intel(R) Xeon(R) Silver 4112 CPU @ 2.60GHz, 2594 Mhz, 4 kernel CPU, 64GB RAM, NVIDIA GeForce GTX 1080 graphic card.

      MATLAB R2021b software was used for implementation.

      Computation times are shown in Author response table 1.

      Author response table 1.

      3) In response to the comment:

      “While the method's avoidance of AI training appeals to those lacking computational know-how and shows improved accuracy over basic threshold-based techniques, there are valid concerns regarding its performance in comparison to advanced methodologies”.

      Comparison of Moran’s I image analysis with AI based segmentations raises conceptual problems which will be addressed in detail in the revised version. Briefly, the basis of AI based analyses is that the ground truth is known and using a large teaching set AI learns to extract the relevant information for image segmentation. In several cases, however (like protein distribution in the membrane) the ground truth is not known and cannot be easily determined by any single observer. Defining spatial inhomogeneities in protein distribution, differentiating proteins involved vs not involved in clusters is highly subjective. Indeed, our analysis showed the 23 expert human observers varied hugely in establishing the boundaries of a protein cluster. As a consequence, establishing and using a teaching set would be highly contentious in these cases. In an average laboratory setting generating a teaching set using hundreds of images examined by two dozen people would not be impossible but not really plausible. The beauty of Moran’n I analysis is that it is able to extract the relevant signals from an image generated in different, often noisy condition using a simple algorithm that allows quantitative characterization and identification of changes in many biological and non-biological samples.

    1. Author response:

      We deeply appreciate the editors’ and reviewers’ invaluable time and effort. We would also like to extend our gratitude to eLife for its unwavering commitment to a transparent review and publication model. Below, we present our point-by-point responses to the comments.  

      Besides the WT allele, equivalent to the mouse TMEM173 gene, the human TMEM173 gene has two common alleles: the HAQ and AQ alleles carried by billions of people. The main conclusions and interpretation, summarized in the Title and Abstract, are (i) Different from the WT TMEM173 allele, the HAQ or AQ alleles are resistant to STING activation-induced cell death; (ii) STING residue 293 is critical for cell death; (iii) HAQ, AQ alleles are dominant to the SAVI allele; iv) One copy of the AQ allele rescues the SAVI disease in mice. We propose that STING research and STING-targeting immunotherapy should consider human TMEM173 heterogeneity. These interpretations and conclusions were based on Data and Logic. We welcome alternative, logical interpretations from our peers and potential collaborations to advance the human TMEM173 research.  

      Reviewer #1 (Public Review):

      Responses to Comment 1: We greatly appreciate Reviewer 1's insights. We will change the “lymphocytes” to “splenocytes” (line 134) as suggested. We respectfully disagree with Reviewer 1’s comments on TBK1 (lines 129 – 134). First, we used two different TBK1 inhibitors: BX795 and GSK8612. Second, because BX795 also inhibits PDK1, we used a PDK1 inhibitor GSK2334470; Third, both BX795 and GSK8612 completely inhibited diABZI-induced splenocyte cell death (Figure 1B). The logical conclusion is “TBK1 activation is required for STING-mediated mouse spleen cell death ex vivo”. (line 118). 

      This manuscript uncovers a significant aspect of the interplay between the common human TMEM173 alleles and the rare SAVI mutation (lines 23-26). Our discovery that the common human TMEM173 alleles are resistant to STING activation-induced cell death is a substantial finding. It further strengthens the argument that the HAQ and AQ alleles are functionally distinct from the WT allele 1-3. We wish to underscore the crucial message of this study-that 'STING research and STING-targeting immunotherapy should consider TMEM173 heterogeneity in humans' (line 37), which has been largely overlooked in current STING clinical trials 4.  

      Regarding STING-Cell death, as we stated in the Introduction (lines 62-79). (i) STING-mediated cell death is cell type-dependent 5-7 and type I IFNs-independent 5,7,8. (ii) The in vivo biological significance of STING-mediated cell death is not clear 7,8. (iii) The mechanisms of STING-Cell death remain controversial. Multiple cell death pathways, i.e., apoptosis, necroptosis, pyroptosis, ferroptosis, and PANoptosis, are proposed 7,9,10. SAVI patients (WT/SAVI) and mouse models had CD4 T cellpenia 8,11. SAVI/HAQ, SAVI/AQ restored T cells in mice. Thus, the manuscript provides some answers to the biological significance of STING-cell death. Next, splenocytes from Q293/Q293 mice are resistant to STING cell death. The logical conclusion is that the amino acid 293 is critical for STING cell death. How aa293 mediates this function needs future investigation. Similarly, how TBK1 mediates STING cell death, independent of type I IFNs and NFκB induction, needs future investigation.

      Responses to Comment 2: These are all very interesting questions that we will address in future studies. This manuscript, titled “The common TMEM173 HAQ, AQ alleles rescue CD4 T cellpenia, restore T-regs, and prevent SAVI (N153S) inflammatory disease in mice” does not focus on Q293 mice. We have been researching the common human TMEM173 alleles since 2011 from the discovery12 , mouse model1,3, human clinical trial2, and human genetics studies 3. This manuscript is another step towards understanding these common human TMEM173 alleles with the new discovery that HAQ, AQ are resistant to STING cell death. 

      Responses to Comment 3: We aim to address these worthy questions in future studies. In this manuscript, Figure 6 shows AQ/SAVI had more T-regs than HAQ/SAVI (lines 246 – 256). In our previous publication on HAQ, AQ knockin mice, we showed that AQ T-regs have more IL-10 and mitochondria activity than HAQ T-regs 3. We propose that increased IL-10+

      Tregs in AQ mice may contribute to an improved phenotype in AQ/SAVI compared to

      HAQ/SAVI. However, we are not excluding other contributions (e.g. metabolic difference) by the AQ allele. We will explore these possibilities in future research.   

      Responses to Comment 4: Figure 2 is necessary because it reveals the difference between mouse and human STING cell death. Figure 2A-2B showed that STING activation killed human CD4 T cells, but not human CD8 T cells or B cells. This observation is different from Figure 1A, where STING activation killed mouse CD4, CD8 T cells, and CD19 B cells, revealing the species-specific STING cell death responses. Regarding human CD8 T cells, as we stated in the Discussion (lines 318-320), human CD8 T cells (PBMC) are not as susceptible as the CD4 T cells to STING-induced cell death 8. We used lung lymphocytes that showed similar observations (Figure 2A). For Figure 2C, we used 2 WT/HAQ and 3 WT/WT individuals (lines 738-739). We generate HAQ, AQ THP-1 cells in STING-KO THP-1 cells (Invivogen,, cat no. thpd-kostg) (lines 740-741). 

      A recent study found that STING agonist SHR1032 induces cell death in STING-KO THP-1 cells expressing WT(R232) human STING 10 (line 182) independent of type I IFNs. SHR1032 suppressed THP1-STING-WT(R232) cell growth at GI50: 23 nM while in the parental THP1STING-HAQ cells, the GI50 of SHR1032 was >103 nM 10. Cytarabine was used as an internal control where SHR1032 killed more robustly than cytarabine in the THP1-STING-WT(R232) cells but much less efficiently than cytarabine in the THP-1-STING-HAQ cells 10.   

      This manuscript rigorously uses mouse splenocytes, human lung lymphocytes, THP-1 reconstituted with HAQ, AQ, and HAQ/SAVI, AQ/SAVI mice, to demonstrate that the common human HAQ, AQ alleles are resistant to STING cell death in vitro and in vivo.

      We agree with reviewer 1 that STING-mediated cell death mechanisms in myeloid and lymphoid cells may be different and likely contribute to the different mechanisms proposed in STING cell death research 7,9,10. Our study focuses on the in vivo mechanism of T cellpenia.  

      Responses to Comment 5: We stated in the Introduction that “AQ responds to CDNs and produce type I IFNs in vivo and in vitro 3,13,14 ”(line 94, 95). We reported that the AQ knock in mice responded to STING activation 3. We previously showed that there was a negative natural selection on the AQ allele in individuals outside of Africa 3. 28% of Africans are WT/AQ but only 0.6% East Asians are WT/AQ 3. Future research on the AQ allele will address this interesting question that may shed new mechanistic light on STING action.

      Responses to Comment 6: The comment here is similar to comment 3. In this manuscript, Figure 6 shows AQ/SAVI had more T-regs than HAQ/SAVI (lines 246 – 256). In our previous publication on HAQ, AQ knockin mice, we showed that AQ T-regs have more IL-10 and mitochondria activity than HAQ T-regs 3. We propose that increased IL-10+ Tregs in AQ mice may contribute to an improved phenotype in AQ/SAVI compared to HAQ/SAVI. However, we are not excluding other contributions (e.g. metabolic difference) by the AQ allele.

      Responses to Comment 7: Both radioresistant parenchymal and/or stromal cells and hematopoietic cells influence SAVI pathology in mice 15,16. Nevertheless, the lack of CD 4 T cells, including the anti-inflammatory T-regs, likely contributes to the inflammation in SAVI mice and patients. We characterized lung function, lung inflammation (Figure 4), lung neutrophils, and inflammatory monocyte infiltration (Figure S4). 

      Responses to Comment 8: Several publications have linked STING to HIV pathogenesis 17-22  (line 271). The manuscript studies STING activation-induced cell death. It is not stretching to ask, for example, does preventing STING cell death, without affecting type I IFNs production, restore CD4 T cell counts and improve care for AIDS patients?

      Reviewer #2 (Public Review):

      Response to Comment 1: Please see the Figure below for cell death by diABZI, DMXAA in Splenocytes from WT/WT, WT/HAQ, HAQ/SAVI, AQ/SAVI mice. The HAQ/SAVI and AQ/SAVI splenocytes showed similar partial resistance to STING activationinduced cell death. 

      Responses to Comment 2: We examined HAQ, AQ mouse splenocytes, HAQ human lung lymphocytes, THP-1 reconstituted with HAQ, AQ, and HAQ/SAVI, AQ/SAVI mice, to demonstrate that the common human HAQ, AQ alleles are resistant to STING cell death in vitro and in vivo. Additional human T cell line work does not add too much. 

      Responses to Comment 3: This is possibly a misunderstanding. We use BMDM for the purpose of comparing STING signaling (TBK1, IRF3, NFκB, STING activation) by WT/SAVI, HAQ/SAVI, AQ/SAVI. Ideally, we would like to compare STING signaling in CD4 T cells from WT/SAVI to HAQ/SAVI, AQ/SAVI mice. However, WT/SAVI has no CD4 T cells. Here, we are making the assumption that the basic STING signaling (TBK1, IRF3, NFκB, STING activation) is conserved between T cells and macrophages. 

      Responses to Comment 4: Reviewer 2 suggests looking for evidence of inflammation and STING activation in the lungs of HAQ/SAVI, AQ/SAVI. We would like to elaborate further. First, anti-inflammatory treatments, e.g. steroids, DMARDs, IVIG, Etanercept, rituximab, Nifedipine, amlodipine, et al., all failed in SAVI patients 11. Second, Figure S4 examined lung neutrophils and inflammatory monocyte infiltration. Interestingly, while AQ/SAVI mice had a better lung function than HAQ/SAVI mice (Figure 4D, 4E vs 4H, 4I), HAQ/SAVI and AQ/SAVI lungs had comparable neutrophils and inflammatory monocyte infiltration. Last, SAVI is classified as type I interferonopathy 11, but the lung diseases of SAVI are mainly independent of type I IFNs 23-26. The AQ allele suppresses SAVI in vivo.  Understanding the mechanisms by which AQ rescues SAVI can generate curative care for SAVI patients.  

      Author response image 1.

      (A-B). Flow cytometry of HAQ/SAVI, AQ/SAVI, WT/WT or WT/HAQ splenocytes treated with diABZI (100ng/ml) or DMXAA (20µg/ml) for 24hrs. Cell death was determined by PI staining. Data are representative of three independent experiments. Graphs represent the mean with error bars indication s.e.m. p values are determined by one-way ANOVA Tukey’s multiple comparison test. * p<0.05. n.s: not significant.

      References.

      (1)             Patel, S. et al. The Common R71H-G230A-R293Q Human TMEM173 Is a Null Allele. J Immunol 198, 776-787 (2017). 

      (2)             Sebastian, M. et al. Obesity and STING1 genotype associate with 23-valent pneumococcal vaccination efficacy. JCI Insight 5 (2020). 

      (3)             Mansouri, S. et al. MPYS Modulates Fatty Acid Metabolism and Immune Tolerance at Homeostasis Independent of Type I IFNs. J Immunol 209, 2114-2132 (2022). 

      (4)             Sivick, K. E. et al. Comment on "The Common R71H-G230A-R293Q Human TMEM173 Is a Null Allele". J Immunol 198, 4183-4185 (2017). 

      (5)             Gulen, M. F. et al. Signalling strength determines proapoptotic functions of STING. Nat Commun 8, 427 (2017). 

      (6)             Kabelitz, D. et al. Signal strength of STING activation determines cytokine plasticity and cell death in human monocytes. Sci Rep 12, 17827 (2022). 

      (7)             Murthy, A. M. V., Robinson, N. & Kumar, S. Crosstalk between cGAS-STING signaling and cell death. Cell Death Differ 27, 2989-3003 (2020). 

      (8)             Kuhl, N. et al. STING agonism turns human T cells into interferon-producing cells but impedes their functionality. EMBO Rep 24, e55536 (2023). 

      (9)             Li, C., Liu, J., Hou, W., Kang, R. & Tang, D. STING1 Promotes Ferroptosis Through MFN1/2-Dependent Mitochondrial Fusion. Front Cell Dev Biol 9, 698679 (2021). 

      (10)         Song, C. et al. SHR1032, a novel STING agonist, stimulates anti-tumor immunity and directly induces AML apoptosis. Sci Rep 12, 8579 (2022). 

      (11)         Liu, Y. et al. Activated STING in a vascular and pulmonary syndrome. N Engl J Med 371, 507-518 (2014). 

      (12)         Jin, L. et al. Identification and characterization of a loss-of-function human MPYS variant. Genes Immun 12, 263-269 (2011). 

      (13)         Yi, G. et al. Single nucleotide polymorphisms of human STING can affect innate immune response to cyclic dinucleotides. PLoS One 8, e77846 (2013). 

      (14)         Patel, S. et al. Response to Comment on "The Common R71H-G230A-R293Q Human TMEM173 Is a Null Allele". J Immunol 198, 4185-4188 (2017). 

      (15)         Gao, K. M. et al. Endothelial cell expression of a STING gain-of-function mutation initiates pulmonary lymphocytic infiltration. Cell Rep 43, 114114 (2024). 

      (16)         Gao, K. M., Motwani, M., Tedder, T., Marshak-Rothstein, A. & Fitzgerald, K. A. Radioresistant cells initiate lymphocyte-dependent lung inflammation and IFNgammadependent mortality in STING gain-of-function mice. Proc Natl Acad Sci U S A 119, e2202327119 (2022). 

      (17)         Monroe, K. M. et al. IFI16 DNA sensor is required for death of lymphoid CD4 T cells abortively infected with HIV. Science 343, 428-432 (2014). 

      (18)         Doitsh, G. et al. Cell death by pyroptosis drives CD4 T-cell depletion in HIV-1 infection. Nature 505, 509-514 (2014). 

      (19)         Jakobsen, M. R., Olagnier, D. & Hiscott, J. Innate immune sensing of HIV-1 infection. Curr Opin HIV AIDS 10, 96-102 (2015). 

      (20)         Silvin, A. & Manel, N. Innate immune sensing of HIV infection. Curr Opin Immunol 32, 54-60 (2015). 

      (21)         Altfeld, M. & Gale, M., Jr. Innate immunity against HIV-1 infection. Nat Immunol 16, 554-562 (2015). 

      (22)         Krapp, C., Jonsson, K. & Jakobsen, M. R. STING dependent sensing - Does HIV actually care? Cytokine Growth Factor Rev 40, 68-76 (2018). 

      (23)         Luksch, H. et al. STING-associated lung disease in mice relies on T cells but not type I interferon. J Allergy Clin Immunol 144, 254-266 e258 (2019). 

      (24)         Stinson, W. A. et al. The IFN-gamma receptor promotes immune dysregulation and disease in STING gain-of-function mice. JCI Insight 7 (2022). 

      (25)         Warner, J. D. et al. STING-associated vasculopathy develops independently of IRF3 in mice. J Exp Med 214, 3279-3292 (2017). 

      (26)         Fremond, M. L. et al. Overview of STING-Associated Vasculopathy with Onset in Infancy (SAVI) Among 21 Patients. J Allergy Clin Immunol Pract 9, 803-818 e811 (2021).

    1. Author Response:

      Reviewer #1 (Public Review):

      Force sensing and gating mechanisms of the mechanically activated ion channels is an area of broad interest in the field of mechanotransduction. These channels perform important biological functions by converting mechanical force into electrical signals. To understand their underlying physiological processes, it is important to determine gating mechanisms, especially those mediated by lipids. The authors in this manuscript describe a mechanism for mechanically induced activation of TREK-1 (TWIK-related K+ channel. They propose that force induced disruption of ganglioside (GM1) and cholesterol causes relocation of TREK-1 associated with phospholipase D2 (PLD2) to 4,5-bisphosphate (PIP2) clusters, where PLD2 catalytic activity produces phosphatidic acid that can activate the channel. To test their hypothesis, they use dSTORM to measure TREK-1 and PLD2 colocalization with either GM1 or PIP2. They find that shear stress decreases TREK-1/PLD2 colocalization with GM1 and relocates to cluster with PIP2. These movements are affected by TREK-1 C-terminal or PLD2 mutations suggesting that the interaction is important for channel re-location. The authors then draw a correlation to cholesterol suggesting that TREK-1 movement is cholesterol dependent. It is important to note that this is not the only method of channel activation and that one not involving PLD2 also exists. Overall, the authors conclude that force is sensed by ordered lipids and PLD2 associates with TREK-1 to selectively gate the channel. Although the proposed mechanism is solid, some concerns remain.

      1) Most conclusions in the paper heavily depend on the dSTORM data. But the images provided lack resolution. This makes it difficult for the readers to assess the representative images.

      The images were provided are at 300 dpi. Perhaps the reviewer is referring to contrast in Figure 2? We are happy to increase the contrast or resolution.

      As a side note, we feel the main conclusion of the paper, mechanical activation of TREK-1 through PLD2, depended primarily on the electrophysiology in Figure 1b-c, not the dSTORM. But both complement each other.

      2) The experiments in Figure 6 are a bit puzzling. The entire premise of the paper is to establish gating mechanism of TREK-1 mediated by PLD2; however, the motivation behind using flies, which do not express TREK-1 is puzzling.

      The fly experiment shows that PLD mechanosensitivity is more evolutionarily conserved than TREK-1 mechanosensitivity. We should have made this clearer.

      -Figure 6B, the image is too blown out and looks over saturated. Unclear whether the resolution in subcellular localization is obvious or not.

      Figure 6B is a confocal image, it is not dSTORM. There is no dSTORM in Figure 6. This should have been made clear in the figure legend. For reference, only a few cells would fit in the field of view with dSTORM.

      -Figure 6C-D, the differences in activity threshold is 1 or less than 1g. Is this physiologically relevant? How does this compare to other conditions in flies that can affect mechanosensitivity, for example?

      Yes, 1g is physiologically relevant. It is almost the force needed to wake a fly from sleep (1.2-3.2g). See ref 33. Murphy Nature Pro. 2017.

      3) 70mOsm is a high degree of osmotic stress. How confident are the authors that a. cell health is maintained under this condition and b. this does indeed induce membrane stretch? For example, does this stimulation activate TREK-1?

      Yes, osmotic swell activates TREK1. This was shown in ref 19 (Patel et al 1998). We agree the 70 mOsm is a high degree of stress. This needs to be stated better in the paper.

      Reviewer #2 (Public Review):

      This manuscript by Petersen and colleagues investigates the mechanistic underpinnings of activation of the ion channel TREK-1 by mechanical inputs (fluid shear or membrane stretch) applied to cells. Using a combination of super-resolution microscopy, pair correlation analysis and electrophysiology, the authors show that the application of shear to a cell can lead to changes in the distribution of TREK-1 and the enzyme PhospholipaseD2 (PLD2), relative to lipid domains defined by either GM1 or PIP2. The activation of TREK-1 by mechanical stimuli was shown to be sensitized by the presence of PLD2, but not a catalytically dead xPLD2 mutant. In addition, the activity of PLD2 is increased when the molecule is more associated with PIP2, rather than GM1 defined lipid domains. The presented data do not exclude direct mechanical activation of TREK-1, rather suggest a modulation of TREK-1 activity, increasing sensitivity to mechanical inputs, through an inherent mechanosensitivity of PLD2 activity. The authors additionally claim that PLD2 can regulate transduction thresholds in vivo using Drosophila melanogaster behavioural assays. However, this section of the manuscript overstates the experimental findings, given that it is unclear how the disruption of PLD2 is leading to behavioural changes, given the lack of a TREK-1 homologue in this organism and the lack of supporting data on molecular function in the relevant cells.

      We agree, the downstream effectors of PLD2 mechanosensitivity are not known in the fly. Other anionic lipids have been shown to mediate pain see ref 46 and 47. We do not wish to make any claim beyond PLD2 being an in vivo contributor to a fly’s response to mechanical force.

      That said we do believe we have established a molecular function at the cellular level. We showed PLD is robustly mechanically activated in a cultured fly cell line (BG2-c2) Figure 6a of the manuscript. And our previous publication established mechanosensation of PLD (Petersen et. al. Nature Com 2016) through mechanical disruption of the lipids. At a minimum, the experiments show PLDs mechanosensitivity is evolutionarily better conserved across species than TREK1.

      This work will be of interest to the growing community of scientists investigating the myriad mechanisms that can tune mechanical sensitivity of cells, providing valuable insight into the role of functional PLD2 in sensitizing TREK-1 activation in response to mechanical inputs, in some cellular systems.

      The authors convincingly demonstrate that, post application of shear, an alteration in the distribution of TREK-1 and mPLD2 (in HEK293T cells) from being correlated with GM1 defined domains (no shear) to increased correlation with PIP2 defined membrane domains (post shear). These data were generated using super-resolution microscopy to visualise, at sub diffraction resolution, the localisation of labelled protein, compared to labelled lipids. The use of super-resolution imaging enabled the authors to visualise changes in cluster association that would not have been achievable with diffraction limited microscopy. However, the conclusion that this change in association reflects TREK-1 leaving one cluster and moving to another overinterprets these data, as the data were generated from static measurements of fixed cells, rather than dynamic measurements capturing molecular movements.

      When assessing molecular distribution of endogenous TREK-1 and PLD2, these molecules are described as "well correlated: in C2C12 cells" however it is challenging to assess what "well correlated" means, precisely in this context. This limitation is compounded by the conclusion that TREK-1 displayed little pair correlation with GM1 and the authors describe a "small amount of TREK-1 trafficked to PIP2". As such, these data may suggest that the findings outlined for HEK293T cells may be influenced by artefacts arising from overexpression.

      The changes in TREK-1 sensitivity to mechanical activation could also reflect changes in the amount of TREK-1 in the plasma membrane. The authors suggest that the presence of a leak currently accounts for the presence of TREK-1 in the plasma membrane, however they do not account for whether there are significant changes in the membrane localisation of the channel in the presence of mPLD2 versus xPLD2. The supplementary data provide some images of fluorescently labelled TREK-1 in cells, and the authors state that truncating the c-terminus has no effect on expression at the plasma membrane, however these data provide inadequate support for this conclusion. In addition, the data reporting the P50 should be noted with caution, given the lack of saturation of the current in response to the stimulus range.

      We thank the reviewer for his/her concern about expression levels. We did test TREK-1 expression. mPLD decreases TREK-1 expression ~two-fold (see Author response image 1). We did not include the mPLD data since TREK-1 was mechanically activated with mPLD. For expression to account for the loss of TREK-1 stretch current (Figure 1b), xPLD would need to block surface expression of TREK-1. The opposite was true, xPLD2 increased TREK-1 expression increased (see Figure S2c). Furthermore, we tested the leak current of TREK-1 at 0 mV and 0 mmHg of stretch. Basal leak current was no different with xPLD2 compared to endogenous PLD (Figure 1d; red vs grey bars respectively) suggesting TREK-1 is in the membrane and active when xPLD2 is present. If anything, the magnitude of the effect with xPLD would be larger if the expression levels were equal.

      Author response image 1.

      TREK expression at the plasma membrane. TREK-1 Fluorescence was measured by GFP at points along the plasma membrane. Over expression of mouse PLD2 (mPLD) decrease the amount of full-length TREK-1 (FL TREK) on the surface more than 2-fold compared to endogenously expressed PLD (enPLD) or truncated TREK (TREKtrunc) which is missing the PLD binding site in the C-terminus. Over expression of mPLD had no effect on TREKtrunc.

      Finally, by manipulating PLD2 in D. melanogaster, the authors show changes in behaviour when larvae are exposed to either mechanical or electrical inputs. The depletion of PLD2 is concluded to lead to a reduction in activation thresholds and to suggest an in vivo role for PA lipid signaling in setting thresholds for both mechanosensitivity and pain. However, while the data provided demonstrate convincing changes in behaviour and these changes could be explained by changes in transduction thresholds, these data only provide weak support for this specific conclusion. As the authors note, there is no TREK-1 in D. melanogaster, as such the reported findings could be accounted for by other explanations, not least including potential alterations in the activation threshold of Nav channels required for action potential generation. To conclude that the outcomes were in fact mediated by changes in mechanotransduction, the authors would need to demonstrate changes in receptor potential generation, rather than deriving conclusions from changes in behaviour that could arise from alterations in resting membrane potential, receptor potential generation or the activity of the voltage gated channels required for action potential generation.

      We are willing to restrict the conclusion about the fly behavior as the reviewers see fit. We have shown PLD is mechanosensitivity in a fly cell line, and when we knock out PLD from a fly, the animal exhibits a mechanosensation phenotype.

      This work provides further evidence of the astounding flexibility of mechanical sensing in cells. By outlining how mechanical activation of TREK-1 can be sensitised by mechanical regulation of PLD2 activity, the authors highlight a mechanism by which TREK-1 sensitivity could be regulated under distinct physiological conditions.

      Reviewer #3 (Public Review):

      The manuscript "Mechanical activation of TWIK-related potassium channel by nanoscopic movement and second messenger signaling" presents a new mechanism for the activation of TREK-1 channel. The mechanism suggests that TREK1 is activated by phosphatidic acids that are produced via a mechanosensitive motion of PLD2 to PIP2-enriched domains. Overall, I found the topic interesting, but several typos and unclarities reduced the readability of the manuscript. Additionally, I have several major concerns on the interpretation of the results. Therefore, the proposed mechanism is not fully supported by the presented data. Lastly, the mechanism is based on several previous studies from the Hansen lab, however, the novelty of the current manuscript is not clearly stated. For example, in the 2nd result section, the authors stated, "fluid shear causes PLD2 to move from cholesterol dependent GM1 clusters to PIP2 clusters and this activated the enzyme". However, this is also presented as a new finding in section 3 "Mechanism of PLD2 activation by shear."

      For PLD2 dependent TREK-1 activation. Overall, I found the results compelling. However, two key results are missing. 1. Does HEK cells have endogenous PLD2? If so, it's hard to claim that the authors can measure PLD2-independent TREK1 activation.

      Yes, there is endogenous PLD (enPLD). We calculated the relative expression of xPLD2 vs enPLD. xPLD2 is >10x more abundant (Fig. S3d of Pavel et al PNAS 2020, ref 14 of the current manuscript). Hence, as with anesthetic sensitivity, we expect the xPLD to out compete the endogenous PLD, which is what we see. This should have been described more carefully in this paper and the studies pointed out that establish this conclusion.

      1. Does the plasma membrane trafficking of TREK1 remain the same under different conditions (PLD2 overexpression, truncation)? From Figure S2, the truncated TREK1 seem to have very poor trafficking. The change of trafficking could significantly contribute to the interpretation of the data in Figure 1.

      If the PLD2 binding site is removed (TREK-1trunc), yes, the trafficking to the plasma membrane is unaffected by the expression of xPLD and mPLD (Figure R1 above). For full length TREK1 (FL-TREK-1), co-expression of mPLD decreases TREK expression (Figure R1) and co-expression with xPLD increases TREK expression (Figure S2). This is exactly opposite of what one would expect if surface expression accounted for the change in pressure currents. Hence, we conclude surface expression does not account for loss of TREK-1 mechanosensitivity with xPLD2.

      For shear-induced movement of TREK1 between nanodomains. The section is convincing, however I'm not an expert on super-resolution imaging. Also, it would be helpful to clarify whether the shear stress was maintained during fixation. If not, what is the time gap between reduced shear and the fixed state. lastly, it's unclear why shear flow changes the level of TREK1 and PIP2.

      Shear was maintained during the fixing. We do not know why shear changes PIP2 and TREK-1 levels. Presumably endocytosis and or release of other lipid modifying enzymes affect the system. The change in TREK-1 levels appears to be directly through an interaction with PLD as TREKtrunc is not affected by over expression of xPLD or mPLD.

      For the mechanism of PLD2 activation by shear. I found this section not convincing. Therefore, the question of how does PLD2 sense mechanical force on the membrane is not fully addressed. Particularly, it's hard to imagine an acute 25% decrease cholesterol level by shear - where did the cholesterol go? Details on the measurements of free cholesterol level is unclear and additional/alternative experiments are needed to prove the reduction in cholesterol by shear.

      The question “how does PLD2 sense mechanical force on the membrane” we addressed and published in Nature Comm. In 2016. The title of that paper is “Kinetic disruption of lipid rafts is a mechanosensor for phospholipase D” see ref 13 Petersen et. al. PLD is a soluble protein associated to the membrane through palmitoylation. There is no transmembrane domain, which narrows the possible mechanism of its mechanosensation to disruption.

      The Nature Comm. reviewer identified as “an expert in PLD signaling” wrote the following of our data and the proposed mechanism:

      "This is a provocative report that identifies several unique properties of phospholipase D2 (PLD2). It explains in a novel way some long established observations including that the enzyme is largely regulated by substrate presentation which fits nicely with the authors model of segregation of the two lipid raft domains (cholesterol ordered vs PIP2 containing). Although PLD has previously been reported to be involved in mechanosensory transduction processes (as cited by the authors) this is the first such report associating the enzyme with this type of signaling... It presents a novel model that is internally consistent with previous literature as well as the data shown in this manuscript. It suggests a new role for PLD2 as a force transduction tied to the physical structure of lipid rafts and uses parallel methods of disruption to test the predictions of their model."

      Regarding cholesterol. We use a fluorescent cholesterol oxidase assay which we described in the methods. This is an appropriate assay for determining cholesterol levels in a cell which we use routinely. We have published in multiple journals using this method, see references 28, 30, 31. Working out the metabolic fate of cholesterol after sheer is indeed interesting but well beyond the scope of this paper. Furthermore, we indirectly confirmed our finding using dSTORM cluster analysis (Figure 3d-e). The cluster analysis shows a decrease in GM1 cluster size consistent with our previous experiments where we chemically depleted cholesterol and saw a similar decrease in cluster size (see ref 13). All the data are internally consistent, and the cholesterol assay is properly done. We see no reason to reject the data.

      Importantly, there is no direct evidence for "shear thinning" of the membrane and the authors should avoid claiming shear thinning in the abstract and summary of the manuscript.

      We previously established a kinetic model for PLD2 activation see ref 13 (Petersen et al Nature Comm 2016). In that publication we discussed both entropy and heat as mechanisms of disruption. Here we controlled for heat which narrowed that model to entropy (i.e., shear thinning) (see Figure 3c). We provide an overall justification below. But this is a small refinement of our previous paper, and we prefer not to complicate the current paper. We believe the proper rheological term is shear thinning. The following justification, which is largely adapted from ref 13, could be added to the supplement if the reviewer wishes.

      Justification: To establish shear thinning in a biological membrane, we initially used a soluble enzyme that has no transmembrane domain, phospholipase D2 (PLD2). PLD2 is a soluble enzyme and associated with the membrane by palmitate, a saturated 16 carbon lipid attached to the enzyme. In the absence of a transmembrane domain, mechanisms of mechanosensation involving hydrophobic mismatch, tension, midplane bending, and curvature can largely be excluded. Rather the mechanism appears to be a change in fluidity (i.e., kinetic in nature). GM1 domains are ordered, and the palmate forms van der Waals bonds with the GM1 lipids. The bonds must be broken for PLD to no longer associate with GM1 lipids. We established this in our 2016 paper, ref 13. In that paper we called it a kinetic effect, however we did not experimentally distinguish enthalpy (heat) vs. entropy (order). Heat is Newtonian and entropy (i.e., shear thinning) is non-Newtonian. In the current study we paid closer attention to the heat and ruled it out (see Figure 3c and methods). We could propose a mechanism based on kinetic disruption, but we know the disruption is not due to melting of the lipids (enthalpy), which leaves shear thinning (entropy) as the plausible mechanism.

      The authors should also be aware that hypotonic shock is a very dirty assay for stretching the cell membrane. Often, there is only a transient increase in membrane tension, accompanied by many biochemical changes in the cells (including acidification, changes of concentration etc). Therefore, I would not consider this as definitive proof that PLD2 can be activated by stretching membrane.

      Comment noted. We trust the reviewer is correct. In 1998 osmotic shock was used to activate the channel. We only intended to show that the system is consistent with previous electrophysiologic experiments.

    1. Author response:

      Reviewer #1 (Public Review):

      Summary:

      For many years, there has been extensive electrophysiological research investigating the relationship between local field potential patterns and individual cell spike patterns in the hippocampus. In this study, using state-of-the-art imaging techniques, they examined spike synchrony of hippocampal cells during locomotion and immobility states. In contrast to conventional understanding of the hippocampus, the authors demonstrated that hippocampal place cells exhibit prominent synchronous spikes locked to theta oscillations.

      Strengths:

      The voltage imaging used in this study is a highly novel method that allows recording not only suprathreshold-level spikes but also subthreshold-level activity. With its high frame rate, it offers time resolution comparable to electrophysiological recordings. Moreover, it enables the visualization of actual cell locations, allowing for the examination of spatial properties (e.g., Figure 4G).

      We thank the reviewer for pointing out the technical novelty of this work.

      Weaknesses:

      There is a notable deviation from several observations obtained through conventional electrophysiological recordings. Particularly, as mentioned below in detail, the considerable differences in baseline firing rates and no observations of ripple-triggered firing patterns raise some concerns about potential artifacts from imaging and analysis, such as cell toxicity, abnormal excitability, and false detection of spikes. While these findings are intriguing if the validity of these methods is properly proven, accepting the current results as new insights is challenging.

      We appreciate the reviewer’s insightful comments regarding the intriguing aspect of our findings. Indeed, the emergence of a novel form of CA1 population synchrony presents exciting implications for hippocampal memory research and beyond.

      While we acknowledge the deviations from conventional electrophysiological recordings, we respectfully contend that these differences do not necessarily imply methodological flaws. All experiments and analyses were conducted with meticulous adherence to established standards in the field.

      Regarding the observed variations in averaging firing rates, it is important to note the well-documented heterogeneity in CA1 pyramidal neuron firing rates, spanning from 0.01 to 10 Hz, with a skewed distribution toward lower frequencies (Mizuseki et al., 2013). Our exclusion criteria for neurons with low estimated firing rates may have inadvertently biased the selection towards more active neurons. Moreover, prior research has indicated that averaging firing rates tend to increase during exposure to novel environments (Karlsson et al., 2008), and among deep-layer CA1 pyramidal neurons (Mizuseki et al., 2011). Given our recording setup in a highly novel environment and the predominance of deep CA1 pyramidal neurons in our sample, the observed higher averaging firing rates could be influenced by these factors. Considering these points, our mean firing rates (3.2 Hz) are reasonable estimations compared to previously reported values obtained from electrophysiological recordings (2.1 Hz in McHugh et al., 1996 and 2.4-2.6 Hz in Buzsaki et al., 2003).

      Regarding concerns about potential cell toxicity, previous studies have shown that Voltron expression and illumination do not significantly alter membrane resistance, membrane capacitance, resting membrane potentials, spike amplitudes, and spike width (see Abdelfattah 2019, Science, Supplementary Figure 11 and 12). In our recordings, imaged neurons exhibit preserved membrane and dendritic morphology during and after experiments (Author response image 1), supporting the absence of significant toxicity.

      Author response image 1.

      Voltron-expressing neurons exhibit preserved membrane and dendritic morphology. (A) Images of two-photon z-stack maximum intensity projection showing Voltron-expressing neurons taken after voltage image experiments in vivo. (B) Post-hoc histological images of neurons being voltage-imaged.

      Regarding spike detection, we use validated algorithms (Abdelfattah et al., 2019 and 2023) to ensure robust and reliable detection of spikes. Spiking activity was first separated from slower subthreshold potentials using high-pass filtering. This way, a slow fluorescence increase will not be detected as a spike, even if its amplitude is large. We benchmarked the detection algorithm in computer simulation. The sensitivity and specificity of the algorithm exceed 98% at the level of signal-to-noise ratio of our recordings. While we acknowledge that a small number of spikes, particularly those occurring later in a burst, might be missed due to their smaller amplitudes (as illustrated in Figure 1 and 2 of the manuscript), we anticipate that any missed spikes would lead to a decrease rather than an increase in synchrony between neurons. Overall, we are confident that spike detection is performed in a rigorous and robust manner.

      To further strengthen these points, we will include the following in the revision:

      (1) Histological images of recorded neurons during and after experiments.

      (2) Further details regarding the validation of spike detection algorithms.

      (3) Analysis of publicly available electrophysiological datasets.

      (4) Discussion regarding the reasons behind the novelty of some of our findings compared to previous observations.

      In conclusion, we assert that our experimental and analysis approach upholds rigorous standards. We remain committed to reconciling our findings with previous observations and welcome further scrutiny and engagement from the scientific community to explore the intriguing implications of our findings.

      Reviewer #2 (Public Review):

      Summary:

      This study employed voltage imaging in the CA1 region of the mouse hippocampus during the exploration of a novel environment. The authors report synchronous activity, involving almost half of the imaged neurons, occurred during periods of immobility. These events did not correlate with SWRs, but instead, occurred during theta oscillations and were phased-locked to the trough of theta. Moreover, pairs of neurons with high synchronization tended to display non-overlapping place fields, leading the authors to suggest these events may play a role in binding a distributed representation of the context.

      We thank the reviewer for a thorough and thoughtful review of our paper.

      Strengths:

      Technically this is an impressive study, using an emerging approach that allows single-cell resolution voltage imaging in animals, that while head-fixed, can move through a real environment. The paper is written clearly and suggests novel observations about population-level activity in CA1.

      We thank the reviewer for pointing out the technical strength and the novelty of our observations.

      Weaknesses:

      The evidence provided is weak, with the authors making surprising population-level claims based on a very sparse data set (5 data sets, each with less than 20 neurons simultaneously recorded) acquired with exciting, but less tested technology. Further, while the authors link these observations to the novelty of the context, both in the title and text, they do not include data from subsequent visits to support this. Detailed comments are below:

      We understand the reviewer’s concerns regarding the size of the dataset. Despite this limitation, it is important to note that synchronous ensembles beyond what could be expected from chance (jittering) were detected in all examined data. In the revision, we plan to add more data, including data from subsequent visits, to further strengthen our findings.

      (1) My first question for the authors, which is not addressed in the discussion, is why these events have not been observed in the countless extracellular recording experiments conducted in rodent CA1 during the exploration of novel environments. Those data sets often have 10x the neurons simultaneously recording compared to these present data, thus the highly synchronous firing should be very hard to miss. Ideally, the authors could confirm their claims via the analysis of publicly available electrophysiology data sets. Further, the claim of high extra-SWR synchrony is complicated by the observation that their recorded neurons fail to spike during the limited number of SWRs recorded during behavior- again, not agreeing with much of the previous electrophysiological recordings.

      We understand the reviewer’s concern. We will examine publicly available electrophysiology datasets to gain further insights into any similarities and differences to our findings. Based on these results, we will discuss why these events have not been previously observed/reported.

      (2) The authors posit that these events are linked to the novelty of the context, both in the text, as well as in the title and abstract. However, they do not include any imaging data from subsequent days to demonstrate the failure to see this synchrony in a familiar environment. If these data are available it would strengthen the proposed link to novelty if they were included.

      We thank the reviewer’s constructive suggestion. We will acquire more datasets from subsequent visits to gain further insights into these synchronous events.

      3) In the discussion the authors begin by speculating the theta present during these synchronous events may be slower type II or attentional theta. This can be supported by demonstrating a frequency shift in the theta recording during these events/immobility versus the theta recording during movement.

      We thank the reviewer’s constructive suggestion. We did demonstrate a frequency shift to a lower frequency in the synchrony-associated theta during immobility than during locomotion (see Fig. 4B, the red vs. blue curves). We will enlarge this panel and specifically refer to it in the corresponding discussion paragraph.

      (4) The authors mention in the discussion that they image deep-layer PCs in CA1, however, this is not mentioned in the text or methods. They should include data, such as imaging of a slice of a brain post-recording with immunohistochemistry for a layer-specific gene to support this.

      We thank the reviewer’s constructive suggestion. We do have images of brain slices post-recordings (Author response image 2). Imaged neurons are clearly located in the deep CA1 pyramidal layer. We will add these images and quantification in the revised manuscript.

      Author response image 2.

      Imaged neurons are located in the deep pyramidal layer of the dorsal hippocampal CA1 region.

      Reviewer #3 (Public Review):

      Summary:

      In the present manuscript, the authors use a few minutes of voltage imaging of CA1 pyramidal cells in head-fixed mice running on a track while local field potentials (LFPs) are recorded. The authors suggest that synchronous ensembles of neurons are differentially associated with different types of LFP patterns, theta and ripples. The experiments are flawed in that the LFP is not "local" but rather collected in the other side of the brain, and the investigation is flawed due to multiple problems with the point process analyses. The synchrony terminology refers to dozens of milliseconds as opposed to the millisecond timescale referred to in prior work, and the interpretations do not take into account theta phase locking as a simple alternative explanation.

      We genuinely appreciate the reviewer’s feedback and acknowledge the concerns raised. However, we believe these concerns can be effectively addressed without undermining the validity of our conclusions. With this in mind, we respectfully disagree with the assessment that our experiments and investigation are flawed. Please allow us to address these concerns and offer additional context to support the validity of our study.

      Weaknesses:

      The two main messages of the manuscript indicated in the title are not supported by the data. The title gives two messages that relate to CA1 pyramidal neurons in behaving head-fixed mice: (1) synchronous ensembles are associated with theta (2) synchronous ensembles are not associated with ripples.

      There are two main methodological problems with the work:

      (1) Experimentally, the theta and ripple signals were recorded using electrophysiology from the opposite hemisphere to the one in which the spiking was monitored. However, both signals exhibit profound differences as a function of location: theta phase changes with the precise location along the proximo-distal and dorso-ventral axes, and importantly, even reverses with depth. And ripples are often a local phenomenon - independent ripples occur within a fraction of a millimeter within the same hemisphere, let alone different hemispheres. Ripples are very sensitive to the precise depth - 100 micrometers up or down, and only a positive deflection/sharp wave is evident.

      We appreciate the reviewer’s consideration regarding the collection of LFP from the contralateral hemisphere. While we acknowledge the limitation of this design, we believe that our findings still offer valuable insights into the dynamics of synchronous ensembles. Despite potential variations in theta phases with recording locations and depth, we find that the occurrence and amplitudes of theta oscillations are generally coordinated across hemispheres (Buzsaki et al., Neurosci., 2003). Therefore, the presence of prominent contralateral LFP theta around the times of synchronous ensembles in our study (see Figure 4A of the manuscript) strongly supports our conclusion regarding their association with theta oscillations, despite the collection of LFP from the opposite hemisphere.

      In addition, in our manuscript, we specifically mentioned that the “preferred phases” varied from session to session, likely due to the variability of recording locations (see Line 254-256). Therefore, we think that the reviewer’s concern regarding theta phase variability has already been addressed in the present manuscript.

      Regarding ripple oscillations, while we recognize that they can sometimes occur locally, the majority of ripples occur synchronously in both hemispheres (up to 70%, see Szabo et al., Neuron, 2022; Buzsaki et al., Neurosci., 2003). Therefore, using contralateral LFP to infer ripple occurrence on the ipsilateral side has been a common practice in the field, employed by many studies published in respectable journals (Szabo et al., Neuron, 2022; Terada et al., Nature, 2021; Dudok et al., Neuron, 2021; Geiller et al., Neuron, 2020). Furthermore, our observation that 446 synchronous ensembles during immobility do not co-occur with contralateral ripples, and the remaining 313 ensembles during locomotion are not associated with ripples, as ripples rarely occur during locomotion. Therefore, our conclusion that synchronous ensembles are not associated with ripple oscillations is supported by data.

      (2) The analysis of the point process data (spike trains) is entirely flawed. There are many technical issues: complex spikes ("bursts") are not accounted for; differences in spike counts between the various conditions ("locomotion" and "immobility") are not accounted for; the pooling of multiple CCGs assumes independence, whereas even conditional independence cannot be assumed; etc.

      We acknowledge the reviewer’s concern regarding spike train analysis. Indeed, complex bursts or different behavioral conditions can lead to differences in spike counts that could potentially affect the detection of synchronous ensembles. However, our jittering procedure (see Line 121-132) is designed to control for the variation of spike counts. Importantly, while the jittered spike trains also contain the same spike count variations, we found 7.8-fold more synchronous events in our data compared to jitter controls (see Figure 1G of the manuscript), indicating that these factors cannot account for the observed synchrony.

      To explicitly demonstrate that complex bursts cannot account for the observed synchrony, we have performed additional analysis to remove all latter spikes in bursts and only count the single and the first spikes of bursts. Importantly, we found that this procedure did not change the rate and size of synchronous ensembles, nor did it significantly alter the grand-average CCG (see Author response image 3). The results of this analysis explicitly rule out a significant effect of complex spikes on the analysis of synchronous ensembles.

      Author response image 3.

      Population synchrony remains after the removal of spikes in bursts. (A) The grand-average cross correlogram (CCG) was calculated using spike trains without latter spikes in bursts. The gray line represents the mean grand average CCG between reference cells and randomly selected cells from different sessions. (B) Pairwise comparison of the event rates of population synchrony between spike trains containing all spikes and spike trains without latter spikes in bursts. Bar heights indicate group means (n=10 segments, p=0.036, Wilcoxon signed-rank test). (C) Histogram of the ensemble sizes as percentages of cells participating in the synchronous ensembles.

      Beyond those methodological issues, there are two main interpretational problems: (1) the "synchronous ensembles" may be completely consistent with phase locking to the intracellular theta (as even shown by the authors themselves in some of the supplementary figures).

      We agree with the reviewer that the synchronous ensembles are indeed consistent with theta phase locking. However, it is important to note that theta phase locking alone does not necessarily imply population synchrony. In fact, theta phase locking has been shown to “reduce” population synchrony in a previous study (Mizuseki et al., 2014, Phil. Trans. R. Soc. B.). Thus, the presence of theta phase locking cannot be taken as a simple alternative explanation of the synchronous ensembles.

      To directly assess the contribution of theta phase locking to synchronous ensembles, we have performed a new analysis to randomize the specific theta cycles in which neurons spike, while keeping the spike phases constant. This manipulation disrupts spike co-occurrence while preserving theta phase locking, allowing us to test whether theta phase locking alone can explain the population synchrony, or whether spike co-occurrence in specific cycles is required. The grand-average CCG shows a much smaller peak compared to the original peak (Author response image 4A). Moreover, synchronous event rates show a 4.5-fold decrease in the randomized data compared to the original event rates (Author response image 4B). Thus, the new analysis reveals theta phase locking alone cannot account for the population synchrony.

      Author response image 4.

      Drastic reduction of population synchrony by randomizing spikes to other theta cycles while preserving the phases. (A) The grand-average cross correlogram (CCG) was calculated using original spike trains (black) and randomized spike trains where theta phases of the spikes are kept the same but spike timings were randomly moved to other theta cycles (red). (B) Pairwise comparison of the event rates of population synchrony between the original spike trains and randomized spike trains (n=10 segments, p=0.002, Wilcoxon signed-rank test). Bar heights indicate group means. ** p<0.01

      (2) The definition of "synchrony" in the present work is very loose and refers to timescales of 20-30 ms. In previous literature that relates to synchrony of point processes, the timescales discussed are 1-2 ms, and longer timescales are referred to as the "baseline" which is actually removed (using smoothing, jittering, etc.).

      Regarding the timescale of synchronous ensembles, we acknowledge that it varies considerably across studies and cell types. However, it is important to note that a timescale of dozens, or even hundreds of milliseconds is common for synchrony terminology in CA1 pyramidal neurons (see Csicsvari et al., Neuron, 2000; Harris et al., Science, 2003; Malvache et al., Science, 2016; Yagi et al., Cell Reports, 2023). In fact, a timescale of 20-30 ms is considered particularly important for information transmission and storage in CA1, as it matches the membrane time constant of pyramidal neurons, the period of hippocampal gamma oscillations, and the time window for synaptic plasticity. Therefore, we believe that this timescale is relevant and in line with established practices in the field.