10,000 Matching Annotations
  1. Jan 2026
    1. eLife Assessment

      The current work uses DNA-tethered motor trapping to reduce vertical forces and improve datasets for kinesin-1 motility under load. The evidence is compelling and the significance is important to the kinesin field. Kinesin-1 is more robust and less prone to premature detachment than previously indicated. This represents a significant advancement in the field and is generally applicable to work with optical tweezers.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript by Hensley and Yildez studies the mechanical behavior of kinesin under conditions where the z-component of the applied force is minimized. This is accomplished by tethering the kinesin to the trapped bead with a long double stranded DNA segment as opposed to directly binding the kinesin to the large bead. It complements several recent studies that have used different approaches to looking at the mechanical properties of kinesin under low z-force loads. The study shows that much of the mechanical information gleaned from the traditional "one bead" with attached kinesin approach was probably profoundly influenced by the direction of the applied force. The authors speculate that when moving small vesicle cargos (particularly membrane bound ones) the direction of resisting force on the motor has much less of a z-component than might be experience if the motor were moving large organelles like mitochondria.

      Strengths:

      The approach is sound and provides an alternative method to examine the mechanics of kinesin under conditions where the z-component of the force is lessened. The data show that kinesin has very different mechanical properties compared to those extensively reported with using the "single-bead" assay where the molecule is directly coupled to a large bead which is then trapped.

      Weaknesses:

      The sub stoichiometry binding of kinesins to the multivalent DNA complicates the interpretation of the data.

      Comments on revisions:

      The authors have addressed my concerns.

    3. Reviewer #2 (Public review):

      This short report by Hensley and Yildiz explores kinesin-1 motility under more physiological load geometries than previous studies. Large Z-direction (or radial) forces are a consequence of certain optical trap experimental geometries, and likely do not occur in the cell. Use of a long DNA tether between the motor and the bead can alleviate Z-component forces. The authors perform three experiments. In the first, they use two assay geometries - one with kinesin attached directly to a bead and the other with kinesin attached via a 2 kbp DNA tether - with a constant-position trap to determine that reducing the Z component of force leads to a difference in stall time but not stall force. In the second, they use the same two assay geometries with a constant-force trap to replicate the asymmetric slip bond of kinesin-1; reducing the Z component of force leads to a small but uniform change in the run lengths and detachment rates under hindering forces but not assisting forces. In the third, they connect two or three kinesin molecules to each DNA, and measure a stronger scaling in stall force and time when the Z component of force is reduced. They conclude that kinesin-1 is a more robust motor than previously envisaged, where much of its weakness came from the application of axial force. If forces are instead along the direction of transport, kinesin can hold on longer and work well in teams. The experiments are rigorous, and the data quality is very high. There is little to critique or discuss. The improved dataset will be useful for modeling and understanding multi-motor transport. The conclusions complement other recent works that used different approaches to low-Z component kinesin force spectroscopy, and provide strong value to the kinesin field.

      Comments on revisions:

      The authors have satisfied all of my comments. I commend them on an excellent paper.

    4. Reviewer #3 (Public review):

      Hensley et al. present an important study into the force-detachment behaviour of kinesin-1, using a newly adapted methodological approach. This new method of DNA-tethered motor trapping is effective in reducing vertical forces and can be easily optimised for other motors and protein characterisation. The major strength of the paper is characterising kinesin-1 under low z-forces, which is likely to reflect the physiological scenario. They find kinesin-1 is more robust and less prone to premature detachment. The motors exhibit higher stall rates and times. Under hindering and assisting loads, kinesin-1 detachment is more asymmetric and sensitive, and with low z-force shows that slip-behaviour kinetics prevail. Another achievement of this paper is the demonstration of the multi-motor kinesin-1 assay using their low-z force method, showing that multiple kinesin-1 motors are capable of generating higher forces (up to 15 pN, and nearly proportional to motor number), thus opening an avenue to study multiple motor coordination. Overall, the data have been collected in a rigorous manner, the new technique is sound and effective, and results presented are compelling.

    5. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Recommendations for the authors):

      (1) My primary concern is that in some of the studies, there are not enough data points to be totally convincing. This is particularly apparent in the low z-force condition of Figure 1C.

      We agree that adequate sampling is essential for drawing robust conclusions. To address this concern, we performed a post hoc sensitivity analysis to assess the statistical power of our dataset. Given our sample sizes (N = 85 and 45) and observed variability, the experiment had 80% power (α = 0.05) to detect a difference in stall force of approximately 0.36 pN (Cohen’s d ≈ 0.38). The actual difference observed between conditions was 0.25 pN (d ≈ 0.26), which lies below the minimum detectable effect size. Thus, the non-significant result (p = 0.16) likely reflects that any true difference, if present, is smaller than the experimental sensitivity, rather than a lack of sufficient sampling.

      Importantly, both measured stall forces fall within the reported range for kinesin-1 in the literature, supporting that the dataset is representative and the measurements are reliable.

      (2) I'm also concerned about Figure 2B. Does each data point in the three graphs represent only a single event? If so, this should probably be repeated several more times to ensure that the data are robust.

      Each data point shown corresponds to the average of many processive runs, ranging from 32 to 167. This has been updated in the figure caption accordingly.

      (3) Figure 3. I'm surprised that the authors could not obtain a higher occupancy of the multivalent DNA tether with kinesin motors. They were adding up to a 30X higher concentration of kinesin, but still did not achieve stoichiometric labeling. The reasons for this should be discussed. This makes interpretation of the mechanical data much tougher. For instance, only 6-7% of the beads would be driven by three kinesins. Unless the movement of hundreds of beads were studied, I think it would be difficult to draw any meaningful insight, since most of the events would be reflective of beads with only one or sometimes two kinesins bound. I think more discussion is required to describe how these data were treated.

      The mass-photometry data in Figure 3B were acquired in the presence of a 3-fold molar excess of kinesin (Supplemental Figure 4) relative to the DNA chassis. In comparison, optical trapping studies were performed at a 10-20-fold molar excess of kinesin, resulting in a substantially higher percentage of chassis with multiple motors. The reason why we had to perform mass photometry measurements at lower molar excess than the optical trap is that at higher kinesin concentrations, the “kinesin-only” peak dominated and obscured 2- or 3-kinesin-bound species, preventing reliable fitting of the mass photometry data. 

      We have now used the mass photometry measurements to extrapolate occupancies under trapping conditions. We estimate 76-93% of 2-motor chassis are bound to two kinesins and ~70% of 3-motor chassis are bound to three kinesins under our trapping conditions. Moreover, the mean forces in Figures 3C–D exceed those expected for a single kinesin, consistent with occupancy substantially greater than one motor per chassis.

      We wrote: “To estimate the percentage of chassis with two and three motors bound, we performed mass photometry measurements at a 3-fold molar excess of kinesin to the chassis, as higher ratios would obscure the distinction of complexes from the kinesin-only population. Assuming there is no cooperativity among the binding sites, we modeled motor occupancy using a Binomial distribution (Figure 3_figure supplement 2). We observed 17-29% of particles corresponded to the two-motor species on the 2-motor chassis in mass photometry, indicating that 45-78% of the 2-motor chassis was bound to two kinesins. Similarly, 15% and 40% of the 3motor chassis were bound to two and three kinesins, respectively.  

      In optical trapping assays, we used 10-fold and 20-fold molar excess of kinesin for 2-motor and 3-motor chassis, respectively, to substantially increase the percentage of the chassis carried by multiple kinesins. Under these conditions, we estimate 76-93% of the 2-motor chassis were bound to two kinesins, and 30% and 70% of 3-motor chassis were bound to two and three kinesins, respectively.”

      “Multi-motor trapping assays were performed similarly using 10x and 20x kinesin for 2- and 3motor chassis, respectively. To estimate the percentage of chassis with multiple motors, we used the probability of kinesin binding to a site on a chassis from mass photometry in 3x excess condition to compute an effective dissociation constant where r is the molar ratio of kinesin to chassis. Single-site occupancy at higher molar excesses of kinesin was calculated using this parameter. ”

      We also added Figure 3_figure supplement 2 to explain our Binomial model.

      (4) Page 5, 1st paragraph. Here, the authors are comparing time constants from stall experiments to data obtained with dynein from Ezber et al. This study used the traditional "one bead" trapping approach with dynein bound directly to the bead under conditions where it would experience high z-forces. Thus, the comparison between the behavior of kinesin at low z-forces is not necessarily appropriate. Has anyone studied dynein's mechanics under low z-force regimes?

      We thank the reviewer for catching a citation error. The text has been corrected to reference Elshenawy et al. 2020, which reported stall time constants for mammalian dynein. 

      To our knowledge, dynein’s mechanics under explicitly low z-force conditions have not yet been reported; however, given the more robust stalling behavior of dynein and greater collective force generation, the cited paper was chosen to compare low z-force kinesin to a motor that appears comparatively unencumbered by z-forces. Our study adds to growing evidence that high z-forces disproportionately limit kinesin performance. 

      For clarification, we modified that sentence as follows: “These time constants are comparable to those reported for minus-end-directed dynein under high z-forces”.

      Reviewer #2 (Recommendations for the authors):

      (1) P3 pp2, a DNA tensiometer cannot control the force, but it can measure it; get the distance between the two ends of the tensiometer, and apply WLC.

      The text has been updated to more accurately reflect the differences between optical trapping and kinesin motility against a DNA tensiometer with a fixed lattice position.

      (2) Fig. 2b, SEM is a poor estimate or error for exponentially distributed run lengths. Other methods, like bootstrapping an exponential distribution fit, may provide a more realistic estimate.

      Run lengths were plotted as an inverse cumulative distribution function and fitted to a single exponential decay (Supplementary Figure S3). The plotted value represents the fitted decay constant (characteristic run length) ± SE (standard error of the fit), not the arithmetic mean ± SEM. Velocity values are reported as mean ± SEM. Detachment rate was computed as velocity divided by run length, except at 6 and 10 pN hindering loads, where minimal forward displacement necessitated fitting run-time decays directly. In those cases, the plotted detachment rate equals the inverse of the fitted time constant. The figure caption has been updated accordingly.

      (3) Kinesin-1 is covalently bound to a DNA oligo, which then attaches to the DNA chassis by hybridization. This oligo is 21 nt with a relatively low GC%. At what force does this oligo unhybridize? Can the authors verify that their stall force measurements are not cut short by the oligo detaching from the chassis?

      The 21-nt attachment oligo (38 % GC) is predicted to have ΔG<sub>37C</sub> ≈-25 kcal/mole or approximately 42 kT. If we assume this is the approximate amount of work required to unhybridize the oligo, we would expect the rupture force to be >15 pN. This significantly exceeds the stall force of a single kinesin. Since the stalling events rarely exceed a few seconds, it is unlikely that our oligos quickly detach from the chassis under such low forces.  

      Furthermore, optical trapping experiments are tuned such that no more than 30% of beads display motion within several minutes after they are brought near microtubules. After stalling events, the motor dissociates from the MT, and the bead snaps back to the trap center. Most beads robustly reengage with the microtubule, typically within 10 s, suggesting that the same motor chassis reengages with the microtubule after microtubule detachment. Successive runs of the same bead typically have similar stall forces, suggesting that the motors do not disengage from the chassis under resistive forces exerted by the trap.

      (4) Figure 1, a justification or explanation should be provided for why events lower than 1.5 pN were excluded. It appears arbitrary.

      Single-motor stall-force measurements used a trap stiffness of 0.08–0.10 pN/nm. At this stiffness, a 1.5 pN force corresponds to 15–19 nm bead displacement, roughly two kinesin steps, and events below this threshold could not be reliably distinguished from Brownian noise. For this reason, forces < 1.5 pN were excluded.

      In Methods, we wrote “Only peak forces above 1.5 pN (corresponding to a 15-19 nm bead displacement) were analyzed to clearly distinguish runs from the tracking noise.”

      (5) Figure 2b, is the difference in velocity statistically significant?

      The difference in velocity is statistically significant for most conditions. We did not compare velocities for -10 and -6 pN as these conditions resulted in little forward displacement. However, the p-values for all of the other conditions are -4 pN: 0.0026, -2 pN: 0.0001, -1 pN: 0.0446, +0.5 pN: 0.3148, +2 pN: 0.0001, +3 pN: 0.1191, +4 pN: 0.0004.

      (6) The number of measurements for each experimental datapoint in the corresponding figure caption should be provided. SEM is used without, but N is not reported in the caption.

      Figure captions have now been updated to report the number of trajectories (N) for each data point.

      Reviewer #3 (Recommendations for the authors):  

      (1) The method of DNA-tethered motor trapping to enable low z-force is not entirely novel, but adapted from Urbanska (2021) for use in conventional optical trapping laboratories without reliance on microfluidics. However, I appreciate that they have fully established it here to share with the community. The authors could strengthen their methods section by being transparent about protein weight, protein labelling, and DNA ladders shown in the supplementary information. What organism is the protein from? Presumably human, but this should be specified in the methods. While the figures show beautiful data and exemplary traces, the total number of molecules analysed or events is not consistently reported. Overall, certain methodological details should be made sufficient for reproducibility.

      We appreciate the reviewer’s attention to methodological clarity. The constructs used are indeed human kinesin-1, KIF5B. The Methods now specify protein origin, molecular weights, and labeling details, and all figure captions report the number of trajectories analyzed to ensure reproducibility.

      (2) The major limitation the study presents is overarching generalisability, starting with the title. I recommend that the title be specific to kinesin-1. 

      The title has been revised to specify kinesin-1. 

      The study uses two constructs: a truncated K560 for conventional high-force assays, and full-length Kif5b for the low z-force method. However, for the multi-motor assay, the authors use K560 with the rationale of preventing autoinhibition due to binding with DNA, but that would also have limited characterisation in the single-molecule assay. Overall, the data generated are clear, high-quality, and exciting in the low z-force conditions. But why have they not compared or validated their findings with the truncated construct K560? This is especially important in the force-feedback experiments and in comparison with Andreasson et al. and Carter et al., who use Drosophila kinesin-1. Could kinesin-1 across organisms exhibit different force-detachment kinetics? It is quite possible. 

      Construct choice was guided by physiological relevance and considerations of autoinhibition: K560 was used for high z-force single-motor assays. The results of these assays are consistent with conventional bead assays performed by Andreasson et al. and Carter et al. using kinesin from a different organism. Therefore, we do not believe there are major differences between force properties of Drosophila and human kinesin-1.

      For low z-force assays, we used full-length KIF5B, which has nearly identical velocity and stall force to K560 in standard bead assays. We used this construct for low z force assays because it has a longer and more flexible stalk than K560 and better represents the force behavior of kinesin under physiological conditions. We then used constitutively-active K560 motors for multi-motor experiments to avoid potential complications from autoinhibition of full-length kinesin.

      Similarly, the authors test backward slipping of Kif5b and K560 and measure dwell times in multi-motor assays. Why not detail the backward slippage kinetics of Kif5b and any step-size impact under low z-forces? For instance, with the traces they already have, the authors could determine slip times, distances, and frequency in horizontal force experiments. Overall, the manuscript could be strengthened by analysing both constructs more fully.

      Slip or backstep analyses were not performed on single-motor data because such events were rare; kinesin typically detached rather than slipped. In contrast, multi-motor assays exhibited frequent slip events corresponding to the detachment of individual motors, which were analyzed in detail.

      We wrote “In comparison, slipping events were rarely observed in beads driven by a single motor, suggesting that kinesin typically detaches rather than slipping back on the microtubule under hindering loads.”

      Appraisal and impact:

      This study contributes to important and debated evidence on kinesin-1 force-detachment kinetics. The authors conclude that kinesin-1 exhibits a slip-bond interaction with the microtubule under increasing forces, while other recent studies (Noell et al. and Kuo et al.), which also use low z-force setups, conclude catch-bond behaviour under hindering loads. I find the results not fully aligned with their interpretation. The first comparison of low zforces in their setup with Noell et al. (2024), based on stall times, does not hold, because it is an apples-to-oranges comparison. Their data show a stall time constant of 2.52 s, which is comparable to the 3 s reported by Noell et al., but the comparison is made with a weighted average of 1.49 s. The authors do report that detachment rates are lower in low z-force conditions under unloaded scenarios. So, to completely rule out catch-bond-like behaviour is unfair. That said, their data quality is good and does show that higher hindering forces lead to higher detachment rates. However, on closer inspection, the range of 0-5 pN shows either a decrease or no change in detachment rate, which suggests that under a hindering force threshold, catch-bond-like or ideal-bond-like behaviour is possible, followed by slipbond behaviour, which is amazing resolution. Under assisting loads, the slip-bond character is consistent, as expected. Overall, the study contributes to an important discussion in the biophysical community and is needed, but requires cautious framing, particularly without evidence of motor trapping in a high microtubule-affinity state rather than genuine bond strengthening.

      We are not completely ruling out the catch bond behavior in our manuscript. As the reviewer pointed out, our results are consistent with the asymmetric slip bond model, whereas DNA tensiometer assays are more consistent with the catch bond behavior. The advantage of our approach is the capability to directly control the magnitude and direction of load exerted on the motor in the horizontal axis and measure the rate at which the motor detaches from the microtubule as it walks under constant load. In comparison, DNA tensiometer assays cannot control the force, but measure the time it takes the motor to fall off from the microtubule after a brief stall. The extension of the DNA tether is used to estimate the force exerted on the motor during a stall in those assays. The slight disadvantage of our method is the presence of low zforces, whereas DNA tensiometer assays are expected to have little to no z-force. We wrote that the discrepancy between our results can be attributed to the presence of low z forces in our DNA tethered trapping assembly, which may result in a higher-than-normal detachment rate under high hindering loads, thereby resulting in less asymmetry in the force detachment kinetics. We also added that this discrepancy can be addressed by future studies that directly control and measure horizontal force and measure the motor detachment rate in the absence of z forces. Optical trapping assays with small nanoparticles (Sudhakar et al. Science 2021) may be well suited to conclusively reveal the bond characteristics of kinesin under hindering loads.

      Reviewing Editor Comments:

      The reviewers are in agreement with the importance of the findings and the quality of the results. The use of the DNA tether reduces the z-force on the motor and provides biologically relevant insight into the behavior of the motor under load. The reviewers' suggestions are constructive and focus on bolstering some of the data points and clarifying some of the methodological approaches. My major suggestion would be to clarify the rationale for concluding that kinesin-1 exhibits slip-bond behavior with increasing force in light of the work of Noell (10.1101/2024.12.03.626575) and Kuo et al (2022 10.1038/s41467022-31069-x), both of which take advantage of DNA tethers.

      Please see our response to the previous comment. In the revised manuscript, we first clarified that our results are in agreement with previous theoretical (Khataee & Howard, 2019) and experimental studies (Kuo et al., 2022; Noell et al., 2024; Pyrpassopoulos et al., 2020) that kinesin exhibits slower detachment under hindering load. This asymmetry became clear when the z-force was reduced or eliminated. 

      We clarified the differences between our results and DNA tensiometer assays and provided a potential explanation for these discrepancies. We also proposed that future studies might be required to fully distinguish between asymmetric slip, ideal, or catch bonding of kinesin under hindering loads.

      We wrote:

      “Our results agree with the theoretical prediction that kinesin exhibits higher asymmetry in force-detachment kinetics without z-forces (Khataee & Howard, 2019), and are consistent with optical trapping and DNA tensiometer assays that reported more persistent stalling of kinesin in the absence of z-forces (Kuo et al., 2022; Noell et al., 2024; Pyrpassopoulos et al., 2020).

      Force-detachment kinetics of protein-protein interactions have been modeled as either a slip, ideal, or catch bond, which exhibit an increase, no change, or a decrease in detachment rate, respectively, under increasing force (Thomas et al., 2008). Slip bonds are most commonly observed in biomolecules, but studies on cell adhesion proteins reported a catch bond behavior (Marshall et al., 2003). Although previous trapping studies of kinesin reported a slip bond behavior (Andreasson et al., 2015; Carter & Cross, 2005), recent DNA tensiometer studies that eliminated the z-force showed that the detachment rate of the motor under hindering forces is lower than that of an unloaded motor walking on the microtubule (Kuo et al., 2022; Noell et al., 2024), consistent with the catch bond behavior. Unlike these reports, we observed that the stall duration of kinesin is shorter than the motor run time under unloaded conditions, and the detachment rate of kinesin increases with the magnitude of the hindering force. Therefore, our results are more consistent with the asymmetric slip bond behavior. The difference between our results and the DNA tensiometer assays (Kuo et al., 2022; Noell et al., 2024) can be attributed to the presence of low z-forces in our DNA-tethered optical trapping assays, which may increase the detachment rate under high hindering forces. Future studies that could directly control hindering forces and measure the motor detachment rate in the absence of z-forces would be required to conclusively reveal the bond characteristics of kinesin under hindering loads.”

    1. eLife Assessment

      This paper undertakes an important investigation to determine whether movement slowing in microgravity is due to a strategic conservative approach or rather due to an underestimation of the mass of the arm. The experimental dataset is unique, the coupled experimental and computational analyses comprehensive, and the effect is strong. However, the authors present incomplete results to support the claim that movement slowing is due to mass underestimation. Further analysis is needed to rule out alternative explanations.

    2. Reviewer #1 (Public review):

      The authors have conducted substantial additional analyses to address the reviewers' comments. However, several key points still require attention. I was unable to see the correspondence between the model predictions and the data in the added quantitative analysis. In the rebuttal letter, the delta peak speed time displays values in the range of [20, 30] ms, whereas the data were negative for the 45{degree sign} direction. Should the reader directly compare panel B of Figure 6 with Figure 1E? The correspondence between the model and the data should be made more apparent in Figure 6. Furthermore, the rebuttal states that a quantitative prediction was not expected, yet it subsequently argues that there was a quantitative match. Overall, this response remains unclear.

      A follow-up question concerns the argument about strategic slowing. The authors argue that this explanation can be rejected because the timing of peak speed should be delayed, contrary to the data. However, there appears to be a sign difference between the model and the data for the 45{degree sign} direction, which means that it was delayed in this case. Did I understand correctly? In that regard, I believe that the hypothesis of strategic slowing cannot yet be firmly rejected and the discussion should more clearly indicate that this argument is based on some, but not all, directions. I agree with the authors on the importance of the mass underestimation hypothesis, and I am not particularly committed to the strategic slowing explanation, but I do not see a strong argument against it. If the conclusion relies on the sign of the delta peak speed, then the authors' claims are not valid across all directions, and greater caution in the interpretation and discussion is warranted. Regarding the peak acceleration time, I would be hesitant to draw firm conclusions based on differences smaller than 10 ms (Figures R3 and 6D).

      The authors state in the rebuttal that the two hypotheses are competing. This is not accurate, as they are not mutually exclusive and could even vary as a function of movement direction. The abstract also claims that the data "refutes" strategic slowing, which I believe is too strong. The main issue is that, based on the authors' revised manuscript, the lack of quantitative agreement between the model and the data for the mass underestimation hypothesis is considered acceptable because a precise quantitative match is not expected, and the predictions overall agree for some (though not all) directions and phases (excluding post-in). That is reasonable, but by the same logic, the small differences between the model prediction and the strategic slowing hypothesis should not be taken as firm evidence against it, as the authors seem to suggest. In practice, I recommend a more transparent and cautious interpretation to avoid giving readers the false impression that the evidence is decisive. The mass underestimation hypothesis is clearly supported, but the remaining aspects are less clear, and several features of the data remain unexplained.

    3. Reviewer #2 (Public review):

      This study explores the underlying causes of the generalized movement slowness observed in astronauts in weightlessness compared to their performance on Earth. The authors argue that this movement slowness stems from an underestimation of mass rather than a deliberate reduction in speed for enhanced stability and safety.

      Overall, this is a fascinating and well-written work. The kinematic analysis is thorough and comprehensive. The design of the study is solid, the collected dataset is rare, and the model adds confidence to the proposed conclusions.

      Compared to the previous version, the authors have thoroughly addressed my concerns. The model is now clear and well-articulated, and alternative hypotheses have been ruled out convincingly. The paper is improved and suitable for publication in my opinion, making a significant contribution to the field.

      Strengths:

      - Comprehensive analysis of a unique data set of reaching movement in microgravity<br /> - Use of a sensible and well-thought experimental approach<br /> - State-of-the-art analyses of main kinematic parameter<br /> - Computational model simulations of arm reaching to test alternative hypotheses and support the mass underestimation one

      This work has no major weakness as it stands, and the discussion provides a fair evaluation of the findings and conclusions.

    4. Reviewer #3 (Public review):

      Summary:

      The authors describe an interesting study of arm movements carried out in weightlessness after a prolonged exposure to the so-called microgravity conditions of orbital spaceflight. Subjects performed radial point-to-point motions of the fingertip on a touch pad. The authors note a reduction in movement speed in weightlessness, which they hypothesize could be due to either an overall strategy of lowering movement speed to better accommodate the instability of the body in weightlessness or an underestimation of body mass. They conclude for the latter, mainly based on two effects. One, slowing in weightlessness is greater for movement directions with higher effective mass at the end effector of the arm. Two, they present evidence for increased number of corrective submovements in weightlessness. They contend that this provides conclusive evidence to accept the hypothesis of an underestimation of body mass.

      Strengths:

      In my opinion, the study provides a valuable contribution, the theoretical aspects are well presented through simulations, the statistical analyses are meticulous, the applicable literature is comprehensively considered and cited and the manuscript is well written.

      Weaknesses:

      I nevertheless am of the opinion that the interpretation of the observations leaves room for other possible explanations of the observed phenomenon, thus weakening the strength of the arguments.

      To strengthen the conclusions, I feel that the following points would need to be addressed:

      (1) The authors model the movement control through equations that derive the input control variable in terms of the force acting on the hand and treating the arm as a second-order low pass filter (Eq. 13). Underestimation of the mass in the computation of a feedforward command would lead to a lower-than-expected displacement to that command. But it is not clear if and how the authors account for a potential modification of the time constants of the 2nd order system. The CNS does not effectuate movements with pure torque generators. Muscles have elastic properties that depend on their tonic excitation level, reflex feedback and other parameters. Indeed, Fisk et al.* showed variations of movement characteristics consistent with lower muscle tone, lower bandwidth and lower damping ratio in 0g compared to 1g. Could the variations in the response to the initial feedforward command be explained by a misrepresentation of the limbs damping and natural frequency, leading to greater uncertainty to the consequences of the initial command. This would still be an argument for un-adapted feedforward control of the movement, leading to the need for more corrective movements. But it would not necessarily reflect an underestimation of body mass.

      *Fisk, J. O. H. N., Lackner, J. R., & DiZio, P. A. U. L. (1993). Gravitoinertial force level influences arm movement control. Journal of neurophysiology, 69(2), 504-511.

      While the authors attempt to differentiate their study from previous studies where limb neuromechanical impedance was shown to be modified in weightlessness by emphasizing that in the current study the movements were rapid and the initial movement is "feedforward". But this incorrectly implies that the limb's mechanical response to the motor command is determined only by active feedback mechanisms. In fact:

      (a) All commands to the muscle pass through the motor neurons. These neurons receive descending activations related not only to the volitional movement, but also to the dynamic state of the body and the influence of other sensory inputs, including the vestibular system. A decrease in descending influences from the vestibular organs will lower the background sensitivity to all other neural influences on the motor neuron. Thus, the motor neuron may be less sensitive to the other volitional and reflexive synaptic inputs that it may receive.

      (b) Muscle tone plays a significant role in determining the force and the time course of the muscle contraction. In a weightless environment, where tonic muscle activity is likely to be reduced, there is the distinct possibility that muscles will react more slowly and with lower amplitude to an otherwise equivalent descending motor command, particularly in the initial moments before spinal reflexes come into play. These, and other neuronal mechanisms could lead to the "under-actuation" effect observed in the current study, without necessarily being reflective of an underestimation of mass per se.

      (2) The subject's body in weightless is much more sensitive to reaction forces in interactions with the environment in the absence of the anchoring effect of gravity pushing the body into the floor and in the absence of anticipatory postural adjustments that typically accompany upper-limb motions in Earth gravity in order to maintain an upright posture. The authors dismiss this possibility because the taikonauts were asked to stabilize their bodies with the contralateral hand. But the authors present no evidence that this was sufficient to maintain the shoulder and trunk at a strictly constant position, as is supposed by the simplified biomechanical model used in their optimal control framework. Indeed, a small backward motion of the shoulder would result in a smaller acceleration of the fingertip and a smaller extent of the initial ballistic motion of the hand with respect to the measurement device (the tablet), consistent with the observations reported in the study. Note that stability of the base might explain why 45º movements were apparently less affected in weightlessness, according to many of the reported analyses, including those related to corrective movements (Fig. 5 B, C, F; Fig. 6D), than the other two directions. If the trunk is being stabilized by the left arm, the same reaction forces on the trunk due to the acceleration of the hand will result in less effective torque on the trunk, given that the reaction forces act with a much smaller moment arm with respect to the left shoulder (the hand movement axis passes approximately through the left shoulder for the 45º target) compared to either the forward or rightward motions of the hand.

      (3) The above is exacerbated by potential changes in the frictional forces between the fingertip and the tablet. The movements were measured by having the subjects slide their finger on the surface of a touch screen. In weightlessness, the implications of this contact can be expected to be quite different than on the ground. While these forces may be low on Earth, the fact is that we do not know what forces the taikonauts used on orbit. In weightlessness, the taikonauts would need to actively press downward to maintain contact with the screen, while on Earth gravity will do the work. The tangential forces that resist movement due to friction might therefore be different in 0g. . Indeed, given the increased instability of the body and the increased uncertainty of movement direction of the hand, taikonauts may have been induced to apply greater forces against the tablet in order to maintain contact in weightlessness, which would in turn slow the motion of the finger on the table and increase the reaction forces acting on the trunk. This could be particularly relevant given that the effect of friction would interact with the limb in a direction-dependent fashion, given the anisotropy of the equivalent mass at the fingertip evoked by the authors

      I feel that the authors have done an admirable job of exploring the how to explain the modifications to movement kinematics that they observed on orbit within the constraints of the optimal control theory applied to a simplified model of the human motor system. While I fully appreciate the value of such models to provide insights into question of human sensorimotor behaviour, to draw firm conclusions on what humans are actually experiencing based only on manipulations of the computational model, without testing the model's implicit assumptions and without considering the actual neurophysiological and biomechanical mechanisms, can be misleading. One way to do this could be to examine these questions through extensions to the model used in the simulations (changing activation dynamics of the torque generators, allowing for potential motion backward motion of the shoulder and trunk, etc.). A better solution would be to emulate the physiological and biomechanical conditions on Earth (supporting the arm against gravity to reduce muscle tone, placing the subject on a moveable base that requires that the body be stabilized with the other hand) in order to distinguish the hypothesis of an underestimation of mass vs. other potential sources of under-actuation and other potential effects of weightlessness on the body.

      In sum, my opinion is that the authors are relying too much on a theoretical model as a ground truth and thus overstate their conclusions. But to provide a convincing argument that humans truly underestimate mass in weightlessness, they should consider more judiciously the neurophysiology and biomechanics that fall outside the purview of the simplified model that they have chosen. If a more thorough assessment of this nature is not possible, then I would argue that a more measured conclusion of the paper should be 1) that the authors observed modifications to movement kinematics in weightlessness consistent with an under-actuation for the intended motion, 2) that a simplified model of human physiology and biomechanics that incorporates principles of optimal control suggest that the source of this under-actuation might be an underestimation of mass in the computation of an appropriate feedforward motor command, and 3) that other potential neurophysiological or biomechanical effects cannot be excluded due to limitations of the computational model.

    5. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      This paper undertakes an important investigation to determine whether movement slowing in microgravity is due to a strategic conservative approach or rather due to an underestimation of the mass of the arm. While the experimental dataset is unique and the coupled experimental and computational analyses comprehensive, the authors present incomplete results to support the claim that movement slowing is due to mass underestimation. Further analysis is needed to rule out alternative explanations.

      We thank the editor and reviewers for the thoughtful and constructive comments, which helped us substantially improve the manuscript. In this revised version, we have made the following key changes:

      - Directly presented the differential effect of microgravity in different movement directions, showing its quantitative match with model predictions.

      - Showed that changing cost function with the idea of conservative strategy is not a viable alternative.

      - Showed our model predictions remain largely the same after adding Coriolis and centripetal torques.

      - Discussed alternative explanations including neuromuscular deconditioning, friction, body stability, etc.

      - Detailed the model description and moved it to the main text, as suggested.

      Our point-to-point response is numbered to facilitate cross-referencing.

      We believe the revisions and the responses adequately addresses the reviewers’ concerns, and new analysis results strengthened our conclusion that mass underestimation is the major contributor to movement slowing in microgravity.

      Reviewer #1 (Public review):

      Summary:

      This article investigates the origin of movement slowdown in weightlessness by testing two possible hypotheses: the first is based on a strategic and conservative slowdown, presented as a scaling of the motion kinematics without altering its profile, while the second is based on the hypothesis of a misestimation of effective mass by the brain due to an alteration of gravity-dependent sensory inputs, which alters the kinematics following a controller parameterization error.

      Strengths:

      The article convincingly demonstrates that trajectories are affected in 0g conditions, as in previous work. It is interesting, and the results appear robust. However, I have two major reservations about the current version of the manuscript that prevent me from endorsing the conclusion in its current form.

      Weaknesses:

      (1) First, the hypothesis of a strategic and conservative slow down implicitly assumes a similar cost function, which cannot be guaranteed, tested, or verified. For example, previous work has suggested that changing the ratio between the state and control weight matrices produced an alteration in movement kinematics similar to that presented here, without changing the estimated mass parameter (Crevecoeur et al., 2010, J Neurophysiol, 104 (3), 1301-1313). Thus, the hypothesis of conservative slowing cannot be rejected. Such a strategy could vary with effective mass (thus showing a statistical effect), but the possibility that the data reflect a combination of both mechanisms (strategic slowing and mass misestimation) remains open.

      Response (1): Thank you for raising this point. The basic premise of this concern is that changing the cost function for implementing strategic slowing can reproduce our empirical findings, thus the alternative hypothesis that we aimed to refute in the paper remain possible. At least, it could co-exist with our hypothesis of mass underestimation. In the revision, we show that changing the cost function only, as suggested here, cannot produce the behavioral patterns observed in microgravity.

      As suggested, we modified the relative weighting of the state and control cost matrices (i.e., Q and R in the cost function Eq 15) without considering mass underestimation. While this cost function scaling can decrease peak velocity – a hallmark of strategic slowing – it also inevitably leads to later peak timings. This is opposite to our robust findings: the taikonauts consistently “advanced” their peak velocity and peak acceleration in time. Note, these model simulation patterns have also been shown in Crevecoeur et al. (2010), the paper mentioned by the reviewer (see their Figure 7B).

      We systematically changed the ratio between the state and control weight matrices in the simulation, as suggested. We divided Q and multiplied R by the same factor α, the cost function scaling parameter α as defined in Crevecoeur et al. (2010). This adjustment models a shift in movement strategy in microgravity, and we tested a wide range of α to examine reasonable parameter space. Simulation results for α = 3 and α = 0.3 are shown in Figure 1—figure supplement 2 and Figure 1—figure supplement 3 respectively. As expected, with α = 3 (higher control effort penalty), peak velocities and accelerations are reduced, but their timing is delayed. Conversely, with α = 0.3, both peak amplitude and timing increase. Hence, changing the cost function to implement a conservative strategy cannot produce the kinematic pattern observed in microgravity, which is a combination of movement slowing and peak timing advance.

      Therefore, we conclude that a change in optimal control strategy alone is insufficient to explain our empirical findings. Logically speaking, we cannot refute the possibility of strategic slowing, which can still exist on top of the mass underestimation we proposed here. However, our data does not support its role in explaining the slowing of goal-directed hand reaching in microgravity. We have added these analyses to the Supplementary Materials and expanded the Discussion to address this point.

      (2) The main strength of the article is the presence of directional effects expected under the hypothesis of mass estimation error. However, the article lacks a clear demonstration of such an effect: indeed, although there appears to be a significant effect of direction, I was not sure that this effect matched the model's predictions. A directional effect is not sufficient because the model makes clear quantitative predictions about how this effect should vary across directions. In the absence of a quantitative match between the model and the data, the authors' claims regarding the role of misestimating the effective mass remain unsupported.

      Response (2): First, we have to clarify that our study does not aim to quantitatively fit observed hand trajectory. The two-link arm model simulates an ideal case of moving a point mass (effective mass) on a horizontal plane without friction (Todorov, 2004; 2005). In contrast, in the experiment, participants moved their hand on a tabletop without vertical arm support, so the movement was not strictly planar and was affected by friction. Thus, this kind of model can only illustrate qualitative differences between conditions, as in the majorities of similar modeling studies (e.g., Shadmehr et al., 2016). In our study, qualitative simulation means the model is intended to reproduce the directional differences between conditions—not exact numeric values—in key kinematic measures. Specifically, it should capture how the peak velocity and acceleration amplitudes and their timings differ between normal gravity and microgravity (particularly under the mass-underestimation assumption).

      Second, the reviewer rightfully pointed out that the directional effect is essential for our theorization of the importance of mass underestimation. However, the directional effect has two aspects, which were not clearly presented in our original manuscript. We now clarify both here and in the revision. The first aspect is that key kinematic variables (peak velocity/acceleration and their timing) are affected by movement direction, even before any potential microgravity effect. This is shown by the ranking order of directions for these variables (Figure 1C-H). The direction-dependent ranking, confirmed by pre-flight data, indicates that effective mass is a determining factor for reaching kinematics, which motivated us to study its role in eliciting movement slowing in space. This was what our original manuscript emphasized and clearly presented.

      The second aspect is that the hypothetical mass underestimation might also differentially affect movements in different directions. This was not clearly presented in the original manuscript. However, we would not expect a quantitative match between model predictions and empirical data, for the reasons mentioned above. We now show this directional ranking in microgravity-elicited kinematic changes in both model simulations and empirical data. The overall trend is that the microgravity effect indeed differs between directions, and the model predictions and the data showed a reasonable qualitative match (Author response image 1 below).

      Shown in Author response image 1, we found that for amplitude changes (Δ peak speed, Δ peak acceleration) both the model and the mean of empirical data show the same directional ordering (45° > 90° > 135°) in pre-in and post-in comparisons. For timing (Δ peak-speed time, Δ peak-acceleration time), which we consider the most diagnostic, the same directional ranking was observed. We only found one deviation, i.e., the predicted sign (earlier peaks) was confirmed at 90° and 135°, but not at 45°. As discussed in Response (6), the absence of timing advance at 45° may reflect limitations of our simplified model, which did not consider that the 45° direction is essentially a single-joint reach. Taken together, the directional pattern is largely consistent with the model predictions based on mass underestimation. The model successfully reproduces the directional ordering of amplitude measures -- peak velocity and peak acceleration. It also captures the sign of the timing changes in two out of the three directions. We added these new analysis results in the revision and expanded Discussion accordingly.

      The details of our analysis on directional effects: We compared the model predictions (Author response image 1, left) with the experimental data (Author response image 1, right) across the three tested directions (45°, 90°, 135°). In the experimental data panels, both Δ(pre-in) (solid bars) and Δ(post-in) (semi-transparent bars) with standard error are shown. The directional trends are remarkably similar between model prediction and actual data. The post-in comparison is less aligned with model prediction; we postulate that the incomplete after-flight recovery (i.e., post data had not returned to pre-flight baselines) might obscure the microgravity effect. Incomplete recovery has also been shown in our original manuscript: peak speed and peak acceleration did not fully recover in post-flight sessions when compared to pre-flight sessions. To further quantify the correspondence between model and data, we performed repeated-measures correlation (rm-corr) analyses. We found significant within-subject correlations for three of the four metrics. For pre–in, Δ peak speed time (r<sub>rm</sub> = 0.627, t(23) = 3.858, p < 0.001), Δ peak acceleration time (r<sub>rm</sub> = 0.591, t(23) = 3.513, p = 0.002), and Δ peak acceleration (r<sub>rm</sub> = 0.573, t(23) = 3.351, p = 0.003) were significant, whereas Δ peak speed was not (r<sub>rm</sub> = 0.334, t(23) = 1.696, p = 0.103). These results thus show that the directional effect, as predicted our model, is observed both before spaceflight and in spaceflight (the pre-in comparison).

      Author response image 1.

      Directional comparison between model predictions and experimental data across the three reach directions (45°, 90°, 135°). Left: model outputs. Right: experimental data shown as Δ relative to the in-flight session; solid bars = Δ(in − pre) and semi-transparent bars = Δ(in − post). Colors encode direction consistently across panels (e.g., 45° = darker hue, 90° = medium, 135° = lighter/orange). Panels (clockwise from top-left): Δ peak speed (cm/s), Δ peak speed time (ms), Δ peak acceleration time (ms), and Δ peak acceleration (cm/s²). Bars are group means; error bars denote standard error across participants.

      Citations:

      Todorov, E. (2004). Optimality principles in sensorimotor control. Nature Neuroscience, 7(9), 907.

      Todorov, E. (2005). Stochastic optimal control and estimation methods adapted to the noise characteristics of the sensorimotor system. Neural Computation, 17(5), 1084–1108.

      Shadmehr, R., Huang, H. J., & Ahmed, A. A. (2016). A Representation of Effort in Decision-Making and Motor Control. Current Biology: CB, 26(14), 1929–1934.

      In general, both the hypotheses of slowing motion (out of caution) and misestimating mass have been put forward in the past, and the added value of this article lies in demonstrating that the effect depended on direction. However, (1) a conservative strategy with a different cost function can also explain the data, and (2) the quantitative match between the directional effect and the model's predictions has not been established.

      We agree that both hypotheses have been put forward before, however they are competing hypotheses that have not been resolved. Furthermore, the mass underestimation hypothesis is a conjecture without any solid evidence; previous reports on mass underestimation of object cannot directly translate to underestimation of body. As detailed in our responses above, we have shown that a conservative strategy implemented via a different cost function cannot reproduce the key findings in our dataset, thereby supporting the alternative hypothesis of mass underestimation. Moreover, we found qualitative agreement between the model predictions and the experimental data in terms of directional effects, which further strengthens our interpretation.

      Specific points:

      (1) I noted a lack of presentation of raw kinematic traces, which would be necessary to convince me that the directional effect was related to effective mass as stated.

      Response (3): We are happy to include exemplary speed and acceleration trajectories. Kinematic profiles from one example participant are shown in Figure 2—figure supplement 6.

      (2) The presentation and justification of the model require substantial improvement; the reason for their presence in the supplementary material is unclear, as there is space to present the modelling work in detail in the main text. Regarding the model, some choices require justification: for example, why did the authors ignore the nonlinear Coriolis and centripetal terms?

      Response (4): Great suggestion. In the revision, we have moved the model into the main text and added further justification for using this simple model.

      We initially omitted the nonlinear Coriolis and centripetal terms in order to start with a minimal model. Importantly, excluding these terms does not affect the model’s main conclusions. In the revision we added simulations that explicitly include these terms. The full explanation and simulations are provided in the Supplementary Notes 2 (this time we have to put it into the Supplementary to reduce the texts devoted to the model). More explanations can also be found in our response to Reviewer 2 (response (6)). The results indicate that, although these velocity-dependent forces show some directional anisotropy, their contribution is substantially smaller relative to that of the included inertial component; specifically, they have only a negligible impact on the predicted peak amplitudes and peak times.

      (3) The increase in the proportion of trials with subcomponents is interesting, but the explanatory power of this observation is limited, as the initial percentage was already quite high (from 60-70% during the initial study to 70-85% in flight). This suggests that the potential effect of effective mass only explains a small increase in a trend already present in the initial study. A more critical assessment of this result is warranted.

      Response (5): Thank you for your thoughtful comment. You are correct that the increase in the percentage of trials with submovements is modest, but a more critical change was observed in the timing between submovement peaks—specifically, the inter-peak interval (IPI). These intervals became longer during flight. Taken together with the percentage increase, the submovement changes significantly predicted the increase in movement duration, as shown by our linear mixed-effects model, which indicated that IPI increased.

      Reviewer #2 (Public review):

      This study explores the underlying causes of the generalized movement slowness observed in astronauts in weightlessness compared to their performance on Earth. The authors argue that this movement slowness stems from an underestimation of mass rather than a deliberate reduction in speed for enhanced stability and safety.

      Overall, this is a fascinating and well-written work. The kinematic analysis is thorough and comprehensive. The design of the study is solid, the collected dataset is rare, and the model tends to add confidence to the proposed conclusions. That being said, I have several comments that could be addressed to consolidate interpretations and improve clarity.

      Main comments:

      (1) Mass underestimation

      a) While this interpretation is supported by data and analyses, it is not clear whether this gives a complete picture of the underlying phenomena. The two hypotheses (i.e., mass underestimation vs deliberate speed reduction) can only be distinguished in terms of velocity/acceleration patterns, which should display specific changes during the flight with a mass underestimation. The experimental data generally shows the expected changes but for the 45° condition, no changes are observed during flight compared to the pre- and post-phases (Figure 4). In Figure 5E, only a change in the primary submovement peak velocity is observed for 45°, but this finding relies on a more involved decomposition procedure. It suggests that there is something specific about 45° (beyond its low effective mass). In such planar movements, 45° often corresponds to a movement which is close to single-joint, whereas 90° and 135° involve multi-joint movements. If so, the increased proportion of submovements in 90° and 135° could indicate that participants had more difficulties in coordinating multi-joint movements during flight. Besides inertia, Coriolis and centripetal effects may be non-negligible in such fast planar reaching (Hollerbach & Flash, Biol Cyber, 1982) and, interestingly, they would also be affected by a mass underestimation (thus, this is not necessarily incompatible with the author's view; yet predicting the effects of a mass underestimation on Coriolis/centripetal torques would require a two-link arm model). Overall, I found the discrepancy between the 45° direction and the other directions under-exploited in the current version of the article. In sum, could the corrective submovements be due to a misestimation of Coriolis/centripetal torques in the multi-joint dynamics (caused specifically -or not- by a mass underestimation)?

      Response (6): Thank you for raising these important questions. We unpacked the whole paragraph into two concerns: 1) the possibility that misestimation of Coriolis and centripetal torques might lead to corrective submovements, and 2) the weak effect in the 45° direction unexploited. These two concerns are valid but addressable, and they did not change our general conclusions based on our empirical findings (see Supplementary note 2. Coriolis and centripetal torques have minimal impact).

      Possible explanation for the 45° discrepancy

      We agree with the reviewer that the 45° direction likely involves more single-joint (elbow-dominant) movement, whereas the 90° and 135° directions require greater multi-joint (elbow + shoulder) coordination. This is particularly relevant when the workspace is near body midline (e.g., Haggard & Richardson, 1995), as the case in our experimental setup. To demonstrate this, we examined the curvature of the hand trajectories across directions. Using cumulative curvature (positive = counterclockwise), we obtained average values of 6.484° ± 0.841°, 1.539° ± 0.462°, and 2.819° ± 0.538° for the 45°, 90°, and 135° directions, respectively. The significantly larger curvature in the 45° condition suggests that these movements deviate more from a straight-line path, a hallmark of more elbow-dominant movements.

      Importantly, this curvature pattern was present in both the pre-flight and in-flight phases, indicating that it is a general movement characteristic rather than a microgravity-induced effect. Thus, the 45° reaches are less suitable for modeling with a simplified two-link arm model compared to the other two directions. We believe this is the main reason why the model predictions based on effective mass become less consistent with the empirical data for the 45° direction.

      We have now incorporated this new analysis in the Results and discussed it in the revised Discussion.

      Citation: Haggard, P., Hutchinson, K., & Stein, J. (1995). Patterns of coordinated multi-joint movement. Experimental Brain Research, 107(2), 254-266.

      b) Additionally, since the taikonauts are tested after 2 or 3 weeks in flight, one could also assume that neuromuscular deconditioning explains (at least in part) the general decrease in movement speed. Can the authors explain how to rule out this alternative interpretation? For instance, weaker muscles could account for slower movements within a classical time-effort trade-off (as more neural effort would be needed to generate a similar amount of muscle force, thereby suggesting a purposive slowing down of movement). Therefore, could the observed results (slowing down + more submovements) be explained by some neuromuscular deconditioning combined with a difficulty in coordinating multi-joint movements in weightlessness (due to a misestimation or Coriolis/centripetal torques) provide an alternative explanation for the results?

      Response (7): Neuromuscular deconditioning is indeed a space effect; thanks for bringing this up as we omitted the discussion of this confounds in our original manuscript. Prolonged stay in microgravity can lead to a reduction of muscle strength, but this is mostly limited to lower limb. For example, a recent well-designed large-sample study have shown that while lower leg muscle showed significant strength reductions, no changes in mean upper body strength was found (Scott et al., 2023), consistent with previous propositions that muscle weakness is less for upper-limb muscles than for postural and lower-limb muscles (Tesch et al., 2005). Furthermore, the muscle weakness is unlikely to play a major role here since our reaching task involves small movements (~12cm) with joint torques of a magnitude of ~2N·m. Of course, we cannot completely rule out the contribution of muscle weakness; we can only postulate, based on the task itself (12 cm reaching) and systematic microgravity effect (the increase in submovements, the increase in the inter-submovements intervals, and their significant prediction on movement slowing), that muscle weakness is an unlikely major contributor for the movement slowing.

      The reviewer suggests that poor coordination in microgravity might contribute to slowing down + more submovements. This is also a possibility, but we did not find evidence to support it. First, there is no clear evidence or reports about poor coordination for simple upper-limb movements like reaching investigated here. Note that reaching or aiming movement is one of the most studied tasks among astronauts. Second, we further analyzed our reaching trajectories and found no sign of curvature increase, a hallmark of poor coordination of Coriolis/centripetal torques, in our large collection of reaching movements. We probably have the largest dataset of reaching movements collected in microgravity thus far, given that we had 12 taikonauts and each of them performed about 480 to 840 reaching trials during their spaceflight. We believe the probability of Type II error is quite low here.

      Citation: Tesch, P. A., Berg, H. E., Bring, D., Evans, H. J., & LeBlanc, A. D. (2005). Effects of 17-day spaceflight on knee extensor muscle function and size. European journal of applied physiology, 93(4), 463-468.

      Scott J, Feiveson A, English K, et al. Effects of exercise countermeasures on multisystem function in long duration spaceflight astronauts. npj Microgravity. 2023;9(11).

      (2) Modelling

      a) The model description should be improved as it is currently a mix of discrete time and continuous time formulations. Moreover, an infinite-horizon cost function is used, but I thought the authors used a finite-horizon formulation with the prefixed duration provided by the movement utility maximization framework of Shadmehr et al. (Curr Biol, 2016). Furthermore, was the mass underestimation reflected both in the utility model and the optimal control model? If so, did the authors really compute the feedback control gain with the underestimated mass but simulate the system with the real mass? This is important because the mass appears both in the utility framework and in the LQ framework. Given the current interpretations, the feedforward command is assumed to be erroneous, and the feedback command would allow for motor corrections. Therefore, it could be clarified whether the feedback command also misestimates the mass or not, which may affect its efficiency. For instance, if both feedforward and feedback motor commands are based on wrong internal models (e.g., due to the mass underestimation), one may wonder how the astronauts would execute accurate goal-directed movements.

      b) The model seems to be deterministic in its current form (no motor and sensory noise). Since the framework developed by Todorov (2005) is used, sensorimotor noise could have been readily considered. One could also assume that motor and sensory noise increase in microgravity, and the model could inform on how microgravity affects the number of submovements or endpoint variance due to sensorimotor noise changes, for instance.

      c) Finally, how does the model distinguish the feedforward and feedback components of the motor command that are discussed in the paper, given that the model only yields a feedback control law? Does 'feedforward' refer to the motor plan here (i.e., the prefixed duration and arguably the precomputed feedback gain)?

      Response (8): We thank the reviewer for raising these important and technically insightful points regarding our modeling framework. We first clarify the structure of the model and key assumptions, and then address the specific questions in points (a)–(c) below.

      We used Todorov’s (2005) stochastic optimal control method to compute a finite-horizon LQG policy under sensory noise and signal-dependent motor noise (state noise set to zero). The cost function is: (see details in updated Methods). The resulting time-varying gains {L<sub>k</sub>, K<sub>k</sub>} correspond to the feedforward mapping and the feedback correction gain, respectively. The control law can be expressed as:

      where u<sub>k</sub> is the control input, is the nominal planned state, is the estimated state, L<sub>k</sub> is the feedforward (nominal) control associated with the planned trajectory, and K<sub>k</sub> is the time-varying feedback gain that corrects deviations from the plan.

      To define the motor plan for comparison with behavior, we simulate the deterministic open-loop

      trajectory by turning off noise and disabling feedback corrections, i.e., . In this framework, “feedforward” refers to this nominal motor plan. Thus, sensory and signal-dependent noise influence the computed policy (via the gains), but are not injected when generating the nominal trajectory. This mirrors the minimum-jerk practice used to obtain nominal kinematics in prior utility-based work (Shadmehr, 2016), while optimal control provides a more physiologically grounded nominal plan. In the revision, we have updated the equations, provided more modeling details, and moved the model description to the main text to reduce possible confusions.

      In the implementation of the “mass underestimation” condition, the mass used to compute the policy is the underestimated mass (), whereas the actual mass is used when simulating the feedforward trajectories. Corrective submovements are analyzed separately and are not required for the planning-deficit findings reported here.

      Answers of the three specific questions:

      a) We mistakenly wrote a continuous-time infinite-horizon cost function in our original manuscript, whereas our controller is actually implemented as a discrete-time finite-horizon LQG with a terminal cost, over a horizon set by the utility-based optimal movement duration T<sub>opt</sub>. The underestimated mass is used in both the utility model (to determine T<sub>opt</sub>) and in the control computation (i.e., internal model), while the true mass is used when simulating the movement. This mismatch captures the central idea of feedforward planning based on an incorrect internal model.

      b) As described, our model includes signal-dependent motor noise and sensory noise, following Todorov (2005). We also evaluated whether increased noise levels in microgravity could account for the observed behavioral changes. Simulation results showed that increasing either source of noise did not alter the main conclusions or reverse the trends in our key metrics. Moreover, our experimental data showed no significant increase in endpoint variability in microgravity (see analyses and results in Figure 2—figure supplement 3 & 4), making it unlikely that increased sensorimotor noise alone accounts for the observed slowing and submovement changes.

      c) In our framework, the time-varying gains {L<sub>K</sub>,K<sub>K</sub>}define the feedforward and feedback components of the control policy. While both gains are computed based on a stochastic optimal control formulation (including noise), for comparison with behavior we simulate only the nominal feedforward plan, by turning off both noise and feedback: . This defines a deterministic open-loop trajectory, which we use to capture planning-level effects such as peak timing shifts under mass underestimation. Feedback corrections via gains exist in the full model but are not involved in these specific analyses. We clarified this modeling choice and its behavioral relevance in the revised text.

      We have updated the equations and moved the model description into the main text in the revised manuscript to avoid confusion.

      (3) Brevity of movements and speed-accuracy trade-off

      The tested movements are much faster (average duration approx. 350 ms) than similar self-paced movements that have been studied in other works (e.g., Wang et al., J Neurophysiology, 2016; Berret et al., PLOS Comp Biol, 2021, where movements can last about 900-1000 ms). This is consistent with the instructions to reach quickly and accurately, in line with a speed-accuracy trade-off. Was this instruction given to highlight the inertial effects related to the arm's anisotropy? One may however, wonder if the same results would hold for slower self-paced movements (are they also with reduced speed compared to Earth performance?). Moreover, a few other important questions might need to be addressed for completeness: how to ensure that astronauts did remember this instruction during the flight? (could the control group move faster because they better remembered the instruction?). Did the taikonauts perform the experiment on their own during the flight, or did one taikonaut assume the role of the experimenter?

      Response (9): Thanks for highlighting the brevity of movements in our experiment. Our intention in emphasizing fast movements is to rigorously test whether movement is indeed slowed down in microgravity. The observed prolonged movement duration clearly shows that microgravity affects people’s movement duration, even when they are pushed to move fast. The second reason for using fast movement is to highlight that feedforward control is affected in microgravity. Mass underestimation specifically affects feedforward control in the first place, shown by the microgravity-related changes in peak velocity/acceleration. Slow movement would inevitably have online corrections that might obscure the effect of mass underestimation. Note that movement slowing is not only observed in our speed-emphasized reaching task, but also in whole-arm pointing in other astronauts’ studies (Berger, 1997; Sangals, 1999), which have been quoted in our paper. We thus believe these findings are generalizable.

      Regarding the consistency of instructions: all our experiments conducted in the Tiangong space station were monitored in real time by experimenters in the control center located in Beijing. The task instructions were presented on the initial display of the data acquisition application and ample reading time was allowed. All the pre-, in-, and post-flight test sessions were administered by the same group of personnel with the same instruction. It is common that astronauts serve both as participants and experimenters at the same time. And, they were well trained for this type of role on the ground. Note that we had multiple pre-flight test sessions to familiarize them with the task. All these rigorous measures were in place to obtain high-quality data. In the revision, we included these experimental details for readers that are not familiar with space studies, and provided the rationales for emphasizing fast movements.

      Citations:

      Berger, M., Mescheriakov, S., Molokanova, E., Lechner-Steinleitner, S., Seguer, N., & Kozlovskaya, I. (1997). Pointing arm movements in short- and long-term spaceflights. Aviation, Space, and Environmental Medicine, 68(9), 781–787.

      Sangals, J., Heuer, H., Manzey, D., & Lorenz, B. (1999). Changed visuomotor transformations during and after prolonged microgravity. Experimental Brain Research. Experimentelle Hirnforschung. Experimentation Cerebrale, 129(3), 378–390.

      (4) No learning effect

      This is a surprising effect, as mentioned by the authors. Other studies conducted in microgravity have indeed revealed an optimal adaptation of motor patterns in a few dozen trials (e.g., Gaveau et al., eLife, 2016). Perhaps the difference is again related to single-joint versus multi-joint movements. This should be better discussed given the impact of this claim. Typically, why would a "sensory bias of bodily property" persist in microgravity and be a "fundamental constraint of the sensorimotor system"?

      Response (10): We believe that the presence or absence of adaptation between our study and Gaveau et al.’s study cannot be simply attributed to single-joint versus multi-joint movements. Their adaptation concerned incorporating microgravity into movement control to minimize effort, whereas ours concerned accurately perceiving body mass. Gaveau et al.’s task involved large-amplitude vertical reaching, a scenario in which gravity strongly affects joint torques and movement execution. Thus, adaptation to microgravity can lead to better execution, providing a strong incentive for learning. By contrast, our task consisted of small-amplitude horizontal movements, where the gravitational influence on biomechanics is minimal.

      More importantly, we believe the lack of adaptation for mass underestimation is not totally surprising. When an inertial change is perceived (such as an extra weight attached to the forearm, as in previous motor adaptation studies), people can adapt their reaching within tens of trials. In that case, sensory cues are veridical, as they correctly signal the inertial perturbation. However, in microgravity, reduced gravitational pull and proprioceptive inputs constantly inform the controller that the body mass is less than its actual magnitude. In other words, sensory cues in space are misleading for estimating body mass. The resulting sensory bias prevents the sensorimotor system from adapting. Our initial explanation on this matter was too brief; we expanded it in the revised Discussion.

      Reviewer #3 (Public review):

      Summary:

      The authors describe an interesting study of arm movements carried out in weightlessness after a prolonged exposure to the so-called microgravity conditions of orbital spaceflight. Subjects performed radial point-to-point motions of the fingertip on a touch pad. The authors note a reduction in movement speed in weightlessness, which they hypothesize could be due to either an overall strategy of lowering movement speed to better accommodate the instability of the body in weightlessness or an underestimation of body mass. They conclude for the latter, mainly based on two effects. One, slowing in weightlessness is greater for movement directions with higher effective mass at the end effector of the arm. Two, they present evidence for an increased number of corrective submovements in weightlessness. They contend that this provides conclusive evidence to accept the hypothesis of an underestimation of body mass.

      Strengths:

      In my opinion, the study provides a valuable contribution, the theoretical aspects are well presented through simulations, the statistical analyses are meticulous, the applicable literature is comprehensively considered and cited, and the manuscript is well written.

      Weaknesses:

      Nevertheless, I am of the opinion that the interpretation of the observations leaves room for other possible explanations of the observed phenomenon, thus weakening the strength of the arguments.

      First, I would like to point out an apparent (at least to me) divergence between the predictions and the observed data. Figures 1 and S1 show that the difference between predicted values for the 3 movement directions is almost linear, with predictions for 90º midway between predictions for 45º and 135º. The effective mass at 90º appears to be much closer to that of 45º than to that of 135º (Figure S1A). But the data shown in Figure 2 and Figure 3 indicate that movements at 90º and 135º are grouped together in terms of reaction time, movement duration, and peak acceleration, while both differ significantly from those values for movements at 45º.

      Furthermore, in Figure 4, the change in peak acceleration time and relative time to peak acceleration between 1g and 0g appears to be greater for 90º than for 135º, which appears to me to be at least superficially in contradiction with the predictions from Figure S1. If the effective mass is the key parameter, wouldn't one expect as much difference between 90º and 135º as between 90º and 45º? It is true that peak speed (Figure 3B) and peak speed time (Figure 4B) appear to follow the ordering according to effective mass, but is there a mathematical explanation as to why the ordering is respected for velocity but not acceleration? These inconsistencies weaken the author's conclusions and should be addressed.

      Response (11): Indeed, the model predicts an almost equal separation between 45° and 90° and between 90° and 135°, while the data indicate that the spacing between 45° and 90° is much smaller than between 90° and 135°. We do not regard the divergence as evidence undermining our main conclusion since 1) the model is a simplification of the actual situation. For example, the model simulates an ideal case of moving a point mass (effective mass) without friction and without considering Coriolis and centripetal torques. 2) Our study does not make quantitative predictions of all the key kinematic measures; that will require model fitting, parameter estimation, and posture-constrained reaching experiments; instead, our study uses well-established (though simplified) models to qualitatively predict the overall behavioral pattern we would observe. For this purpose, our results are well in line with our expectations: though we did not find equal spacing between direction conditions, we do confirm that the key kinematic measures (Figure 2 and Figure 3 as questioned) show consistent directional trends between model predictions and empirical data. We added new analysis results on this matter: the directional effect we observed (how the key measures changed in microgravity across direction condition) is significantly correlated with our model predictions in most cases. Please check our detailed response (2) above. These results are also added in the revision.

      We also highlight in the revision that our modeling is not to quantitatively predict reaching behaviors in space, but to qualitatively prescribe that how mass underestimation, but not the conservative control strategy, can lead to divergent predictions about key kinematic measures of fast reaching.

      Then, to strengthen the conclusions, I feel that the following points would need to be addressed:

      (1) The authors model the movement control through equations that derive the input control variable in terms of the force acting on the hand and treat the arm as a second-order low-pass filter (Equation 13). Underestimation of the mass in the computation of a feedforward command would lead to a lower-than-expected displacement to that command. But it is not clear if and how the authors account for a potential modification of the time constants of the 2nd order system. The CNS does not effectuate movements with pure torque generators. Muscles have elastic properties that depend on their tonic excitation level, reflex feedback, and other parameters. Indeed, Fisk et al. showed variations of movement characteristics consistent with lower muscle tone, lower bandwidth, and lower damping ratio in 0g compared to 1g. Could the variations in the response to the initial feedforward command be explained by a misrepresentation of the limbs' damping and natural frequency, leading to greater uncertainty about the consequences of the initial command? This would still be an argument for unadapted feedforward control of the movement, leading to the need for more corrective movements. But it would not necessarily reflect an underestimation of body mass.

      Fisk, J. O. H. N., Lackner, J. R., & DiZio, P. A. U. L. (1993). Gravitoinertial force level influences arm movement control. Journal of neurophysiology, 69(2), 504-511.

      Response (12): We agree that muscle properties, tonic excitation level, proprioception-mediated reflexes all contribute to reaching control. Fisk et al. (1993) study indeed showed that arm movement kinematics change, possibly owing to lower muscle tone and/or damping. However, reduced muscle damping and reduced spindle activity are more likely to affect feedback-based movements. Like in Fisk et al.’s study, people performed continuous arm movements with eyes closed; thus their movements largely relied on proprioceptive control. Our major findings are about the feedforward control, i.e., the reduced and “advanced” peak velocity/acceleration in discrete and ballistic reaching movements. Note that the peak acceleration happens as early as approximately 90-100ms into the movements, clearly showing that feedforward control is affected -- a different effect from Fisk et al’s findings. It is unlikely that people “advanced” their peak velocity/acceleration because they feel the need for more later corrective movements. Thus, underestimation of body mass remains the most plausible explanation.

      (2) The movements were measured by having the subjects slide their finger on the surface of a touch screen. In weightlessness, the implications of this contact are expected to be quite different than those on the ground. In weightlessness, the taikonauts would need to actively press downward to maintain contact with the screen, while on Earth, gravity will do the work. The tangential forces that resist movement due to friction might therefore be different in 0g. This could be particularly relevant given that the effect of friction would interact with the limb in a direction-dependent fashion, given the anisotropy of the equivalent mass at the fingertip evoked by the authors. Is there some way to discount or control for these potential effects?

      Response (13): We agree that friction might play a role here, but normal interaction with a touch screen typically involves friction between 0.1N and 0.5N (e.g., Ayyildiz et al., 2018). We believe that the directional variation of the friction is even smaller than 0.1N. It is very small compared to the force used to accelerate the arm for the reaching movement (10N-15N). Thus, friction anisotropy is unlikely to explain our data. Indeed, our readers might have the same concern, we thus added some discussion about possible effect of friction.

      Citation: Ayyildiz M, Scaraggi M, Sirin O, Basdogan C, Persson BNJ. Contact mechanics between the human finger and a touchscreen under electroadhesion. Proc Natl Acad Sci U S A. 2018 Dec 11;115(50):12668-12673.

      (3) The carefully crafted modelling of the limb neglects, nevertheless, the potential instability of the base of the arm. While the taikonauts were able to use their left arm to stabilize their bodies, it is not clear to what extent active stabilization with the contralateral limb can reproduce the stability of the human body seated in a chair in Earth gravity. Unintended motion of the shoulder could account for a smaller-than-expected displacement of the hand in response to the initial feedforward command and/or greater propensity for errors (with a greater need for corrective submovements) in 0g. The direction of movement with respect to the anchoring point could lead to the dependence of the observed effects on movement direction. Could this be tested in some way, e.g., by testing subjects on the ground while standing on an unstable base of support or sitting on a swing, with the same requirement to stabilize the torso using the contralateral arm?

      Response (14): Body stabilization is always a challenge for human movement studies in space. We minimized its potential confounding effects by using left-hand grasping and foot straps for postural support throughout the experiment. We think shoulder stability is an unlikely explanation because unexpected shoulder instability should not affect the feedforward (early) part of the ballistic reaching movement: the reduced peak acceleration and its early peak were observed at about 90-100ms after movement initiation. This effect is too early to be explained by an expected stability issue. This argument is now mentioned in the revised Discussion.

      The arguments for an underestimation of body mass would be strengthened if the authors could address these points in some way.

      Recommendations for the authors:

      Reviewing Editor Comments:

      General recommendation

      Overall, the reviewers agreed this is an interesting study with an original and strong approach. Nonetheless, there were significant weaknesses identified. The main criticism is that there is insufficient evidence for the claim that the movement slowing is due to mass underestimation, rather than other explanations for the increased feedback corrections. To bolster this claim, the reviewers have requested a deeper quantitative analysis of the directional effect and comparison to model predictions. They have also suggested that a 2-dof arm model could be used to predict how mass underestimation would influence multi-joint kinematics, and this should be compared to the data. Alternatively, or additionally, a control experiment could be performed (described in the reviews). We do realize that some of these options may not be feasible or practical. Ultimately, we leave it to you to determine how best to strengthen and solidify the argument for mass underestimation, rather than other causes.

      As an alternative approach, you could consider tempering the claim regarding mass underestimation and focus more on the result that slower movements in microgravity are not simply a feedforward, rescaling of the movement trajectories, but rather, have greater feedback corrections. In this case, the reviewers feel it would still be critical to explain and discuss potential reasons for the corrections beyond mass underestimation.

      We hope that these points are addressable, either with new analyses, experiments, or with a tempering of the claims. Addressing these points would help improve the eLife assessment.

      Reviewer #1 (Recommendations for the authors):

      (1) Move model descriptions to the main text to present modelling choices in more detail

      Response (15): Thank you for the suggestion. We have moved the model descriptions to the main text to present the modeling choices in more detail and to allow readers to better cross-reference the analyses.

      (2) Perform quantitative comparisons of the directional effect with the model's predictions, and add raw kinematic traces to illustrate the effect in more detail.

      Response (16): Thanks for the suggestion, we have added the raw kinematics figure from a representative participant and please refer to Response (2) above for the comparisons of directional effect.

      (3) Explore the effect of varying cost parameters in addition to mass estimation error to estimate the proportion of data explained by the underestimation hypothesis.

      Response (17): Thank you for the suggestion. This has already been done—please see Response (1) above.

      Reviewer #2 (Recommendations for the authors):

      Minor comments:

      (1) It must be justified early on why reaction times are being analyzed in this work. I understood later that it is to rule out any global slowing down of behavioral responses in microgravity.

      Response (18): Exactly, RT results are informative about the absence of a global slowing down. Contrary to the conservative-strategy hypothesis, taikonauts did not show generalized slowing; they actually had faster reaction times during spaceflight, incompatible with a generalized slowing strategy. Thanks for point out; we justified that early in the text.

      (2) Since the results are presented before the methods, I suggest stressing from the beginning that the reaching task is performed on a tablet and mentioning the instructions given to the participants, to improve the reading experience. The "beep" and "no beep" conditions also arise without obvious justification while reading the paper.

      Response (19): Great suggestions. We now give out some experimental details and rationales at the beginning of Results.

      (3) Figure 1C: The vel profiles are not returning to 0 at the end, why? Is it because the feedback gain is computed based on the underestimated mass or because a feedforward controller is applied here? Is it compatible with the experimental velocity traces?

      Response (20): Figure. 1C shows the forward simulation under the optimal control policy. In our LQG formulation the terminal velocity is softly penalized (finite weight) rather than hard-constrained to zero; with a fixed horizon° the optimal solution can therefore end with a small residual velocity.

      In the behavioral data, the hand does come to rest: this is achieved by corrective submovements during the homing phase.

      (4) Left-skewed -> I believe this is right-skewed since the peak velocity is earlier.

      Response (21): Yes, it should be right-skewed, thanks for point that out.

      (5) What was the acquisition frequency of the positional data points? (on the tablet).

      Response (22): The sampling frequency is 100 Hz. Thanks for pointing that out; we’ve added this information to the Methods.

      (6) Figure S1. The planned duration seems to be longer than in the experiment (it is more around 500 ms for the 135-degree direction in simulation versus less than 400 ms in the experiment). Why?

      Response (23): We apologize for a coding error that inadvertently multiplied the body-mass parameter by an extra factor, making the simulated mass too high. We have corrected the code, rerun the simulations, and updated Figures 1 and S1; all qualitative trends remain unchanged, and the revised movement durations (≈300–400 ms) are closer to the experimental values.

      (7) After Equation 13: "The control law is given by". This is not the control law, which should have a feedback form u=K*x in the LQ framework. This is just the dynamic equations for the auxiliary state and the force. Please double-check the model description.

      Response (24): Thank you for point this out. We have updated and refined all model equations and descriptions, and moved the model description from the Supplementary Materials to the main text; please see the revised manuscript.

      Reviewer #3 (Recommendations for the authors):

      (1) I have a concern about the interpretation of the anisotropic "equivalent mass". From my understanding, the equivalent mass would be what an external actor would feel as an equivalent inertia if pushing on the end effector from the outside. But the CNS does not push on the arm with a pure force generator acting at the hand to effectuate movement. It applies torque around the joints by applying forces across joints with muscles, causing the links of the arm to rotate around the joints. If the analysis is carried out in joint space, is the effective rotational inertia of the arm also anisotropic with respect to the direction of the movement of the hand? In other words, can the authors reassure me that the simulations are equivalent to an underestimation of the rotational inertia of the links when applied to the joints of the limb? It could be that these are mathematically the same; I have not delved into the mathematics to convince myself either way. But I would appreciate it if the authors could reassure me on this point.

      Response (25): Thank you for raising this point. In our work, “equivalent mass” denotes the operational-space inertia projected along the hand-movement direction u, computed as:

      This formulation describes the effective mass perceived at the end effector along a given direction, and is standard in operational-space control.

      Although the motor command can be coded as either torque/force in the CNS, the actual executions are equivalent no matter whether it is specified as endpoint forces or joint torques, since force and torque are related by . For small excursions as investigated here, this makes the directional anisotropy in endpoint inertia consistent with the anisotropy of the effective joint-space inertia required to produce the same endpoint motion. Conceptually, therefore, our “mass underestimation” manipulation in operational space corresponds to underestimating the required joint-space inertia mapped through the Jacobian. Since our behavioral data are hand positions, using the operational-space representation is the most direct and appropriate way for modeling.

      (2) I would also like to suggest one more level of analysis to test their hypothesis. The authors decomposed the movements into submovements and measured the prevalence of corrective submovements in weightlessness vs. normal gravity. The increase in corrective submovements is consistent with the hypothesis of a misestimation of limb mass, leading to an unexpectedly smaller displacement due to the initial feedforward command, leading to the need for corrections, leading to an increased overall movement duration. According to this hypothesis, however, the initial submovement, while resulting in a smaller than expected displacement, should have the same duration as the analogous movements performed on Earth. The authors could check this by analyzing the duration of the extracted initial submovements.

      Response (26): We appreciate the reviewer’s suggestion regarding the analysis of the initial submovement duration. In our decomposition framework, each submovement is modeled as a symmetric log-normal (bell-shaped) component, such that the time to peak speed is always half of the component duration. Thus, the initial submovement duration is directly reflected in the initial submovement peak-speed time already reported in our original manuscript (Figure. 5F).

      However, we respectfully disagree with the assumption that mass underestimation would necessarily yield the same submovement duration as on Earth. Under mass underestimation, the movement is effectively under-actuated, and the initial submovement can terminate prematurely, leading to a shorter duration. This is indeed what we observed in the data. Therefore, our reported metrics already address the reviewer’s proposal and support the conclusion that mass underestimation reduces the initial submovement duration in microgravity. Per your suggestion, we now added one more sentence to explain to the reader that initial submovement peak-speed time reflect the duration of the initial submovement.

      Some additional minor suggestions:

      (1) I believe that it is important to include the data from the control subjects, in some form, in the main article. Perhaps shading behind the main data from the taikonauts to show similarities or differences between groups. It is inconvenient to have to go to the supplementary material to compare the two groups, which is the main test of the experiment.

      Response (27): Thank you for the suggestion. For all the core performance variables, the control group showed flat patterns, with no changes across test sessions at all. Thus, including these figures (together with null statistical results) in the main text would obscure our central message, especially given the expanded length of the revised manuscript (we added model details and new analysis results). Instead, following eLife’s format, we have reorganized the Supplementary Material so that each experimental figure has a corresponding supplementary figure showing the control data. This way, readers can quickly locate the control results and directly compare them with the experimental data, while keeping the main text focused.

      (2) "Importantly, sensory estimate of bodily property in microgravity is biased but evaded from sensorimotor adaptation, calling for an extension of existing theories of motor learning." Perhaps "immune from" would be a better choice of words.

      Response (28): Thanks for the suggestion, we edited our text accordingly.

      (3) "First, typical reaching movement exhibits a symmetrical bell-shaped speed profile, which minimizes energy expenditure while maximizing accuracy according to optimal control principles (Todorov, 2004)." While Todorov's analysis is interesting and well accepted, it might be worthwhile citing the original source on the phenomenon of bell-shaped velocity profiles that minimize jerk (derivative of acceleration) and therefore, in some sense, maximize smoothness. Flash and Hogan, 1985.

      Response (29): Thanks for the suggestion, we added the citation of minimum jerk.

      (4) "Post-hoc analyses revealed slower reaction times for the 45° direction compared to both 90° (p < 0.001, d = 0.293) and 135° (p = 0.003, d = 0.284). Notably, reactions were faster during the in-flight phase compared to pre-flight (p = 0.037, d = 0.333), with no significant difference between in-flight and post-flight phases (p = 0.127)." What can one conclude from this?

      Response (30): Although these decreases reached statistical significance, their magnitudes were small. The parallel pattern across groups suggests the effect is not driven by microgravity, but is more plausibly a mild learning/practice effect. We now mentioned this in the Discussion.

      (5) "In line with predictions, peak acceleration appeared significantly earlier in the 45° direction than other directions (45° vs. 90°, p < 0.001, d = 0.304; 45° vs. 135°, p < 0.001, d = 0.271)." Which predictions? Because the effective mass is greater at 45º? Could you clarify the prediction?

      Response (31): We should be more specific here; thank you for raising this. The predictions are the ones about peak acceleration timing (shown in Fig. 1H). We now modified this sentence as:

      “In line with model predictions (Figure 1H), ….”.

      (6) Figure 2: Why do 45º movements have longer reaction times but shorter movement durations?

      Response (32): Appreciate your careful reading of the results. We believe this is possibly due to flexible motor control across conditions and trials, i.e., people tend to move faster when people react slower with longer reaction time. This has been reflected in across-direction comparisons (as spotted by the reviewer here), and it has also been shown within participant and across participants: For both groups, we found a significant negative correlation between movement duration (MD) and reaction time (RT), both across and within individuals (Figure 2—figure supplement 5). This finding indicates that participants moved faster when their RT was slower, and vice versa. This flexible motor adjustment, likely due to the task requirement for rapid movements, remained consistent during spaceflight.

    1. eLife Assessment

      In this useful study, the authors conducted an impressive amount of atomistic simulations with a realistic asymmetric lipid bilayer to probe how the HIV-1 envelope glycoprotein (Env) transmembrane domain, cytoplasmic tail, and membrane environment influence ectodomain orientation and antibody epitope exposure. The simulations convincingly show that ectodomain motion is dominated by tilting relative to the membrane and explicitly demonstrate the role of membrane asymmetry in modulating the protein conformation and orientation. However, due to the qualitative nature of the conducted analyses, the evidence for the coupling between membrane-proximal regions and the antigenic surface is considered incomplete. With stronger integration of prior experimental and computational literature, this work has the potential to serve as a reference for how Env behaves in a realistic, glycosylated, membrane-embedded context.

    2. Reviewer #1 (Public review):

      Summary:

      In the manuscript "Conformational Variability of HIV-1 Env Trimer and Viral Vulnerability", the authors study the fully glycosylated HIV-1 Env protein using an all-atom forcefield. It combines long all-atom simulations of Env in a realistic asymmetric bilayer with careful data analysis. This work clarifies how the CT domain modulates the overall conformation of the Env ectodomain and characterizes different MPER-TMD conformations. The authors also carefully analyze the accessibility of different antibodies to the Env protein.

      Strengths:

      This paper is state-of-the-art, given the scale of the system and the sophistication of the methods. The biological question is important, the methodology is rigorous, and the results will interest a broad audience.

      Weaknesses:

      The manuscript lacks a discussion of previous studies. The authors should consider addressing or comparing their work with the following points:

      (1) Tilting of the Env ectodomain has also been reported in previous experimental and theoretical work:

      https://doi.org/10.1101/2025.03.26.645577

      (2) A previous all-atom simulation study has characterized the conformational heterogeneity of the MPER-TMD domain:

      https://doi.org/10.1021/jacs.5c15421

      (3) Experimental studies have shown that MPER-directed antibodies recognize the prehairpin intermediate rather than the prefusion state:

      https://doi.org/10.1073/pnas.1807259115

      (4) How does the CT domain modulate the accessibility of these antibodies studied? The authors are in a strong position to compare their results with the following experimental study:

      https://doi.org/10.1126/science.aaa9804

    3. Reviewer #2 (Public review):

      (1) Summary

      In this work, the authors aim to elucidate how a viral surface protein behaves in a membrane environment and how its large-scale motions influence the exposure of antibody-binding sites. Using long-timescale, all-atom molecular dynamics simulations of a fully glycosylated, full-length protein embedded in a virus-like membrane, the study systematically examines the coupling between ectodomain motion, transmembrane orientation, membrane interactions, and epitope accessibility. By comparing multiple model variants that differ in cleavage state, initial transmembrane configuration, and presence of the cytoplasmic tail, the authors aim to identify general features of protein-membrane dynamics relevant to antibody recognition.

      (2) Strengths

      A major strength of this study is the scope and ambition of the simulations. The authors perform multiple microsecond-scale simulations of a highly complex, biologically realistic system that includes the full ectodomain, transmembrane region, cytoplasmic tail, glycans, and a heterogeneous membrane. Such simulations remain technically challenging, and the work represents a substantial computational and methodological effort.

      The analysis provides a clear and intuitive description of large-scale protein motions relative to the membrane, including ectodomain tilting and transmembrane orientation. The finding that the ectodomain explores a wide range of tilt angles while the transmembrane region remains more constrained, with limited correlation between the two, offers useful conceptual insight into how global motions may be accommodated without large rearrangements at the membrane anchor.

      Another strength is the explicit consideration of membrane and glycan steric effects on antibody accessibility. By evaluating multiple classes of antibodies targeting distinct regions of the protein, the study highlights how membrane proximity and glycan dynamics can differentially influence access to different epitopes. This comparative approach helps place the results in a broader immunological context and may be useful for readers interested in antibody recognition or vaccine design.

      Overall, the results are internally consistent across multiple simulations and model variants, and the conclusions are generally well aligned with the data presented.

      (3) Weaknesses

      The main limitations of the study relate to sampling and model dependence, which are inherent challenges for simulations of this size and complexity. Although the simulations are long by current standards, individual trajectories explore only portions of the available conformational space, and several conclusions rely on pooling data across a limited number of replicas. This makes it difficult to fully assess the robustness of some quantitative trends, particularly for rare events such as specific epitope accessibility states.

      In addition, several aspects of the model construction, including the treatment of missing regions, loop rebuilding, and initial configuration choices, are necessarily approximate. While these approaches are reasonable and well motivated, the extent to which some conclusions depend on these modeling choices is not always fully clear from the current presentation.

      Finally, the analysis of antibody accessibility is based on geometric and steric criteria, which provide a useful first-order approximation but do not capture potential conformational adaptations of antibodies or membrane remodeling during binding. As a result, the accessibility results should be interpreted primarily as model-based predictions rather than definitive statements about binding competence.

      Despite these limitations, the study provides a valuable and carefully executed contribution, and its datasets and analytical framework are likely to be useful to others interested in protein-membrane interactions and antibody recognition.

    4. Reviewer #3 (Public review):

      Summary:

      This study uses large-scale all-atom molecular dynamics simulations to examine the conformational plasticity of the HIV-1 envelope glycoprotein (Env) in a membrane context, with particular emphasis on how the transmembrane domain (TMD), cytoplasmic tail (CT), and membrane environment influence ectodomain orientation and antibody epitope exposure. By comparing Env constructs with and without the CT, explicitly modeling glycosylation, and embedding Env in an asymmetric lipid bilayer, the authors aim to provide an integrated view of how membrane-proximal regions and lipid interactions shape Env antigenicity, including epitopes targeted by MPER-directed antibodies.

      Strengths:

      A key strength of this work is the scope and realism of the simulation systems. The authors construct a very large, nearly complete Env-scale model that includes a glycosylated Env trimer embedded in an asymmetric bilayer, enabling analysis of membrane-protein interactions that are difficult to capture experimentally. The inclusion of specific glycans at reported sites, and the focus on constructs with and without the CT, are well motivated by existing biological and structural data.

      The simulations reveal substantial tilting motions of the ectodomain relative to the membrane, with angles spanning roughly 0-30{degree sign} (and up to ~50{degree sign} in some analyses), while the ectodomain itself remains relatively rigid. This framing, that much of Env's conformational variability arises from rigid-body tilting rather than large internal rearrangements, is an important conceptual contribution. The authors also provide interesting observations regarding asymmetric bilayer deformations, including localized thinning and altered lipid headgroup interactions near the TMD and CT, which suggest a reciprocal coupling between Env and the surrounding membrane.

      The analysis of antibody-relevant epitopes across the prefusion state, including the V1/V2 and V3 loops, the CD4 binding site, and the MPER, is another strength. The study makes effective use of existing experimental knowledge in this context, for example, by focusing on specific glycans known to occlude antibody binding, to motivate and interpret the simulations.

      Weaknesses:

      While the simulations are technically impressive, the manuscript would benefit from more explicit cross-validation against prior experimental and computational work throughout the Results and Discussion, and better framing in the introduction. Many of the reported behaviors, such as ectodomain tilting, TMD kinking, lipid interactions at helix boundaries, and aspects of membrane deformation, have been described previously in a range of MD studies of HIV Env and related constructs (e.g., PMC2730987, PMC2980712, PMC4254001, PMC4040535, PMC6035291, PMC12665260, PMID: 33882664, PMC11975376). Clearly situating the present results relative to these studies would strengthen the paper by clarifying where the simulations reproduce established behavior and where they extend it to more complete or realistic systems.

      A related limitation is that the work remains largely descriptive with respect to conformational coupling. Numerous experimental studies have demonstrated functional and conformational coupling between the TMD, CT, and the antigenic surface, with effects on Env stability, infectivity, and antibody binding (e.g., PMC4701381, PMC4304640, PMC5085267). In this context, the statement that ectodomain and TMD tilting motions are independent is a strong conclusion that is not fully supported by the analyses presented, particularly given the authors' acknowledgment that multiple independent simulations are required to adequately sample conformational space. More direct analyses of coupling, rather than correlations inferred from individual trajectories, would help align the simulations with the existing experimental literature. Given the scale of these simulations, a more thorough analysis of coupling could be this paper's most seminal contribution to the field.

      The choice of membrane composition also warrants deeper discussion. The manuscript states that it relies on a plasma membrane model derived from a prior simulation-based study, which itself is based on host plasma membrane (PMID: 35167752), but experimental analyses have shown that HIV virions differ substantially from host plasma membranes (e.g., PMC46679, PMC1413831, PMC10663554, PMC5039752, PMC6881329). In particular, virions are depleted in PC, PE, and PI, and enriched in phosphatidylserine, sphingomyelins, and cholesterol. These differences are likely to influence bilayer thickness, rigidity, and lipid-protein interactions and, therefore, may affect the generality of the conclusions regarding Env dynamics and antigenicity. Notably, the citation provided for membrane composition is a laboratory self-citation, a secondary source, rather than a primary experimental study on plasma membrane composition.

      Finally, there are pervasive issues with citation and methodological clarity. Several structural models are referred to only by PDB ID without citation, and in at least one case, a structure described as cryo-EM is in fact an NMR-derived model. Statements regarding residue flexibility, missing regions in structures, and comparisons to prior dynamics studies are often presented without appropriate references. The Methods section also lacks sufficient detail for a system of this size and complexity, limiting readers' ability to assess robustness or reproducibility.

      With stronger integration of prior experimental and computational literature, this work has the potential to serve as a valuable reference for how Env behaves in a realistic, glycosylated, membrane-embedded context. The simulation framework itself is well-suited for future studies incorporating mutations, strain variation, antibodies, inhibitors, or receptor and co-receptor engagement. In its current form, the primary contribution of the study is to consolidate and extend existing observations within a single, large-scale model, providing a useful platform for future mechanistic investigations.

    5. Author response:

      In response to the comments raised, we outline below the revisions we plan to strengthen the manuscript.

      First, we will expand the Introduction and Discussion sections to provide clearer comparison with prior experimental and computational studies of ectodomain tilting, MPER–TMD conformational heterogeneity, and membrane deformation, and to discuss how our simulations reproduce and extend these earlier observations.

      Second, we plan to add analyses that more directly assess the coupling between ectodomain and TMD motions. We will also revise the text to emphasize the limits imposed by sampling and model dependence and to discuss the potential benefits of enhanced sampling methods.

      Third, we will clarify the rationale for the chosen membrane composition and discuss how differences in lipid content between host plasma membranes and HIV virions may influence bilayer properties and Env dynamics.

      Fourth, we will supplement the Methods section to improve clarity and address issues of citation throughout the manuscript.

      Finally, we intend to deposit MD trajectories to a public research data repository to the extent permitted by available storage capacity.

    1. eLife Assessment

      This valuable study uses NAD(P)H fluorescence lifetime imaging (FLIM) to map metabolic states in the Drosophila brain. The authors reveal subtype-specific metabolic profiles in Kenyon cells and report learning-related changes, supported by solid evidence and careful methodology. However, the FLIM shifts observed after memory formation in α/β neurons are small and only weakly significant, so the ability of FLIM to detect subtle physiological changes still requires further validation. Nevertheless, this work provides a strong starting point and demonstrates the promising potential of FLIM for probing neural metabolism in vivo.

    2. Reviewer #1 (Public review):

      Summary:

      The authors present a novel usage of fluorescence life-time imaging microscopy (FLIM) to measure NAD(P)H autofluorescence in the Drosophila brain, as a proxy for cellular metabolic/redox states. This new method relies on the fact that both NADH and NADPH are autofluorescent, with a different excitation lifetime depending on whether they are free (indicating glycolysis) or protein-bound (indicating oxidative phosphorylation). The authors successfully use this method in Drosophila to measure changes in metabolic activity across different areas of the fly brain, with a particular focus on the main center for associative memory: the mushroom body.

      Strengths:

      The authors have made a commendable effort to explain the technical aspects of the method in accessible language. This clarity will benefit both non-experts seeking to understand the methodology and researchers interested in applying FLIM to Drosophila in other contexts.

      Weaknesses:

      Despite being statistically significant, the learning-induced change in f-free in α/β Kenyon cells is minimal (a decrease from 0.76 to 0.73, with a high variability). It is unclear whether this small effect represents a meaningful shift in neuronal metabolic state.

      Whether this method can be valuable to examine the effects of long-term memory (after spaced or massed conditioning) remains to be established.

    3. Reviewer #2 (Public review):

      This revised manuscript presents a valuable application of NAD(P)H fluorescence lifetime imaging (FLIM) to study metabolic activity in the Drosophila brain. The authors reveal regional differences in oxidative and glycolytic metabolism, with particular emphasis on the mushroom body, a key center for associative learning and memory. They also report metabolic shifts in α/β Kenyon cells following classical conditioning, in line with their known role in energy-demanding memory processes.

      The study is well-executed and the authors have added more detailed methodological descriptions in this version, which strengthen the technical contribution. The analysis pipeline is rigorous, with careful curve fitting and appropriate controls. However, the metabolic shifts observed after conditioning are small and only weakly significant, raising questions about the sensitivity of FLIM for detecting subtle physiological changes. The authors acknowledge these limitations in the revised discussion, which helps place the findings in proper context.

      Despite this, the work provides a solid foundation for future applications of label-free FLIM in vivo and serves as a valuable technical resource for researchers interested in neural metabolism. Overall, this study represents a meaningful step toward integrating metabolic imaging with the study of neural activity and cognitive function.

    4. Reviewer #3 (Public review):

      This study investigates the characteristics of the autofluorescence signal excited by 740 nm 2-photon excitation, in the range of 420-500 nm, across the Drosophila brain. The fluorescence lifetime (FL) appears bi-exponential, with a short 0.4 ns time constant followed by a longer decay. The lifetime decay and the resulting parameter fits vary across the brain. The resulting maps reveal anatomical landmarks, which simultaneous imaging of genetically encoded fluorescent proteins help identify. Past work has shown that the autofluorescence decay time course reflects the balance of the redox enzyme NAD(P)H vs. its protein bound form. The ratio of free to bound NADPH is thought to indicate relative glycolysis vs. oxidative phosphorylation, and thus shifts in the free-to-bound ratio may indicate shifts in metabolic pathways. The basics of this measure have been demonstrated in other organisms, and this study is the first to use the FLIM module of the STELLARIS 8 FALCON microscope from Leica to measure autofluorescence lifetime in the brain of the fly. Methods include registering brains of different flies to a common template and masking out anatomical regions of interest using fluorescence proteins.

      The analysis relies on fitting a FL decay model with two free parameters, f_free and T_bound. F_free is the fraction of the normalized curve contributed by a decaying exponential with a time constant 0.4 ns, thought to represent the FL of free NADPH or NADH, which apparently cannot be distinguished. T_bound is the time constant of the second exponential, with scalar amplitude = (1-f_free). The T_bound fit is thought to represent the decay time constant of protein bound NADPH, but can differ depending on the protein. The study shows that across the brain, T_bound can range from 0 to >5 ns, whereas f_free can range from 0.5 to 0.9 ns (Figure 1a). The paper beautifully lays out the analysis pipeline, providing a valuable resource. The full range of fits are reported, including maximum likelihood quality parameters, and can be benchmarks for future studies.

      The authors measure properties of NADPH related autofluorescence of Kenyon Cells (KCs) of the fly mushroom body. The somata and calyx of mushroom bodies have a longer average tau_bound than other regions (Figure 1e); the f_free fit is higher for the calyx (input synapses) region than for KC somata; and the average across flies of average f_free fits in alpha/beta KC somata decreases slightly following paired presentation of odor and shock, compared to unpaired presentation of the same stimuli. Though the change is slight, no comparable change is detected in gamma KCs, suggesting that distributions of f_free derived from FL may be sensitive enough to measure changes in metabolic pathways following conditioning.

      FLIM as a method is not yet widely prevalent in fly neuroscience, but recent demonstrations of its potential are likely to increase its use. Future efforts will benefit from the description of the properties of the autofluorescence signal to evaluate how autofluorescence may impact measures of FL of genetically engineered indicators.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors present a novel usage of fluorescence lifetime imaging microscopy (FLIM) to measure NAD(P)H autofluorescence in the Drosophila brain, as a proxy for cellular metabolic/redox states. This new method relies on the fact that both NADH and NADPH are autofluorescent, with a different excitation lifetime depending on whether they are free (indicating glycolysis) or protein-bound (indicating oxidative phosphorylation). The authors successfully use this method in Drosophila to measure changes in metabolic activity across different areas of the fly brain, with a particular focus on the main center for associative memory: the mushroom body.

      Strengths:

      The authors have made a commendable effort to explain the technical aspects of the method in accessible language. This clarity will benefit both non-experts seeking to understand the methodology and researchers interested in applying FLIM to Drosophila in other contexts.

      Weaknesses:

      (1) Despite being statistically significant, the learning-induced change in f-free in α/β Kenyon cells is minimal (a decrease from 0.76 to 0.73, with a high variability). The authors should provide justification for why they believe this small effect represents a meaningful shift in neuronal metabolic state.

      We agree with the reviewer that the observed f_free shift averaged per individual, while statistically significant, is small. However, to our knowledge, this is the first study to investigate a physiological (i.e., not pharmacologically induced) variation in neuronal metabolism using FLIM. As such, there are no established expectations regarding the amplitude of the effect. In the revised manuscript, we have included an additional experiment involving the knockdown of ALAT in α/β Kenyon cells, which further supports our findings. We have also expanded the discussion to expose two potential reasons why this effect may appear modest.

      (2) The lack of experiments examining the effects of long-term memory (after spaced or massed conditioning) seems like a missed opportunity. Such experiments could likely reveal more drastic changes in the metabolic profiles of KCs, as a consequence of memory consolidation processes.

      We agree with the reviewer that investigating the effects of long-term memory on metabolism represent a valuable future path of investigation. An intrinsic caveat of autofluorescence measurement, however, is to identify the cellular origin of the observed changes. To this respect, long-term memory formation is not an ideal case study as its essential feature is expected to be a metabolic activation localized to Kenyon cells’ axons in the mushroom body vertical lobes (as shown in Comyn et al., 2024), where many different neuron subtypes send intricate processes. This is why we chose to first focus on middle-term memory, where changes at the level of the cell bodies could be expected from our previous work (Rabah et al., 2022). But our pioneer exploration of the applicability of NAD(P)H FLIM to brain metabolism monitoring in vivo now paves the way to extending it to the effect of other forms of memory.

      (3) The discussion is mostly just a summary of the findings. It would be useful if the authors could discuss potential future applications of their method and new research questions that it could help address.

      The discussion has been expanded by adding interpretations of the findings and remaining challenges.

      Reviewer #2 (Public review):

      This manuscript presents a compelling application of NAD(P)H fluorescence lifetime imaging (FLIM) to study metabolic activity in the Drosophila brain. The authors reveal regional differences in oxidative and glycolytic metabolism, with a particular focus on the mushroom body, a key structure involved in associative learning and memory. In particular, they identify metabolic shifts in α/β Kenyon cells following classical conditioning, consistent with their established role in energy-demanding middle- and long-term memories.

      These results highlight the potential of label-free FLIM for in-vivo neural circuit studies, providing a powerful complement to genetically encoded sensors. This study is well-conducted and employs rigorous analysis, including careful curve fitting and well-designed controls, to ensure the robustness of its findings. It should serve as a valuable technical reference for researchers interested in using FLIM to study neural metabolism in vivo. Overall, this work represents an important step in the application of FLIM to study the interactions between metabolic processes, neural activity, and cognitive function.

      Reviewer #3 (Public review):

      This study investigates the characteristics of the autofluorescence signal excited by 740 nm 2-photon excitation, in the range of 420-500 nm, across the Drosophila brain. The fluorescence lifetime (FL) appears bi-exponential, with a short 0.4 ns time constant followed by a longer decay. The lifetime decay and the resulting parameter fits vary across the brain. The resulting maps reveal anatomical landmarks, which simultaneous imaging of genetically encoded fluorescent proteins helps to identify. Past work has shown that the autofluorescence decay time course reflects the balance of the redox enzyme NAD(P)H vs. its protein-bound form. The ratio of free-to-bound NADPH is thought to indicate relative glycolysis vs. oxidative phosphorylation, and thus shifts in the free-to-bound ratio may indicate shifts in metabolic pathways. The basics of this measure have been demonstrated in other organisms, and this study is the first to use the FLIM module of the STELLARIS 8 FALCON microscope from Leica to measure autofluorescence lifetime in the brain of the fly. Methods include registering the brains of different flies to a common template and masking out anatomical regions of interest using fluorescence proteins.

      The analysis relies on fitting an FL decay model with two free parameters, f_free and t_bound. F_free is the fraction of the normalized curve contributed by a decaying exponential with a time constant of 0.4 ns, thought to represent the FL of free NADPH or NADH, which apparently cannot be distinguished. T_bound is the time constant of the second exponential, with scalar amplitude = (1-f_free). The T_bound fit is thought to represent the decay time constant of protein-bound NADPH but can differ depending on the protein. The study shows that across the brain, T_bound can range from 0 to >5 ns, whereas f_free can range from 0.5 to 0.9 (Figure 1a). These methods appear to be solid, the full range of fits are reported, including maximum likelihood quality parameters, and can be benchmarks for future studies.

      The authors measure the properties of NADPH-related autofluorescence of Kenyon Cells(KCs) of the fly mushroom body. The results from the three main figures are:

      (1) Somata and calyx of mushroom bodies have a longer average tau_bound than other regions (Figure 1e);

      (2) The f_free fit is higher for the calyx (input synapses) region than for KC somata (Figure 2b);

      (3) The average across flies of average f_free fits in alpha/beta KC somata decreases from 0.734 to 0.718. Based on the first two findings, an accurate title would be "Autofluorecense lifetime imaging reveals regional differences in NADPH state in Drosophila mushroom bodies."

      The third finding is the basis for the title of the paper and the support for this claim is unconvincing. First, the difference in alpha/beta f_free (p-value of 4.98E-2) is small compared to the measured difference in f_free between somas and calyces. It's smaller even than the difference in average soma f_free across datasets (Figure 2b vs c). The metric is also quite derived; first, the model is fit to each (binned) voxel, then the distribution across voxels is averaged and then averaged across flies. If the voxel distributions of f_free are similar to those shown in Supplementary Figure 2, then the actual f_free fits could range between 0.6-0.8. A more convincing statistical test might be to compare the distributions across voxels between alpha/beta vs alpha'/beta' vs. gamma KCs, perhaps with bootstrapping and including appropriate controls for multiple comparisons.

      The difference observed is indeed modest relative to the variability of f_free measurements in other contexts. The fact that the difference observed between the somata region and the calyx is larger is not necessarily surprising. Indeed, these areas have different anatomical compositions that may result in different basal metabolic profiles. This is suggested by Figure 1b which shows that the cortex and neuropile have different metabolic signatures. Differences in average f_free values in the somata region can indeed be observed between naive and conditioned flies. However, all comparisons in the article were performed between groups of flies imaged within the same experimental batches, ensuring that external factors were largely controlled for. This absence of control makes it difficult to extract meaningful information from the comparison between naive and conditioned flies.

      We agree with the reviewer that the choice of the metric was indeed not well justified in the first manuscript. In the new manuscript, we have tried to illustrate the reasons for this choice with the example of the comparison of f_free in alpha/beta neurons between unpaired and paired conditioning (Dataset 8). First, the idea of averaging across voxels is supported by the fact that the distributions of decay parameters within a single image are predominantly unimodal. Examples for Dataset 8 are now provided in the new Sup. Figure 14. Second, an interpretable comparison between multiple groups of distributions is, to our knowledge, not straightforward to implement. It is now discussed in Supplementary information. To measure interpretable differences in the shapes of the distributions we computed the first three moments of distributions of f_free for Dataset 8 and compared the values obtained between conditions (see Supplementary information and new Sup. Figure 15). Third, averaging across individuals allows to give each experimental subject the same weight in the comparisons.

      I recommend the authors address two concerns. First, what degree of fluctuation in autofluorescence decay can we expect over time, e.g. over circadian cycles? That would be helpful in evaluating the magnitude of changes following conditioning. And second, if the authors think that metabolism shifts to OXPHOS over glycolosis, are there further genetic manipulations they could make? They test LDH knockdown in gamma KCs, why not knock it down in alpha/beta neurons? The prediction might be that if it prevents the shift to OXPHOS, the shift in f_free distribution in alpha/beta KCs would be attenuated. The extensive library of genetic reagents is an advantage of working with flies, but it comes with a higher standard for corroborating claims.

      In the present study, we used control groups to account for broad fluctuations induced by external factors such as the circadian cycle. We agree with the reviewer that a detailed characterization of circadian variations in the decay parameters would be valuable for assessing the magnitude of conditioning-induced shifts. We have integrated this relevant suggestion in the Discussion. Conducting such an investigation lies unfortunately beyond the scope and means of the current project.

      In line with the suggestion of the reviewer, we have included a new experiment to test the influence of the knockdown of ALAT on the conditioning-induced shift measured in alpha/beta neurons. This choice is motivated in the new manuscript. The obtained result shows that no shift is detected in the mutant flies, in accordance with our hypothesis.

      FLIM as a method is not yet widely prevalent in fly neuroscience, but recent demonstrations of its potential are likely to increase its use. Future efforts will benefit from the description of the properties of the autofluorescence signal to evaluate how autofluorescence may impact measures of FL of genetically engineered indicators.

      Recommendations for the authors

      Reviewer #1 (Recommendations for the authors):

      (1) Y axes in Figures 1e, 2c, 3b,c are misleading. They must start at 0.

      Although we agree that making the Y axes start at 0 is preferable, in our case it makes it difficult to observe the dispersion of the data at the same time (your next suggestion). To make it clearer to the reader that the axes do not start at 0, a broken Y-axis is now displayed in every concerned figure.

      (2) These same plots should have individual data points represented, for increased clarity and transparency.

      Individual data points were added on all boxplots.

      Reviewer #2 (Recommendations for the authors):

      I am evaluating this paper as a fly neuroscientist with experience in neurophysiology, including calcium imaging. I have little experience with FLIM but anticipate its use growing as more microscopes and killer apps are developed. From this perspective, I value the opportunity to dig into FLIM and try to understand this autofluorescence signal. I think the effort to show each piece of the analysis pipeline is valuable. The figures are quite beautiful and easy to follow. My main suggestion is to consider moving some of the supplemental data to the main figures. eLife allows unlimited figures, moving key pieces of the pipeline to the main figures would make for smoother reading and emphasize the technical care taken in this study.

      We thank the reviewer for their feedback. Following their advice we have moved panels from the supplementary figures to the main text (see new Figure 2).

      Unfortunately, the scientific questions and biological data do not rise to the typical standard in the field to support the claims in the title, "In vivo autofluorescence lifetime imaging of the Drosophila brain captures metabolic shifts associated with memory formation". The authors also clearly state what the next steps are: "hypothesis-driven approaches that rely on metabolite-specific sensors" (Intro). The advantage of fly neuroscience is the extensive library of genetic reagents that enable perturbations. The key manipulation in this study is the electric shock conditioning paradigm that subtly shifts the distribution of a parameter fit to an exponential decay in the somas of alpha/beta KCs vs others. This feels like an initial finding that deserves follow-up; but is it a large enough result to motivate a future student to pick this project up? The larger effect appears to be the gradients in f_free across KCs overall (Figure 2b). How does this change with conditioning?

      We acknowledge that the observed metabolic shift is modest relative to the variability of f_free and agree that additional corroborating experiments would further strengthen this result. Nevertheless, we believe it remains a valid and valuable finding that will be of interest to researchers in the field. The reviewer is right in pointing out that the gradient across KCs is higher in magnitude, however, the fact that this technique can also report experience-dependent changes, in addition to innate heterogeneities across different cell types, is a major incentive for people who could be interested in applying NAD(P)H FLIM in the future. For this reason, we consider it appropriate to retain mention of the memory-induced shift in the title, while making it less assertive and adding a reference to the structural heterogeneities of f_free revealed in the study. We have also rephrased the abstract to adopt a more cautious tone and expanded the discussion to clarify why a low-magnitude shift in f_free can still carry biological significance in this context. Finally, we have added the results of a new set of data involving the knockdown of ALAT in Kenyon cells, to further support the relevance of our observation relative to memory formation, despite its small magnitude. We believe that these elements together form a good basis for future investigations and that the manuscript merits publication in its present form.

      Together, I would recommend reshaping the paper as a methods paper that asks the question, what are the spatial properties of NADPH FL across the brain? The importance of this question is clear in the context of other work on energy metabolism in the MBs. 2P FLIM will likely always have to account for autofluorescence, so this will be of interest. The careful technical work that is the strength of the manuscript could be featured, and whether conditioning shifts f_free could be a curio that might entice future work.

      By transferring panels of the supplementary figures to the main text (see new Figure 2) as suggested by Reviewer 2, we have reinforced the methodological part of the manuscript. For the reasons explained above, we however still mention the ‘biological’ findings in the title and abstract.

      Minor recommendations on science:

      Figure 2C. Plotting either individual data points or distributions would be more convincing.

      Individual data points were added on all boxplots.

      There are a few mentions of glia. What are the authors' expectations for metabolic pathways in glia vs. neurons? Are glia expected to use one more than the other? The work by Rabah suggests it should be different and perhaps complementary to neurons. Can a glial marker be used in addition to KC markers? This seems crucial to being able to distinguish metabolic changes in KC somata from those in glia.

      Drosophila cortex glia are thought to play a similar role as astrocytes in vertebrates (see Introduction). In that perspective, we expect cortex glia to display a higher level of glycolysis than neurons. The work by Rabah et al. is coherent with this hypothesis. Reviewer 2 is right in pointing out that using a glial marker would be interesting. However, current technical limitations make such experiments challenging. These limitations are now exposed in the discussion.

      The question of whether KC somata positions are stereotyped can probably be answered in other ways as well. For example, the KCs are in the FAFB connectomic data set and the hemibrain. How do the somata positions compare?

      The reviewer’s suggestion is indeed interesting. However, the FAFB and hemibrain connectomic datasets are based on only two individual flies, which probably limits their suitability for assessing the stereotypy of KC subtype distributions. In addition, aligning our data with the FAFB dataset would represent substantial additional work.

      The free parameter tau_bound is mysterious if it can be influenced by the identity of the protein. Are there candidate NADPH binding partners that have a spatial distribution in confocal images that could explain the difference between somas and calyx?

      There are indeed dozens of NADH- or NADPH-binding proteins. For this reason, in all studies implementing exponential fitting of metabolic FLIM data, tau_bound is considered a complex combination of the contributions from many different proteins. In addition, one should keep in mind that the number of cell types contributing to the autofluorescence signal in the mushroom body calyx (Kenyon cells, astrocyte-like and ensheathing glia, APL neurons, olfactory projection neurons, dopamine neurons) is much higher than in the somas (only Kenyon cells and cortex glia). This could also participate in the observed difference. Hence, focusing on intracellular heterogeneities of potential NAD(P)H binding partners seems premature at that stage.

      The phrase "noticeable but not statistically significant" is misleading.

      We agree with the reviewer and have removed “noticeable but” from the sentence in the new version of the manuscript.

      Minor recommendations on presentation:

      The Introduction can be streamlined.

      We agree that some parts of the Introduction can seem a bit long for experts of a particular field. However, we think that this level of detail makes the article easily accessible for neuroscientists working on Drosophila and other animal models but not necessarily with FLIM, as well as for experts in energy metabolism that may be familiar with FLIM but not with Drosophila neuroscience.

    1. eLife Assessment

      This study provides a useful application of computational modelling to examine how people with chronic pain learn under uncertainty, contributing to efforts to link pain with motivational processes. However, the evidence supporting the main claims is incomplete, as the modelling differences are not reflected in observable behaviour or pain measures, and the interpretation extends beyond what the data can substantiate. The conclusions would benefit from a clearer explanation of the behavioural differences that underlie the computational findings.

    2. Reviewer #1 (Public review):

      Summary:

      This study investigates how individuals with chronic temporomandibular disorder (TMD) learn from uncertain rewards, using a probabilistic three-armed bandit task and computational modelling. The authors aim to identify whether people living with chronic pain show altered learning under uncertainty and how such differences might relate to psychological symptoms.

      Strengths:

      The work addresses an important question about how chronic pain may influence cognition and motivation. The task design is appropriate for probing adaptive learning, and the modelling approach is novel. The findings of altered uncertainty updating in the TMD group are interesting.

      Weaknesses:

      Several aspects of the paper limit the strength of the conclusions. The group differences appear only in model-derived parameters, with no corresponding behavioural differences in task performance. Model parameters do not correlate with pain severity, making the proposed mechanistic link between pain and learning speculative. Some of the interpretations extend beyond what the data can directly support.

    3. Reviewer #2 (Public review):

      Summary:

      In this paper, the authors report on a case-control study in which participants with chronic pain (TMD) were compared to controls on performance of a three-option learning task. The authors find no difference in task behavior, but fit a model to this behavior and suggest that differences in the model-derived metrics (specifically, change in learning rate/estimated volatility/model estimated uncertainty) reveal a relevant between-group effect. They report a mediation effect suggesting that group differences on self-report apathy may be partially mediated by this uncertainty adaptation result.

      Strengths:

      The role of sensitivity to uncertainty in pathological states is an interesting question and is the focus of a reasonable amount of research at present. This paper provides a useful assessment of these processes in people with chronic pain.

      Weaknesses:

      (1) The interpretation of the model in the absence of any apparent behavioral effect is not convincing. The model is quite complex with a number of free parameters (what these parameters are is not well explained in the methods, although they seem to be presented in the supplement). These parameters are fitted to participant choice behavior - that is, they explain some sort of group difference in this choice behavior. The authors haven't been able to demonstrate what this difference is. The graphs of learning rate per group (Figure 2) suggest that the control group has a higher initial learning rate and a lower later learning rate. If this were actually the case, you would expect to see it reflected in the choice data (the control group should show higher lose-shift behavior earlier on, with this then declining over time, and the TMD group should show no change). This behavior is not apparent. The absence of a clear effect on behavior suggests that the model results are more likely to be spurious.

      (2) As far as I could see, the actual parameters of the model are not reported. The results (Figure 2) illustrate the trial-level model estimated uncertainty/learning rate, etc, but these differ because the fitted model parameters differ. The graphs look like there are substantial differences in v0 (which was not well recovered), but presumably lambda, at least, also differs. The mean(SD) group values for these parameters should be reported, as should the correlations between them (it looks very much like they will be correlated).

      (3) The task used seems ill-suited to measuring the reported process. The authors report the performance of a restless bandit task and find an effect on uncertainty adaptation. The task does not manipulate uncertainty (there are no periods of high/low uncertainty) and so the only adaptation that occurs in the task is the change from what appears to be the participants' prior beliefs about uncertainty (which appear to be very different between groups - i.e. the lines in Figure 2a,b,c are very different at trial 0). If the authors are interested in measuring adaptation to uncertainty, it would clearly be more useful to present participants with periods of higher or lower uncertainty.

      (4) The main factor driving the better fit of the authors' preferred model over listed alternatives seems to be the inclusion of an additive uncertainty term in the softmax-this differentiates the chosen model from the other two Kalman filter-based models that perform less well. But a similar term is not included in the RW models-given the uncertainty of a binary outcome can be estimated as p(1-p), and the RW models are estimating p, this would seem relatively straightforward to do. It would be useful to know if the factor that actually drives better model fit is indeed in the decision stage (rather than the learning stage).

    4. Reviewer #3 (Public review):

      This paper applies a computational model to behavior in a probabilistic operant reward learning task (a 3-armed bandit) to uncover differences between individuals with temporomandibular disorder (TMD) compared with healthy controls. Integrating computational principles and models into pain research is an important direction, and the findings here suggest that TMD is associated with subtle changes in how uncertainty is represented over time as individuals learn to make choices that maximize reward. There are a number of strengths, including the comparison of a volatile Kalman filter (vKF) model to some standard base models (Rescorla Wagner with 1 or 2 learning rates) and parameter recovery analyses suggesting that the combination of task and vKF model may be able to capture some properties of learning and decision-making under uncertainty that may be altered in those suffering from chronic pain-related conditions.

      I've focused my comments in four areas: (1) Questions about the patient population, (2) Questions about what the findings here mean in terms of underlying cognitive/motivational processes, (3) Questions about the broader implications for understanding individuals with TMD and other chronic pain-related disorders, and (4) Technical questions about the models and results.

      (1) Patient population

      This is a computational modelling study, so it is light on characterization of the population, but the patient characteristics could matter. The paper suggests they were hospitalized, but this is not a condition that requires hospitalization per se. It would be helpful to connect and compare the patient characteristics with large-scale studies of TMD, such as the OPPERA study led by Maixner, Fillingim, and Slade.

      (2) What cognitive/motivational processes are altered in TMD

      The study finds a pattern of alterations in TMD patients that seems clear in Figure 2. Healthy controls (HC) start the task with high estimates of volatility, uncertainty, and learning rate, which drop over the course of the task session. This is consistent with a learner that is initially uncertain about the structure of the environment (i.e., which options are rewarded and how the contingencies change over time) but learns that there is a fixed or slowly changing mean and stationary variance. The TMD patients start off with much lower volatility, uncertainty, and learning rate - which are actually all near 0 - and they remain stable over the course of learning. This is consistent with a learner who believes they know the structure of the environment and ignores new information.

      What is surprising is that this pattern of changes over time was found in spite of null group differences in a number of aspects of performance: (1) stay rate, (2) switch rate, (3) win-stay/lose-switch behaviors, (4) overall performance (corrected for chance level), (5) response times, (6) autocorrelation, (7) correlations between participants' choice probability and each option's average reward rate, (7) choice consistency (though how operationalized is not described?), (8) win-stay-lose-shift patterns over time. I'm curious about how the patterns in Figure 2 would emerge if standard aspects of performance are essentially similar across groups (though the study cannot provide evidence in favor of the null). It will be important to replicate these patterns in larger, independent samples with preregistered analyses.

      The authors believe that this pattern of findings reveals that TMD patients "maintain a chronically heightened sensitivity to environmental changes" and relate the findings to predictive processing, a hallmark of which (in its simplest form) is precision-weighted updating of priors. They also state that the findings are not related to reduced overall attentiveness or failure to understand the task, but describe them as deficits or impairments in calibrating uncertainty.

      The pattern of differences could, in fact, result from differences in prior beliefs, conceptualization of the task, or learning. Unpacking these will be important steps for future work, along with direct measures of priors, cognitive processes during learning, and precision-weighted updating.

      (3) Implications for understanding chronic pain

      If the findings and conclusions of the paper are correct, individuals with TMD and perhaps other pain-related disorders may have fundamental alterations in the ways in which they make decisions about even simple monetary rewards. The broader questions for the field concern (1) how generalizable such alterations are across tasks, (2) how generalizable they are across patient groups and, conversely, how specific they are to TMD or chronic pain, (3) whether they are the result of neurological dysfunction, as opposed to (e.g.) adaptive strategies or assumptions about the environment/task structure.

      It will be important to understand which features of patients' and/or controls' cognition are driving the changes. For example, could the performance differences observed here be attributable to a reduced or altered understanding of the task instructions, more uncertainty about the rules of the game, different assumptions about environments (i.e., that they are more volatile/uncertain or less so), or reduced attention or interest in optimizing performance? Are the controls OVERconfident in their understanding of the environment?

      This set of questions will not be easy to answer and will be the work of many groups for many years to come. It is a judgment call how far any one paper must go to address them, but my view is that it is a collaborative effort. Start with a finding, replicate it across labs, take the replicable phenomena and work to unpack the underlying questions. The field must determine whether it is this particular task with this model that produces case-control differences (and why), or whether the findings generalize broadly. Would we see the same findings for monetary losses, sounds, and social rewards? Tasks with painful stimuli instead of rewards?

      Another set of questions concerns the space of computational models tested, and whether their parameters are identifiable. An alteration in estimated volatility or learning rate, for example, can come from multiple sources. In one model, it might appear as a learning rate change and in another as a confirmation bias. It would be interesting in this regard to compare the "mechanisms" (parameters) of other models used in pain neuroscience, e.g., models by Seymour, Mancini, Jepma, Petzschner, Smith, Chen, and others (just to name a few).

      One immediate next step here could be to formally compare the performance of both patients and controls to normatively optimal models of performance (e.g., Bayes optimal models under different assumptions). This could also help us understand whether the differences in patients reflect deficits and what further experiments we would need to pin that down.<br /> In addition, the volatility parameter in the computational model correlated with apathy. This is interesting. Is there a way to distinguish apathy as a particular clinical characteristic and feature of TMD from apathy in the sense of general disinterest in optimal performance that may characterize many groups?

      If we know this, what actionable steps does it lead us to take? Could we take steps to reduce apathy and thus help TMD patients better calibrate to environmental uncertainty in their lives? Or take steps to recalibrate uncertainty (i.e., increase uncertainty adaptation), with benefits on apathy? A hallmark of a finding that the field can build off of is the questions it raises.

      (4) Technical questions about the models and results

      Clarification of some technical points would help interpret the paper and findings further:

      (a) Was the reward probability truly random? Was the random walk different for each person, or constrained?

      (b) When were self-report measures administered, and how?

      (c) Pain assessments: What types of pain? Was a body map assessed? Widespreadness? Pain at the time of the test, or pain in general?

      (d) Parameter recovery: As you point out, r = 0.47 seems very low for recovery of the true quantity, but this depends on noise levels and on how the parameter space is sampled. Is this noise-free recovery, and is it robust to noise? Are the examples of true parameters drawn from the space of participants, or do they otherwise systematically sample the space of true parameters?

      (e) What are the covariances across parameter estimates and resultant confusability of parameter estimates (e.g., confusion matrix)?

      (f) It would be helpful to have a direct statistical comparison of controls and TMD on model parameter estimates.

      (g) Null statistical findings on differences in correlations should not be interpreted as a lack of a true effect. Bayes Factors could help, but an analysis of them will show that hundreds of people are needed before it is possible to say there are no differences with reasonable certainty. Some journals enforce rules around the kinds of language used to describe null statistical findings, and I think it would be helpful to adopt them more broadly.

      (h) What is normatively optimal in this task? Are TMD patients less so, or not? The paper states "aberrant precision (uncertainty) weighting and misestimation of environmental volatility". But: are they misestimates?

      (i) It's not clear how well the choice of prior variance for all parameters (6.25) is informed by previous research, as sensible values may be task- and context-dependent. Are the main findings robust to how priors are specified in the HBI model?

    1. eLife Assessment

      This manuscript proposes a lateralized, lobe-specific brain-liver sympathetic neurocircuit regulating hepatic glucose metabolism and presents anatomical evidence for sympathetic crossover at the porta hepatis using viral tracing and neuromodulation approaches. While the topic is of important significance and the methodologies are, in principle, state-of-the-art, significant concerns regarding experimental design, incomplete methodological reporting, sparse and ambiguous labeling, and overi-nterpretation of the data substantially weaken support for the study's central conclusions, thereby limiting the study's completeness. The work will be of interest to biologists, clinicians, and physiologists.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript by Wang et al. reports the potential involvement of an asymmetric neurocircuit in the sympathetic control of liver glucose metabolism.

      Strengths:

      The concept that the contralateral brain-liver neurocircuit preferentially regulates each liver lobe may be interesting.

      Weaknesses:

      However, the experimental evidence presented did not support the study's central conclusion.

      (1) Pseudorabies virus (PRV) tracing experiment:<br /> The liver not only possesses sympathetic innervations but also vagal sensory innervations. The experimental setup failed to distinguish whether the PRV-labeling of LPGi (Lateral Paragigantocellular Nucleus) is derived from sympathetic or vagal sensory inputs to the liver.

      (2) Impact on pancreas:<br /> The celiac ganglia not only provide sympathetic innervations to the liver but also to the pancreas, the central endocrine organ for glucose metabolism. The chemogenetic manipulation of LPGi failed to consider a direct impact on the secretion of insulin and glucagon from the pancreas.

      (3) Neuroanatomy of the brain-liver neurocircuit:<br /> The current study and its conclusion are based on a speculative brain-liver sympathetic circuit without the necessary anatomical information downstream of LPGi.

      (4) Local manipulation of the celiac ganglia:<br /> The left and right ganglia of mice are not separate from each other but rather anatomically connected. The claim that the local injection of AAV in the left or right ganglion without affecting the other side is against this basic anatomical feature.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript by Wang and colleagues aims to determine whether the left and right LPGi differentially regulate hepatic glucose metabolism and to reveal decussation of hepatic sympathetic nerves.

      The authors used tissue clearing to identify sympathetic fibers in the liver lobes, then injected PRV into the hepatic lobes. Five days post-injection, PRV-labeled neurons in the LPGi were identified. The results indicated contralateral dominance of premotor neurons and partial innervation of more than one lobe. Then the authors activated each side of the LPGi, resulting in a greater increase in blood glucose levels after right-sided activation than after left-sided activation, as well as changes in protein expression in the liver lobes. These data suggested modulation of HGP (hepatic glucose production) in a lobe-specific manner. Chemical denervation of a particular lobe did not affect glucose levels due to compensation by the other lobes. In addition, nerve bundles decussate in the hepatic portal region.

      Strengths:

      The manuscript is timely and relevant. It is important to understand the sympathetic regulation of the liver and the contribution of each lobe to hepatic glucose production. The authors use state-of-the-art methodology.

      Weaknesses:

      (1) The wording/terminology used in the manuscript is misleading, and it is not used in the proper context. For instance, the goal of the study is "to investigate whether cerebral hemispheres differentially regulate hepatic glucose metabolism..." (see abstract); however, the authors focus on the brainstem (a single structure without hemispheres). Similarly, symmetric is not the best word for the projections.

      (2) Sparse labeling of liver-related neurons was shown in the LPGi (Figure 1). It would be ideal to have lower magnification images to show the area. Higher quality images would be necessary, as it is difficult to identify brainstem areas. The low number of labeled neurons in the LPGi after five days of inoculation is surprising. Previous findings showed extensive labeling in the ventral brainstem at four days post-inoculation (Desmoulins et al., 2025). Unfortunately, it is not possible to compare the injection paradigm/methods because the PRV inoculation is missing from the methods section. If the PRV is different from the previously published viral tracers, time-dependent studies to determine the order of neurons and the time course of infection would be necessary.

      (3) Not all LPGi cells are liver-related. Was the entire LPGi population stimulated, or was it done in a cell-type-specific manner? What was the strain, sex, and age of the mice? What was the rationale for using the particular viral constructs?

      (4) The authors should consider the effect of stimulation of double-labeled neurons (innervating more than one lobe) and potential confounding effects regarding other physiological functions.

      (5) The authors state that "central projections directly descend along the sympathetic chain to the celiac-superior mesenteric ganglia". What they mean is unclear. Do the authors refer to pre-ganglionic neurons or premotor neurons? How does it fit with the previous literature?

      (6) How was the chemical denervation completed for the individual lobes?

      (7) The Western Blot images look like they are from different blots, but there are no details provided regarding protein amount (loading) or housekeeping. What was the reason to switch beta-actin and alpha-tubulin? In Figures 3F -G, the GS expression is not a good representative image. Were chemiluminescence or fluorescence antibodies used? Were the membranes reused?

      (8) Key references using PRV for liver innervation studies are missing (Stanley et al, 2010 [PMID: 20351287]; Torres et al., 2021 [PMID: 34231420]; Desmoulins et al., 2025 [PMID: 39647176]).

    4. Reviewer #3 (Public review):

      Summary:

      This study found a lobe-specific, lateralized control of hepatic glucose metabolism by the brain and provides anatomical evidence for sympathetic crossover at the porta hepatis. The findings are particularly insightful to the researchers in the field of liver metabolism, regeneration, and tumors.

      Strengths:

      Increasing evidence suggests spatial heterogeneity of the liver across many aspects of metabolism and regenerative capacity. The current study has provided interesting findings: neuronal innervation of the liver also shows anatomical differences across lobes. The findings could be particularly useful for understanding liver pathophysiology and treatment, such as metabolic interventions or transplantation.

      Weaknesses:

      Inclusion of detailed method and Discussion:

      (1) The quantitative results of PRV-labeled neurons are presented, and please include the specific quantitative methods.

      (2) The Discussion can be expanded to include potential biological advantages of this complex lateralized innervation pattern.

    5. Reviewer #4 (Public review):

      Summary:

      The studies here are highly informative in terms of anatomical tracing and sympathetic nerve function in the liver related to glucose levels, but given that they are performed in a single species, it is challenging to translated them to humans, or to determine whether these neural circuits are evolutionarily conserved. Dual-labeling anatomical studies are elegant, and the addition of chemogenetic and optogenetic studies is mechanistically informative. Denervation studies lack appropriate controls, and the role of sensory innervation in the liver is overlooked.

      Specific Weaknesses - Major:

      (1) The species name should be included in the title.

      (2) Tyrosine hydroxylase was used to mark sympathetic fibers in the liver, but this marker also hits a portion of sensory fibers that need to be ruled out in whole-mount imaging data

      (3) Chemogenetic and optogenetic data demonstrating hyperglycemia should be described in the context of prior work demonstrating liver nerve involvement in these processes. There is only a brief mention in the Discussion currently, but comparing methods and observations would be helpful.

      (4) Sympathetic denervation with 6-OHDA can drive compensatory increases to tissue sensory innervation, and this should be measured in the liver denervation studies to implicate potential crosstalk, especially given the increase in LPGi cFOS that may be due to afferent nerve activity. Compensatory sympathetic drive may not be the only culprit, though it is clearly assumed to be. The sensory or parasympathetic/vagal innervation of the liver is altogether ignored in this paper and could be better described in general.

    6. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript by Wang et al. reports the potential involvement of an asymmetric neurocircuit in the sympathetic control of liver glucose metabolism.

      Strengths:

      The concept that the contralateral brain-liver neurocircuit preferentially regulates each liver lobe may be interesting.

      Weaknesses:

      However, the experimental evidence presented did not support the study's central conclusion.

      We sincerely thank the reviewer for recognizing the conceptual novelty of our work and for constructive comments aimed at enhancing its rigor and clarity. In response, we will carry out targeted experiments to address the points raised, including: (i) further characterization of LPGi projections to vagal and sympathetic circuits; (ii) evaluation of potential pancreatic involvement; and (ii) validation of the specificity of chemogenetic activation within the proposed circuit. We anticipate completing the revised version within 8 weeks.

      (1) Pseudorabies virus (PRV) tracing experiment:

      The liver not only possesses sympathetic innervations but also vagal sensory innervations. The experimental setup failed to distinguish whether the PRV-labeling of LPGi (Lateral Paragigantocellular Nucleus) is derived from sympathetic or vagal sensory inputs to the liver.

      Thank you for raising this important point. We fully agree that the liver receives both sympathetic and vagal sensory innervation, and we acknowledge that PRV-based tracing alone does not definitively distinguish between these two pathways. This represents a limitation of the original experimental design.

      Based on established anatomical literature as well as our experimental observations, vagal sensory neuron cell bodies reside in the nodose ganglion (NG), and their central projections terminate predominantly in the nucleus of the solitary tract (NTS) (Nature. 2023;623(7986):387-396; Curr Biol. 2020;30(20):3986-3998.e5.), which is located in the dorsomedial medulla. In contrast, the LPGi, together with other sympathetic-related nuclei, is predominantly distributed in the ventral medulla (Cell Metab. 2025;37(11):2264-2279.e10; Nat Commun. 2022;13(1):5079.).

      To directly assess the contribution of vagal sensory pathways, we will perform an additional PRV tracing experiment using two groups of mice: one with bilateral nodose ganglion (NG) removal and a sham-operated control group. Identical PRV injections will be delivered to the liver in both groups, and PRV labeling in the LPGi will be quantitatively compared. Preservation of LPGi labeling following NG ablation would indicate that PRV transmission occurs primarily via sympathetic, rather than vagal sensory, pathways. These data will be incorporated into the revised manuscript and are expected to be completed within 3 weeks.

      (2) Impact on pancreas:

      The celiac ganglia not only provide sympathetic innervations to the liver but also to the pancreas, the central endocrine organ for glucose metabolism. The chemogenetic manipulation of LPGi failed to consider a direct impact on the secretion of insulin and glucagon from the pancreas.

      Thank you for this important comment. We agree that the celiac ganglia (CG) provide sympathetic innervation not only to the liver but also to the pancreas, which plays a central role in glucose homeostasis through the secretion of both insulin and glucagon. Therefore, the potential pancreatic implications associated with LPGi chemogenetic manipulation worth careful consideration.

      To address this concern, we examined circulating glucagon levels following chemogenetic manipulation of the LPGi. As shown in the Supplementary Figure below, plasma glucagon (GCG) concentrations were not significantly altered at 30, 60, 90, or 120 minutes compared with control mice (n = 6), indicating that LPGi manipulation does not measurably affect glucagon secretion under our experimental conditions.

      We acknowledge that insulin secretion was not assessed in the study, which represents an important limitation given the pancreatic innervation of the CG. To further strengthen our interpretation, we are performing additional experiments in newly prepared mice to measure circulating insulin levels following LPGi manipulation. These data together with Author response image 1 below will be included in the revised manuscript upon completion.

      Author response image 1.

      Plasma concentrations of GCG in mice following LPGi GABAergic neurons activation.

      (3) Neuroanatomy of the brain-liver neurocircuit:<br /> The current study and its conclusion are based on a speculative brain-liver sympathetic circuit without the necessary anatomical information downstream of LPGi.

      Thank you for raising this important point. A clear anatomical definition of the downstream pathways linking the brain to the liver is essential for interpreting the proposed brain-liver sympathetic circuit.

      However, the present study (Figure 4A) provides direct anatomical evidence supporting the organization of the brain–liver sympathetic neurocircuit. These observations are consistent with our recent detailed characterization of the brain-liver sympathetic circuit published in Cell Metabolism (Cell Metab. 2025;37(11):2264–2279), LPGi GABAergic neurons inhibit GABAergic neurons in the caudal ventrolateral medulla (CVLM). Disinhibition of CVLM reduces GABAergic suppression of rostral ventrolateral medulla (RVLM) neurons, which are key excitatory drivers of sympathetic tone. RVLM neurons project to sympathetic preganglionic neurons in the sympathetic chain (Syc). These neurons synapse with postganglionic sympathetic neurons in ganglia such as the celiac-superior mesenteric ganglion (CG-SMG). Postganglionic sympathetic fibers then innervate the liver, releasing NE to activate hepatic β<sub>2</sub>-adrenergic receptors and stimulate HGP.

      Together, these data establish a coherent anatomical basis for the proposed brain-liver sympathetic pathway and clarify the downstream organization relevant to the functional experiments presented here.

      Author response image 2.

      Tracing scheme (Left) and whole-mount imaging (Right) of PRV-labeled brain-liver neurocircuit. Scale bars, 3,000 (whole mount) or 1,000 (optical sections) μm.

      (4) Local manipulation of the celiac ganglia:<br /> The left and right ganglia of mice are not separate from each other but rather anatomically connected. The claim that the local injection of AAV in the left or right ganglion without affecting the other side is against this basic anatomical feature.

      Thank you for raising this important anatomical point. We fully acknowledge that the left and right celiac ganglia (CG) in mice are interconnected, and that unilateral viral injection could theoretically affect the contralateral side. The celiac–superior mesenteric ganglion (CG-SMG) complex serves as a major sympathetic hub that regulates visceral organ functions. Recent transcriptomic, anatomical, and functional studies have revealed that the CG-SMG is not a homogeneous structure but is composed of molecularly and functionally distinct neuronal populations. These populations exhibit specialized projection patterns and regulate different aspects of gastrointestinal physiology, supporting a model of modular sympathetic control. (Nature. 2025 Jan;637(8047):895-902). Therefore, we were aware of this phenomenon during the initial stages of these experiments.

      To minimize unintended spread to the contralateral CG, we took two complementary approaches.

      First, we optimized the injection strategy by using an extremely small injection volume (100 nL per site), with a very slow infusion rate (50 nL/min), and fine glass micropipettes. With these refinements, contralateral viral spread was rarely observed.

      Second, and importantly, all animals included in the final analyses were subjected to post hoc anatomical verification. After completion of the experiments, CG were collected, sectioned, and examined for viral expression. As shown in Supplementary Figure 5F, only mice in which viral expression was strictly confined to the targeted CG, with no detectable infection in the contralateral ganglion, were included in the presented data.

      Together, these measures ensure that the reported effects are attributable to local manipulation of the intended CG. We will ensure that the Methods section more explicitly details these technical precautions and that the legend for Figure S5F clearly states its role in validating injection specificity.

      Reviewer #2 (Public review):

      Summary:

      The manuscript by Wang and colleagues aims to determine whether the left and right LPGi differentially regulate hepatic glucose metabolism and to reveal decussation of hepatic sympathetic nerves.

      The authors used tissue clearing to identify sympathetic fibers in the liver lobes, then injected PRV into the hepatic lobes. Five days post-injection, PRV-labeled neurons in the LPGi were identified. The results indicated contralateral dominance of premotor neurons and partial innervation of more than one lobe. Then the authors activated each side of the LPGi, resulting in a greater increase in blood glucose levels after right-sided activation than after left-sided activation, as well as changes in protein expression in the liver lobes. These data suggested modulation of HGP (hepatic glucose production) in a lobe-specific manner. Chemical denervation of a particular lobe did not affect glucose levels due to compensation by the other lobes. In addition, nerve bundles decussate in the hepatic portal region.

      We thank the reviewer for the thorough and constructive evaluation of our manuscript. In direct response, we will undertake comprehensive revisions to enhance the rigor and clarity of the study, including: (i) correcting ambiguous or misleading terminology pertaining to anatomical resolution and sympathetic circuit organization; (ii) expanding the Methods section with complete experimental details, improved image presentation, and explicit justification of our viral and genetic approaches; and (iii) strengthening data interpretation by addressing issues related to sparse PRV labeling, projection heterogeneity, and the functional implications of double-labeled neurons. All revisions are expected to be completed within 8 weeks.

      Strengths:

      The manuscript is timely and relevant. It is important to understand the sympathetic regulation of the liver and the contribution of each lobe to hepatic glucose production. The authors use state-of-the-art methodology.

      Weaknesses:

      (1) The wording/terminology used in the manuscript is misleading, and it is not used in the proper context. For instance, the goal of the study is "to investigate whether cerebral hemispheres differentially regulate hepatic glucose metabolism..." (see abstract); however, the authors focus on the brainstem (a single structure without hemispheres). Similarly, symmetric is not the best word for the projections.

      We thank the reviewer for raising these critical points regarding terminology and conceptual framing. We acknowledge that certain phrases in our original manuscript may have been overly broad or ambiguous, particularly in describing the scope of sympathetic heterogeneity and the specificity of neural projections. Due to practical constraints and the scope of our study, our investigation is focused on the brainstem, which represents the final common pathway for these lateralized commands. We acknowledge that terms referring to the cerebral hemispheres do not accurately describe our study.

      We are revising the manuscript to ensure accurate and consistent terminology and will submit the revised version with these corrections.

      (2) Sparse labeling of liver-related neurons was shown in the LPGi (Figure 1). It would be ideal to have lower magnification images to show the area. Higher quality images would be necessary, as it is difficult to identify brainstem areas. The low number of labeled neurons in the LPGi after five days of inoculation is surprising. Previous findings showed extensive labeling in the ventral brainstem at four days post-inoculation (Desmoulins et al., 2025). Unfortunately, it is not possible to compare the injection paradigm/methods because the PRV inoculation is missing from the methods section. If the PRV is different from the previously published viral tracers, time-dependent studies to determine the order of neurons and the time course of infection would be necessary.

      We sincerely thank the reviewer for these detailed and constructive comments regarding the PRV tracing experiments. We fully agree that careful presentation and interpretation of the anatomical data are essential for ensuring rigor and transparency. We address each point in detail below.

      (1) Image magnification and anatomical context of LPGi labeling

      We agree that the original images did not sufficiently convey the broader anatomical context of the LPGi. In the revised manuscript, we will replace the original panels in Figure 1 with new images that include lower-magnification overviews of the brainstem, alongside higher-magnification views of the LPGi. These images clearly delineate the LPGi with respect to established anatomical landmarks and atlas boundaries. Image contrast and resolution will also be optimized to allow unambiguous identification of PRV-labeled neurons and surrounding structures.

      (2) Sparse LPGi labeling at 5 days post-injection and methodological details

      We apologize for the omission of the detailed PRV injection protocol in the original Methods section. We deliberately used small-volume, focal injections (1 µL per liver lobe) to minimize viral spread and to restrict labeling to circuits specifically connected to the targeted hepatic region. Under these conditions, early-stage or intermediate-order upstream nuclei such as the LPGi are expected to exhibit relatively sparse labeling compared to more proximal autonomic nuclei. This information will add, including the PRV strain, viral titer, injection volume, precise injection coordinates, and surgical procedures.

      (3) Not all LPGi cells are liver-related. Was the entire LPGi population stimulated, or was it done in a cell-type-specific manner? What was the strain, sex, and age of the mice? What was the rationale for using the particular viral constructs?

      We thank the reviewer for this insightful and important question. We agree that not all neurons within the LPGi are liver-related, and we apologize that our rationale was not clearly articulated in the original manuscript.

      (1) Our decision to target GABAergic neurons in the LPGi using Gad1-Cre mice was based on prior experimental evidence rather than an assumption about the entire LPGi population. In our previous study (Cell Metab. 2025;37(11):2264-2279.e10), we performed single-cell RNA sequencing on retrogradely labeled LPGi neurons following liver tracing. These analyses revealed that the majority of liver-projecting LPGi neurons are GABAergic in nature. Based on these findings, we chose to selectively manipulate GABAergic neurons in the LPGi rather than the entire LPGi neuronal population, in order to achieve greater cellular specificity and to minimize potential confounding effects arising from heterogeneous neuron types within this region. We regret that this rationale was not clearly described in the original submission and have now revised the manuscript to explicitly state this reasoning.

      (2) In addition, we apologize for the omission of mouse strain, sex, and age information in the Methods section. These details will be fully added.

      (3) We selected AAV-based viral vectors, specifically the AAV9 serotype, due to their well-established efficiency in transducing neurons in the brainstem, relatively low toxicity, and widespread use in circuit-level chemogenetic and optogenetic studies. When combined with Cre-dependent viral constructs in Gad1-Cre mice, this approach enabled selective and reliable manipulation of LPGi GABAergic neurons.

      (4) The authors should consider the effect of stimulation of double-labeled neurons (innervating more than one lobe) and potential confounding effects regarding other physiological functions.

      We thank the reviewer for raising this important point. We agree that neurons innervating more than one liver lobe could, in principle, introduce potential confounding effects and may reflect higher-order integrative autonomic neurons.

      This consideration is consistent with a key finding of the cited study: the celiac-superior mesenteric ganglion (CG-SMG) contains molecularly distinct sympathetic neuron populations (e.g., RXFP1<sup>+</sup> vs. SHOX2<sup>+</sup>) that exhibit complementary organ projections and separate, non‑overlapping functions. Specifically, RXFP1<sup>+</sup> neurons innervate secretory organs (pancreas, bile duct) to regulate secretion, while SHOX2<sup>+</sup> neurons innervate the gastrointestinal tract to control motility. This functional segregation supports the concept of specialized autonomic modules rather than a uniform,“fight or flight”response, reinforcing the need for careful interpretation of circuit-specific manipulations. (Nature. 2025;637(8047):895-902; Neuron. Published online December 10, 2025).

      In our PRV tracing experiments, the proportion of double-labeled neurons was relatively small, suggesting that the majority of labeled LPGi neurons preferentially associate with individual hepatic lobes. Nevertheless, we recognize that activation of this minority population could contribute to broader physiological effects beyond strictly lobe-specific regulation. We acknowledge that the absence of single-cell-level resolution in the current study limits our ability to further dissect the functional heterogeneity of these projection-defined neurons, and we will explicitly state this as a limitation in the revised manuscript. We will explicitly acknowledge this possibility in the revised manuscript and included it as a limitation of the current study. We thank the reviewer for highlighting this important conceptual consideration.

      (5) The authors state that "central projections directly descend along the sympathetic chain to the celiac-superior mesenteric ganglia". What they mean is unclear. Do the authors refer to pre-ganglionic neurons or premotor neurons? How does it fit with the previous literature?

      We thank the reviewer for pointing out this imprecise wording. We agree that the original phrasing was anatomically inaccurate and potentially confusing. The pathways we intended to describe involve brainstem premotor neurons that project to sympathetic preganglionic neurons in the spinal cord. These preganglionic neurons then innervate neurons in the celiac–superior mesenteric ganglia, which in turn provide postganglionic input to the liver.

      We are revising the manuscript to clearly distinguish premotor from preganglionic neurons and to describe this pathway in a manner consistent with the established organization of sympathetic autonomic circuits reported in the previous literature. The revised wording will explicitly reflect this hierarchical relay structure.

      (6) How was the chemical denervation completed for the individual lobes?

      We thank the reviewer for raising this important methodological concern. We agree that potential diffusion of 6-OHDA is a critical issue when performing lobe-specific chemical denervation, and we apologize that our original description did not sufficiently clarify how this was controlled.

      In the revised Methods section, we will provide a detailed description of the denervation procedure, including the injection volume and concentration of 6-OHDA, as well as the physical separation and isolation of individual hepatic lobes during application to minimize diffusion to adjacent tissue.

      To directly assess the specificity of the chemical denervation, we included immunofluorescence and Western blot analyses demonstrating a selective reduction of sympathetic markers in the targeted lobe, with minimal effects on non-targeted lobes. These results support the effectiveness and relative spatial confinement of the 6-OHDA treatment under our experimental conditions.

      We thank the reviewer for highlighting this point, which has helped us improve both the clarity and rigor of the manuscript.

      (7) The Western Blot images look like they are from different blots, but there are no details provided regarding protein amount (loading) or housekeeping. What was the reason to switch beta-actin and alpha-tubulin? In Figures 3F -G, the GS expression is not a good representative image. Were chemiluminescence or fluorescence antibodies used? Were the membranes reused?

      We thank the reviewer for this careful and detailed evaluation of the Western blot data. We apologize that insufficient methodological detail was provided in the original submission.

      (1) We would like to clarify that the protein bands shown within each panel were derived from the same membrane. To improve transparency, we will provide full, uncropped images of the corresponding membranes in the supplementary materials. In addition, detailed information regarding protein loading amounts, gel conditions, and housekeeping controls will be added to the Methods section.

      (2) The use of different loading controls (β-actin or α-tubulin) reflects a technical consideration rather than an experimental inconsistency. In our experiments, the molecular weight of the TH (62kDa) was too close to α-tubulin (55kDa), and β-actin (42kDa) was therefore used to avoid band overlap and to ensure accurate quantification.

      (3) Regarding the GS signal shown in Figures 3F–G, we agree that the original representative image was suboptimal. This appears to be related to antibody performance rather than sample quality. To address this, we are repeating the GS Western blot using a newly validated antibody. The original tissue samples had been aliquoted and stored at −80 °C, allowing reliable re-analysis. This work will be done in 8 weeks.

      (4) All Western blot experiments were detected using chemiluminescence, and membrane stripping and reprobing procedures are now explicitly described in the Methods section.

      We thank the reviewer for highlighting these issues, which significantly improve the rigor and clarity of our data presentation.

      (8) Key references using PRV for liver innervation studies are missing (Stanley et al, 2010 [PMID: 20351287]; Torres et al., 2021 [PMID: 34231420]; Desmoulins et al., 2025 [PMID: 39647176]).

      We thank the reviewer for pointing out these important and highly relevant references that were inadvertently omitted in our initial submission. The studies by Stanley et al. (Proc Natl Acad Sci U S A, 2010), Torres et al. (Am J Physiol Regul Integr Comp Physiol, 2021), and Desmoulins et al. (Auton Neurosci, 2025) represent key PRV-based retrograde tracing work that has mapped central neural circuits innervating the liver and thus provide essential context for our anatomical analyses.

      We agree that inclusion of these studies is necessary to properly situate our findings within the existing literature. Accordingly, we will incorporate citations to these references in the revised manuscript and discuss their relationship to our results.

      Reviewer #3 (Public review):

      Summary:

      This study found a lobe-specific, lateralized control of hepatic glucose metabolism by the brain and provides anatomical evidence for sympathetic crossover at the porta hepatis. The findings are particularly insightful to the researchers in the field of liver metabolism, regeneration, and tumors.

      Strengths:

      Increasing evidence suggests spatial heterogeneity of the liver across many aspects of metabolism and regenerative capacity. The current study has provided interesting findings: neuronal innervation of the liver also shows anatomical differences across lobes. The findings could be particularly useful for understanding liver pathophysiology and treatment, such as metabolic interventions or transplantation.

      Weaknesses:

      Inclusion of detailed method and Discussion:

      We sincerely thank the reviewer for the positive and constructive feedback, which will significantly enhance both the methodological rigor and the broader biological interpretation of our study. In direct response, we will revise the Discussion to elaborate on the potential physiological advantages of a lateralized and lobe-specific pattern of liver innervation. Furthermore, we will expand the Methods section to include a comprehensive description of the quantitative analysis applied to PRV-labeled neurons. Together, these revisions will strengthen the manuscript’s clarity, depth, and relevance to researchers in hepatic metabolism, regeneration, and disease. We expect to complete all updates within 8 weeks.

      (1) The quantitative results of PRV-labeled neurons are presented, and please include the specific quantitative methods.

      We thank the reviewer for this helpful suggestion. We will add a detailed description of the quantitative methods used to analyze PRV-labeled neurons in the revised Methods section. This includes information on the counting criteria, the brain regions analyzed, how the regions of interest were delineated, and the normalization procedures applied to obtain the reported neuron counts.

      (2) The Discussion can be expanded to include potential biological advantages of this complex lateralized innervation pattern.

      We appreciate the reviewer’s suggestion. We will expand the Discussion to include a paragraph addressing the potential biological significance of lateralized liver innervation. We highlight that this asymmetric organization could allow for more precise, lobe-specific regulation of hepatic metabolism, enable integration of distinct physiological signals, and potentially provide robustness against perturbations. These points will discuss in the revised manuscript.

      Reviewer #4 (Public review):

      Summary:

      The studies here are highly informative in terms of anatomical tracing and sympathetic nerve function in the liver related to glucose levels, but given that they are performed in a single species, it is challenging to translated them to humans, or to determine whether these neural circuits are evolutionarily conserved. Dual-labeling anatomical studies are elegant, and the addition of chemogenetic and optogenetic studies is mechanistically informative. Denervation studies lack appropriate controls, and the role of sensory innervation in the liver is overlooked.

      We sincerely appreciate the reviewer's thoughtful evaluation and fully agree that findings derived from a single-species model must be interpreted with caution in relation to human physiology. In direct response, we will revise the manuscript to explicitly clarify that all experimental data were obtained in mice and to provide a discussion of the limitations regarding direct extrapolation to humans. Concurrently, we will expand the Discussion section by integrating our findings with recent human and translational studies, including a multicenter clinical trial demonstrating that catheter-based endovascular denervation of the celiac and hepatic arteries significantly improved glycemic control in patients with poorly controlled type 2 diabetes, without major adverse events (Signal Transduct Target Ther. 2025;10(1):371). While our current work focuses on defining the anatomical organization and functional asymmetry of this circuit in mice, the clinical findings suggest that the core principles, sympathetic control of hepatic glucose metabolism via CG-liver pathways, may be conserved and of translational relevance. Additionally, we will clarify the interpretation of tyrosine hydroxylase labeling and expand the discussion of hepatic sensory and parasympathetic innervation, acknowledging their important roles in liver–brain communication and identifying them as key directions for future research. Collectively, these revisions will provide a more balanced, clinically informed, and rigorous framework for interpreting our findings, and we aim to complete all updates within 8 weeks.

      Specific Weaknesses - Major:

      (1) The species name should be included in the title.

      We thank the reviewer for this suggestion. We agree that the species should be clearly indicated. The findings presented in this study were obtained in mice using tissue clearing and whole-organ imaging approaches. Due to technical limitations, these observations are currently limited to the mouse strain. We will update the title and clarified the species used throughout the manuscript.

      (2) Tyrosine hydroxylase was used to mark sympathetic fibers in the liver, but this marker also hits a portion of sensory fibers that need to be ruled out in whole-mount imaging data

      We thank the reviewer for pointing this out. We acknowledge that tyrosine hydroxylase (TH) labels not only sympathetic fibers but also a subset of sensory fibers. We will add a limitation of this point in the revised manuscript. In addition, ongoing experiments using retrograde PRV labeling from the liver, combined with sectioning, are being used to distinguish sympathetic fibers from vagal and dorsal root ganglion–derived sensory fibers. These data will be included in a forthcoming update of the manuscript and are expected to be completed in approximately 6 weeks.

      (3) Chemogenetic and optogenetic data demonstrating hyperglycemia should be described in the context of prior work demonstrating liver nerve involvement in these processes. There is only a brief mention in the Discussion currently, but comparing methods and observations would be helpful.

      We thank the reviewer for this suggestion. Previous studies largely relied on electrical stimulation to modulate liver innervation, which provides relatively coarse control of neural activity (Eur J Biochem. 1992;207(2):399-411). By contrast, our use of chemogenetic and optogenetic approaches allows selective, cell-type–specific manipulation of LPGi neurons. We will revise the Discussion to place our functional data in the context of prior work, highlighting how these more precise approaches improve understanding of the contribution of liver-innervating neurons to hyperglycemia.

      (4) Sympathetic denervation with 6-OHDA can drive compensatory increases to tissue sensory innervation, and this should be measured in the liver denervation studies to implicate potential crosstalk, especially given the increase in LPGi cFOS that may be due to afferent nerve activity. Compensatory sympathetic drive may not be the only culprit, though it is clearly assumed to be. The sensory or parasympathetic/vagal innervation of the liver is altogether ignored in this paper and could be better described in general.

      We thank the reviewer for this insightful comment and agree that chemical sympathetic denervation with 6-OHDA may induce compensatory changes in non-sympathetic hepatic inputs, including sensory and parasympathetic (vagal) innervation. As the reviewer correctly points out, increased LPGi cFOS activity may reflect afferent nerve engagement rather than solely compensatory sympathetic drive.

      More broadly, we agree that the central nervous system functions as an integrated homeostatic network that continuously processes diverse afferent signals, including hepatic sensory and vagal inputs, as well as other interoceptive cues. From this perspective, the LPGi cFOS changes observed in our study likely represent one component of a complex integrative response rather than evidence for a single dominant pathway.

      We acknowledge that the present study did not directly assess hepatic sensory or parasympathetic innervation, which represents a limitation in scope. In the revised manuscript, we will expand the Discussion to explicitly note this limitation and provide a more balanced consideration of potential crosstalk among sympathetic, sensory, and parasympathetic pathways in shaping LPGi activity following hepatic denervation.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      Although the findings are interesting, this reviewer has major concerns about the experimental design, methodology, results, and interpretation of the data. Experimental details are lacking, including basic information (age, sex, strain of mice, procedures, magnification, etc.).

      We thank the reviewer for this important recommendation. We agree that comprehensive reporting of experimental details is essential for rigor and reproducibility.

      In the revised manuscript, we will add complete information regarding mouse strain, sex, age, and sample size for each experiment. In addition, detailed descriptions of surgical procedures, viral constructs, injection parameters, imaging magnification, and analysis methods have been incorporated into the Methods section.

      These revisions ensure that all experiments are described with sufficient technical detail and clarity to allow accurate interpretation and replication of our findings.

      Reviewer #3 (Recommendations for the authors):

      Addressing a few questions might help:

      (1) The study found that liver-associated LPGi neurons are predominantly GABAergic. It would be informative to molecularly characterize the PRV-traced, liver-projecting LPGi neurons to determine their neurochemical phenotypes.

      We thank the reviewer for this insightful suggestion. We agree that molecular characterization of liver-projecting LPGi neurons is important for understanding their functional identity.

      This issue has been addressed in detail in our recent study (Cell Metab. 2025;37(11):2264-2279.e10), in which we performed single-cell RNA sequencing on retrogradely traced LPGi neurons connected to the liver. These analyses demonstrated that the majority of liver-projecting LPGi neurons are GABAergic, with a defined transcriptional profile distinct from neighboring non–liver-related populations.

      Based on these findings, the current study selectively targets GABAergic LPGi neurons using Gad1-Cre mice. We are now explicitly referencing and summarizing these molecular results in the revised manuscript to clarify the neurochemical identity of the PRV-traced LPGi neurons.

      (2) Is it possible to do a local microinjection of a sodium channel blocker (e.g., lidocaine) or an adrenergic receptor antagonist into the porta hepatis? That would potentially provide additional evidence for the porta hepatis as the functional crossover point.

      We appreciate the reviewer’s thoughtful suggestion. While pharmacological blockade at the porta hepatis could modulate local neural activity, the proposed approach may not fully capture the distinction between ipsilateral and contralateral inputs, and may not conclusively establish neural crossover at this particular site.

      In our view, the anatomical evidence provided by whole-mount tissue clearing, dual-labeled tracing, and direct visualization of decussating nerve bundles at the porta hepatis offers a more definitive demonstration of sympathetic crossover. Pharmacological blockade would affect both crossed and uncrossed fibers simultaneously and therefore would not specifically resolve the anatomical organization of this decussation.

      Nevertheless, we agree that functional interrogation of the porta hepatis represents an interesting direction for future work, and we will now acknowledge this possibility in the Discussion.

      (3) It is possible to investigate the effects of unilateral LPGi manipulation or ablation of one side of CG/SMG on liver metabolism, such as hyperglycemia?

      We thank the reviewer for this important suggestion. We agree that unilateral ablation or silencing of the CG-SMG could provide additional insight into lateralized sympathetic control of liver metabolism.

      However, precise and selective ablation of one side of the CG-SMG through 6-OHDA without affecting the contralateral ganglion or adjacent autonomic structures remains technically challenging, particularly given the anatomical connectivity between the two sides. We are currently optimizing approaches to achieve reliable unilateral manipulation.

      If successful within the revision timeframe, we will include these experiments and corresponding metabolic analyses in the revised manuscript. If not, we will explicitly discuss this experimental limitation and the predicted metabolic consequences of unilateral CG-SMG ablation as an important direction for future studies. This work will be done in 6 weeks.

      Reviewer #4 (Recommendations for the authors):

      In the abstract and elsewhere, the use of the term 'sympathetic release' is unclear - do you mean release of nerve products, such as the neurotransmitter norepinephrine? This should be more clearly defined.

      We thank the reviewer for pointing out this ambiguity. We agree that the term “sympathetic release” was imprecise. In the revised manuscript, we will explicitly refer to the release of sympathetic neurotransmitters, primarily norepinephrine, from postganglionic sympathetic fibers.

      We will revise the wording throughout the manuscript to ensure accurate and consistent terminology and to avoid potential confusion regarding the underlying neurobiological mechanisms.

    1. eLife Assessment

      The findings are important, as they identify MIRO1 as a central regulator linking mitochondrial positioning and respiratory chain function to VSMC proliferation, neointima formation, and human vasoproliferative disease. Overall, the strength of evidence is convincing, with comprehensive in vivo and in vitro data, including human cells and added bioenergetic analyses, that broadly support the main claims despite some remaining limitations in mechanistic and mitochondrial assays.

    2. Reviewer #1 (Public review):

      Summary:

      In this paper, the authors investigate the effects of Miro1 on VSMC biology after injury. Using conditional knockout animals, they provide the important observation that Miro1 is required for neointima formation. They also confirm that Miro1 is expressed in human coronary arteries. Specifically, in conditions of coronary diseases, it is localized in both media and neointima and, in atherosclerotic plaque, Miro1 is expressed in proliferating cells.

      However, the role of Miro1 in VSMC in CV diseases is poorly studied and the data available are limited; therefore, the authors decided to deepen this aspect. The evidence that Miro-/- VSMCs show impaired proliferation and an arrest in S phase is solid and further sustained by restoring Miro1 to control levels, normalizing proliferation. Miro1 also affects mitochondrial distribution, which is strikingly changed after Miro1 deletion. Both effects are associated with impaired energy metabolism due to the ability of Miro1 to participate in MICOS/MIB complex assembly, influencing mitochondrial cristae folding. Interestingly, the authors also show the interaction of Miro1 with NDUFA9, globally affecting super complex 2 assembly and complex I activity.<br /> Finally, these important findings also apply to human cells and can be partially replicated using a pharmacological approach, proposing Miro1 as a target for vasoproliferative diseases.

      Strengths:

      The discovery of Miro1 relevance in neointima information is compelling, as well as the evidence in VSMC that MIRO1 loss impairs mitochondrial cristae formation, expanding observations previously obtained in embryonic fibroblasts.<br /> The identification of MIRO1 interaction with NDUFA9 is novel and adds value to this paper. Similarly, the findings that VSMC proliferation requires mitochondrial ATP support the new idea that these cells do not rely mostly on glycolysis.

      The revised manuscript includes additional data supporting mitochondrial bioenergetic impairment in MIRO1 knockout VSMCs. Measurements of oxygen consumption rate (OCR), along with Complex I (ETC-CI) and Complex V activity, have been added and analyzed across multiple experimental conditions. Collectively, these findings provide a more comprehensive characterization of the mitochondrial functional state. Following revision, the association between MIRO1 deficiency and impaired Complex I activity is more robust.

      Although the precise molecular mechanism of action remains to be fully elucidated, in this updated version, experiments using a MIRO1 reducing agent are presented with improved clarity

      Although some limitations remain, the authors have addressed nearly all the concerns raised, and the manuscript has substantially improved

      Weaknesses:

      Figure 6: The authors do not address the concern regarding the cristae shape; however, characterization of the cristae phenotype with MIRO1 ΔTM would have strengthened the mechanistic link between MIRO1 and the MIB/MICOS complex

      Although the authors clarified their reasoning, they did not explore in vivo validation of key biochemical findings, which represents a limitation of the current study. While their justification is acknowledged, at least a preliminary exploratory effort could have been evaluated to reinforce the translational relevance of the study.

      Finally, in line with the explanations outlined in the rebuttal, the Discussion section should mention the limits of MIRO1 reducer treatment.

    3. Reviewer #2 (Public review):

      Summary:

      This study identifies the outer‑mitochondrial GTPase MIRO1 as a central regulator of vascular smooth muscle cell (VSMC) proliferation and neointima formation after carotid injury in vivo and PDGF-stimulation ex vivo. Using smooth muscle-specific knockout male mice, complementary in vitro murine and human VSMC cell models, and analyses of mitochondrial positioning, cristae architecture and respirometry, the authors provide solid evidence that MIRO1 couples mitochondrial motility with ATP production to meet the energetic demands of the G1/S cell cycle transition. However, a component of the metabolic analyses are suboptimal and would benefit from more robust methodologies. The work is valuable because it links mitochondrial dynamics to vascular remodelling and suggests MIRO1 as a therapeutic target for vasoproliferative diseases, although whether pharmacological targeting of MIRO1 in vivo can effectively reduce neointima after carotid injury has not been explored. This paper will be of interest to those working on VSMCs and mitochondrial biology.

      Strengths:

      The strength of the study lies in its comprehensive approach assessing the role of MIRO1 in VSMC proliferation in vivo, ex vivo and importantly in human cells. The subject provides mechanistic links between MIRO1-mediated regulation of mitochondrial mobility and optimal respiratory chain function to cell cycle progression and proliferation. Finally, the findings are potentially clinically relevant given the presence of MIRO1 in human atherosclerotic plaques and the available small molecule MIRO1.

      Weaknesses:

      (1) High-resolution respirometry (Oroboros) to determine mitochondrial ETC activity in permeabilized VSMCs would be informative.

      (2) Therapeutic targeting of MIRO1 failed to prevent neointima formation, however, the technical difficulties of such an experiment is appreciated.

    4. Reviewer #3 (Public review):

      Summary:

      This study addresses the role of MIRO1 in vascular smooth muscle cell proliferation, proposing a link between MIRO1 loss and altered growth due to disrupted mitochondrial dynamics and function. While the findings are useful for understanding the importance of mitochondrial positioning and function in this specific cell type, the main bioenergetic and mechanistic claims are not strongly supported.

      Strengths:

      This study focuses on an important regulatory protein, MIRO1, and its role in vascular smooth muscle cell (VSMC) proliferation, a relatively underexplored context.

      This study explores the link between smooth muscle cell growth, mitochondrial dynamics, and bioenergetics, which is a significant area for both basic and translational biology.

      The use of both in vivo and in vitro systems provides a useful experimental framework to interrogate MIRO1 function in this context.

      Weaknesses:

      The proposed link between MIRO1 and respiratory supercomplex biogenesis or function is not clearly defined.

      Completeness and integration of mitochondrial assays is marginal, undermining the strength of the conclusions regarding oxidative phosphorylation.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews

      Reviewer #1 (Public review):

      Summary:

      In this paper, the authors investigate the effects of Miro1 on VSMC biology after injury. Using conditional knockout animals, they provide the important observation that Miro1 is required for neointima formation. They also confirm that Miro1 is expressed in human coronary arteries. Specifically, in conditions of coronary diseases, it is localized in both media and neointima, and, in atherosclerotic plaque, Miro1 is expressed in proliferating cells.

      However, the role of Miro1 in VSMC in CV diseases is poorly studied, and the data available are limited; therefore, the authors decided to deepen this aspect. The evidence that Miro-/- VSMCs show impaired proliferation and an arrest in S phase is solid and further sustained by restoring Miro1 to control levels, normalizing proliferation. Miro1 also affects mitochondrial distribution, which is strikingly changed after Miro1 deletion. Both effects are associated with impaired energy metabolism due to the ability of Miro1 to participate in MICOS/MIB complex assembly, influencing mitochondrial cristae folding. Interestingly, the authors also show the interaction of Miro1 with NDUFA9, globally affecting super complex 2 assembly and complex I activity.

      Finally, these important findings also apply to human cells and can be partially replicated using a pharmacological approach, proposing Miro1 as a target for vasoproliferative diseases.

      Strengths:

      The discovery of Miro1 relevance in neointima information is compelling, as well as the evidence in VSMC that MIRO1 loss impairs mitochondrial cristae formation, expanding observations previously obtained in embryonic fibroblasts.

      The identification of MIRO1 interaction with NDUFA9 is novel and adds value to this paper. Similarly, the findings that VSMC proliferation requires mitochondrial ATP support the new idea that these cells do not rely mostly on glycolysis.

      Weaknesses:

      (1) Figure 3:

      I appreciate the system used to assess mitochondrial distribution; however, I believe that time-lapse microscopy to evaluate mitochondrial movements in real time should be mandatory. The experimental timing is compatible with time-lapse imaging, and these experiments will provide a quantitative estimation of the distance travelled by mitochondria and the fraction of mitochondria that change position over time. I also suggest evaluating mitochondrial shape in control and MIRO1-/- VSMC to assess whether MIRO1 absence could impact mitochondrial morphology, altering fission/fusion machinery, since mitochondrial shape could differently influence the mobility.

      Mitochondrial motility experiments. WT and Miro1-/- VSMCs were transiently transfected with mito-ds-red and untargeted GFP adenoviruses to fluorescently label mitochondria and cytosol, respectively. Live-cell fluorescence confocal microscopy was used to acquire mitochondrial images at one-minute intervals over a 25-30-minute period. WT cells exhibited dynamic reorganization of the mitochondrial network, whereas Miro1-/- VSMCs displayed minimal mitochondrial movement, characterized only by limited oscillatory behavior without network remodeling (Supplemental Video 1).

      Mitochondrial shape (form factor) was assessed by confocal microscopy in WT and Miro1-/- VSMCs. Analysis of the mitochondrial form factor (defined as the ratio of mitochondrial length to width) during cell cycle progression revealed morphological changes in wild type (WT) cells, characterized by an increase in form factor. In contrast, Miro1-/- cells exhibited no significant alterations in mitochondrial morphology (Figure 3- Figure supplement 1B).

      (2) Figure 6:

      The evidence of MIRO1 ablation on cristae remodeling is solid; however, considering that the mechanism proposed to explain the finding is the modulation of MICOS/MIB complex, as shown in Figure 6D, I suggest performing EM analysis in each condition. In my mind, Miro1 KK and Miro1 TM should lead to different cristae phenotypes according to the different impact on MICOS/MIB complex assembly. Especially, Miro1 TM should mimic Miro1 -/- condition, while Miro1 KK should drive a less severe phenotype. This would supply a good correlation between Miro1, MICOS/MIB complex formation and cristae folding.

      I also suggest performing supercomplex assembly and complex I activity with each plasmid to correlate MICOS/MIB complex assembly with the respiratory chain efficiency.

      Complex I activity assays revealed that overexpression of MIRO1-WT fully restored enzymatic activity in MIRO1-/- cells, whereas MIRO1-KK provided partial rescue. In contrast, a MIRO1 mutant lacking the transmembrane domain failed to restore activity and resembled the Miro1-/- phenotype (Figure 6- Figure supplement 2).

      The Complex I activity in each Miro1 mutant correlated with the degree of MICOS/MIB complex assembly in pulldown assays, implying a functional link between Miro1 and mitochondrial cristae organization.

      Moreover, an in-gel Complex V activity assay was performed to evaluate the enzymatic activity of mitochondrial ATP synthase in a native gel following electrophoresis. To normalize the activity signal, a Blue Native PAGE of the same samples was probed for the ATP5F1 subunit. A modest, yet statistically significant reduction in Complex V activity was observed in Miro1-/- cells (Figure 6- Figure supplement 1).

      (3) I noticed that none of the in vitro findings have been validated in an in vivo model. I believe this represents a significant gap that would be valuable to address. In your animal model, it should not be too complex to analyze mitochondria by electron microscopy to assess cristae morphology. Additionally, supercomplex assembly and complex I activity could be evaluated in tissue homogenates to corroborate the in vitro observations.

      We appreciate the reviewer’s comment. However, our currently available samples have been processed by light microscopy and are therefore not suitable for embedding for light for electron microscopy.

      (4) I find the results presented in Figure S7 somewhat unclear. The authors employ a pharmacological strategy to reduce Miro1 and validate the findings previously obtained with the genetic knockout model. They report increased mitophagy and a reduction in mitochondrial mass. However, in my opinion, these changes alone could significantly impact cellular metabolism. A lower number of mitochondria would naturally result in decreased ATP production and reduced mitochondrial respiration. This, in turn, weakens the proposed direct link between Miro1 deletion and impaired metabolic function or altered electron transport chain (ETC) activity. I believe this section would benefit from additional experiments and a more in-depth discussion.

      We initially conducted experiments using the MIRO1 reducer to explore the translational potential of our findings. These experiments aimed to provide a foundation in vivo studies. However, despite multiple attempts, we were unable to demonstrate a significant effect of MIRO1reducer, delivered via a Pluronic gel, on the mitochondria of the vascular wall. Of note, he role of MIRO1 in mitophagy has been well-established in several studies (for example, PMID: 34152608), which show that genetic deletion of Miro1 delays the translocation of the E3 ubiquitin ligase Parkin onto damaged mitochondria, thereby reducing mitochondrial clearance in fibroblasts and cultured neurons. Furthermore, loss of Miro1 in the hippocampus and cortex increases mitofusin levels with the appearance of hyperfused mitochondria and activation of the integrated stress response. Thus, MIRO1 deletion in genetic models does not result in a substantial reduction of mitochondria but causes hyperfused mitochondria. The rationale for developing the MIRO1 reducer stems from genetic forms of Parkinson’s disease, where Miro1 is retained in PD cells but degraded in healthy cells following mitochondrial depolarization (PMID: 31564441). Thus, the degradation of mutant MIRO1 by the reducer does not phenocopy the effects of genetic MIRO1 depletion. Thus, we believe the data with the reducer demonstrate that MIRO1 can be acutely targeted in vitro, but the mechanism of action (as the reviewer points out, the reduction of mitochondrial mass may lead to decreased ATP levels, potentially reducing cell proliferation) differs from that of chronic genetic deletion. In fact, we observe somewhat increased mitochondrial length in MIRO1-/- cells. We acknowledge that this is complex and have revised the paragraph to clarify the use of the MIRO1 reducer.

      Reviewer #2 (Public review):

      Summary:

      This study identifies the outer mitochondrial GTPase MIRO1 as a central regulator of vascular smooth muscle cell (VSMC) proliferation and neointima formation after carotid injury in vivo and PDGF-stimulation ex vivo. Using smooth muscle-specific knockout male mice, complementary in vitro murine and human VSMC cell models, and analyses of mitochondrial positioning, cristae architecture, and respirometry, the authors provide solid evidence that MIRO1 couples mitochondrial motility with ATP production to meet the energetic demands of the G1/S cell cycle transition. However, a component of the metabolic analyses is suboptimal and would benefit from more robust methodologies. The work is valuable because it links mitochondrial dynamics to vascular remodeling and suggests MIRO1 as a therapeutic target for vasoproliferative diseases, although whether pharmacological targeting of MIRO1 in vivo can effectively reduce neointima after carotid injury has not been explored. This paper will be of interest to those working on VSMCs and mitochondrial biology.

      Strengths:

      The strength of the study lies in its comprehensive approach, assessing the role of MIRO1 in VSMC proliferation in vivo, ex vivo, and importantly in human cells. The subject provides mechanistic links between MIRO1-mediated regulation of mitochondrial mobility and optimal respiratory chain function to cell cycle progression and proliferation. Finally, the findings are potentially clinically relevant given the presence of MIRO1 in human atherosclerotic plaques and the available small molecule MIRO1.

      Weaknesses:

      (1) There is a consistent lack of reporting across figure legends, including group sizes, n numbers, how many independent experiments were performed, or whether the data is mean +/- SD or SEM, etc. This needs to be corrected.

      These data were added in the revised manuscript.

      (2) The in vivo carotid injury experiments are in male mice fed a high-fat diet; this should be explicitly stated in the abstract, as it's unclear if there are any sex- or diet-dependent differences. Is VSMC proliferation/neointima formation different in chow-fed mice after carotid injury?

      This is an important point, and we appreciate the feedback. In this model, the transgene is located on the Y chromosome. As a result, only male mice can be studied. However, in our previous experiments, we have not observed any sex-dependent changes in neointimal formation. Additionally, please note that smooth muscle cell proliferation in neointimal formation is enhanced in models of cholesterol-fed mice on a high-fat diet.

      (3) The main body of the methods section is thin, and it's unclear why the majority of the methods are in the supplemental file. The authors should consider moving these to the main article, especially in an online-only journal.

      We thank the reviewer for this suggestion. We moved the methods to the main manuscript.

      (4) Certain metabolic analyses are suboptimal, including ATP concentration and Complex I activity measurements. The measurement of ATP/ADP and ATP/AMP ratios for energy charge status (luminometer or mass spectrometry), while high-resolution respirometry (Oroboros) to determine mitochondrial complex I activity in permeabilized VSMCs would be more informative.

      ATP/ADP and ATP/AMP ratios were assessed on samples from WT and Miro1-/- VSMCs using an ATP/ADP/AMP Assay Kit (Cat#: A-125) purchased from Biomedical Research Service, University at Buffalo, New York). Miro1-/- samples exhibited reduced ATP levels accompanied by elevated concentrations of ADP and AMP. As a result, both ATP/ADP and ATP/AMP ratios were significantly lower in MIRO1-/- cells compared to WT, indicating impaired cellular energy homeostasis (Figure 5B, C).

      (5) The statement that 'mitochondrial mobility is not required for optimal ATP production' is poorly supported. XF Seahorse analysis should be performed with nocodazole and also following MIRO1 reconstitution +/- EF hands.

      To evaluate the metabolic effects of Nocodazole, we conducted Seahorse metabolic assays on vascular smooth muscle cells with various conditions (VSMCs). We used WT VSMCs, Miro1-/- VSMCs, and Miro1-/- VSMCs that expressed either MIRO1-WT, KK, or ΔTM mutants.Our results demonstrate that Nocodazole exposure did not compromise mitochondrial respiratory activity. However, Miro1-/- VSMCs displayed a trend toward reduced basal and maximal mitochondrial respiration when compared to WT cells. This deficit was only partially corrected by the expression of the MIRO1-KK mutant. In contrast, reintroducing MIRO1-WT through adenoviral delivery fully restored mitochondrial respiration to normal levels (Figure 5- Figure supplement 1).

      (6) The authors should consider moving MIRO1 small molecule data into the main figures. A lot of value would be added to the study if the authors could demonstrate that therapeutic targeting of MIRO1 could prevent neointima formation in vivo.

      We appreciate the reviewer's comment and attempted the suggested in vivo experiments using the commercially available Miro1 reducer. For these experiments, we used a pluronic gel to deliver the reducer to the adventitial area surrounding the carotid artery. Despite numerous attempts to optimize the experimental conditions, we were unable to reliably detect a significant effect of the reducer on mitochondria in the vascular wall.

      Reviewer #3 (Public review):

      Summary:

      This study addresses the role of MIRO1 in vascular smooth muscle cell proliferation, proposing a link between MIRO1 loss and altered growth due to disrupted mitochondrial dynamics and function. While the findings are potentially useful for understanding the importance of mitochondrial positioning and function in this specific cell type within health and disease contexts, the evidence presented appears incomplete, with key bioenergetic and mechanistic claims lacking adequate support.

      Strengths:

      (1)The study focuses on an important regulatory protein, MIRO1, and its role in vascular smooth muscle cell (VSMC) proliferation, a relatively underexplored context.

      (2) It explores the link between smooth muscle cell growth, mitochondrial dynamics, and bioenergetics, which is a potentially significant area for both basic and translational biology.

      (3) The use of both in vivo and in vitro systems provides a potentially useful experimental framework to interrogate MIRO1 function in this context.

      Weaknesses:

      (1) The central claim that MIRO1 loss impairs mitochondrial bioenergetics is not convincingly demonstrated, with only modest changes in respiratory parameters and no direct evidence of functional respiratory chain deficiency.

      (2) The proposed link between MIRO1 and respiratory supercomplex assembly or function is speculative, lacking mechanistic detail and supported by incomplete or inconsistent biochemical data.

      (3) Key mitochondrial assays are either insufficiently controlled or poorly interpreted, undermining the strength of the conclusions regarding oxidative phosphorylation.

      (4) The study does not adequately assess mitochondrial content or biogenesis, which could confound interpretations of changes in respiratory activity.

      (5) Overall, the evidence for a direct impact of MIRO1 on mitochondrial respiratory function in the experimental setting is weak, and the conclusions overreach the data.

      Recommendations for the authors:

      Reviewer #3 (Recommendations for the authors):

      (1)  Throughout the manuscript, the authors incorrectly use "mobility" to describe the active transport of mitochondria. The appropriate term is "mitochondrial motility," which refers to active, motor-driven movement. "Mobility" implies passive diffusion and is not scientifically accurate in this context.

      (2) "Super complex" should be consistently written as "supercomplex," in line with accepted mitochondrial biology terminology.

      We thank the reviewer for this comment and revised the text accordingly.

      (3) A significant limitation of the in vivo model is the mild phenotype observed, which is expected from an inducible knockout system. The authors should clarify whether a constitutive, tissue-specific knockout was considered and, if not, whether embryonic lethality or another limitation prevented its generation.

      This genetic model was originally developed by Dr. Janet Shaw at the University of Utah. In the original publication, Miro1 was constitutively knocked out in neurons. Germline inactivation of Miro1 was achieved by crossing mice harboring the Miro1F allele with a mouse line expressing Cre recombinase under the control of the hypoxanthine-guanine phosphoribosyltransferase (HPRT) promoter. Mating Miro1+/− mice resulted in Miro1−/− animals, which were cyanotic and died shortly after birth. Due to this outcome, we opted to develop an inducible, smooth muscle-specific model. Additionally, we considered testing whether the acute use of an inhibitor or a knockdown system targeting Miro1 could be evaluated as a potential therapeutic approach.

      (4) In Figure 1A and S1A, the authors use Western blotting to validate the knockout in the aorta and IHC in carotid arteries. The choice of different methods does not seem justified, and qPCR data are shown only for the aorta. IHC appears to be suboptimal for assessing MIRO1 levels in vascular tissue due to high autofluorescence, and IHC in Figure S1A is merely qualitative, with no quantification provided.

      We present complementary approaches to validate the deletion of Miro1. For Western blot analysis, we used the aorta because it provides more material for analysis. The autofluorescence observed via immunofluorescence is characteristic of elastin fibers within the media layer, making our results typical for this technique. As shown in Figure 1- Figure supplement 1, our data demonstrate a significant decrease, if not a complete knockout, of the target protein specifically in smooth muscle cells.

      (5) In Figure 1G, the bottom left panel (magnification) shows a lower green signal than the top left panel, suggesting these may have been collected with different signal intensity. This raises concerns about image consistency and representation.

      Top images in Figure 1G are taken at magnification 63x. Bottom images were made at magnification 20x. The intensity is different between the two magnifications, but similar between genotypes.

      (6) In Figure S3, the sampling is uncontrolled: the healthy subject and the patient differ markedly in age. The claim of colocalization is not substantiated with any quantitative analysis.

      As outlined in the Methods section, our heart samples were obtained from LVAD patients or explanted hearts from transplant recipients. Due to the limited availability of such samples, there is indeed a difference in age between the healthy subject and the patient. While we acknowledge this limitation, the scarcity of samples made it challenging to control for age. Additionally, we determined that performing a quantitative analysis of colocalization would not yield robust or meaningful data given the constraints of our sample size and variability. 

      (7) Figure S4A lacks statistical analysis, which is necessary for interpreting the data shown.

      This appears to be a misunderstanding. In this manuscript, we do present statistically significant differences and focus on those that are biologically meaningful. Specifically, we highlight differences between PDGF treatment versus no treatment within the same genotype, as well as differences between the two genotypes under the same treatment condition (control or PDGF treatment). In this particular case, there is only a statistical difference between WT+PDGF and SM-Miro1-/, but since this is not a meaningful comparison, it is not shown. Please note that this approach applies to all figures in the manuscript. Including all comparisons—whether statistically significant or not, and whether biologically meaningful or not—may appear rigorous but in our opinion, ultimately detracts from the main message of this paper.

      (8) The authors state, "given the generally poor proliferation of VSMCs from SM-MIRO1-/- mice, in later experiments we used VSMCs from MIRO1fl/fl mice and infected them with adenovirus expressing cre." This is not convincing, especially since in vivo cre efficiency is generally lower than in vitro. Moreover, the methods indicate that "VSMCs from littermate controls were subjected to the same procedure with empty vector control adenovirus," yet in Figure 2A, the control appears to be MIRO1fl/fl VSMCs transduced with Ad-EV. The logic and consistency of the controls used need clarification.

      For the initial experiments, cells were explanted from SM-MIRO1-/- mice (Figure 2- Figure supplement 1). In these mice, Cre recombination had occurred in vivo, and the cells exhibited very poor growth. In fact, their growth was so limited that we decided not to pursue this experimental approach after three independent experiments.

      For subsequent experiments, cells were explanted from Miro1fl/fl mice and passaged several times, which allowed us to generate the number of cells required for the experiments (Figure 2B). Once sufficient Miro1fl/fl cells were obtained, they were treated with adenovirus expressing Cre, as described in the Methods section. Control cells were treated with an empty vector adenovirus. To clarify, the control cells are Miro1fl/fl cells infected with an empty vector adenovirus, while the MIRO1-/- cells are Miro1fl/fl cells infected with adenovirus expressing Cre. The statement that “littermate controls were used” is incorrect as in fact, Miro1fl/fl cells from the same preparation were either infected with an empty vector adenovirus, or with adenovirus expressing Cre. As mentioned, the knockdown was confirmed by Western blotting.

      (9) Figure 2C shows a growth delay in MIRO1-/- cells. Have the authors performed additional time points to determine when these cells return to G1 and quantify the duration of the lag?

      This is an excellent suggestion. So far, we have not performed this experiment.

      (10) In the 24 h time point of Figure 2C, MIRO1-/- cells appear to be cycling, yet no cyclin E signal is detected. How do the authors explain this inconsistency? Additionally, in Figure 2H, the quantification of cyclin E is unreliable, given that lanes 3 and 4 show no detectable signal.

      We agree with the reviewer—the inconsistency is driven by the exposure of the immunoblot presented. We revisited the data, reviewed the quantification, and performed an additional experiment. We are now presenting an exposure that demonstrates levels of cyclin E (Figure 2G).

      (11) In Figure 3D, the authors present mitochondrial probability map vs. distance from center curves. How was the "center" defined in this analysis? Were radial distances normalized across cells (e.g., to the cell radius or maximum extent)? If not, variation in cell and/or nucleus size or shape could significantly affect the resulting profiles. No statistical analysis is provided for this assessment, which undermines its quantitative value. Furthermore, the rationale behind the use of mito95 values is not clearly explained.

      The center refers to the center of the microchip's Y-shaped pattern, to which each cell is attached. Since all Y-shapes on the chip are identical in size, normalization is not required. The size of the optimal Y-shapes was tested as recommended by CYTOO. For further context, please refer to the papers by the Kittler group.

      Additionally, a graph demonstrating the percentage of mitochondria localized at specific distances can be produced for any given distance. Notably, the further from the center of the chip, the more pronounced the differences become.

      (12) The authors apply a 72 h oligomycin treatment to assess proliferation and a 16 h treatment to measure ATP levels. This discrepancy in experimental design is not justified in the manuscript. The length of treatment directly impacts the interpretation of the data in Figures 4C, 4D, and 4E, and needs to be addressed.

      Thank you for this comment. We have performed additional experiments to align these time points. In the revised manuscript, we now present proliferation and ATP production measured at the same time point (Figure 4A, B for proliferation and ATP levels).

      (13) The manuscript repeatedly suggests that MIRO1 loss causes a defect in mitochondrial ATP production, yet no direct demonstration of a bioenergetic defect is provided. The claim relies on a modest decrease in supercomplex species (of undefined composition) and a mild reduction in complex I activity that does not support a substantial OXPHOS defect. Notably, the respirometry data in Figure 5I do not align with the BN-PAGE results in Figure 6I. There is increasing evidence that respiratory chain supercomplexes do not confer a catalytic advantage. The authors should directly assess the enzymatic activities of all respiratory complexes. Reported complex I activity in MIRO1-/- cells appears rotenone-like (virtually zero, figure 3K) or ~30% residual (Figure 3L), suggesting a near-total loss of functional complex I, which is not reflected in the BN-PAGE. Additionally, complex I activity has not been normalized to a mitochondrial reference, such as citrate synthase.

      Given that we work in primary cells and are limited by the number of cells we can generate, we concentrated on ETC1 and 5 and performed experiments in cells after expression of MIRO1 WT and MIRO1 mutants (Figure 6- Figure supplement 1). Please note that the addition of Rotenone abolishes the slope of NADH consumptions (Figure 6- Figure supplement 2F).

      While the ETC1 activity is measured in Fig. 6K, the blue native gel shown in Figure 6I is performed without substrate and thus, indicative of protein complex abundance rather than complex activity.

      In additional experiments, we normalized the activity to citrate synthase as requested.

      (14) In the methods section, the complex I activity assay is incorrectly described: complex I is a NADH dehydrogenase, so the assay measures NADH oxidation, not NADPH.

      We thank the reviewer for his comment and revised the manuscript accordingly.

      (15) The authors have not assessed mitochondrial mass, which is a critical omission. Differences in mitochondrial biogenesis or content could underlie several observed phenotypes and should be controlled for.

      A qPCR assay was used to assess mitochondrial DNA copy number in WT and Miro1-/- VSMCs. We determined the abundance of COX1 and MT-RNR1 DNA as mitochondrial gene targets and NDUFV DNA as the nuclear reference gene. While the results in Miro1-/- cells were highly variable, no statistically significant reduction of copy numbers was detected (Figure 3- Figure supplement 1B).

      (16) Complex IV signal is missing in Figure 6I. Its omission is not acknowledged or explained.

      Thank you for this comment. We believe this is due to a technical issue. Complex IV can be challenging to detect consistently, as its visibility is highly dependent on sample preparation conditions. In this specific case, we suspect that the buffer used during the isolation process may have influenced the detection of Complex IV.

      (17) Figure 6D does not appear representative of the quantifications shown. C-MYC signal is visibly reduced in the mutant, consistent with the lower levels of interactors such as Sam50 and NDUFA9. Additionally, the SDHA band is aligned at the bottom of the blot box. The list of antibodies used, and their catalog number is missing, or it was not provided to the reviewers. It seems plausible that the authors used a cocktail antibody set (e.g., Abcam ab110412), which includes anti-NDUFA9. This would contradict the claim of reduced complex I and SC levels, as the steady-state levels of NDUFA9 appear unchanged.

      We acknowledge that the expression of the myc-MIRO1 mutant is lower compared to myc-MIRO1 WT or myc-MIRO1 KK. Achieving identical expression levels when overexpressing multiple MIRO1 constructs is challenging. We agree that the lower expression of this mutant contributes to a reduced pull-down. Our quantification shows a reduction in association, although it is not statistically significant.

      A list of the antibodies was provided in the Methods section.

      We would like to clarify that we did not use an antibody cocktail in our experiments.

      (18) The title of Figure 6, "Loss of Miro1 leads to dysregulation of ETC activity under growth conditions," is vague. The term "dysregulation" should be replaced with a more specific mechanistic descriptor-what specific regulatory defect is meant?

      We thank the reviewer for this suggestion and rephrased the title.

      (19) In the results text for Figure 6, the authors state: "These data demonstrate that MIRO1 associates with MIB/MICOS and that this interaction promotes the formation of mitochondrial super complexes and the activity of ETC complex I." This conclusion is speculative and not mechanistically supported by the data presented.

      We appreciate the reviewer's feedback. We have revised the text to clarify the relationship between MIRO1, MIB/MICOS, supercomplex formation, and ETC activity. The updated text now states: "These data demonstrate that MIRO1 associates with MIB/MICOS. Additionally, MIRO1 promotes the formation of mitochondrial supercomplexes and enhances the activity of ETC complex I.”

      (20) In Figure 7A, it is unclear what the 3x siControl/siMiro1 pairs represent-are these different cell lines or technical replicates of the same line? No loading control is shown. If changes in mitochondrial protein abundance are being evaluated, using COX4 as a loading control is inappropriate. The uneven COX4 signal across samples further complicates interpretation

      Please note that we used primary cells, not cell lines. The three siControl/siMiro1 pairs represent independent cell isolations, each transfected with either siControl or. siMIRO1 mRNA. While the possibility of a difference in mitochondrial mass is an interesting question, the primary objective of this experiment is to demonstrate that the technique effectively results in the knockdown of Miro1, which is exclusively localized to mitochondria and not present in the cytosol. As such, we believe that Cox4 serves as a reasonable loading control. Although Miro1 knockdown may lead to a reduction in mitochondrial mass, the focus of this experiment is not to assess mitochondrial mass but to confirm the reduction in Miro1 protein levels on mitochondria. We also performed anti-VDAC immunoblots on the same membranes as alternative loading control (Author response image 1).

      Author response image 1.

      (21) Figure 7G is difficult to interpret. Why did the authors choose to use a sensor-based method instead of the chemiluminescent assay to measure ATP in these samples?

      Both methods were employed to assess ATP levels in human samples. ATP measurements obtained with luminescent assay are provided.

    1. eLife Assessment

      This manuscript provides useful insights into how the brain can simultaneously represent events and the times when they occurred. The results include a comparison between two different basis functions for temporal selectivity and how these generate different predictions for the dynamics of neural populations. The conclusions are partly incomplete because of questions such as the impact of the linear separability assumption and whether joint encodings of event type and time can be made without it.

    2. Joint Public Review:

      Quite obviously, the brain encodes "time", as we are able to tell if something happened before or after something else. How this is done, however, remains essentially not understood. In the context of Working Memory tasks, many experiments have shown that the neural activity during the retention period "encodes" time, besides the stimulus to be remembered; that is, the time elapsed from stimulus presentation can be reliably inferred from the recordings, even if time per se is not important for the task. This implies 'mixed selectivity', in the weak sense of neural activity varying with both stimulus identity and time elapsed (since presentation).

      In this paper, the authors investigate the implications of a specific form of such mixed selectivity, that is, conjunctive coding of what (stimulus) and when (time) at the single-neuron level, on the resulting dynamics of the population activity when 'viewed' through linear dimensionality-reduction techniques, essentially Principal Component Analysis (PCA). The theoretical/modeling results presented provide a useful guide to the interpretation of the experimental results; in particular, with respect to what can, or cannot, be rightfully inferred from those experimental results (using PCA-like techniques). The results are essentially theoretical in nature; there are, however, some conclusions that require a more precise justification, in my opinion. More generally, as the authors themselves discuss in the paper, it is not clear how to generalize this coding scheme to more complicated, but behaviorally and cognitively relevant, situations, such as multi-item WM or WM for sequences.

      (1) It is unclear to me how the conjunctive code that the authors use (i.e., Equation (3)) is constrained by the theoretical desiderata (i.e., compositionality) they list, or whether it is simply an ansatz, partly motivated by theoretical considerations and experimental observations.

      The "what" part: What the authors mean by "relationships" between stimuli is never clearly defined. From their argument (and from Figure 1b), it would seem that what they mean is "angles" between population vectors for all pairs of stimuli. If this is so, then the effect of the passing time can only amount to a uniform rescaling of the components of the population vector (i.e., it must be a similarity transformation; rotations are excluded, if the linear-decoder vectors are to be time-independent); the scaling factor, then, must be a strictly monotonous function of time (increasing or decreasing), if one is to decode time. In other words, the "when" receptive fields must be the same for all neurons.

      The "when" part: The condition, \tau_3=\tau_1+\tau_2, does not appear to be used at all. In fact, it is unclear (to me at least) whether the model, as it is formulated, is able to represent time intervals between stimuli.

      (2) For the specific case considered, i.e., conjunctive coding, it would seem that one should be able to analytically work out the demixed PCA (see Kobak et al., 2016). More generally, it seems interesting to compare the results of the PCA and the demixed PCA in this specific case, even just using synthetic data.

      (3) In the Section "Dimensionality of neural trajectories...", there is some claim about how the dimensionality of the population activity goes up with the observation window T, backed up by numerical results that somehow mimic the results of Cueva et al. (2020) on experimental data. Is this a result that can be formally derived? Related to this point, it would be useful to provide a little more justification for Equation (17). Naively, one would think that the correlation matrix of the temporal component is always full-rank nominally, but that one can get excellent low-rank approximations (depending on T, following your argument).

    1. eLife Assessment

      The authors provide a scholarly review of intracranial research into the neural correlates of consciousness (NCCs). To our knowledge, this is the first such review, and it therefore may become a must-read for anyone working in the field of consciousness research. It is not so persuasive that intracranial recordings are better suited to identifying pure NCCs than other methods, which appears a problem instead solved through novel paradigms and better-developed theories - but this no doubt reflects an in-depth, timely, and insightful contribution to the literature.

    2. Reviewer #1 (Public review):

      Summary

      In this review paper, the authors describe the concept of neural correlates of consciousness (NCC) and explain how noninvasive neuroimaging methods fall short of being able to properly characterise an unconfounded NCC. They argue that intracranial research is a means to address this gap and provide a review of many intracranial neuroimaging studies that have sought to answer questions regarding the neural basis of perceptual consciousness.

      Strengths

      The authors have provided an in-depth, timely, and scholarly contribution to the study of NCCs. First and foremost, the review surveys a vast array of literature. The authors synthesise findings such that a coherent narrative of what invasive electrophysiology studies have revealed about the neural basis of consciousness can be easily grasped by the reader. The review is also, to the best of my knowledge, the first review to specifically target intracranial approaches to consciousness and to describe their results in a single article. This is a credit to the authors, as it becomes ever harder to apply strict tests to theories of consciousness using methods such as fMRI and M/EEG it is important to have informative resources describing the results of human intracranial research so that theorists will have to constrain their theories further in accordance with such data. As far as the authors were aiming to provide a complete and coherent overview of intracranial approaches to the study of NCCs, I believe they have achieved their aim.

      Weaknesses

      Overall, I feel positive about this paper. However, there are a couple of aspects to the manuscript that I think could be improved.

      (1) Distinguishing NCCs from their prerequisites or consequences

      This section in the introduction was particularly confusing to me. Namely, in this section, the authors' aim is to explain how intracranial recordings can help distinguish 'pure' NCCs from their antecedents and consequences. However, the authors almost exclusively describe different tasks (e.g., no-report tasks) that have been used to help solve this problem, rather than elaborating on how intracranial recordings may resolve this issue. The authors claim that no-report designs rely on null findings, and invasive recordings can be more sensitive to smaller effects, which can help in such cases. However, this motivation pertains to the previous sub-section (limits of noninvasive methods), since it is primarily concerned with the lack of temporal and spatial resolution of fMRI and M/EEG. It is not, in and of itself, a means to distinguish NCCs from their confounds.

      As such, in its current formulation, I do not find the argument that intracranial recordings are better suited to identifying pure NCCs (i.e. separating them from pre- or post-processing) convincing. To me, this is a problem solved through novel paradigms and better-developed theories. As it stands, the paper justifies my position by highlighting task developments that help to distinguish NCCs from prerequisites and consequences, rather than giving a novel argument as to why intracranial recordings outperform noninvasive methods beyond the reasons they explained in the previous section. Again, this position is justified when, from lines 505-506, the authors describe how none of the reported single-cell studies were able to dissociate NCCs from post-perceptual processing. As such, it seems as if, even with intracranial recording, NCCs and their confounds cannot be disentangled without appropriate tasks.

      The section 'Towards Better Behavioural Paradigms' is a clear attempt to address these issues and, as such, I am sure the authors share the same concerns as I am raising. Still, I remain unconvinced that the distinguishing of NCCs from pre-/post- processing is a fair motivation for using intracranial over noninvasive measures.

      (2) Drawing misleading conclusions from certain studies

      There are passages of the manuscript where the authors draw conclusions from studies that are not necessarily warranted by the studies they cite. For instance:

      Lines 265 - 271: "The results of these two studies revealed a complex pattern: on the one hand, HGA in the lateral occipitotemporal cortex and the ventral visual cortex correlated with stimulus strength. On the other hand, it also correlated with another factor that does not appear to play a role in visibility (repetition suppression), and did not correlate with a non-sensory factor that affects visibility reports (prior exposure). These results suggest that activity in occipitotemporal cortex regions reflecting higher-order visual processing may be a precursor to the NCC but not an NCC proper."

      It's possible to imagine a theory that would predict HGA could correlate with stimulus strength and repetition suppression, or that it would not correlate with prior exposure (e.g. prior exposure could impact response bias without affecting subjective visibility itself). The authors describe this exact ambiguity in interpretation later in the article (line 664), but in its current form, at least in line 270 (when the study is most extensively discussed), the manuscript heavily implies that HGA is not an NCC proper. This generates a false impression that intracranial recordings have conclusively determined that occipitotemporal HGA is not a pure NCC, which is certainly a premature conclusion.

      Line 243: "Altogether, these early human intracranial studies indicate that early-latency visual processing steps, reflected in broadband and low gamma activity, occur irrespective of whether a stimulus is consciously perceived or not. They also identified a candidate NCC: later (>200 ms) activity in the occipitotemporal region responsible for higher-order visual processing."

      The authors claim in this section that later (>200ms) activity in occipitotemporal regions may be a candidate for an NCC. However, the Fisch et al. (2009) study they describe in support of this conclusion found that early (~150ms) activity could dissociate conscious and unconscious processing. This would suggest that it is early processing that lays claim to perceptual consciousness. The authors explicitly describe the Fisch et al results as showing evidence for early markers of consciousness (line 240: '...exhibited an early...response following recognized vs unrecognised stimuli.) Yet only a few lines later they use this to support the conclusion that a candidate NCC is 'later (>200ms) activity in the occipitotemporal region' (line 245). As such, I am not sure what conclusion the authors want me to make from these studies.

      This problem is repeated in lines 386-387: "Altogether, studies that investigated the cortical correlates of visual consciousness point to a role of neural responses starting ~250 ms after stimulus onset in the non-primary visual cortex and prefrontal cortex."

      This seems to be directly in conflict with the Fisch et al results, which show that correlates of consciousness can begin ~100ms earlier than the authors state in this passage.

      (3) Justifying single-neuron cortical correlates of consciousness

      The purpose of the present manuscript is to highlight why and how intracortical measures of neural activity can help reveal the neural correlates of perceptual consciousness. As such, in the section 'Single-neuron cortical correlates of perceptual consciousness', I think the paper is lacking an argument as to why single-neuron research is useful when searching for the NCC. Most theories of consciousness are based around circuit or system-level analyses (e.g., global ignition, recurrent feedback, prefrontal indexing, etc.) and usually do not make predictions about single cells. Without any elaboration or argument as to why single-cell research is necessary for a science of consciousness, the research described in this section, although excellent and valuable in its own right, seems out of place in the broader discussion of NCCs. A particularly strong interpretation here could be that intracranial recordings mislead researchers into studying single cells simply because it is the finest level of analysis, rather than because it offers helpful insight into the NCCs.

      (4) No mention of combined fMRI-EEG research

      A minor point, but I was surprised that the authors did not mention any combined fMRI-EEG research when they were discussing the limits of noninvasive recordings. Intracortical recordings are one way to surpass the spatial and temporal resolution limits of M/EEG and fMRI respectively, but studies that combine fMRI and EEG are also an alternative means to solve this problem: by combining the spatial resolution of fMRI with the temporal resolution of EEG, researchers can - in theory - compare when and where certain activity patterns (be they univariate ERPs or multivariate patterns) arise. The authors do cite one paper (Dellert et al., 2021 JNeuro) that used this kind of setup, but they discuss it only with respect to the task and ignore the recording method. The argument for using intracranial recordings is weaker for not mentioning a viable, noninvasive alternative that resolves the same issues.

    3. Reviewer #2 (Public review):

      Summary:

      In this work, the authors review the study of the neural correlates of consciousness (NCCs). They discuss several of the difficulties that researchers must face when studying NCCs, and argue that several of these difficulties can be alleviated by using intracranial recordings in humans.

      They describe what constitutes an NCC, and the difficulties to distinguish between an NCC proper from the prerequisites and consequences of conscious processing.

      They also describe the two main types of experimental designs used to study NCCs. These are the contrastive approach (with its report and non-report variants), and the supraliminal approach, each with its own merits and pitfalls.

      They discuss the limitations of non-invasive methods, such as fMRI, EEG and MEG, as well as the limitations of the use of invasive recordings in non-human animals.

      After setting the stage in this way, the authors provide an extensive review of the knowledge acquired by using invasive recordings in humans. This included population-level measurements in vision and in other sensory modalities, as well as single-neuron level studies. The authors also discuss studies of subcortical NCCs.

      The second half of this work discusses the theoretical insights gained through the use of intracranial recordings, as well as their limitations, and a perspective for future work.

      Strengths:

      This work offers an impressive review, which will serve as a useful reference document, both for newcomers to the study of NCC and for experienced researchers. The inclusion of non-visual and subcortical NCCs is of particular merit, as these have been understudied.

      Besides serving as a review, this work includes a perspective, exploring several directions to pursue for the progress of the field.

      Weaknesses:

      The intention of the authors is to argue how some of the problems faced when studying NCCs are alleviated by the use of intracranial recordings in humans. But in some cases, the link between the problems related to the study of NCCs and the advantages of intracranial recordings over non-invasive methods is not clear.

      For example, the authors explain the difficulties in distinguishing between true NCCs from their prerequisites and consequences. This constitutes a difficult conceptual problems that plague all recording techniques. The authors don't provide a convincing explanation of how intracranial recordings offer advantages over EEG or MEG when dealing with these problems.

      For example, the authors explain how the use of non-report designs to rule out post-perceptual processing relies on null results, which, according to them, are harder to interpret given the low resolution of non-invasive methods. But the interpretation of null results is actually more complicated in the case of intracranial recordings. As the coverage achieved by the electrodes is sparse, if a null result is attested, it remains possible that a true effect was present in a nearby patch of cortex out of coverage.

      The authors argue that the spatial resolution of intracranial recordings is better than that of EEG and MEG. While this is technically true (especially compared to EEG), the true spatial scale of the NCCs is unknown. If NCCs' span is in the mm range, then the additional spatial resolution of intracranial recordings might not be an advantage.

      Another factor that should be taken into consideration when assessing the spatial resolution of intracranial recordings is that while the listening zone of individual intracranial contacts is small, coverage is sparse and defined by clinical criteria (something that the authors discuss). In practice, the activity recorded by contacts is usually attributed to anatomically defined ROIs with a scale in the cm range. Given the sparse and uneven (across regions and patients) coverage afforded by intracranial recordings, the advantage of intracranial recordings in terms of spatial resolution is overstated.

      Appraisal of whether the authors achieved their aims:

      In this work, the authors have gathered an impressive review and have discussed several important problems in the field of study of NCCs, as well as provided a perspective on how the field could move forward.

      What is less clear is how the use of intracranial recordings per se holds potential to overcome problems such as the distinction between true NCCs and the prerequisites and consequences of conscious processing.

      Discussion of the likely impact of the work on the field:

      This work has the potential of becoming a must-read for anyone working in the field of consciousness research.

    4. Reviewer #3 (Public review):

      Summary:

      This narrative review provides a clear, well-structured, and comprehensive synthesis of intracerebral recording work on the neural correlates of consciousness. It is written in an accessible manner that will be useful to a broad community of researchers, from those new to iEEG to specialists in the field.

      Strengths:

      The manuscript successfully integrates methodological and theoretical perspectives and offers a balanced overview of current, sometimes contradicting evidence. As such, the manuscript is important as it calls for a concerted and better exploration of NCCs using iEEG in the future.

      Weaknesses:

      The manuscript extensively discusses the use of "report" as a criterion for identifying conscious perception and its limitations for separating between correlates of consciousness and post-consciousness processes, yet the term is not defined at the outset. The authors should specify what they mean by "report" (e.g., verbal report, nonverbal self-report, or any meta-cognitive indication of experience). Importantly, this definition should be explicitly linked to the theoretical landscape: whether the authors adopt an access-consciousness perspective in which (self) reportability is central, or whether the review also aims to address phenomenal consciousness. Making this conceptual grounding explicit at the beginning will help readers interpret the empirical work surveyed throughout the review.

      In addition, the review would benefit from an earlier introduction of the distinction between states and contents of consciousness. This distinction becomes important in the later section on anaesthesia, sleep, and epileptic seizures, where the focus shifts from content-specific NCCs to alterations in global states. Presenting these definitions upfront and briefly explaining how states and contents interact would strengthen the coherence of the manuscript.

      Overall, this is an excellent and timely review. With clearer initial theoretical definitions of consciousness, the manuscript will offer an even stronger conceptual framework for interpreting intracerebral studies of consciousness.

    1. eLife Assessment

      This important study establishes a workflow based on environmental sampling for the discovery of bacteriophages capable of infecting antibiotic-resistant pathogens. The experimental design, analysis, and results demonstrating the effectiveness of the workflow are convincing, although a broader sampling scheme and more careful framing of the data within the current limitations of viral taxonomy could strengthen the work. This study will interest researchers working on bacterial infections, environmental microbiology, and phage-based alternatives for addressing antimicrobial resistance.

    2. Reviewer #1 (Public review):

      Summary:

      In the manuscript "Pathogen-Phage Geomapping to Overcome Resistance," Do et al. present an impressive demonstration of using geographical sampling and metagenomics to guide sample choice for enrichment in human-associated microbes and the pathogen of interest to increase the chances of success for isolating phages active against highly resistant bacterial strains. The authors document many notable successes (17!) with highly resistant bacterial isolates and share a thoughtfully structured phage discovery effort, potentially opening the door to similar geomapping efforts across the field. While the work is methodologically strong and valuable for the community, there are a few areas where additional clarification and analysis could better align the claims with the data presented.

      Strengths:

      (1) The manuscript describes a well-executed and transparent example of overcoming a major obstacle in therapeutic virus identification, providing a practical success story that will resonate with researchers in microbiology and medicine.

      (2) Many phage researchers have anecdotally experienced a similar phenomenon, that a particular wastewater treatment plant always seems to have the pathogens you need. Quantifying this with metagenomics modernizes and adds evidence to this phenomenon in a way that could help researchers reproduce this success in a methodical way.

      (3) The methodology of combining environmental sampling, viral screening, and host-range analysis is clearly articulated and reproducible, offering a valuable blueprint for others in the field.

      (4) The data are presented with appropriate analytical rigor, and the results include robust sequencing and metagenomic profiling that deepen understanding of local viral communities.

      (5) The 17 successes yielding 35 phages have a lot of phylogenetic novelty beyond what the Tailor labs have typically found with previous methods.

      (6) The work highlights a practical and innovative solution to an increasingly important clinical problem, supporting the development of personalized antiviral strategies.

      Weaknesses:

      (1) The central concept of geomapping as a broadly applicable strategy is wonderfully supported by the 17 successes documented in the paper. While this is actually, of course, a strength, the study does not include a comparative analysis across multiple sites with varying sampling outcomes for different bacterial types, which would be necessary to validate this claim more generally.

      (2) Some elements, such as beta diversity comparisons and the metagenomics analysis of viral dark matter, would benefit from additional statistical analysis and clearer context.

      (3) Claims about therapeutic cocktails would be better framed as speculative and/or moved to the discussion section.

      (4) The manuscript could be strengthened by elaborating on the scope and composition of the phage and bacterial isolate collections, which are important for interpreting the broader significance of the findings.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript by Do and colleagues aims to develop a workflow for isolating and identifying bacteriophages with potential applications in phage therapy against antibiotic-resistant pathogens. The workflow integrates geΦmapping as a strategy to identify potential phage sources, ΦHD as a device for phage concentration, and RΦ as a phage library constructed from the initial sampling, resulting in the discovery of 36 new phages. The paper is overall interesting, and the proposed method appears robust and effective.

      Strengths:

      The methods proposed combined state-of-the-art strategies to solve an ever-increasing problem of antibiotic resistance. The methods are robust, and the controls are appropriate. The integration of environmental sampling, concentration strategies, and downstream genomic characterization is a clear strength and provides a potentially scalable framework for identifying candidate therapeutic phages. The manuscript is clearly written overall, and the results support the main conclusions.

      Weaknesses:


      While the authors acknowledge several limitations, some aspects require clearer framing or additional clarification. The proposed workflow focuses exclusively on aquatic environments as sources of phages, which may limit the diversity of hosts and phage types recoverable using this approach. Some interpretations, particularly regarding taxonomic classification and sampling saturation, would benefit from more cautious wording given current limitations in viral taxonomy and the observed data.

    1. eLife Assessment

      This important work shows that a history of cocaine self-administration disrupts the orbitofrontal cortex's ability to encode similarities between distinct sensory stimuli that possess identical task information - hidden states. The evidence supporting these conclusions is compelling, with methods and analyses spanning self-administration, a novel 'figure 8' sequential odor task, recordings from 3,881 single units, and sophisticated firing analyses revealing complex orbitofrontal representations of task structure. These results will be of broad interest to psychologists, neuroscientists, and clinicians.

    2. Reviewer #1 (Public review):

      Summary:

      In this study, the authors trained rats on a "figure 8" go/no-go odor discrimination task. Six odor cues (3 rewarded and 3 non-rewarded) were presented in a fixed temporal order and arranged into two alternating sequences that partially overlap (Sequence #1: 5⁺-0⁻-1⁻-2⁺; Sequence #2: 3⁺-0⁻-1⁻-4⁺) --forming an abstract figure-8 structure of looping odor cues.

      This task is particularly well-suited for probing representations of hidden states, defined here as the animal's position within the task structure beyond superficial sensory features. Although the task can be solved without explicit sequence tracking, it affords the opportunity to generalize across functionally equivalent trials (or "positions") in different sequences, allowing the authors to examine how OFC representations collapse across latent task structure.

      Rats were first trained to criterion on the task and then underwent 15 days of self-administration of either intravenous cocaine (3 h/day) or sucrose. Following self-administration, electrodes were implanted in lateral OFC, and single-unit activity was recorded while rats performed the figure-8 task.

      Across a series of complementary analyses, the authors report several notable findings. In control animals, lOFC neurons exhibit representational compression across corresponding positions in the two sequences. This compression is observed not only in trial/positions involving overlapping odor (e.g., Position 3 = odor 1 in sequence 1 vs sequence 2), but also in trials/positions involving distinct, sequence-specific odors (e.g., Position 4: odor 2 vs odor 4) --indicating generalization across functionally equivalent task states. Ensemble decoding confirms that sequence identity is weakly decodable at these positions, consistent with the idea that OFC representations collapse incidental differences in sensory information into a common latent or hidden state representation. In contrast, cocaine-experienced rats show persistently stronger differentiation between sequences, including at overlapping odor positions.

      Strengths:

      Elegant behavioral design that affords the detection of hidden-state representations.

      Sophisticated and complementary analytical approaches (single-unit activity, population decoding, and tensor component analysis).

      Weaknesses:

      The number of subjects is small --can't fully rule out idiosyncratic, animal-specific effects.

      Comments

      (1) Emergence of sequence-dependent OFC representations across learning.

      A conceptual point that would benefit from further discussion concerns the emergence of sequence-dependent OFC activity at overlapping positions (e.g., position P3, odor 1). This implies knowledge of the broader task structure. Such representations are presumably absent early in learning, before rats have learned the sequence structure. While recordings were conducted only after rats were well trained, it would be informative if the authors could comment on how they envision these representations developing over learning. For example, does sequence differentiation initially emerge as animals learn the overall task structure, followed by progressive compression once animals learn that certain states are functionally equivalent? Clarifying this learning-stage interpretation would strengthen the theoretical framing of the results.

      (2) Reference to the 24-odor position task

      The reference to the previously published 24-odor position task is not well integrated into the current manuscript. Given that this task has already been published and is not central to the main analyses presented here, the authors may wish to a) better motivate its relevance to the current study or b) consider removing this supplemental figure entirely to maintain focus.

      (3) Missing behavioral comparison

      Line 117: the authors state that absolute differences between sequences differ between cocaine and sucrose groups across all three behavioral measures. However, Figure 1 includes only two corresponding comparisons (Fig. 1I-J). Please add the third measure (% correct) to Figure 1, and arrange these panels in an order consistent with Figure 1F-H (% correct, reaction time, poke latency).

      (4) Description of the TCA component

      Line 220: authors wrote that the first TCA component exhibits low amplitude at positions P1 and P4 and high amplitude at positions P2 and P3. However, Figure 3 appears to show the opposite pattern (higher magnitude at P1 and P4 and lower magnitude at P2 and P3). Please check and clarify this apparent discrepancy. Alternatively, a clearer explanation of how to interpret the temporal dynamics and scaling of this component in the figure would help readers correctly understand the result.

      (5) Sucrose control<br /> Sucrose self-administration is a reasonable control for instrumental experience and reward exposure, but it means that this group also acquired an additional task involving the same reinforcer. This experience may itself influence OFC representations and could contribute to the generalization observed in control animals. A brief discussion of this possibility would help contextualize the interpretation of cocaine-related effects.

      (6) Acknowledge low N

      The number of rats per group is relatively low. Although the effects appear consistent across animals within each group, this sample size does not fully rule out idiosyncratic, animal-specific effects. This limitation should be explicitly acknowledged in the manuscript.

      (7) Figure 3E-F: The task positions here are ordered differently (P1, P4, P2, P3) than elsewhere in the paper. Please reorder them to match the rest of the paper.

    3. Reviewer #2 (Public review):

      In the current study, the authors use an odor-guided sequence learning task described as a "figure 8" task to probe neuronal differences in latent state encoding within the orbitofrontal cortex after cocaine (n = 3) vs sucrose (n = 3) self-administration. The task uses six unique odors which are divided into two sequences that run in series. For both sequences, the 2nd and 3rd odors are the same and predict reward is not available at the reward port. The 1st and 4th odors are unique, and are followed by reward. Animals are well-trained before undergoing electrode implant and catheterization, and then retrained for two weeks prior to recording. The hypothesis under test is that cocaine-experienced animals will be less able to use the latent task structure to perform the task, and instead encode information about each unique sequence that is largely irrelevant. Behaviorally, both cocaine and sucrose-experienced rats show high levels of accuracy on task, with some group differences noted. When comparing reaction times and poke latencies between sequences, more variability was observed in the cocaine-treated group, implying animals treated these sequences somewhat differently. Analyses done at the single unit and ensemble level suggests that cocaine self-administration had increased the encoding of sequence-specific information, but decreased generalization across sequences. For example, the ability to decode odor position and sequence from neuronal firing in cocaine-treated animals was greater than controls. This pattern resembles that observed within the OFC of animals that had fewer training sessions. The authors then conducted tensor component analysis (TCA) to enable a more "hypothesis agnostic" evaluation of their data.

      Overall, the paper is well written and the authors do a good job of explaining quite complicated analyses so that the reader can follow their reasoning. I have the following comments.

      While well-written, the introduction mainly summarises the experimental design and results, rather than providing a summary of relevant literature that informed the experimental design. More details regarding the published effects of cocaine self-administration on OFC firing, and on tests of behavioral flexibility across species, would ground the paper more thoroughly in the literature and explain the need for the current experiment.

      For Fig 1F, it is hard to see the magnitude of the group difference with the graph showing 0-100%- can the y axis be adjusted to make this difference more obvious? It looks like the cocaine-treated animals were more accurate at P3- is that right?<br /> The concluding section is quite brief. The authors suggest that the failure to generalize across sequences observed in the current study could explain why people who are addicted to cocaine do not use information learned e.g. in classrooms or treatment programs to curtail their drug use. They do not acknowledge the limitations of their study e.g. use of male rats exclusively, or discuss alternative explanations of their data.

      Is it a problem that neuronal encoding of the "positions" i.e. the specific odors was at or near chance throughout in controls? Could they be using a simpler strategy based on the fact that two successive trials are rewarded, then two successive trials are not rewarded, such that the odors are irrelevant?

      When looking at the RT and poke latency graphs, it seems the cocaine-experienced rats were faster to respond to rewarded odors, and also faster to poke after P3. Does this mean they were more motivated by the reward?

    1. eLife Assessment

      This important study provides the first direct neuroimaging evidence for the integration-segregation theory of exogenous attention underlying inhibition of return, using an optimized IOR-Stroop fMRI paradigm to dissociate integration and segregation processes and to demonstrate that attentional orienting modulates semantic- and response-level conflict processing. Although the empirical evidence is compelling, clearer justification of the experimental logic, more cautious framing of behavioral and regional interpretations, and greater transparency in reporting and presentation are needed to strengthen the conclusions. The work will be of broad interest to researchers investigating visual attention, perception, cognitive control, and conflict processing.

    2. Reviewer #1 (Public review):

      Summary:

      This study makes a significant and timely contribution to the field of attention research. By providing the first direct neuroimaging evidence for the integration-segregation theory of exogenous attention, it fills a critical gap in our understanding of the neural mechanisms underlying inhibition of return (IOR). The authors employ a carefully optimized cue-target paradigm combined with fMRI to elegantly dissociate the neural substrates of cue-target integration from those of segregation, thereby offering compelling support for the integration-segregation account. Beyond validating a key theoretical hypothesis, the study also uncovers an interaction between spatial orienting and cognitive conflict processing, suggesting that exogenous attention modulates conflict processing at both semantic and response levels. This finding shed new light on the neural mechanisms that connect exogenous attentional orienting with cognitive control.

      Strengths:

      The experimental design is rigorous, the analyses are thorough, and the interpretation is well grounded in the literature. The manuscript is clearly written, logically structured, and addresses a theoretically important question. Overall, this is an excellent, high-impact study that advances both theoretical and neural models of attention.

      Weaknesses:

      While this study addresses an important theoretical question and presents compelling neuroimaging findings, a few additional details would help improve clarity and interpretation. Specifically, more information could be provided regarding the experimental conditions (SI and RI), the justification for the criteria used for excluding behavioral trials, and how the null condition was incorporated into the analyses. In addition, given the non-significant interaction effect in the behavioral results, the claim that the behavioral data "clearly isolated" distinct semantic and response conflict effects should be phrased more cautiously.

    3. Reviewer #2 (Public review):

      Summary:

      This study provides evidence for the integration-segregation theory of an attentional effect, widely cited as inhibition of return (IOR), from a neuroimaging perspective, and explores neural interactions between IOR and cognitive conflict, showing that conflict processing is potentially modulated by attentional orienting.

      Strengths:

      The integration-segregation theory was examined in a sophisticated experimental task that also accounted for cognitive conflict processing, which is phenomenologically related to IOR but "non-spatial" by nature. This study was carefully designed and executed. The behavioral and neuroimaging data were carefully analyzed and largely well presented.

      Weaknesses:

      The rationale for the experimental design was not clearly explained in the manuscript; more specifically, why the current ER-fMRI study would disentangle integration and segregation processes was not explained. The introduction of "cognitive conflict" into the present study was not well reasoned for a non-expert reader to follow.

      The presentation of the results can be further improved, especially the neuroimaging results. For instance, Figure 4 is challenging to interpret. If "deactivation" (or a reduction in activation) is regarded as a neural signature of IOR, this should be clearly stated in the manuscript.

    4. Reviewer #3 (Public review):

      Summary:

      This study aims to provide the first direct neuroimaging evidence relevant to the integration-segregation theory of exogenous attention - a framework that has shaped behavioral research for more than two decades but has lacked clear neural validation. By combining an inhibition-of-return (IOR) paradigm with a modified Stroop task in an optimized event-related fMRI design, the authors examine how attentional integration and segregation processes are implemented at the neural level and how these processes interact with semantic and response conflicts. The central goal is to map the distinct neural substrates associated with integration and segregation and to clarify how IOR influences conflict processing in the brain.

      Strengths:

      The study is well-motivated, addressing a theoretically important gap in the attention literature by directly testing a long-standing behavioral framework with neuroimaging methods. The experimental approach is creative: integrating IOR with a Stroop manipulation expands the theoretical relevance of the paradigm, and the use of a genetic-algorithm-optimized fMRI design ensures high efficiency. Methodologically, the study is sound, with rigorous preprocessing, appropriate modeling, and analyses that converge across multiple contrasts. The results are theoretically coherent, demonstrating plausible dissociations between integration-related activity in the fronto-parietal attention network (FEF, IPS, TPJ, dACC) and segregation-related activity in medial temporal regions (PHG, STG). The findings advance the field by supplying much-needed neural evidence for the integration-segregation framework and by clarifying how IOR modulates conflict processing.

      Weaknesses:

      Some interpretive aspects would benefit from clarification, particularly regarding the dual roles ascribed to dACC activation and the circumstances under which PHG and STG are treated as a single versus separate functional clusters. Reporting conventions are occasionally inconsistent (e.g., statistical formatting, abbreviation definitions), which may hinder readability. More detailed reporting of sample characteristics, exclusion criteria, and data-quality metrics-especially regarding the global-variance threshold-would improve transparency and reproducibility. Finally, some limitations of the study, including potential constraints on generalization, are not explicitly acknowledged and should be articulated to provide a more balanced interpretation.

    1. eLife Assessment

      This important work contributes a transcriptional dataset that identifies potential genes involved in axon initial growth and axon regrowth, followed by a characterization of axon phenotypes after knockdown of a subset of these genes. Focused experiments on a single gene, Pmvk, highlight the potential role of the mevalonate pathway in axon regrowth. The methods are convincing, though partially incomplete. The data establish a basis for further studies on axonal development and will be of interest to both developmental neurobiologists and those seeking to develop molecular tools to target, monitor, and manipulate axon morphology and function.

    2. Reviewer #1 (Public review):

      Summary:

      Fahdan et al. present a study investigating the molecular programs underlying axon initial growth and regrowth in Drosophila mushroom body (MB) neurons. The authors leverage the fact that different Kenyon cell (KC) subtypes undergo distinct axonal events on the same developmental timeline: γ KCs prune and then regrow their axons during early pupation, whereas α/β KCs extend their axons for the first time during the same pupal period. Using bulk Smart-seq2 RNA sequencing across six developmental time points, the authors identify genes enriched during γ KC regrowth and α/β KC initial outgrowth, and subsequently perform an RNAi screen to determine which candidates are functionally required for these processes.

      Among these, they focus on Pmvk, a key enzyme in the mevalonate pathway. Both RNAi knockdown and a CRISPR-generated mutant produce strong γ KC regrowth defects. Knockdown of other mevalonate pathway components (Hmgcr, Mvk) partially recapitulates this phenotype. The authors propose that Pmvk promotes axonal regrowth through effects on the TOR pathway.

      Overall, this work identifies new molecular players in developmental axon remodeling and provides intriguing evidence connecting Pmvk to γ KC regrowth.

      While the Pmvk knockdown and loss-of-function data are compelling, the evidence that the mevalonate pathway broadly regulates γ KC axon regrowth is less clear. RNAi knockdown of enzymes upstream of Pmvk (Hmgcr, Mvk) produces only mild phenotypes, and knockdown of several downstream enzymes produces no phenotype. The authors attribute this discrepancy to the possibility of weak RNAi constructs, which is plausible but not fully demonstrated. It would be helpful for the authors to discuss alternative explanations, including non-canonical roles for Pmvk that may not require the full pathway, and clarify the extent to which the current data support the conclusion that the mevalonate pathway, rather than Pmvk specifically, is a core regulator of regrowth.

      It is not clear from the Methods whether γ KCs and α/β KCs were sorted from the same brains using orthogonal binary expression systems (e.g., Gal4 > reporter 1 and LexA > reporter 2), or isolated separately from different fly lines. If the latter, differences in genetic background, staging, or batch effects could influence transcriptional comparisons. This should be explicitly clarified in the Methods, and any associated limitations discussed in the manuscript.

      The authors have made important findings that contribute to our understanding of axon growth and regrowth. As written, some major claims are only partially supported, but these issues can be addressed through reframing and clarification. In particular, the manuscript would benefit from (1) a more cautious interpretation of the mevalonate pathway's role, potentially considering Pmvk non-canonical functions, and (2) addressing methodological ambiguities in the transcriptomic analysis.

    3. Reviewer #2 (Public review):

      Fahdan et al. set out to build upon their previous work outlining the genes involved in axon growth, targeting two axon growth states: initial growth and regrowth. They outline a debate in the field that axon regrowth (For instance, after injury or in the peripheral nervous system) is different from initial axon growth, for which the authors have previously demonstrated distinct mechanisms. The authors set out to directly compare the transcriptomes of initial axon growth and regrowth, specifically within the same neuronal environment and developmental time point. To this end, the authors used the well-characterized genetic tools available in Drosophila melanogaster (the fruit fly) to build a valuable dataset of genes involved at different time points in axon growth (alpha/beta Mushroom Body Kenyon cells) and regrowth (gamma Mushroom Body Kenyon cells). The authors then focus on genes that are upregulated during both initial axon growth and axon regrowth. Then, using this subset of genes, they screen for axonal growth and regrowth deficits by knocking down 300 of these genes. 12 genes are found to be phenotypically involved in both axon growth and regrowth based on RNAi gene-targeted knockdown in the Mushroom Body. Of these 12 genes, the authors focus on one gene, Pmvk, which is part of the mevalonate pathway. They then highlight other genes in this pathway. But these genes primarily affect axon regrowth, not initial axon growth, implicating metabolic pathways in axon regrowth. This comprehensive RNA-seq dataset will be a valuable resource for the field of axon growth and regrowth, as well as for other researchers studying the Mushroom Body.

      Strengths:

      This paper contains many strengths, including the in-depth sequencing of overlapping developmental time points during the alpha/beta KCs' initial axon growth and gamma KCs' regrowth. This produces a rich dataset of differentially expressed genes across different time points in either cell population during development. In addition, the authors characterized expression patterns at developmental time points for 30 Gal4 lines previously identified as alpha/beta KC-expressing. This is very helpful for Drosophila

      Mushroom Body researchers because the authors not only characterized alpha/beta expression but also alpha'/beta' expression, gamma expression, and non-MB expression. The authors comprehensively walked through identifying differentially expressed genes during alpha/beta axon growth, identifying a subset of overlapping upregulated genes between cell types, then systematically characterized whether knockdown of a subset of these genes produced an axonal growth defect, and finally selected 1 of 3 cell-autonomous genes important for gamma KCs regrowth to further study.

      The authors utilized the developing Mushroom Body in Drosophila melanogaster, which happens to have new neurons developing axons and neurons that have undergone pruning and are regrowing neurons at the same developmental time. They are also in the same part of the brain (the Mushroom Body) and, in theory, since the authors implicate a metabolic pathway, they will have similar metabolic growth conditions.

      Identifying Pmvk and two other components of the mevalonate pathway in axon regrowth opens up novel avenues for future studies on the role this metabolic pathway may have in axon growth. The authors of this paper are also very upfront about their negative results, allowing researchers to avoid running redundant experiments and truly build on this work.

      Weaknesses:

      While the dataset produced in this study is a strength, certain aspects make it more challenging to interpret. For instance, the authors state that roughly equal numbers of males and females are used for sequencing, and this vagueness, coupled with only taking a subset of the GFP-labeled neurons during FACs sorting, can introduce confounds into the dataset. This may hold true in imaging studies as well, in which males and females were used interchangeably.

      Additionally, a rationale is needed to explain why random numbers of 1-7 were assigned to zero-expressing genes in the DESeq analysis. This does not seem to conform to the usual way this analysis is normally performed. This can alter how genes across the dataset are normalized and requires further explanation.

      The display and discussion of the data set do not always align with the authors' stated goal of having a comprehensive description of the genes that dynamically change during axon<br /> growth and regrowth. Displaying more information about genes differentially expressed in the alpha/beta KCs, or any information about the genes diƯerentially expressed in the gamma KCs when using the same criteria as the alpha/beta KCs, or the 676 overlapping upregulated genes, would significantly add to this paper. The authors previously performed a similar study across developmental time points for gamma KCs, and it is not clear whether any overlapping genes were identified. Also, more information on the genes consisting of PC1 and PC3 when showing the PCA analysis would be helpful. Within the text, there is a discussion of why certain genes or gene groups were omitted or selected, such as clusters 1 and 2, and then some of their subgroups based on expected genes. There is also some discussion of omitted gene groups, but this is not complete across the different clusters, nor is there a discussion of why PC2 was not selected or of which genes might exhibit greater variability than cell type. The authors would make a stronger case for the genes they pursued if they showed that groups of genes already known to be involved in axon growth clustered within the selected groups. Since we do not see the gene lists, this is unclear and adds to the sometimes arbitrary nature of the author's choices about what to pursue in this paper. A larger set of descriptors, such as gene lists and Gene Ontology analysis beyond what is shown, would be very helpful in putting the results in context and determining whether this is a resource beneficial to others.

      While the Pmvk story is interesting, the authors appear to make some arbitrary decisions in what is shown or pursued in this paper. Visually, CadN and Twr appear to be more severe axon regrowth phenotypes, where the peduncle appears intact, and axons are not regrowing in Figures 3 N and O. In contrast, Pmvk visually appears to lose neurons in Figure 3 M. With a change of the Gal4 driver (Figure 4), Pmvk now produces a gamma axon regrowth phenotype similar to CadN and Twr in Figure 3. This diƯerence in the use of Gal4 for characterizing axonal phenotypes is not discussed, making some interpretations more challenging due to diƯerences in Gal4 expression strength. For instance, the sequencing work was done with a diƯerent Gal4 MB expressing line than the characterization of gene knockdowns. Further characterization of the Pmvk was performed in the same Gal4 lines as the sequencing (Figure 4), suggesting a potential diƯerence in Gal4 strength that may play a role in their rescue experiments if they are using a slightly weaker Gal4 for gamma lobe expression. A broader discussion of this may make the selection of Pmvk less arbitrary if the phenotype is similar to those of CadN and Twr. Along the lines of the sometimes arbitrary nature of the genes chosen to pursue further, the authors state that they selected genes that showed differential expression at any time point. As they refine their list of genes to pursue further, they seem to prioritize genes that change at 18-21 APF. This appears to be the early period for axon growth in alpha/beta KCs and gamma KCs, based on Figure 1. A stronger case might be made at longer time points when the axon is growing or regrowing.

      The paper would benefit from scaling back the claim that the mevalonate pathway is involved. The authors identified only a subset of genes from the mevalonate pathway, all immediately upstream of Pmvk, with no effect on downstream genes. Along these lines, the paper would benefit from a discussion of non-canonical PmvK signaling.

      While the ability to take neurons at the same developmental time and from the same brain region is a strength, they are still 2 different types of neurons. Although gamma neuron axon growth occurs very early in development, it would be interesting to know whether the same genes are involved in their initial growth. A caveat to the author's conclusion is that these are 2 different cell types, and they might use different genetic programs or use overlapping ones at other times. The authors did not show that gamma KCs use these genes in their initial axon growth.

    1. eLife Assessment

      This valuable study characterises the activity of motor units from two of the three anatomical subdivisions ("heads") of the triceps muscle while mice walked on a treadmill at various speeds. Altogether, this is the most thorough characterisation of motor unit activity in walking mice to date, providing solid evidence for probabilistic recruitment of motor units that differed between the two heads.

    2. Reviewer #1 (Public review):

      Summary:

      Here, the authors have addressed the recruitment and firing patterns of motor units (MUs) from the long and lateral heads of triceps in the mouse. They used their newly developed Myomatrix arrays to record from these muscles during treadmill locomotion at different speeds, and they used template-based spike sorting (Kilosort) to extract units. Between MUs from the two heads, the authors observe differences in their firing rates, recruitment probability, phase of activation within the locomotor cycle and interspike interval patterning. Examining different walking speeds, the authors find increases in both recruitment probability and firing rates as speed increases. The authors also observed differences in the relation between recruitment and the angle of elbow extension between motor units from each head. These differences indicate meaningful variation between motor units within and across motor pools, and may reflect the somewhat distinct joint actions of the two heads of triceps.

      Strengths:

      The extraction of MU spike timing for many individual units is an exciting new method that has great promise for exposing the fine detail in muscle activation and its control by the motor system. In particular, the methods developed by the authors for this purpose seem to be the only way to reliably resolve single MUs in the mouse, as the methods used previously in humans and in monkeys (e.g. Marshall et al. Nature Neuroscience, 2022) do not seem readily adaptable for use in rodents.

      The paper provides a number of interesting observations. There are signs of interesting differences in MU activation profiles for individual muscles here, consistent with those shown by Marshall et al. It is also nice to see fine scale differences in the activation of different muscle heads, which could relate to their partially distinct functions. The mouse offers greater opportunities for understanding the control of these distinct functions, compared to the other organisms in which functional differences between heads have previously been described.

      The Discussion is very thorough, providing a very nice recounting of a great deal of relevant previous results.

      Weaknesses:

      The findings are limited to one pair of muscle heads. While the findings are important in their own right, the lack of confirmation from analysis of other muscles acting at other joints leaves the generalization of these findings unclear.

      While differences between muscle heads with somewhat distinct functions are interesting and relevant to joint control, differences between MUs for individual muscles, like those in Marshall et al., are more striking because they cannot be attributed potentially to differences in each head's function. The present manuscript does show some signs of differences for MUs within individual heads (e.g. Figure 2C), but the manuscript falls short of providing a statistical basis for the existence of distinct subpopulations.

    3. Reviewer #2 (Public review):

      The present study, led by Thomas and collaborators, aims to characterise the firing activity of individual motor units in mice during locomotion. To achieve this, the team implanted small arrays of eight electrodes into two heads of the triceps and performed spike sorting using a custom implementation of Kilosort. Concurrently, they tracked the positions of the shoulder, elbow, and wrist using a single camera and a markerless motion capture algorithm (DeepLabCut). Repeated one-minute recordings were conducted in six mice across five speeds, ranging from 10 to 27.5 cm-1.

      From these data, the authors demonstrate that:

      - Their recording method and adapted spike-sorting algorithm enable robust decoding of motor unit activity during rapid movements.<br /> - Identified motor units tend to be recruited during a subset of strides, with recruitment probability increasing with speed.<br /> - Motor units within individual heads of the triceps likely receive common synaptic inputs that correlate their activity, whereas motor units from different heads exhibit distinct behaviour.

      The authors conclude that these differences arise from the distinct functional roles of the muscles and the task constraints (i.e., speed).

      Strengths:

      - The novel combination of electrode arrays for recording intramuscular electromyographic signals from a larger muscle volume, paired with an advanced spike-sorting pipeline capable of identifying motor unit populations.<br /> - The robustness of motor unit decoding during fast movements.

      Weaknesses:

      - The data do not clearly indicate which motor units were sampled from each pool, leaving uncertainty as to whether the sample is biased towards high-threshold motor units or representative of the entire pool.<br /> - The results largely confirm the classic physiological framework of motor unit recruitment and rate coding, offering limited new insights into motor unit physiology.

      I would like to thank the authors for their thorough and insightful revisions. I am particularly pleased with the inclusion of the new analyses demonstrating the robustness of motor unit decoding, as well as the improved transparency regarding spike-sorting yield for each muscle and animal. Additionally, the new analyses illustrating that recruitment within muscle heads is consistent with the presence of common synaptic inputs and orderly recruitment significantly strengthen the manuscript.

    4. Reviewer #3 (Public review):

      Summary:

      Using the approach of Myomatrix recording, the authors report that 1) motor units are recruited differently in the two types of muscles and 2) individual units are probabilistically recruited during the locomotion strides, whereas the population bulk EMG has a more reliable representation of the muscle. Third, the recruitment of units was proportional to walking speed.

      Strengths:

      The new technique provides a unique dataset, and the data analysis is convincing and well-executed.

      Weaknesses:

      After the revision, I no longer see any apparent weaknesses in the study.

    5. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Here, the authors have addressed the recruitment and firing patterns of motor units (MUs) from the long and lateral heads of the triceps in the mouse. They used their newly developed Myomatrix arrays to record from these muscles during treadmill locomotion at different speeds, and they used template-based spike sorting (Kilosort) to extract units. Between MUs from the two heads, the authors observed differences in their firing rates, recruitment probability, phase of activation within the locomotor cycle, and interspike interval patterning. Examining different walking speeds, the authors find increases in both recruitment probability and firing rates as speed increases. The authors also observed differences in the relation between recruitment and the angle of elbow extension between motor units from each head. These differences indicate meaningful variation between motor units within and across motor pools and may reflect the somewhat distinct joint actions of the two heads of triceps.

      Strengths:

      The extraction of MU spike timing for many individual units is an exciting new method that has great promise for exposing the fine detail in muscle activation and its control by the motor system. In particular, the methods developed by the authors for this purpose seem to be the only way to reliably resolve single MUs in the mouse, as the methods used previously in humans and in monkeys (e.g. Marshall et al. Nature Neuroscience, 2022) do not seem readily adaptable for use in rodents.

      The paper provides a number of interesting observations. There are signs of interesting differences in MU activation profiles for individual muscles here, consistent with those shown by Marshall et al. It is also nice to see fine-scale differences in the activation of different muscle heads, which could relate to their partially distinct functions. The mouse offers greater opportunities for understanding the control of these distinct functions, compared to the other organisms in which functional differences between heads have previously been described.

      The Discussion is very thorough, providing a very nice recounting of a great deal of relevant previous results.

      We thank the Reviewer for these comments.

      Weaknesses:

      The findings are limited to one pair of muscle heads. While an important initial finding, the lack of confirmation from analysis of other muscles acting at other joints leaves the general relevance of these findings unclear.

      The Reviewer raises a fair point. While outside the scope of this paper, future studies should certainly address a wider range of muscles to better characterize motor unit firing patterns across different sets of effectors with varying anatomical locations. Still, the importance of results from the triceps long and lateral heads should not be understated as this paper, to our knowledge, is the first to capture the difference in firing patterns of motor units across any set of muscles in the locomoting mouse.

      While differences between muscle heads with somewhat distinct functions are interesting and relevant to joint control, differences between MUs for individual muscles, like those in Marshall et al., are more striking because they cannot be attributed potentially to differences in each head's function. The present manuscript does show some signs of differences for MUs within individual heads: in Figure 2C, we see what looks like two clusters of motor units within the long head in terms of their recruitment probability. However, a statistical basis for the existence of two distinct subpopulations is not provided, and no subsequent analysis is done to explore the potential for differences among MUs for individual heads.

      We agree with the Reviewer and have revised the manuscript to better examine potential subpopulations of units within each muscle as presented in Figure 2C. We performed Hartigan’s dip test on motor units within each muscle to test for multimodal distributions. For both muscles, p > 0.05, so we cannot reject the null hypothesis that the units in each muscle come from a multimodal distribution. However, Hartigan’s test and similar statistical methods have poor statistical power for the small sample sizes (n=17 and 16 for long and lateral heads, respectively) considered here, so the failure to achieve statistical significance might reflect either the absence of a true difference or a lack of statistical resolution.

      Still, the limited sample size warrants further data collection and analysis since the varying properties across motor units may lead to different activation patterns. Given these results, we have edited the text as follows:

      “A subset of units, primarily in the long head, were recruited in under 50% of the total strides and with lower spike counts (Figure 2C). This distribution of recruitment probabilities might reflect a functionally different subpopulation of units. However, the distribution of recruitment probabilities were not found to be significantly multimodal (p>0.05 in both cases, Hartigan’s dip test; Hartigan, 1985). However, Hartigan’s test and similar statistical methods have poor statistical power for the small sample sizes (n=17 and 16 for long and lateral heads, respectively) considered here, so the failure to achieve statistical significance might reflect either the absence of a true difference or a lack of statistical resolution.”

      The statistical foundation for some claims is lacking. In addition, the description of key statistical analysis in the Methods is too brief and very hard to understand. This leaves several claims hard to validate.

      We thank the Reviewer for these comments and have clarified the text related to key statistical analyses throughout the manuscript, as described in our other responses below.

      Reviewer #2 (Public review):

      The present study, led by Thomas and collaborators, aims to describe the firing activity of individual motor units in mice during locomotion. To achieve this, they implanted small arrays of eight electrodes in two heads of the triceps and performed spike sorting using a custom implementation of Kilosort. Simultaneously, they tracked the positions of the shoulder, elbow, and wrist using a single camera and a markerless motion capture algorithm (DeepLabCut). Repeated one-minute recordings were conducted in six mice at five different speeds, ranging from 10 to 27.5 cm·s<sup>-1</sup>.

      From these data, the authors reported that:

      (1) a significant portion of the identified motor units was not consistently recruited across strides,

      (2) motor units identified from the lateral head of the triceps tended to be recruited later than those from the long head,

      (3) the number of spikes per stride and peak firing rates were correlated in both muscles, and

      (4) the probability of motor unit recruitment and firing rates increased with walking speed.

      The authors conclude that these differences can be attributed to the distinct functions of the muscles and the constraints of the task (i.e., speed).

      Strengths:

      The combination of novel electrode arrays to record intramuscular electromyographic signals from a larger muscle volume with an advanced spike sorting pipeline capable of identifying populations of motor units.

      We thank the Reviewer for this comment.

      Weaknesses:

      (1) There is a lack of information on the number of identified motor units per muscle and per animal.

      The Reviewer is correct that this information was not explicitly provided in the prior submission. We have therefore added Table 1 that quantifies the number of motor units per muscle and per animal.

      (2) All identified motor units are pooled in the analyses, whereas per-animal analyses would have been valuable, as motor units within an individual likely receive common synaptic inputs. Such analyses would fully leverage the potential of identifying populations of motor units.

      Please see our answer to the following point, where we address questions (2) and (3) together.

      (3) The current data do not allow for determining which motor units were sampled from each pool. It remains unclear whether the sample is biased toward high-threshold motor units or representative of the full pool.

      We thank the Reviewer for these comments. To clarify how motor unit responses were distributed across animals and muscle targets, we updated or added the following figures:  

      Figure 2C

      Figure 4–figure supplement 1

      Figure 5–figure supplement 2

      Figure 6–figure supplement 2

      These provide a more complete look at the range of activity within each motor pool, suggesting that we do measure from units with different activation thresholds within the same motor pool, rather than this variation being due to cross-animal differences. For example, Figure 2C illustrates that motor units from the same muscle and animal show a wide variety of recruitment probabilities. However, the limited number of motor units recorded from each individual animal does not allow a statistically rigorous test for examining cross-animal differences.

      (4) The behavioural analysis of the animals relies solely on kinematics (2D estimates of elbow angle and stride timing). Without ground reaction forces or shoulder angle data, drawing functional conclusions from the results is challenging.

      The Reviewer is correct that we did not measure muscular force generation or ground reaction forces in the present study. Although outside the scope of this study, future work might employ buckle force transducers as used in larger animals (Biewener et al., 1988; Karabulut et al., 2020) to examine the complex interplay between neural commands, passive biomechanics, and the complex force-generating properties of muscle tissue.

      Major comments:

      (1) Spike sorting

      The conclusions of the study rely on the accuracy and robustness of the spike sorting algorithm during a highly dynamic task. Although the pipeline was presented in a previous publication (Chung et al., 2023, eLife), a proper validation of the algorithm for identifying motor unit spikes is still lacking. This is particularly important in the present study, as the experimental conditions involve significant dynamic changes. Under such conditions, muscle geometry is altered due to variations in both fibre pennation angles and lengths.

      This issue differs from electrode drift, and it is unclear whether the original implementation of Kilosort includes functions to address it. Could the authors provide more details on the various steps of their pipeline, the strategies they employed to ensure consistent tracking of motor unit action potentials despite potential changes in action potential waveforms, and the methods used for manual inspection of the spike sorting algorithm's output?

      This is an excellent point and we agree that the dynamic behavior used in this investigation creates potential new challenges for spike sorting. In our analysis, Kilosort 2.5 provides key advantages in comparing unit waveforms across multiple channels and in detecting overlapping spikes. We modified this version of Kilosort to construct unit waveform templates using only the channels within the same muscle (Chung et al., 2023), as clarified in the revised Methods section (see “Electromyography (EMG)”):

      “A total of 33 units were identified across all animals. Each unit’s isolation was verified by confirming that no more than 2% of inter-spike intervals violated a 1 ms refractory limit. Additionally, we manually reviewed cross-correlograms to ensure that each waveform was only reported as a single motor unit.”

      The Reviewer is correct that our ability to precisely measure a unit’s activity based on its waveform will depend on the relationship between the embedded electrode and the muscle geometry, which alters over the course of the stride. As a follow-up to the original text, we have included new analyses to characterize the waveform activity throughout the experiment and stride (also in Methods):

      “We further validated spike sorting by quantifying the stability of each unit’s waveform across time (Figure 1–figure supplement 1). First, we calculated the median waveform of each unit across every trial to capture long-term stability of motor unit waveforms. Additionally, we calculated the median waveform through the stride binned in 50 ms increments using spiking from a single trial. This second metric captures the stability of our spike sorting during the rapid changes in joint angles that occur during the burst of an individual motor unit. In doing so, we calculated each motor unit’s waveforms from the single channel in which that unit’s amplitude was largest and did not attempt to remove overlapping spikes from other units before measuring the median waveform from the data. We then calculated the correlation between a unit’s waveform over either trials or bins in which at least 30 spikes were present. The high correlation of a unit waveform over time, despite potential changes in the electrodes’ position relative to muscle geometry over the dynamic task, provides additional confidence in both the stability of our EMG recordings and the accuracy of our spike sorting.”

      (2) Yield of the spike sorting pipeline and analyses per animal/muscle

      A total of 33 motor units were identified from two heads of the triceps in six mice (17 from the long head and 16 from the lateral head). However, precise information on the yield per muscle per animal is not provided. This information is crucial to support the novelty of the study, as the authors claim in the introduction that their electrode arrays enable the identification of populations of motor units. Beyond reporting the number of identified motor units, another way to demonstrate the effectiveness of the spike sorting algorithm would be to compare the recorded EMG signals with the residual signal obtained after subtracting the action potentials of the identified motor units, using a signal-to-residual ratio.

      Furthermore, motor units identified from the same muscle and the same animal are likely not independent due to common synaptic inputs. This dependence should be accounted for in the statistical analyses when comparing changes in motor unit properties across speeds and between muscles.

      We thank the Reviewer for this comment. Regarding motor unit yield, as described above the newly-added Table 1 displays the yield from each animal and muscle.

      Regarding spike sorting, while signal-to-residual is often an excellent metric, it is not ideal for our high-resolution EMG signals since isolated single motor units are typically superimposed on a “bulk” background consisting of the low-amplitude waveforms of other motor units. Because these smaller units typically cannot be sorted, it is challenging to estimate the “true” residual after subtracting (only) the largest motor unit, since subtracting each sorted unit’s waveform typically has a very small effect on the RMS of the total EMG signal. To further address concerns regarding spike sorting quality, we added Figure 1–figure supplement 1 that demonstrates motor units’ consistency over the experiment, highlighting that the waveform maintains its shape within each stride despite muscle/limb dynamics and other possible sources of electrical noise or artifact.

      Finally, the Reviewer is correct that individual motor units in the same muscle are very likely to receive common synaptic inputs. These common inputs may reflect in sparse motor units being recruited in overlapping rather than different strides. Indeed, in the following added to the Results, we identified that motor units are recruited with higher probability when additional units are recruited.

      “Probabilistic recruitment is correlated across motor units

      Our results show that the recruitment of individual motor units is probabilistic even within a single speed quartile (Figure 5A-C) and predicts body movements (Figure 6), raising the question of whether the recruitment of individual motor units are correlated or independent. Correlated recruitment might reflect shared input onto the population of motor units innervating the muscle (De Luca, 1985; De Luca & Erim, 1994; Farina et al., 2014). For example, two motor units, each with low recruitment probabilities, may still fire during the same set of strides. To assess the independence of motor unit recruitment across the recorded population, we compared each unit’s empirical recruitment probability across all strides to its conditional recruitment probability during strides in which another motor unit from the same muscle was recruited (Figure 7). Doing this for all motor unit pairs revealed that motor units in both muscles were biased towards greater recruitment when additional units were active (p<0.001, Wilcoxon signed-rank tests for both the lateral and long heads of triceps). This finding suggests that probabilistic recruitment reflects common synaptic inputs that covary together across locomotor strides.”

      (3) Representativeness of the sample of identified motor units

      However, to draw such conclusions, the authors should exclusively compare motor units from the same pool and systematically track violations of the recruitment order. Alternatively, they could demonstrate that the motor units that are intermittently active across strides correspond to the smallest motor units, based on the assumption that these units should always be recruited due to their low activation thresholds.

      One way to estimate the size of motor units identified within the same muscle would be to compare the amplitude of their action potentials, assuming that all motor units are relatively close to the electrodes (given the selectivity of the recordings) and that motoneurons innervating more muscle fibres generate larger motor unit action potentials.

      We thank the Reviewer for this comment. Below, we provide more detailed analyses of the relationships between motor unit spike amplitude and the recruitment probability as well as latency (relative to stride onset) of activation.

      We generated the below figures to illustrate the relationship between the amplitude of motor units and their firing properties. As suspected, units with larger-amplitude waveforms fired with lower probability and produced their first spikes later in the stride. If we were comfortable assuming that larger spike amplitudes mean higher-force units, then this would be consistent with a key prediction of the size principle (i.e. that higher-force units are recruited later). However, we are hesitant to base any conclusions on this assumption or emphasize this point with a main-text figure, since EMG signal amplitude may also vary due to the physical properties of the electrode and distance from muscle fibers. Thus it is possible that a large motor unit may have a smaller waveform amplitude relative to the rest of the motor pool.

      Author response image 1.

      Relation between motor unit amplitude and (A) recruitment probability and (B) mean first spike time within the stride. Colored lines indicate the outcome of linear regression analyses.

      Currently, the data seem to support the idea that motor units that are alternately recruited across strides have recruitment thresholds close to the level of activation or force produced during slow walking. The fact that recruitment probability monotonically increases with speed suggests that the force required to propel the mouse forward exceeds the recruitment threshold of these "large" motor units. This pattern would primarily reflect spatial recruitment following the size principle rather than flexible motor unit control.

      We thank the Reviewer for this comment. We agree with this interpretation, particularly in relation to the references suggested in later comments, and have added the following text to the Discussion to better reflect this argument:

      “To investigate the neuromuscular control of locomotor speed, we quantified speed-dependent changes in both motor unit recruitment and firing rate. We found that the majority of units were recruited more often and with larger firing rates at faster speeds (Figure 5, Figure5–figure supplement 1). This result may reflect speed-dependent differences in the common input received by populations of motor neurons with varying spiking thresholds (Henneman et al., 1965). In the case of mouse locomotion, faster speeds might reflect a larger common input, increasing the recruitment probability as more neurons, particularly those that are larger and generate more force, exceed threshold for action potentials (Farina et al., 2014).”

      (4) Analysis of recruitment and firing rates

      The authors currently report active duration and peak firing rates based on spike trains convolved with a Gaussian kernel. Why not report the peak of the instantaneous firing rates estimated from the inverse of the inter-spike interval? This approach appears to be more aligned with previous studies conducted to describe motor unit behaviour during fast movements (e.g., Desmedt & Godaux, 1977, J Physiol; Van Cutsem et al., 1998, J Physiol; Del Vecchio et al., 2019, J Physiol).

      We thank the Reviewer for this comment. In the revised Discussion (see ‘Firing rates in mouse locomotion compared to other species’) we reference several examples of previous studies that quantified spike patterns based on the instantaneous firing rate. We chose to report the peak of the smoothed firing rate because that quantification includes strides with zero spikes or only one spike, which occur regularly in our dataset (and for which ISI rate measures, which require two spikes to define an instantaneous firing rate, cannot be computed). Regardless, in the revised Figure 4B, we present an analysis that uses inter-spike intervals as suggested, which yielded similar ranges of firing rates as the primary analysis.

      (5) Additional analyses of behaviour

      The authors currently analyse motor unit recruitment in relation to elbow angle. It would be valuable to include a similar analysis using the angular velocity observed during each stride, re broadly, comparing stride-by-stride changes in firing rates with changes in elbow angular velocity would further strengthen the final analyses presented in the results section.

      We thank the Reviewer for this comment. To address this, we have modified Figure 6 and the associated Supplemental Figures, to show relationships in unit activation with both the range of elbow extension and the range of elbow velocity for each stride. These new Supplemental Figures show that the trends shown in main text Figure 6C and 6E (which show data from all speed quartiles on the same axes) are also apparent in both the slower and faster quartiles individually, although single-quartile statistical tests (with smaller sample size than the main analysis) not reach statistical significance in all cases.

      Reviewer #3 (Public review):

      Summary:

      Using the approach of Myomatrix recording, the authors report that:

      (1) Motor units are recruited differently in the two types of muscles.

      (2) Individual units are probabilistically recruited during the locomotion strides, whereas the population bulk EMG has a more reliable representation of the muscle.

      (3) The recruitment of units was proportional to walking speed.

      Strengths:

      The new technique provides a unique data set, and the data analysis is convincing and well-performed.

      We thank the Reviewer for the comment.

      Weaknesses:

      The implications of "probabilistical recruitment" should be explored, addressed, and analyzed further.

      Comments:

      One of the study's main findings (perhaps the main finding) is that the motor units are "probabilistically" recruited. The authors do not define what they mean by probabilistically recruited, nor do they present an alternative scenario to such recruitment or discuss why this would be interesting or surprising. However, on page 4, they do indicate that the recruitment of units from both muscles was only active in a subset of strides, i.e., they are not reliably active in every step.

      If probabilistic means irregular spiking, this is not new. Variability in spiking has been seen numerous times, for instance in human biceps brachii motor units during isometric contractions (Pascoe, Enoka, Exp physiology 2014) and elsewhere. Perhaps the distinction the authors are seeking is between fluctuation-driven and mean-driven spiking of motor units as previously identified in spinal motor networks (see Petersen and Berg, eLife 2016, and Berg, Frontiers 2017). Here, it was shown that a prominent regime of irregular spiking is present during rhythmic motor activity, which also manifests as a positive skewness in the spike count distribution (i.e., log-normal).

      We thank the Reviewer for this comment and have clarified several passages in response. The Reviewer is of course correct that irregular motor unit spiking has been described previously and may reflect motor neurons’ operating in a high-sensitivity (fluctuation-driven) regime. We now cite these papers in the Discussion (see ‘Firing rates in mouse locomotion compared to other species’). Additionally, the revision clarifies that “probabilistically” - as defined in our paper - refers only to the empirical observation that a motor unit spikes during only a subset of strides, either when all locomotor speeds are considered together (Figure 2) or separately (Figure 5A-C):

      “Motor units in both muscles exhibited this pattern of probabilistic recruitment (defined as a unit’s firing on only a fraction of strides), but with differing distributions of firing properties across the long and lateral heads (Figure 2).”

      “Our findings (Figure 4) highlight that even with the relatively high firing rates observed in mice, there are still significant changes in firing rate and recruitment probability across the spikes within bursts (Figure 4B) and across locomotor speeds (Figure 5F). Future studies should more carefully examine how these rapidly changing spiking patterns derive from both the statistics of synaptic inputs and intrinsic properties of motor neurons (Manuel & Heckman, 2011; Petersen & Berg, 2016; Berg, 2017).”

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      As mentioned above, there are several issues with the statistics that need to be corrected to properly support the claims made in the paper.

      The authors compare the fractions of MUs that show significant variation across locomotor speeds in their firing rate and recruitment probability. However, it is not statistically founded to compare the results of separate statistical tests based on different kinds of measurements and thus have unconstrained differences in statistical power. The comparison of the fractional changes in firing rates and recruitment across speeds that follow is helpful, though in truth, by contemporary standards, one would like to see error bars on these estimates. These could be generated using bootstrapping.

      The Reviewer is correct, and we have revised the manuscript to better clarify which quantities should or should not be compared, including the following passage (see “Motor unit mechanisms of speed control” in Results):

      “Speed-dependent increases in peak firing rate were therefore also present in our dataset, although in a smaller fraction of motor units (22/33) than changes in recruitment probability (31/33). Furthermore, the mean (± SE) magnitude of speed-dependent increases was smaller for spike rates (mean rate<sub>fast</sub>/rate<sub>slow</sub> of 111% ± 20% across all motor units) than for recruitment probabilities (mean p(recruitment) <sub>fast</sub>/p(recruitment) <sub>slow</sub> of 179% ± 3% across all motor units). While fractional changes in rate and recruitment probability are not readily comparable given their different upper limits, these findings could suggest that while both recruitment and peak rate change across speed quartiles, increased recruitment probability may play a larger role in driving changes in locomotor speed.”

      The description in the Methods of the tests for variation in firing rates and recruitment probability across speeds are extremely hard to understand - after reading many times, it is still not clear what was done, or why the method used was chosen. In the main text, the authors quote p-values and then state "bootstrap confidence intervals," which is not a statistical test that yields a p-value. While there are mathematical relationships between confidence intervals and statistical tests such that a one-to-one correspondence between them can exist, the descriptions provided fall short of specifying how they are related in the present instance. For this reason, and those described in what follows, it is not clear what the p-values represent.

      Next, the authors refer to fitting a model ("a Poisson distribution") to the data to estimate firing rate and recruitment probability, that the model results agree with their actual data, and that they then bootstrapped from the model estimates to get confidence intervals and compute p-values. Why do this? Why not just do something much simpler, like use the actual spike counts, and resample from those? I understand that it is hard to distinguish between no recruitment and just no spikes given some low Poisson firing rate, but how does that challenge the ability to test if the firing rates or the number of spiking MUs changes significantly across speeds? I can come up with some reasons why I think the authors might have decided to do this, but reasoning like this really should be made explicit.

      In addition, the authors would provide an unambiguous description of the model, perhaps using an equation and a description of how it was fit. For the bootstrapping, a clear description of how the resampling was done should be included. The focus on peak firing rate instead of mean (or median) firing rate should also be justified. Since peaks are noisier, I would expect the statistical power to be lower compared to using the mean or median.

      We thank the Reviewer for the comments and have revised and expanded our discussion of the statistical tests employed. We expanded and clarified our description of these techniques in the updated Methods section:

      “Joint model of rate and recruitment

      We modeled the recruitment probability and firing rate based on empirical data to best characterize firing statistics within the stride. Particularly, this allowed for multiple solutions to explain why a motor unit would not spike within a stride. From the empirical data alone, strides with zero spikes would have been assumed to have no recruitment of a unit. However, to create a model of motor unit activity that includes both recruitment and rate, it must be possible that a recruited unit can have a firing rate of zero. To quantify the firing statistics that best represent all spiking and non-spiking patterns, we modeled recruitment probability and peak firing rate along the following piecewise function:

      where y denotes the observed peak firing rate on a given stride (determined by convolving motor unit spike times with a Gaussian kernel as described above), p denotes the probability of recruitment, and λ denotes the expected peak firing rate from a Poisson distribution of outcomes. Thus, an inactive unit on a given stride may be the result of either non-recruitment or recruitment with a stochastically zero firing rate. The above equations were fit by minimizing the negative log-likelihood of the parameters given the data.

      “Permutation test for joint model of rate and recruitment and type 2 regression slopes

      To quantify differences in firing patterns across walking speeds, we subdivided each mouse’s total set of strides into speed quartiles and calculated rate (𝜆, Eq. 1 and 2, Fig. 5A-C) and recruitment probability terms (p, Eq. 1 and 2, Fig. 5D-F) for each unit in each speed quartile. Here we calculated the difference in both the rate and recruitment terms across the fastest and slowest speed quartiles (p<sub>fast</sub>-p<sub>slow</sub> and 𝜆<sub>fast</sub>-𝜆<sub>slow</sub>). To test whether these model parameters were significantly different depending on locomotor speed, we developed a null model combining strides from both the fastest and slowest speed quartiles. After pooling strides from both quartiles, we randomly distributed the pooled set of strides into two groups with sample sizes equal to the original slow and fast quartiles. We then calculated the null model parameters for each new group and found the difference between like terms. To estimate the distribution of possible differences, we bootstrapped this result using 1000 random redistributions of the pooled set of strides. Following the permutation test, the 95% confidence interval of this final distribution reflects the null hypothesis of no difference between groups. Thus, the null hypothesis can be rejected if the true difference in rate or recruitment terms exceeds this confidence interval.

      We followed a similar procedure to quantify cross-muscle differences in the relationship between firing parameters. For each muscle, we estimated the slope across firing parameters for each motor unit using type 2 regression. In this case, the true difference was the difference in slopes between muscles. To test the null hypothesis that there was no difference in slopes, the null model reflected the pooled set of units from both muscles. Again, slopes were calculated for 1000 random resamplings of this pooled data to estimate the 95% confidence interval.”

      The argument for delayed activation of the lateral head is interesting, but I am not comfortable saying the nervous system creates a delay just based on observations of the mean time of the first spike, given the potential for differential variability in spike timing across muscles and MUs. One way to make a strong case for a delay would be to show aggregate PSTHs for all the spikes from all the MUs for each of the two heads. That would distinguish between a true delay and more gradual or variable activation between the heads.

      This is a good point and we agree that the claim made about the nervous system is too strong given the results. Even with Author response image 2 below that the Reviewer suggested, there is still not enough evidence to isolate the role of the nervous system in the muscles’ activation.

      Author response image 2.

      Aggregate peristimulus time histogram (PSTH) for all motor unit spike times in the long head (top) and lateral head (bottom) within the stride.

      In the ideal case, we would have more simultaneous recordings from both muscles to make a more direct claim on the delay. Still, within the current scope of the paper, to correct this and better describe the difference in timing of muscle activity, we edited the text to the following:

      “These findings demonstrate that despite the synergistic (extensor) function of the long and lateral heads of the triceps at the elbow, the motor pool for the long head becomes active roughly 100 ms before the motor pool supplying the lateral head during locomotion (Figure 3C).”

      The results from Marshall et al. 2022 suggest that the recruitment of some MUs is not just related to muscle force, but also the frequency of force variation - some of their MUs appear to be recruited only at certain frequencies. Figure 5C could have shown signs of this, but it does not appear to. We do not really know the force or its frequency of variation in the measurements here. I wonder whether there is additional analysis that could address whether frequency-dependent recruitment is present. It may not be addressable with the current data set, but this could be a fruitful direction to explore in the future with MU recordings from mice.

      We agree that this would be a fruitful direction to explore, however the Reviewer is correct that this is not easily addressable with the dataset. As the Reviewer points out, stride frequency increases with increased speed, potentially offering the opportunity to examine how motor unit activity varies with the frequency, phase, and amplitude of locomotor movements. However, given our lack of force data (either joint torques or ground reaction forces), dissociating the frequency/phase/amplitude of skeletal kinematics from the frequency/phase/amplitude of muscle force. Marshall et al. (2022) mitigated these issues by using an isometric force-production task (Marshall et al., 2022). Therefore, while we agree that it would be a major contribution to extend such investigations to whole-body movements like locomotion, given the complexities described above we believe this is a project for the future, and beyond the scope of the present study.

      Minor:

      Page 5: "Units often displayed no recruitment in a greater proportion of strides than for any particular spike count when recruited (Figures 2A, B)," - I had to read this several times to understand it. I suggest rephrasing for clarity.

      We have changed the text to read:

      “Units demonstrated a variety of firing patterns, with some units producing 0 spikes more frequently than any non-zero spike count (Figure 2A, B),...”

      Figure 3 legend: "Mean phase ({plus minus} SE) of motor unit burst duration across all strides.": It is unclear what this means - durations are not usually described as having a phase. Do we mean the onset phase?

      We have changed the text to read:

      “Mean phase ± SE of motor unit burst activity within each stride”

      Page 9: "suggesting that the recruitment of individual motor units in the lateral and long heads might have significant (and opposite) effects on elbow angle in strides of similar speed (see Discussion)." I wouldn't say "opposite" here - that makes it sound like the authors are calling the long head a flexor. The authors should rephrase or clarify the sense in which they are opposite.

      This is a fair point and we agree we should not describe the muscles as ‘opposite’ when both muscles are extensors. We have removed the phrase ‘and opposite’ from the text.

      Page 11: "in these two muscles across in other quadrupedal species" - typo.

      We have corrected this error.

      Page 16: This reviewer cannot decipher after repeated attempts what the first two sentences of the last paragraph mean. - “Future studies might also use perturbations of muscle activity to dissociate the causal properties of each motor unit’s activity from the complex correlation structure of locomotion. Despite the strong correlations observed between motor unit recruitment and limb kinematics (Fig. 6, Supplemental Fig. 3), these results might reflect covariations of both factors with locomotor speed rather than the causal properties of the recorded motor unit.”

      For better clarity, we have changed the text to read:

      “Although strong correlations were observed between motor unit recruitment and limb kinematics during locomotion (Figure 6, Figure 6–figure supplement 1), it remains unclear whether such correlations actually reflect the causal contributions that those units make to limb movement. To resolve this ambiguity, future studies could use electrical or optical perturbations of muscle contraction levels (Kim et al., 2024; Lu et al., 2024; Srivastava et al., 2015, 2017) to test directly how motor unit firing patterns shape locomotor movements. The short-latency effects of patterned motor unit stimulation (Srivastava et al., 2017) could then reveal the sensitivity of behavior to changes in muscle spiking and the extent to which the same behaviors can be performed with many different motor commands.”

      Reviewer #2 (Recommendations for the authors):

      Minor comments:

      Introduction:

      (1) "Although studies in primates, cats, and zebrafish have shown that both the number of active motor units and motor unit firing rates increase at faster locomotor speeds (Grimby, 1984; Hoffer et al., 1981, 1987; Marshall et al., 2022; Menelaou & McLean, 2012)." I would remove Marshall et al. (2022) as their monkeys performed pulling tasks with the upper limb. You can alternatively remove locomotor from the sentence and replace it with contraction speed.

      Thank you for the comment. While we intended to reference this specific paper to highlight the rhythmic activity in muscles, we agree that this deviates from ‘locomotion’ as it is referenced in the other cited papers which study body movement. We have followed the Reviewer’s suggestion to remove the citation to Marshall et al.

      (2) "The capability and need for faster force generation during dynamic behavior could implicate motor unit recruitment as a primary mechanism for modulating force output in mice."

      The authors could add citations to this sentence, of works that showed that recruitment speed is the main determinant of the rate of force development (see for example Dideriksen et al. (2020) J Neurophysiol; J. L. Dideriksen, A. Del Vecchio, D. Farina, Neural and muscular determinants of maximal rate of force development. J Neurophysiol 123, 149-157 (2020)).

      Thank you for pointing out this important reference. We have included this as a citation as recommended.

      Results:

      (3) "Electrode arrays (32-electrode Myomatrix array model RF-4x8-BHS-5) were implanted in the triceps brachii (note that Figure 1D shows the EMG signal from only one of the 16 bipolar recording channels), and the resulting data were used to identify the spike times of individual motor units (Figure 1E) as described previously (Chung et al., 2023)."

      This sentence can be misleading for the reader as the array used by the researchers has 4 threads of 8 electrodes. Would it be possible to specify the number of electrodes implanted per head of interest? I assume 8 per head in most mice (or 4 bipolar channels), even if that's not specifically written in the manuscript.

      Thank you for the suggestion. As described above, we have added Table 1, which includes all array locations, and we edited the statement referenced in the comment as follows:

      “Electrode arrays (32-electrode Myomatrix array model RF-4x8-BHS-5) were implanted in forelimb muscles (note that Figure 1D shows the EMG signal from only one of the 16 bipolar recording channels), and the resulting data were used to identify the spike times of individual motor units in the triceps brachii long and lateral heads (Table 1, Figure 1E) as described previously (Chung et al., 2023).“

      (4) "These findings demonstrate that despite the overlapping biomechanical functions of the long and lateral heads of the triceps, the nervous system creates a consistent, approximately 100 ms delay (Figure 3C) between the activation of the two muscles' motor neuron pools. This timing difference suggests distinct patterns of synaptic input onto motor neurons innervating the lateral and long heads."

      Both muscles don't have fully overlapping biomechanical functions, as one of them also acts on the shoulder joint. Please be more specific in this sentence, saying that both muscles are synergistic at the elbow level rather than "have overlapping biomechanical functions".

      We agree with the above reasoning and that our manuscript should be clearer on this point. We edited the above text in accordance with the Reviewer suggestion as follows:

      "These findings demonstrate that despite the synergistic (extensor) function of the long and lateral heads of the triceps at the elbow, …”  

      (5) "Together with the differences in burst timing shown in Figure 3B, these results again suggest that the motor pools for the lateral and long heads of the triceps receive distinct patterns of synaptic input, although differences in the intrinsic physiological properties of motor neurons innervating the two muscles might also play an important role."

      It is difficult to draw such an affirmative conclusion on the synaptic inputs from the data presented by the authors. The differences in firing rates may solely arise from other factors than distinct synaptic inputs, such as the different intrinsic properties of the motoneurons or the reception of distinct neuromodulatory inputs.

      To better explain our findings, we adjusted the above text in the Results (see “Motor unit firing patterns in the long and lateral heads of the triceps”):

      “Together with the differences in burst timing shown in Figure 3B, these results again suggest that the motor pools for the lateral and long heads of the triceps receive distinct patterns of synaptic input, although differences in the intrinsic physiological properties of motor neurons innervating the two muscles might also play an important role.”

      We also included the following distinction in the Discussion (see “Differences in motor unit activity patterns across two elbow extensors”) to address the other plausible mechanisms mentioned.

      “The large differences in burst timing and spike patterning across the muscle heads suggest that the motor pools for each muscle receive distinct inputs. However, differences in the intrinsic physiological properties of motor units and neuromodulatory inputs across motor pools might also make substantial contributions to the structure of motor unit spike patterns (Martínez-Silva et al., 2018; Miles & Sillar, 2011).”

      (6) "We next examined whether the probabilistic recruitment of individual motor units in the triceps and elbow extensor muscle predicted stride-by-stride variations in elbow angle kinematics."

      I'm not sure that the wording is appropriate here. The analysis does not predict elbow angle variations from parameters extracted from the spiking activity. It rather compares the average elbow angle between two conditions (motor unit active or not active).

      We thank the Reviewer for this comment and agree that the wording could be improved here to better reflect our analysis. To lower the strength of our claim, we replaced usage of the word ‘predict’ with ‘correlates’ in the above text and throughout the paper when discussing this result.

      Methods:

      (7) "Using the four threads on the customizable Myomatrix array (RF-4x8-BHS-5), we implanted a combination of muscles in each mouse, sometimes using multiple threads within the same muscle. [...] Some mice also had threads simultaneously implanted in their ipsilateral or contralateral biceps brachii although no data from the biceps is presented in this study."

      A precise description of the localisation of the array (muscles and the number of arrays per muscle) for each animal would be appreciated.

      (8) "A total of 33 units were identified and manually verified across all animals." A precise description of the number of motor units concurrently identified per muscle and per animal would be appreciated. Moreover, please add details on the manual inspection. Does it involve the manual selection of missing spikes? What are the criteria for considering an identified motor unit as valid?

      As discussed earlier, we added Table 1 to the main text to provide the details mentioned in the above comments.

      Regarding spike sorting, given the very large number of spikes recorded, we did not rely on manual adjusting mislabeled spikes. Instead, as described in the revised Methods section, we verified unit isolation by ensuring units had >98% of spikes outside of 1ms of each other. Moreover, as described above we have added new analyses (Figure 1–figure supplement 1) confirming the stability of motor unit waveforms across both the duration of individual recording sessions (roughly 30 minutes) and across the rapid changes in limb position within individual stride cycles (roughly 250 msec).

      Reviewer #3 (Recommendations for the authors):

      Figure 2 (and supplement) show spike count distributions with strong positive skewness, which is in accordance with the prediction of a fluctuation-driven regime. I suggest plotting these on a logarithmic x-axis (in addition to the linear axis), which should reveal a bell-shaped distribution, maybe even Gaussian, in a majority of the units.

      We thank the Reviewer for the suggestion. We present the requested analysis below, which shows bell-shaped distributions for some (but not all) distributions. However, we believe that investigating why some replotted distributions are Gaussian and others are not falls beyond the scope of this paper, and likely requires a larger dataset than the one we were able to obtain.

      Author response image 3.

      Spike count distributions for each motor unit on a logarithmic x-axis.

      Why not more data? I tried to get an overview of how much data was collected.

      Supplemental Figure 1 has all the isolated units, which amounts to 38 (are the colors the two muscle types?). Given there are 16 leads in each myomatrix, in two muscles, of six mice, this seems like a low yield. Could the authors comment on the reasons for this low yield?

      Regarding motor unit yield, even with multiple electrodes per muscle and a robust sorting algorithm, we often isolated only a few units per muscle. This yield likely reflects two factors. First, because of the highly dynamic nature of locomotion and high levels of muscle contraction, isolating individual spikes reliably across different locomotor speeds is inherently challenging, regardless of the algorithm being employed. Second, because the results of spike-train analyses can be highly sensitive to sorting errors, we have only included the motor units that we can sort with the highest possible confidence across thousands of strides.

      Minor:

      Figure captions especially Figure 6: The text is excessively long. Can the text be shortened?

      We thank the Reviewer for this comment. Generally, we seek to include a description of the methods and results within the figure captions, but we concede that we can condense the information in some cases. In a number of cases, we have moved some of the descriptive text from the caption to the Methods section.

      References

      Berg, R. W. (2017). Neuronal Population Activity in Spinal Motor Circuits: Greater Than the Sum of Its Parts. Frontiers in Neural Circuits, 11. https://doi.org/10.3389/fncir.2017.00103

      Biewener, A. A., Blickhan, R., Perry, A. K., Heglund, N. C., & Taylor, C. R. (1988). Muscle Forces During Locomotion in Kangaroo Rats: Force Platform and Tendon Buckle Measurements Compared. Journal of Experimental Biology, 137(1), 191–205. https://doi.org/10.1242/jeb.137.1.191

      Chung, B., Zia, M., Thomas, K. A., Michaels, J. A., Jacob, A., Pack, A., Williams, M. J., Nagapudi, K., Teng, L. H., Arrambide, E., Ouellette, L., Oey, N., Gibbs, R., Anschutz, P., Lu, J., Wu, Y., Kashefi, M., Oya, T., Kersten, R., … Sober, S. J. (2023). Myomatrix arrays for high-definition muscle recording. eLife, 12, RP88551. https://doi.org/10.7554/eLife.88551

      De Luca, C. J. (1985). Control properties of motor units. Journal of Experimental Biology, 115(1), 125–136. https://doi.org/10.1242/jeb.115.1.125

      De Luca, C. J., & Erim, Z. (1994). Common drive of motor units in regulation of muscle force. Trends in Neurosciences, 17(7), 299–305. https://doi.org/10.1016/0166-2236(94)90064-7

      Farina, D., Negro, F., & Dideriksen, J. L. (2014). The effective neural drive to muscles is the common synaptic input to motor neurons. The Journal of Physiology, 592(16), 3427–3441. https://doi.org/10.1113/jphysiol.2014.273581

      Hartigan, P. M. (1985). Algorithm AS 217: Computation of the Dip Statistic to Test for Unimodality. Applied Statistics, 34(3), 320. https://doi.org/10.2307/2347485

      Henneman, E., Somjen, G., & Carpenter, D. O. (1965). FUNCTIONAL SIGNIFICANCE OF CELL SIZE IN SPINAL MOTONEURONS. Journal of Neurophysiology, 28(3), 560–580. https://doi.org/10.1152/jn.1965.28.3.560

      Karabulut, D., Dogru, S. C., Lin, Y.-C., Pandy, M. G., Herzog, W., & Arslan, Y. Z. (2020). Direct Validation of Model-Predicted Muscle Forces in the Cat Hindlimb During Locomotion. Journal of Biomechanical Engineering, 142(5), 051014. https://doi.org/10.1115/1.4045660

      Kim, J. J., Wyche, I. S., Olson, W., Lu, J., Bakir, M. S., Sober, S. J., & O’Connor, D. H. (2024). Myo-optogenetics: Optogenetic stimulation and electrical recording in skeletal muscles. https://doi.org/10.1101/2024.06.21.600113

      Lu, J., Zia, M., Baig, D. A., Yan, G., Kim, J. J., Nagapudi, K., Anschutz, P., Oh, S., O’Connor, D., Sober, S. J., & Bakir, M. S. (2024). Opto-Myomatrix: μLED integrated microelectrode arrays for optogenetic activation and electrical recording in muscle tissue. https://doi.org/10.1101/2024.07.01.601601

      Manuel, M., & Heckman, C. J. (2011). Adult mouse motor units develop almost all of their force in the subprimary range: A new all-or-none strategy for force recruitment? Journal of Neuroscience, 31(42), 15188–15194. https://doi.org/10.1523/JNEUROSCI.2893-11.2011

      Marshall, N. J., Glaser, J. I., Trautmann, E. M., Amematsro, E. A., Perkins, S. M., Shadlen, M. N., Abbott, L. F., Cunningham, J. P., & Churchland, M. M. (2022). Flexible neural control of motor units. Nature Neuroscience, 25(11), 1492–1504. https://doi.org/10.1038/s41593-022-01165-8

      Martínez-Silva, M. de L., Imhoff-Manuel, R. D., Sharma, A., Heckman, C. J., Shneider, N. A., Roselli, F., Zytnicki, D., & Manuel, M. (2018). Hypoexcitability precedes denervation in the large fast-contracting motor units in two unrelated mouse models of ALS. eLife, 7(2007), 1–26. https://doi.org/10.7554/eLife.30955

      Miles, G. B., & Sillar, K. T. (2011). Neuromodulation of Vertebrate Locomotor Control Networks. Physiology, 26(6), 393–411. https://doi.org/10.1152/physiol.00013.2011

      Petersen, P. C., & Berg, R. W. (2016). Lognormal firing rate distribution reveals prominent fluctuation–driven regime in spinal motor networks. eLife, 5. https://doi.org/10.7554/elife.18805

      Srivastava, K. H., Elemans, C. P. H., & Sober, S. J. (2015). Multifunctional and Context-Dependent Control of Vocal Acoustics by Individual Muscles. The Journal of Neuroscience, 35(42), 14183–14194. https://doi.org/10.1523/JNEUROSCI.3610-14.2015

      Srivastava, K. H., Holmes, C. M., Vellema, M., Pack, A. R., Elemans, C. P. H., Nemenman, I., & Sober, S. J. (2017). Motor control by precisely timed spike patterns. Proceedings of the National Academy of Sciences of the United States of America, 114(5), 1171–1176. https://doi.org/10.1073/pnas.1611734114

    1. eLife Assessment

      This important model-based study seeks to mimic bat echolocation behavior and flight under conditions of high interference, such as when large numbers of bats leave their roost together. The simulations convincingly suggest that the problem of acoustic jamming in these situations may be less severe than previously thought. This finding will be of broad interest to scientists working in the fields of bat biology and collective behaviour.

    2. Reviewer #1 (Public review):

      Summary:

      Mazer & Yovel 2025 dissect the inverse problem of how echolocators in groups manage to navigate their surroundings despite intense jamming using computational simulations.

      The authors show that despite the 'noisy' sensory environments that echolocating groups present, agents can still access some amount of echo-related information and use it to navigate their local environment. It is known that echolocating bats have strong small and large-scale spatial memory that plays an important role for individuals. The results from this paper also point to the potential importance of an even lower-level, short-term role of memory in the form of echo 'integration' across multiple calls, despite the unpredictability of echo detection in groups. The paper generates a useful basis to think about the mechanisms in echolocating groups for experimental investigations too.

      Strengths:

      The paper builds on biologically well-motivated and parametrised 2D acoustics and sensory simulation setup to investigate the various key parameters of interest

      The 'null-model' of echolocators not being able to tell apart objects & conspecifics while echolocating still shows agents successfully emerge from groups - even though the probability of emergence drops severely in comparison to cognitively more 'capable' agents. This is nonetheless an important result showing the direction-of-arrival of a sound itself is the 'minimum' set of ingredients needed for echolocators navigating their environment.

      The results generate an important basis in unraveling how agents may navigate in sensorially noisy environments with a lot of irrelevant and very few relevant cues.

      The 2D simulation framework is simple and computationally tractable enough to perform multiple runs to investigate many variables - while also remaining true to the aim of the investigation.

    3. Reviewer #2 (Public review):

      This manuscript describes a detailed model for bats flying together through a fixed geometry. The model considers elements which are faithful to both bat biosonar production and reception and the acoustics governing how sound moves in air and interacts with obstacles. The model also incorporates behavioral patterns observed in bats, like one-dimensional feature following and temporal integration of cognitive maps. From a simulation study of the model and comparison of the results with the literature, the authors gain insight into how often bats may experience destructive interference of their acoustic signals and those of their peers, and how much such interference may actually negatively effect the groups' ability to navigate effectively. The authors use generalized linear models to test the significance of the effects they observe.

      The work relies on a thoughtful and detailed model which faithfully incorporates salient features, such as acoustic elements like the filter for a biological receiver and temporal aggregation as a kind of memory in the system. At the same time, the authors abstract features that are complicating without being expected to give additional insights, as can be seen in the choice of a two-dimensional rather than three-dimensional system. I thought that the level of abstraction in the model was perfect, enough to demonstrate their results without needless details. The results are compelling and interesting, and the authors do a great job discussing them in the context of the biological literature.

      With respect to the first version of the manuscript, the authors have remedied all my outstanding questions or concerns in the current version. The new supplementary figure 5 is especially helpful in understanding the geometry.

    4. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Mazar & Yovel 2025 dissect the inverse problem of how echolocators in groups manage to navigate their surroundings despite intense jamming using computational simulations.

      The authors show that despite the 'noisy' sensory environments that echolocating groups present, agents can still access some amount of echo-related information and use it to navigate their local environment. It is known that echolocating bats have strong small and large-scale spatial memory that plays an important role for individuals. The results from this paper also point to the potential importance of an even lower-level, short-term role of memory in the form of echo 'integration' across multiple calls, despite the unpredictability of echo detection in groups. The paper generates a useful basis to think about the mechanisms in echolocating groups for experimental investigations too.

      Strengths:

      The paper builds on biologically well-motivated and parametrised 2D acoustics and sensory simulation setup to investigate the various key parameters of interest

      The 'null-model' of echolocators not being able to tell apart objects & conspecifics while echolocating still shows agents succesfully emerge from groups - even though the probability of emergence drops severely in comparison to cognitively more 'capable' agents. This is nonetheless an important result showing the direction-of-arrival of a sound itself is the 'minimum' set of ingredients needed for echolocators navigating their environment.

      The results generate an important basis in unraveling how agents may navigate in sensorially noisy environments with a lot of irrelevant and very few relevant cues.

      The 2D simulation framework is simple and computationally tractable enough to perform multiple runs to investigate many variables - while also remaining true to the aim of the investigation.

      Weaknesses:

      Authors have not yet provided convincing justification for the use of different echolocation phases during emergence and in cave behaviour. In the previous modelling paper cited for the details - here the bat-agents are performing a foraging task, and so the switch in echolocation phases is understandable. While flying with conspecifics, the lab's previous paper has shown what they call a 'clutter response' - but this is not necessarily the same as going into a 'buzz'-type call behaviour. As pointed out by another reviewer - the results of the simulations may hinge on the fact that bats are showing this echolocation phase-switching, and thus improving their echo-detection. This is not necessarily a major flaw - but something for readers to consider in light of the sparse experimental evidence at hand currently.

      The use of echolocation phases—defined as the sequential search, approach, and buzz call patterns—has been documented not only during foraging but also in tasks such as landing, obstacle avoidance, clutter navigation, and drinking. Bat call structure has been shown to vary systematically with object proximity, not exclusively in response to prey. During obstacle avoidance, phase transitions were observed, with approach calls emitted in grouped sequences and with reduced durations (Gustafson & Schnitzler, 1979; Schnitzler et al., 1987). In landing contexts, bats have been reported to emit short-duration calls and decrease inter-pulse intervals—buzz-like patterns also observed during prey capture— suggesting shared acoustic strategies across behaviors (Hagino et al., 2007; Hiryu et al., 2008; Melcón et al., 2007, 2009). Comparable patterns have been reported during drinking maneuvers, where “drinking buzzes” have been proposed to guide a precise approach to the water surface, analogous to landing buzzes (Griffiths, 2013; Russo et al., 2016). In response to environmental complexity, bats were found to shorten calls and increase repetition rates when navigating cluttered spaces compared to open ones (Falk et al., 2014; Kalko & Schnitzler, 1993).

      Moreover, field recordings from our study of Rhinopoma microphyllum (Goldshtein et al., 2025) revealed shortened call durations and inter-pulse intervals during dense group flight outside the cave during emergence—patterns consistent with terminal-approach phase that is typical when coming very close to an object (another bat in this case). The Author response image 1 shows an approach sequence recorded from a tagged bat approximately 20 meters from the cave entrance, with self-generated echolocation calls marked. The inter-pulse-interval of ca. 20 ms is used by these bats when a reflective object (another bat in this case) is nearby. 

      Author response image 1.

      These results provide direct evidence that bats actively employ approach-phase echolocation during swarming likely to avoid collision with other bats. This supports the view that echolocation phase transitions are a general proximity-based sensing strategy, adapted across a variety of behavioral scenarios—not limited to hunting alone. 

      In our simulations, bats predominantly emitted calls in the approach phase, with only rare occurrences of buzz-phase calls.

      See lines 355-363 in the revised manuscript.

      The decision to model direction-of-arrival with such high angular resolution (1-2 degrees) is not entirely justifiable - and the authors may wish to do simulation runs with lower angular resolution. Past experimental paradigms haven't really separated out target-strength as a confounding factor for angular resolution (e.g. see the cited Simmons et al. 1983 paper). Moreover, to this reviewer's reading of the cited paper - it is not entirely clear how this experiment provides source-data to support the DoA-SNR parametrisation in this manuscript. The cited paper has two array-configurations, both of which are measured to have similar received levels upon ensonification. A relationship between angular resolution and signal-to-noise ratio is understandable perhaps - and one can formulate such a relationship, but here the reviewer asks that the origin/justification be made clear. On an independent line, also see the recent contrasting results of Geberl, Kugler, Wiegrebe 2019 (Curr. Biol.) - who suggest even poorer angular resolution in echolocation.

      We thank the reviewer for raising this important point. The acuity of 1.5–3° in horizontal direction-of-arrival (DoA) estimation is based on the classical work of Simmons et al. with Eptesicus fuscus (Simmons et al., 1983). Similar precision was later supported by Erwin et al. (Erwin et al., 2001), who modeled azimuth estimation from measured interaural intensity differences (IIDs), reporting an average error of 0.2° with a standard deviation of ~2.2°, consistent with the behavioral data found by Simmons. The decline in acuity with increasing arrival angle has also been demonstrated in behavioral and physiological studies of binaural IID processing (Erwin et al., 2001; Fay, 1995; Razak, 2012; Wohlgemuth et al., 2016). The error model itself was first introduced in our earlier work (Mazar & Yovel, 2020).

      Importantly, Geberl et al. (Geberl et al., 2019) examined the resolution of weak targets masked by nearby strong flankers  and found poor spatial discrimination of ~45 degrees; however, they were studying a detection problem, rather than the horizontal acuity of azimuth estimation. Indeed, our model assumes there is no spatial discrimination at all.

      Overall, while our DoA–SNR parametrization can certainly be critiqued and alternative parameterizations could be tested in future work, we believe it reflects a reasonable and empirically supported assumption. 

      Reviewer #2 (Public review):

      This manuscript describes a detailed model for bats flying together through a fixed geometry. The model considers elements which are faithful to both bat biosonar production and reception and the acoustics governing how sound moves in air and interacts with obstacles. The model also incorporates behavioral patterns observed in bats, like one-dimensional feature following and temporal integration of cognitive maps. From a simulation study of the model and comparison of the results with the literature, the authors gain insight into how often bats may experience destructive interference of their acoustic signals and those of their peers, and how much such interference may actually negatively effect the groups' ability to navigate effectively. The authors use generalized linear models to test the significance of the effects they observe.

      The work relies on a thoughtful and detailed model which faithfully incorporates salient features, such as acoustic elements like the filter for a biological receiver and temporal aggregation as a kind of memory in the system. At the same time, the authors abstract features that are complicating without being expected to give additional insights, as can be seen in the choice of a two-dimensional rather than three-dimensional system. I thought that the level of abstraction in the model was perfect, enough to demonstrate their results without needless details. The results are compelling and interesting, and the authors do a great job discussing them in the context of the biological literature.

      With respect to the first version of the manuscript, the authors have remedied all my outstanding questions or concerns in the current version. The new supplementary figure 5 is especially helpful in understanding the geometry.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Data Availability: This reviewer lauds the authors for switching from a private commercial folder requiring login to one that does not. At the cost of being overtly pedantic - the Github repository is not a long-term archival resource. The ideal solution is to upload the code in an academic repository (Zenodo, OSF, etc.) to periodically create a 'static snapshot' of code for archival, while also hosting a 'live' version on Github.

      We have uploaded to Zenodo repository, and updated the link in the paper:

      How bats exit a crowded colony when relying on echolocation only - a modeling approach

      In one of the rebuttals to Reviewer #3- the authors have cited a wrong paper (Beleyur & Goerlitz 2019) - while discussing broad bandwidth calls improving detection - and may wish to correct this if possible on record.

      We have removed the incorrect citation from the revised version of the manuscript.

      Specific comments on the 2nd manuscript:

      Figure 5: Table 1 says 1, 2,5,10,20,40,100 bats were simulated (line 138-139) but the conclusion (line 398) says '1 to 100 bats' per 3msq. However, the X-axis only stops at 40 and says 'number of bats', while the legend says bats/3msq....what is actually being plotted? Moreover, in the entire paper there is a constant back-and-forth between density and # of bats - perhaps it is explained beforehand, but it is a bit unsettling - and more can be done to clarify these two conventions.

      While most parameters were tested across the full range of 1 to 100 bats per 3 m², a subset of conditions—including misidentification, multi-call clustering, wall target strength, and conspecific target strength—were simulated only up to 40 bats due to significantly longer run-times. This is now clarified in both the main text and the Table 1 caption.

      In our simulations, the primary parameter was the number of bats placed within a 3 m² starting area, which directly determined the initial density (bats per 3 m²). Throughout the manuscript, we use “number of bats” to refer to the simulation input, while “density” denotes the equivalent ecological measure. Figure 5 and related captions have been revised accordingly to note these conventions and to indicate when results are shown only up to 40 bats (see lines 120–122, 314-317 in the revised text).

      Table 1: This was made considerably difficult to read given the visual clutter - and I hope I've understood these changes correctly.

      What is in the square brackets of the effect-size (e.g. first row with values 'Exit prob. (%)' says -0.37/bat [63:100] ? What does this 63:100 refer to?

      What is the 'process flag'

      Values in square brackets indicate the minimum and maximum values of the metric across the tested range (e.g., [63:100] shows the range of exit probabilities observed across different bat densities).

      The term “process flag” has been replaced with “with and without multi-call clustering” for clarity

      Both the table layout and caption have been revised to reduce visual clutter and to make these conventions clearer to the reader. 

      Lines 562-3: "In our study, due to the dense cave environment, the bats are found to operate in the approach phase nearly all of the time, which is consistent with natural cave emergence behavior" - bats are 'found to' implies there is some experimental data or it is an emergent property. See above for the point questioing the implementation of multiple echolocation phases in the model, but also - here the bat-agents are allowed to show different phases and thus they do so -- it is a constraint of the implementation and not a result per se given the size of the cave and the number of bats involved...

      We removed the sentence from the Methods section, since it could be misinterpreted as an experimental finding rather than a model outcome. Instead, we now discuss this in the Discussion, clarifying that the predominance of the approach phase arises from the cluttered cave environment in our simulations, which is consistent with natural emergence behavior (see lines 355-363). In this context, the use of echolocation phases is presented as a biologically plausible modeling choice rather than an empirical result.

      Lines 659-660: The parametrisation between DoA and SNR is supposedly found in 'Equation 10' - which this reviewer could not find in the manuscript

      The equation was accidentally omitted in the previous revision and has now been reinserted into the manuscript. It defines how direction-of-arrival (DoA) error depends on SNR and azimuth angle (see lines 603-605).

    1. eLife Assessment

      This important mouse study shows that wild-type female progeny of Khdc3 mutants have abnormal gene expression relating to hepatic metabolism, which persists over multiple generations and passes through both female and male lineages. A role for small RNAs on this phenomenon is proposed, and evidence supporting the authors' claims is convincing. Further experiments are required to functionally validate the role of small RNAs in transmission of the phenotype. The work will be of interest to researchers in the field of DNA-independent mechanism of inheritance.

    2. Reviewer #1 (Public review):

      The key discovery of the manuscript is that the authors found that genetically wild type females descended from Khdc3 mutants shows abnormal gene expression relating to hepatic metabolism, which persist over multiple generations and pass through both female and male lineages. They also find dysregulation of hepatically-metabolized molecules in the blood of these wild type mice with Khdc3 mutant ancestry. These data provide solid evidence further support that phenotype can be transmitted to multiple generations without altering DNA sequence, supporting the involvement of epigenetic mechanisms. The authors further performed exploratory studies on the small RNA profiles in the oocytes of Khdc3-null females, and their wild type descendants, suggesting that altered small RNA expression could be a contributor of the observed phenotype transmission, although this has not been functionally validated.

      Comments on revisions:

      My previous comments are addressed.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript aimed to investigate the non-genetic impact of KHDC3 mutation on the liver metabolism. To do that they analyzed the female liver transcriptome of genetically wild type mice descended from female ancestors with a mutation in the Khdc3 gene. They found that genetically wild type females descended from Khdc3 mutants have hepatic transcriptional dysregulation which persist over multiple generations in the progenies descended from female ancestors with a mutation in the Khdc3 gene. This transcriptomic deregulation was associated with dysregulation of hepatically-metabolized molecules in the blood of these wild type mice with female mutational ancestry. Furthermore, to determine whether small non-coding RNA could be involved in the maternal non-genetic transmission of the hepatic transcriptomic deregulation, they performed small RNA-seq of oocytes from Khdc3-/- mice and genetically wild type female mice descended from female ancestors with a Khdc3 mutation and claimed that oocytes of wild type female offspring from Khdc3-null females has dysregulation of multiple small RNAs.

      Finally, they claimed that their data demonstrates that ancestral mutation in Khdc3 can produce transgenerational inherited phenotypes.

      Comments on revisions:

      I thank the authors for their detailed response to my comments. I have nothing to add.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      The key discovery of the manuscript is that the authors found that genetically wild type females descended from Khdc3 mutants shows abnormal gene expression relating to hepatic metabolism, which persist over multiple generations and pass through both female and male lineages. They also find dysregulation of hepatically-metabolized molecules in the blood of these wild type mice with Khdc3 mutant ancestry. These data provide solid evidence further support that phenotype can be transmitted to multiple generations without altering DNA sequence, supporting the involvement of epigenetic mechanisms. The authors further performed exploratory studies on the small RNA profiles in the oocytes of Khdc3-null females, and their wild type descendants, suggesting that altered small RNA expression could be a contributor of the observed phenotype transmission, although this has not been functionally validated.

      Reviewer #2 (Public review):

      Summary:

      This manuscript aimed to investigate the non-genetic impact of KHDC3 mutation on the liver metabolism. To do that they analyzed the female liver transcriptome of genetically wild type mice descended from female ancestors with a mutation in the Khdc3 gene. They found that genetically wild type females descended from Khdc3 mutants have hepatic transcriptional dysregulation which persist over multiple generations in the progenies descended from female ancestors with a mutation in the Khdc3 gene. This transcriptomic deregulation was associated with dysregulation of hepatically-metabolized molecules in the blood of these wild type mice with female mutational ancestry. Furthermore, to determine whether small non-coding RNA could be involved in the maternal non-genetic transmission of the hepatic transcriptomic deregulation, they performed small RNA-seq of oocytes from Khdc3-/- mice and genetically wild type female mice descended from female ancestors with a Khdc3 mutation and claimed that oocytes of wild type female offspring from Khdc3-null females has dysregulation of multiple small RNAs.

      Finally, they claimed that their data demonstrates that ancestral mutation in Khdc3 can produce transgenerational inherited phenotypes.

      However, at this stage and considering the information provided in the paper, I think that these conclusions are too preliminary. Indeed, several controls/experiments need to be added to reach those conclusions.

      Additional context you think would help readers interpret or understand the significance of the work

      Line 25: this first sentence is very strong and needs to be documented in the introduction.

      Line 48: Reference 5 is not appropriate since the paper shows the remodeling of small RNA during post-testicular maturation of mammalian sperm and their sensibility to environment. Please, change it

      Line 51: "implies" is too strong and should be replaced by « suggests »

      Line 67: reference is missing

      Database, the accession numbers are lacking.

      References showing the maternal transmission of non-genetically inherited phenotypes in mice via small RNA need to be added

      Line 378: All RNA-Seq and small RNA-Seq data are available in the NCBI GEO

      We have changed references as requested, and updated portions of the introduction in order to mention specifically genes that seem to regulate an RNA-based genetic nurture effect.  We are not aware of any published work that has demonstrated maternal transmission of non-genetic phenotypes via small RNAs; if the reviewer has a specific reference in mind, we would be happy to read it and add it to our manuscript.  We did add a few sentences describing why this work has primarily been performed in males/fathers.

      Reviewer #1 (Recommendations for the authors):

      (1) In addition to the altered hepatic gene expression and metabolites, did the authors notice any overall phenotypes? including body weight, overall growth, eating behavior, etc?

      We have added information on more general phenotypes of the mice, including litter size, birth weights, and weights at 3 and 8 weeks of age.  We have also performed a metabolic analysis of WT****** mice at 8 months of age.  Overall, there are no striking differences in the WT* mice in these broad phenotypic measures, and also no indication that a smaller litter size or larger birthweight are the drivers of our observed hepatic abnormalities.

      (2) When analyzing the small RNAs, the authors mentioned that they have mapped the reads aging rRNAs. This should have resulted in the identifications of many rRNA-derived small RNAs (rsRNAs). The authors should also perform analyses on the differential expression of rsRNAs in this context. Both tsRNAs and rsRNAs has been shown to be involved in epigenetic inheritance (at least in sperm) (Nat Cell Biol 2018, PMID: 29695786).

      In the oocyte small RNA data, we did not notice many differences in either piRNAs or rRNAs between either the WT and KO oocytes, or the WT and WT** oocytes.  The most significant differences by far were in miRNA and tsRNA.  We have added that we do not see any differences in rRNAs.

      Reviewer #2 (Recommendations for the authors):

      To support your conclusion, you should include the following Data/experiments:

      (1) In the abstract, you wrote "Our results demonstrate that ancestral mutation in Khdc3 can produce transgenerational inherited phenotypes". The full phenotypic description of the phenotype (weight at birth, 3-weeks, 8weights old, phenotype of the liver...) of each progeny should carefully described/analyzed.

      Female KHDC3-deficient mice showed reduced fertility with smaller litter. Given the fact that litter size influences early growth and adult physiology (DOI: 10.1016/j.cmet.2020.07.014), all the metabolic effects observed in the paper could be the result of the litter size. Information about the litter size should be provided. Without this information, it is difficult to evaluate the non-genetic impact of KHDC3 mutation on the metabolism of the progenies.

      We have added information on more general phenotypes of the mice, including litter size, birth weights, and weights at 3 and 8 weeks of age (Figure 3). We have also performed a metabolic analysis of WT****** mice at 8 months of age.  Overall, there were no striking differences in the WT* mice in these broad phenotypic measures, and also no indication that a smaller litter size or larger birthweight are the drivers of our observed hepatic abnormalities.

      We have also added a new figure in order to examine the mechanism of transmission of our observed transcriptional abnormalities (Figure 5).  By transferring serum from WT* mice into wild type recipients, we observe alterations to hepatic gene expression, suggesting that serum-based molecules are driving the altered non-genetic factors in the oocyte.  This lends further support to the conclusion that the observed changes in WT* mice are from inherited germ cell abnormalities (informed by somatic metabolic abnormalities and communicated via blood), and not a consequence of litter sizes or growth rates.

      (2) In addition to the lack of phenotypic information of the progenies, the DEG for the small RNA-seq should be filtered on padj(FDR)<0.05 and not on pvalue<0.05. In Figure 4a, the legend is missing.

      We did not alter the filtering on the small RNA-Seq data.  We are not focusing on any specific small RNA, rather we are stating that these groups (miRNA, tsRNA) of small RNAs are dysregulated; accordingly we believe that using pval is not inappropriate in this circumstance.  The analysis was performed similarly to 4 cell embryo RNA-Seq performed by Harris et al, Cell Reports (PMID 38573852).

    1. eLife Assessment

      The study presents data on the possible connection of respiratory pathologies like pneumonia in a cohort of dolphins with altered composition and concomitant perturbed biophysical properties of pulmonary surfactant complexes. Overall, it is a valuable contribution that could be of interest to scientists in the field. However, the study as it is appears somewhat incomplete and additional clarification and discussions are required in order to explain a few methodological questions that may limit the impact of the work considerably.

    2. Reviewer #1 (Public review):

      Summary:

      This paper describes a number of alterations in pulmonary surfactant recovered from bottlenosed dolphins. Although the sample consists of only seven diseased and two control animals, due to the difficulty in obtaining these animals, this is considered adequate. However, conclusions must be considered in view of this small sample size. The authors employ a number of sophisticated techniques to show differences in the composition and in the structure of bilayers formed by these two surfactant samples

      Strengths:

      The availability of these samples makes this study quite original. The authors apply mass spectroscopy to observe an increase of an acidic phospholipid and in the level of plasmalogens in the diseased (i.e. pneumonia) aquatic animals. They suggest these increases contribute to hampered function in vivo. They show alterations in lipid bilayers formed from lipid extracts of these surfactants by electron microscopy, by Atomic Force Microscopy and by small and wide-angle X-ray scattering -SAXS/WAXS. They have previously shown that adding small amounts of cardiolin to the clinical surfactant BLES results in altered bilayer structure, consistent with the current study.

      Weaknesses:

      It seems surprising to me that the small changes in cardiolipin can alter surfactant function i.e., reducing surface tension to near zero. As it happens, no surfactant function tests monitoring the reduction in surface tension were conducted. This would add a great deal to the paper. Further, the paper would benefit greatly from the inclusion of a table listing the lipid composition of surfactant recovered from diseased and normal animals and comparing this to the composition of BLES, a clinical surfactant. Finally, there is a possibility that the minor lipid identified by mass spec is the lysosomal marker, bis-(monoacylglcerol)phosphate rather than the metachronal marker, cardiolipin.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, Porras-Gómez et al. analyse the lipid composition and biophysical properties of pulmonary surfactant obtained by bronchoalveolar lavage (BAL) from a group of bottlenose dolphins (Tursiops truncatus), including two healthy individuals and five affected by pneumonia. Through lipidomic analysis, the authors report an exacerbated presence of cardiolipin species in the BAL lipid extracts from diseased dolphins compared to healthy ones. Structural analyses using electron microscopy, atomic force microscopy, and X-ray scattering on rehydrated membrane samples reveal that lipids from diseased animals form membranes with a more pronounced Lβ phase and reduced fluidity. Moreover, the membranes from affected lungs appear more interconnected and less hydrated, as indicated by the X-ray scattering data. These findings provide valuable and convincing insights into how pulmonary disease alters the lipid composition and structural properties of surfactant in diving mammals, and may have broader implications for understanding surfactant dysfunction in marine mammals.

      Strengths:

      The study is well designed, and the experimental techniques were applied in a logical and coherent manner. The results are thoroughly analysed and discussed, and the manuscript is clearly written and well organized, making it both easy to follow and scientifically robust. Although the number of samples is limited, the rarity and logistical challenges of obtaining bronchoalveolar lavage material, particularly from animals affected by respiratory disease, make this study especially valuable and relevant.

      Weaknesses:

      In my opinion, the main issue lies in the treatment of the samples. Pulmonary surfactant is a lipoprotein complex produced by type II pneumocytes of the alveolar epithelium in the form of compact and highly dehydrated structures known as tubular myelin. Once secreted, these structures unfold and, upon contact with the air-liquid interface, form an interfacial monolayer connected to surfactant membranes in the subphase, thereby facilitating respiratory dynamics throughout the breathing cycle.

      When bronchoalveolar lavages are treated using the Bligh and Dyer method to extract the hydrophobic fraction of these samples, the structural complexity of the surfactant is disrupted, and this organization cannot be completely restored once the lipids are rehydrated. Although these extracts contain the hydrophobic proteins SP-B and SP-C, the hydrophilic protein SP-A may play an essential role in the formation of pulmonary surfactant structures. It is well established that SP-A is crucial for the formation of tubular myelin, an intermediate structure between the lamellar bodies newly secreted by type II cells and the interfacial surfactant layers.

      Moreover, and more importantly, bronchoalveolar lavage fluid may contain cells, tissue debris, and even bacteria that can alter the lipid composition of the samples used in the study after extraction by the Bligh and Dyer method. For this reason, most studies include a density gradient centrifugation step to isolate the surfactant membranes. Consequently, the samples used may be contaminated with phospholipids originating from other cells, such as macrophages, pneumocytes, or bacterial cells, particularly in lavages obtained from diseased animals.

      Although the techniques employed provide valuable information about the behaviour of surfactant membranes and allow certain inferences regarding their functionality, no functional studies of these samples have been conducted using methods such as the constrained drop surfactometer or the captive bubble surfactometer. The observed alterations do not necessarily demonstrate that surfactant modulates its properties, as claimed by the authors, but rather indicate that it is altered by the presence of other lipids.

      The spin-coating technique used to form lipid films for analysis by atomic force microscopy is not the most suitable approach to reproduce the structures generated by pulmonary surfactant. However, the results obtained may still provide valuable insights into the biophysical behaviour of its components. The analysis of lung tissue shown in Supplementary Figure S3 presents the same limitation, as the samples were embedded in a cutting compound, and the measurements may have been taken from different regions of the tissue. Therefore, it cannot be ensured that the analysed structures correspond to those generated by pulmonary surfactant.

      The finding that the structures formed in samples obtained from diseased animals are more tightly packed and dehydrated than those derived from the surfactant of healthy animals contrasts with the notion that the high efficiency of lamellar bodies in generating interfacial structures is related to their high degree of packing and dehydration. The formation of these structures involves the participation of the ABCA3 protein, which pumps phospholipids into the interior of lamellar bodies, and SP-B, which facilitates the formation of close membrane contacts.

      While the results are interesting from a comparative perspective, the implications for surfactant performance and respiratory dynamics should be interpreted with caution.

    4. Reviewer #3 (Public review):

      In this manuscript, the authors present data on the supposed composition of pulmonary surfactant obtained from bronchoalveolar lavages (BALs) of a small cohort of dolphins, a group of them suffering from pneumonia. The lipid compositional differences of the sample group are consistent with the different pathological situations of the specimens, suggesting that differences in surfactant composition are somehow associated (as a cause or as a consequence) with the particular pathophysiological contexts. It is particularly remarkable that an increase in cardiolipins and plasmalogens appears as an abnormal composition in pathological surfactants. The study is completed by analyzing the differences in membrane properties (order, packing, phase) of abnormal versus "control" membranes, concluding that pneumonia in dolphins is associated with a significant alteration of surfactant membranes that become more rigid, packed and thicker than those in surfactant from animals with no lung disease.

      In general terms, the data provided are of interest as they somehow offer a framework of effects that may extend what is known about alterations of composition, biophysical properties and functional performance of pulmonary surfactant as a consequence of respiratory pathologies. A collection of pertinent biophysical methodologies (fluorescence, X-ray scattering, AFM) have been applied to complete a full characterization of membrane properties in the different samples.

      However, they way the samples have been processed, i.e. by making organic extracts of hydrophobic (lipid and protein) components before surfactant membranes have been purified or at least, separated from bulk lavage, open the question of how much of the altered composition is actually occurring in surfactant or comes from other membranes (from cells, bacteria) that have been completely intermixed as a consequence of the organic extraction. Without an appropriate surfactant membrane obtention, the results of the study should be taken with caution and await confirmation. Specific questions that need to be considered include:

      (1) As said, the direct organic extract of BAL samples ends in a full mix of lipid and protein components that in origin could be part of different membranes, either from different surfactant assemblies, or even from pulmonary cells or membrane debris, or microorganisms, collected within the lavage. Obtaining conclusions about the structure and properties of membranes artefactually reconstituted from such lipid and protein mixtures is far from correct.

      It is mentioned that "subsequentially" to the organic extraction, the samples were subjected to ultracentrifugation to separate debris and membrane cells. I do not see what the ultracentrifugation is going to change if it is done after the organic extraction. It should have been done before the extraction, for the organic solvents to solubilize exclusively the large, and relatively light, surfactant membrane complexes.

      On the other hand, the ulterior reconstitution of the obtained full lipid mixture surely ends in membrane assemblies whose compositional distribution and organization may differ significantly from those in the original membranes.

      Taking all this into account, statements such as "These aggregate forms reproduce the expected membrane microstructures observed in native alveolar hypophase" or "pulmonary membranes can be successfully extracted and reconstituted from BALs of Navy dolphins" are simply not true and should be rephrased.

      One can understand that the limitation of material may make it difficult to obtain first the purified surfactant membranes and then their organic extract. However, the limitation should be acknowledged to make the readers clear that the actual compositional effects caused in surfactant by pneumonia need confirmation.

      (2) In some of the experiments, i.e. in the AFM characterization, supported membranes were prepared by the spray-dry method applied to organic solutions. Again, the spray-dry of organic lipid solutions ends in a lipid dispersion that may be very far from the real organization of the lipids in actual surfactant membranes.

      (3) When stated that phospholipid concentrations are greater in BAL from pinnipeds than in humans, how has the actual concentration been determined? BAL volumes are typically subjected to large variations depending on the conditions used to obtain the lavage (including volume of saline instilled, level of atelectasia in the lung tissue, presence of inflammation and edema, etc). If total amounts of phospholipids in BAL are to be compared, certain normalization procedures should be applied, such as for instance, with respect to the urea concentration in serum.

      (4) All the differences regarding membrane phase and lipid order/packing have been interpreted in terms of the potential coexistence of Lbeta (gel)/Lalpha (liquid crystalline) phases. However, it has been well established that in lipid systems containing cholesterol, such as pulmonary surfactant, phase coexistence can actually be of the type liquid-ordered (Lo)/liquid-disordered (Ld), very different in terms of mobility and true molecular order. Why do the authors consider that Lbeta is the phase observed in the surfactant membranes they have reconstituted? The presence of round-shaped domains seems to indicate that a liquid/liquid phase segregation is actually occurring.

      (5) In the same line as the previous comment, the authors state that SAXS shows that bovine-extracted pulmonary membranes exhibit a coexistence of two lamellar phases, one rich in unsaturated lipids and one in saturated lipids. SAXS and WAXS cannot provide compositional information, but structural parameters such as membrane thickness, or molecular order. This should be clarified.

      (6) It is mentioned that the surfactant monolayer at the air-liquid interface is interconnected to tubular membranous structures (tubular myelin, TM). It is true that TM, when present, appears interconnected with the interface. However, it is widely recognized that there are many other structures connected with the interfacial film, including multilamellar membrane arrays or reservoirs that have not been mentioned here. Furthermore, TM is not required for surfactant function, because it is absent, for instance, in mice lacking expression of surfactant protein SP-A, which can breathe perfectly.

      (7) In the Discussion, the authors mention that "...after squeeze-out, the excluded multilayers remain closely associated with the interfacial monolayer rather than escaping into the subphase". The authors may like to complete this discussion by specifying that the stable association of excluded assemblies with the interfacial film is actually possible thanks to the surfactant proteins.

    1. eLife Assessment

      This is a potentially important paper attempting to identify neutral correlates of memory engram expression in humans, and how they change during forgetting. The questions posed are clear and novel. The methods employed, namely behavioral analysis, high-resolution functional magnetic resonance imaging, and representational similarity analysis, are advanced, integrative, and appropriate. The experiments are well designed and combine analysis of recollection and familiarity of object/face associations. However, substantial questions remain as to the validity of the incomplete statistical analyses applied to the imaging data, as well as the parsing of and interpretation of the behavioral data.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript presents an ambitious attempt to examine whether episodic memory traces ("engrams") of forgotten associations persist in the human brain and whether these traces continue to influence behavior implicitly. Using 7T fMRI, the authors track 96 one-shot face-object associations across learning, 30-minute retrieval, and 24-hour retrieval, complemented by a recognition test. Participants classify each memory as sure, unsure, or guess, enabling an operational dissociation between consciously accessible and inaccessible memories.

      Strengths:

      The study addresses a timely and theoretically important question arising from rodent engram research, i.e., whether forgotten human memories leave detectable neural signatures. The use of high-resolution 7T fMRI, representational similarity analysis (RSA), and gPPI connectivity analyses aims at a detailed systems-level perspective. The results suggest that correct guess responses (i.e., when participants believe they are guessing) are accompanied by hippocampal activity and connectivity patterns that correlate with behavioral performance, potentially pointing to residual memory traces. The study also presents evidence for divergent consolidation trajectories: consciously accessible memories become more neocortically distributed after sleep, whereas inaccessible memories exhibit strengthened hippocampal signatures.

      Weaknesses:

      Despite the methodological rigor, some interpretational issues merit caution. First, the reliance on participants' subjective "guess" reports to categorize trials as forgotten is problematic. Guess responses at the 30-minute retrieval were at chance level, whereas guess responses during recognition were above chance; interpreting both as "implicit episodic memory" may conflate different mechanisms (episodic retrieval, familiarity, associative priming).

      Second, several analyses raise concerns about circularity or insufficient independence, for example, when contrasting correct vs. incorrect guess trials to locate "engram" activity and then correlating that activity with guessing accuracy. Similarly, the behavioral analyses are fragmented (multiple t-tests across conditions) rather than using a factorial model that accounts for dependencies among confidence levels and timepoints.

      Third, the choice to include only "sure" and "guess" responses discards a substantial portion of trials ("unsure"), reducing power and complicating interpretation, especially given that unsure responses show above-chance performance.

      Finally, the study's two-scanner-sequence design (small-FOV vs. whole-brain) is challenging as it complicates comparisons across analyses, especially when some critical results (e.g., hippocampal reinstatement patterns) do not consistently replicate across sequences.

      Conclusion:

      Overall, the manuscript provides preliminary evidence that neural traces of forgotten episodic memories might persist in humans and could guide behavior in the absence of conscious awareness. While interpretational caution is warranted, especially regarding the nature of "guess"-based retrieval and the independence of neural contrasts, the study makes a valuable contribution to debates on engram persistence, systems consolidation, and the role of consciousness in episodic memory.

    3. Reviewer #2 (Public review):

      Summary:

      The goal of the experiment was to identify the fMRI neural correlates of persistence and recovery of forgotten memories. A forgotten memory was defined behaviorally as successful learning, followed by failure in a recall format task, followed by next-day success in a recognition format task. The comparison is to memories that were not forgotten at any stage of the task. Various univariate, connectivity, and multivariate analyses were used to identify neural correlates of forgotten memories that were recovered, that remained forgotten, and successful memory. Some claims are made about how activity of the "episodic memory network" predicts the persistence of forgotten memories.

      Strengths:

      Studies on the persistence of forgotten memories in rodent models have been used to make some novel claims about the potential properties of engrams. Attempting similar research in humans is a laudable goal.

      Patterns of behavioral responses are consistent across subjects.

      Weaknesses:

      I do not find that the fMRI results fit the narrative provided.

      A major issue is that primary results do not replicate across the two fMRI datasets that were collected using the same task. For example, hippocampal activity associated with correct responses (confident and guess) was identified in the group receiving the fMRI scan that used a small FOV, but not in the group that received an fMRI scan of the whole brain, for both 30-min and 24-hr delays (lines 202-217). This suggests that the main findings are not even replicable internally within the same experiment. There is no reasonable justification for this.

      Next, most of the reported fMRI findings do not meet reasonable thresholds for statistical significance. In many places, the authors acknowledge this in the text by saying that a difference in the fMRI metric "tended towards significant correlation" or that comparisons "revealed non-significant mean value comparisons". It is not clear why these non-significant findings are interpreted as though they are positive findings. Beyond that, many of the reported findings are not meeting the threshold (i.e., p=0.058), without any acknowledgement that they are marginal. Beyond that, the majority of comparisons that are interpreted in the main text are not significant based on the companion information provided in the supplementary tables. That is, they are totally non-significant when using FWE or FDR correction at either the cluster or peak levels.

      Beyond this, the supplementary tables indicate that "clusters identified solely within white matter regions have been excluded." The fact that there are any findings in white matter to ignore indicates that the statistical thresholds are inappropriate. It's tantamount to seeing activation in the brain of a dead fish.

      The overall picture based on these factors is that the statistical tests did not use sufficiently stringent safeguards against false positives given the multiple comparison problem that plagues fMRI. So, there are tons of false positives, which are being selectively interpreted to tell a particular story. That is, each comparison yields lots of findings in many brain area, and those that do not fit the particular narrative are being ignored (including those in white matter). What's more, when the small FOV fMRI scan is done, the imaging volume is centered on the hippocampus and its close network, so all false positives appear to be exactly in those brain regions about which the authors want to make conclusions. When throwing darts, you will always hit a bullseye if that is all that exists. The fact that the same comparisons done in the companion whole-brain dataset do not yield the same results is telling: the analysis plan is not sufficiently rigorous to yield findings that are replicable.

      Further, I think that it is highly debatable whether the task measures the recovery of forgotten memories at all. Forgotten memories are defined as those that fail when tested using a recollection format but succeed when tested using a recognition format. The well-characterized distinction between recollection and recognition is thus being construed as telling us something about the fate of engrams. I think the much more likely alternative is that "forgotten" memories are just relatively weak memories that don't meet whatever criteria subjects typically use when making recollection judgments, and not some special category of memory. In terms of brain activation, they seem for the most part to follow the pattern of stronger memory, but weaker.

      Finally, many hypotheses are used as though they are proven. For instance, fMRI activity patterns are called "engrams" even though there are no tests to determine whether they meet reasonable criteria that have been adopted in the engram literature (e.g., necessity, sufficiency). Whatever happens over the 24-hour delay is called "consolidation" even if there is no test that consolidation has occurred. Etc. It becomes hard to differentiate what is an assumption, versus a hypothesis, versus an inference/conclusion.

    1. eLife Assessment

      This valuable study links psychological theories of chunking with a physiological implementation based on short-term synaptic plasticity and synaptic augmentation. The theoretical derivation for increased memory capacity via hierarchical chunking is solid. However, the model robustness and biological grounding of the mechanism - including many aspects that were hard-wired, chunking cues, and parameter ranges - as well as its evaluation in the task settings that motivated the study, are incomplete. Additional simulations to test robustness in more cognitively and biologically realistic settings, a systematic parameter analysis, and stronger links to prior work would substantially strengthen the manuscript and increase its impact across disciplines.

    2. Reviewer #1 (Public review):

      Summary:

      This study extends the short-term synaptic plasticity (STP)-based theory of activity-silent working memory (WM) by introducing a physiological mechanism for chunking that relies on synaptic augmentation (SA) and specialized chunking clusters. The model consists of a recurrent neural network comprising excitatory clusters representing individual items and a global inhibitory pool. The self-connections within each cluster dynamically evolve through the combined effects of STP and SA. When a chunking cue, such as a brief pause in a stimulus sequence, is presented, the chunking cluster transiently suppresses the activity of the item clusters, enabling the grouped items to be maintained as a coherent unit and subsequently reactivated in sequence. This mechanism allows the network to enhance its effective memory capacity without exceeding the number of simultaneously active clusters, which defines the basic capacity. They further derive a new upper limit of WM capacity, the new magic number. When the basic capacity is four, the upper bound for complete recall becomes eight, and the optimal hierarchical structure corresponds to a binary tree of two-item pairs forming four chunks that combine into two meta-chunks. Reanalysis of linguistic data and single-neuron recordings from human epilepsy patients (identifying boundary neurons) provides qualitative support for the model's predictions.

      Strengths:

      This study makes an important contribution to theoretical and computational neuroscience by proposing a physiologically grounded mechanism for chunking based on STP and SA. By embedding these processes in a recurrent neural network, the authors provide a unified account of how chunks can be formed, maintained, and sequentially retrieved through local circuit dynamics, rather than through top-down cognitive strategies. The work is conceptually original, analytically rigorous, and clearly presented, deriving a simple yet powerful capacity law that extends the classical magic number framework from four to eight items under hierarchical chunking. The modeling results are further supported by preliminary empirical evidence from linguistic data and single-neuron recordings in the human medial temporal lobe, lending credibility to the proposed mechanism. Overall, this is a well-designed and well-written study that offers novel insights into the neural basis of working-memory capacity and establishes a solid bridge between theoretical modeling and experimental findings.

      Weaknesses:

      This study is conceptually strong and provides an elegant theoretical framework, but several aspects limit its biological and empirical grounding.

      First, the control mechanism that triggers and suppresses chunking clusters remains only schematically defined. The model assumes that chunking events are initiated by pauses, prosodic cues, or internal control signals, but does not specify the underlying neural circuits (e.g., prefrontal-basal ganglia loops) that could mediate this gating in the brain. Clarifying where, when, and how the chunking clusters are turned on and off will be critical for establishing biological plausibility.

      Second, the network representation is simplified: item clusters are treated as non-overlapping and homogeneous, whereas real cortical circuits exhibit overlapping representations, distinct excitatory/inhibitory populations, and multiscale local and long-range connectivity. It remains unclear how robust the proposed dynamics and derived capacity limit would be under such biologically realistic conditions.

      Third, the model heavily relies on SA operating over a timescale of several seconds, yet in vivo, the time constants and prevalence of SA can vary widely across cortical regions and neuromodulatory states. The stability of the predicted "new magic number" under realistic noise levels and modulatory influences, therefore, needs to be systematically evaluated.

    3. Reviewer #2 (Public review):

      Summary:

      This work extends a previous recurrent neural network model of activity-silent working memory to account for well-established findings from psychology and neuroscience suggesting that working memory capacity constraints can be partially overcome when stimuli can be organized into chunks. This is accomplished via the introduction of specialized chunking clusters of neurons to the original model. When these chunking clusters are activated by a cue (such as a longer delay between stimuli), they rapidly suppress recently active stimulus clusters. This makes these stimulus clusters available for later retrieval via a synaptic augmentation mechanism, thereby expanding the network's overall effective capacity. Furthermore, these chunking clusters can be arranged in a hierarchical fashion, where chunking clusters are themselves chunked by higher-level chunking clusters, further expanding the network's overall effective capacity to a new "magic number", 2^{C-1} (where C is the basic capacity without chunking). In addition to illustrating the basic dynamics of the model with detailed simulations (Figures 1 and 2), the paper also utilizes qualitative predictions from the model to (re-)analyze data collected in previous experiments, including single-unit recordings from human medial temporal lobe as well as behavioral findings from a classic study of human memory.

      Strengths:

      The writing and figures are very clear, and the general topic is relevant to a broad interdisciplinary audience. The work is strongly theory-driven, but also makes some effort to engage with existing data from two empirical studies. The basic results showcasing how chunking can be achieved in an activity-silent working memory model via suppression and synaptic augmentation dynamics are interesting. Furthermore, we agree with the authors that the derivation of their new "magic number" is relatively general and could apply to other models, so those findings in particular may be of interest even to researchers using different modeling frameworks.

      Weaknesses:

      (1) Very important aspects of the model are assumed / hard-coded, raising the concern that it relies too much on an external controller, and that it would therefore be difficult to implement the same principles in a fully behaving model responsible for producing its own outputs from a sequence of stimuli (i.e., without a priori knowledge of the structure of incoming sequences).

      (i) One such aspect is the use of external chunking cues provided to the model at critical times to activate the chunking clusters. The simulations reported in the paper were conducted in a setting where signals to chunk are conveniently indicated by longer delays between stimuli. In this case, it is not difficult to imagine how an external component could detect the presence of such a delay and activate a chunking cluster in response. However, in order for the model to be more broadly applicable to different memory tasks that elicit chunking-related phenomena, a more general-purpose detector would be required (see further comments below and alternative models).

      (ii) Relatedly, and as the authors acknowledge in the discussion, the network relies on a pretty sophisticated external controller that decides when the individual chunking clusters are activated or deactivated during readout/retrieval. This seems especially complex in the hierarchical case. How might a network decide which chunking/meta-chunking clusters are activated/deactivated in which order? This was hard-coded in their simulations, but we imagine that it would be difficult to implement a general solution to this problem, especially in cases where there is ambiguity about which stimuli should be chunked, or where the structure of the incoming sequence is not known in advance.

      (iii) One of the central mechanisms of the model is the rapid synaptic plasticity in the inhibitory connections responsible for binding chunking clusters to their corresponding stimulus clusters. This mechanism again appears to have been hard-coded in the main simulations. Although we appreciate that the authors worked on one possible way that this could be implemented (Methods section D, Supplementary Figure S2), in the end, their solution seems to rely on precisely fine-tuning the timing with which stimuli are presented - a factor that seems unlikely to matter very much in humans/animals. This stands in contrast with models of working memory that rely on persistent activity, which are more robust to changes in timing. Note that we do not discount the possibility of activity-silent WM, and indeed it should be studied in its own right, but it is then even more important to highlight which of its features are dependent on the time constants, etc.

      (2) Another key shortcoming of this work is its limited direct engagement with empirical evidence and alternative computational accounts of chunking in WM. Although the efforts to re-analyze existing empirical results in light of the new predictions made by the model are commendable, in the end, we think they fall short of being convincing. As noted above, the model doesn't actually perform the same two tasks used in the human experiments, so direct quantitative comparisons between the model and human behavior or neural data are not possible. Instead, the authors rely on isolating two qualitative predictions of the model - the "dip" and "ramp" phenomena observed after a chunking cluster is activated (Figure 3), and the new magic number for effective capacity derived from the model in the case where stimuli are chunkable, which approximately converges with human recall performance in a memory study (Figure 4). Below, we highlight some specific issues related to these two sets of analyses, but the larger point is that if the model is making a commitment about how these neural mechanisms relate to behavioral phenomena, it would be important to test if the model can produce the behavioral patterns of data in experimental paradigms that have been extensively used to characterize those phenomena. For example, modern paradigms characterizing capacity limits have been more careful to isolate the contributions of WM per se (whereas the original magic number 7 is now thought to reflect a combination of episodic and working memory; see Cowan 2010). There are several existing models that more directly engage with this literature (e.g., Edin et al., 2009; Matthey et al., 2015; Nassar et al., 2018; Soni & Frank, 2025; Swan & Wyble, 2014; van den Berg et al., 2014; Wei et al., 2012), some of which also account for chunking-related phenomena (e.g., Wei et al, 2012; Nassar et al., 2018; Panichello et al., 2019; Soni & Frank, 2025). A number of related proposals suggest that WM capacity limits emerge from fundamentally different mechanisms than the one considered here - for example, content-related interference (Bays, 2014; Ma et al., 2014; Schurgin et al., 2020), or limitations in the number of content-independent pointers that can be deployed at a given time (Awh & Vogel, 2025), and/or the inherent difficulty of learning this binding problem (Soni & Frank, 2025). We think it would be worth discussing how these ideas could be considered complementary or alternatives to the ones presented here.

      (i) Single unit recordings. We found it odd that the authors chose to focus on evidence from single-unit recordings in the medial temporal lobe from a study focused on episodic memory. It was unclear how exactly these data are supposed to relate to their proposal. Is the suggestion that a mechanism similar to the boundary neurons might be operative in the case of working memory over shorter timescales in WM-related areas such as the prefrontal cortex, or that their chunking mechanism may relate not only to working memory but also to episodic memory in the medial temporal lobe?

      (ii) N-gram memory experiment. Our main complaint about the analysis of the behavioral data from the human memory study (Figure 4) is that the model clearly does not account for the main effect observed in that study - namely, the better recall observed for higher-order n-gram approximations to English. We acknowledge that this was perhaps not the main point of the analysis (which related more to the prediction about the absolute capacity limit M*), but it relates to a more general criticism that the model cannot account for chunking behavior associated with statistical learning or semantic similarity. Most of the examples used in the introduction and discussion are of this kind (e.g., expressions such as "Oh my God" or "Easier said than done", etc.). However, the chunking mechanism of the model should not have any preference for segmenting based on statistical regularities or semantic similarity - it should work just as well if statistical anomalies or semantic dissimilarity were used as external chunking cues. In our view, these kinds of effects are likely to relate to the brain's use of distributed representations that can capture semantic similarity and learn statistical regularities in the environment. Although these kinds of effects may be beyond the scope of this model, some effort could be made to highlight this in the discussion. But again, more generally, the paper would be more compelling if the model were challenged to simulate more modern experimental paradigms aimed at testing the nature of capacity limits in WM, or chunking, etc.

      (iii) There are a number of other empirical phenomena that we're not sure the model can explain. In particular, one of the hallmarks of WM capacity limits is that it suffers from a recency bias, where people are more likely to remember the most recent items at the expense of items presented prior to that (Oberauer et al 2012). [There are also studies showing primacy effects in addition to recency effects, but the primacy effects are generally attributed to episodic rather than working memory - for example, introducing a distractor task abolishes the recency but not primacy effect]. But the current model seems to make the opposite prediction: when the stimuli exceed its base capacity, it appears to forget the most recent stimuli rather than the earliest ones (Figure 1d). This seems to result from the number of representations that can be reactivated within a cycle and thus seems inherent to the dynamics of the model, but the authors can clarify if, instead, it depends on the particular values of certain parameters. (In contrast, this recency effect is captured in other models with chunking capabilities based on attractive dynamics and/or gating mechanisms - eg Boboeva et al 2023; Soni & Frank (2025)). Relatedly, we're not sure if the model could account for the more recent finding that recall is specifically enhanced when chunks occur in early serial positions compared to later ones (Thalmann, Souza, Oberauer, 2019).

    4. Reviewer #3 (Public review):

      The paper presents a synaptic mechanism for chunking in working memory, extending previous work of the last author by introducing specialized "chunking clusters", neural populations that can dynamically segment incoming items into chunks. The idea is that this enables hierarchical representations that increase the effective capacity of working memory. They also derive a theoretical bound for working memory capacity based on this idea, suggesting that hierarchical chunking expands the number of retrievable items beyond the basic WM capacity. Finally, they present neural and behavioral data related to their hypothesis.

      Strengths

      A major strength of the paper is its clear theoretical ambition of developing a mechanistic model of working memory chunking.

      Weaknesses

      Despite the inspiration in biophysical mechanisms (short-term synaptic plasticity with different time constants), the model is "cartoonish". It is unclear whether the proposed mechanism would work reliably in the presence of noise and non-zero background activity or in a more realistic implementation (e.g., a spiking network).

      As far as I know, there is no evidence for cyclic neural activation patterns, which are supposed to limit WM capacity (such as in Figure 1d). In fact, I believe there is no evidence for population bursts in WM, which are a crucial ingredient of the model. For example, Panicello et al. 2024 have found evidence for periods during which working memory decoding accuracy decreases, but no population bursts were observed in their data. In brief, my critique is that including some biophysical mechanism in an abstract model does not make the model plausible per se.

      It is claimed that "our proposed chunking mechanism applies to both the persistent-activity and periodic-activity regimes, with chunking clusters serving the same function in each", but this is not shown. If the results and model predictions are the same, irrespective of whether WM is activity-silent or persistent, I suggest highlighting this more and including the corresponding simulations.

      The empirical validations of the model are weak. The single-unit analysis is purely descriptive, without any statistical quantification of the apparent dip-ramp pattern. I agree that the dip-ramp pattern may be consistent with the proposed model, but I don't believe that this pattern is a specific prediction of the proposed model. It seems just to be an interesting observation that may be compatible with several network mechanisms involving some inhibition and a rebound.

      Moreover, the reanalyses of n-gram behavioral data do not constitute a mechanistic test of the model. The "new magic number" depends strongly on structural assumptions about how chunking operates, and it is unclear whether human working memory uses the specific hierarchical scheme required to achieve the predicted limit.

      The presentation of the modeling results is highly compressed in two figures and is rather hard to follow. Plotting the activity of different neural clusters in separate subplots or as heatmaps (x-axis time, y-axis neural population, color = firing rate) would help to clarify (Figure 1d). Also, control signals that activate the chunking clusters should be shown.

      Overall, the theoretical proposal is interesting, but its empirical grounding and biological plausibility need to be substantially reinforced.

    1. eLife Assessment

      The granularity with which neural activity in the sensorimotor cortex of mice corresponds to voluntary forelimb motion is a key open question. This paper provides compelling evidence for the encoding of low-level features like joint angles and represents an important step forward toward understanding cortical limb control signals.

    2. Reviewer #1 (Public review):

      Summary:

      This study addresses the encoding of forelimb movement parameters using a reach-to-grasp task in mice. The authors use a modified version of the water-reaching paradigm developed by Galinanes and Huber. Two-photon calcium imaging was then performed with GCaMP6f to measure activity across both the contralateral caudal forelimb area (CFA) and the forelimb portion of primary somatosensory cortex (fS1) as mice perform the reaching behavior. Established methods were used to extract the activity of imaged neurons in layer 2/3, including methods for deconvolving the calcium indicator's response function from fluorescence time series. Video-based limb tracking was performed to track the positions of several sites on the forelimb during reaching and extract numerous low-level (joint angle) and high-level (reach direction) parameters. The authors find substantial encoding of parameters for both the proximal and distal parts of the limb across both CFA and fS1, with individual neurons showing heterogeneous parameter encoding. Limb movement can be decoded similarly well from both CFA and fS1, though CFA activity enables decoding of reach direction earlier and for a more extended duration than fS1 activity. Collectively, these results indicate involvement of a broadly distributed sensorimotor region in mouse cortex in determining low-level features of limb movement during reach-to-grasp.

      Strengths:

      The technical approach is of very high quality. In particular, the decoding methods are well designed and rigorous. The use of partial correlations to distinguish correlation between cortical activity and either proximal or distal limb parameters or either low- or high-level movement parameters was very nice. The limb tracking was also of extremely high quality, and critical here to revealing the richness of distal limb movement during task performance.

      The task itself also reflects an important extension of the original work by Galinanes and Huber. The demonstration of a clear, trackable grasp component in a paradigm where mice will perform hundreds of trials per day expands the experimental opportunities for the field. This is an exciting development.

      The findings here are important and the support for them is solid. The work represents an important step forward toward understanding the cortical origins of limb control signals. One can imagine numerous extensions of this work to address basic questions that have not been reachable in other model systems.

      Collectively, these strengths made this manuscript a pleasure to read and review.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, Grier, Salimian, and Kaufman characterize the relationship between the activity of neurons in sensorimotor cortex and forelimb kinematics in mice performing a reach-to-grasp task. First, they train animals to reach to two cued targets to retrieve water reward, measure limb motion with high resolution, and characterize the stereotyped kinematics of the shoulder, elbow, wrist, and digits. Next, they find that inactivation of the caudal forelimb motor area severely impairs coordination of the limb and prevents successful performance of the task. They then use calcium imaging to measure the activity of neurons in motor and somatosensory cortex, and demonstrate that fine details of limb kinematics can be decoded with high fidelity from this activity. Finally, they show reach direction (left vs right target) can be decoded earlier in the trial from motor than from somatosensory cortex.

      Strengths:

      In my opinion, this manuscript is technically outstanding and really sets a new bar for motor systems neurophysiology in the mouse. The writing and figures are clear, and the claims are supported by the data. This study is timely, as there has been a recent trend towards recording large numbers of neurons across the brain in relatively uncontrolled tasks and inferring a widespread but coarse encoding of high-level task variables. The central finding here, that sensorimotor cortical activity reflects fine details of forelimb movement, argues against the resurgent idea of cortical equipotentiality, and in favor of a high degree of specificity in the responses of individual neurons and of the specialization of cortical areas.

      Comment on revised version:

      The authors addressed all my concerns, and in my opinion, the manuscript is suitable for publication of the Version of Record in its current form.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study addresses the encoding of forelimb movement parameters using a reach-to-grasp task in mice. The authors use a modified version of the water-reaching paradigm developed by Galinanes and Huber. Two-photon calcium imaging was then performed with GCaMP6f to measure activity across both the contralateral caudal forelimb area (CFA) and the forelimb portion of primary somatosensory cortex (fS1) as mice perform the reaching behavior. Established methods were used to extract the activity of imaged neurons in layer 2/3, including methods for deconvolving the calcium indicator's response function from fluorescence time series. Video-based limb tracking was performed to track the positions of several sites on the forelimb during reaching and extract numerous low-level (joint angle) and high-level (reach direction) parameters. The authors find substantial encoding of parameters for both the proximal and distal parts of the limb across both CFA and fS1, with individual neurons showing heterogeneous parameter encoding. Limb movement can be decoded similarly well from both CFA and fS1, though CFA activity enables decoding of reach direction earlier and for a more extended duration than fS1 activity. Collectively, these results indicate involvement of a broadly distributed sensorimotor region in mouse cortex in determining low-level features of limb movement during reach-to-grasp.

      Strengths:

      The technical approach is of very high quality. In particular, the decoding methods are well designed and rigorous. The use of partial correlations to distinguish correlation between cortical activity and either proximal or distal limb parameters or either low- or high-level movement parameters was very nice. The limb tracking was also of extremely high quality, and critical here to revealing the richness of distal limb movement during task performance.

      The task itself also reflects an important extension of the original work by Galinanes and Huber. The demonstration of a clear, trackable grasp component in a paradigm where mice will perform hundreds of trials per day expands the experimental opportunities for the field. This is an exciting development.

      The findings here are important and the support for them is solid. The work represents an important step forward toward understanding the cortical origins of limb control signals. One can imagine numerous extensions of this work to address basic questions that have not been reachable in other model systems.

      Collectively, these strengths made this manuscript a pleasure to read and review.

      Thank you!

      Weaknesses:

      In the last section of the results, the authors purport to examine the representation of "higher-level target-related signals," using the decoding of reach direction. While I think the authors are careful in their phrasing here, I think they should be more explicit about what these signals could be reflecting. The "signals" here that are used to decode direction could relate to anything - low-level signals related to limb or postural muscles, or true high-level commands that dictate only what movement downstream motor centers should execute, rather than the muscle commands that dictate how. One could imagine using a partial correlation-type approach again here to extract a signal uncorrelated with all the measured low-level parameters, but there would still be all the unmeasured ones. Again, I think it is still ok to call these "high-level signals," but I think some explicit discussion of what these signals could reflect is necessary.

      Thank you for this excellent suggestion. We have followed both pieces of the reviewer’s advice. First, we performed the suggested analysis, partialing off the kinematics then performing target classification on the residuals. This is now Figure 6S1. The analysis revealed the presence of target-related information in the neural activity after subtracting off all linear correlations with kinematics, supporting our claims that higher-level information is present in both populations. The exact timing of classifier performances varied substantially across mice, potentially due to differences in reach-to-grasp strategy, kinematic tracking fidelity, and exact spatial locations of each recorded FOV. Following the second suggestion, we have made the relevant text more careful. We now conclude simply that higher-level signals, meaning those signals that are largely unrelated to forelimb joint angle kinematics, are present but with variable timing and strengths in each area. That text now reads:

      “Target decoding performance could result from truly higher-level signals that code abstractly for target location, or alternatively could be supported by strong encoding of kinematic variables that differed between targets. To disambiguate these possibilities, we refit the linear classifier to neural data after regressing off variance related to the joint angle kinematics. The strength and exact time course of the resulting target decoding varied somewhat across animals, but the earliest portion of target decoding performance persisted in all animals after the removal of kinematics and performance remained stronger for M1-fl than S1-fl (Fig. 6S1B). We thus conclude that higher-level signals are present in both areas, but differ in their exact timing and strength. However, we note that other possible signals, such as postural changes, could not be controlled for here.”

      Related to this, I think the manuscript in general does not do an adequate job of explicitly raising the important caveats in interpreting parametric correlations in motor system signals, like those raised by Todorov, 2000. The authors do an expert job of handling the correlations, using PCA to extract uncorrelated components and using the partial correlation approach. However, more clarity about the range of possible signal types the recorded activity could reflect seems necessary.

      This is an important point, and our text could have unintentionally misled readers. We have now attempted to make this point explicit in the Discussion and in the Results for Figure 6. This Discussion text now reads:

      “Moreover, as is widely known (Todorov 2000), the exact role of these kinematically-related signals is challenging to determine from correlative measures alone; thus, determining whether these signals are used for direct movement control or instead indirectly reflect control performed elsewhere is left as a topic for future work.”

      The manuscript could also do a better job of clarifying relevant similarities and differences between the rodent and primate systems, especially given the claims about the rodent being a "first-class" system for examining the cellular and circuit basis of motor control, which I certainly agree with. Interspecies similarities and differences could be better addressed both in the Introduction, where results from both rodents and primates are intermixed (second paragraph), and in the Discussion, where more clarity on how results here agree and disagree with those from primates would be helpful. For example, the ratio of corticospinal projections targeting sensory and motor divisions of the spinal cord differs substantially between rodents and primates. As another example, the relatively high physical proximity between the typical neurons in mouse M1 and S1 compared to primates seems likely to yoke their activity together to a greater extent. There is also the relatively large extent of fS1 from which forelimb movements can be elicited through intracortical microstimulation at current levels similar to those for evoking movement from M1. All of these seem relevant in the context of findings that activity in mouse M1 and S1 are similar.

      We understand two points to address here. The first point is that we needed to be more careful to attribute previous results as being from the rodent vs. monkey. We agree. We have now revised several parts of the paper to make these distinctions clearer. The second point is about the potential benefit of a thorough review of the many ways in which primate and rodent sensorimotor systems differ. We entirely agree that this could be useful for the field. However, this is a sizable endeavor and doing it full justice is beyond what we know how to fit in the space allotted for framing our results here. We therefore sought a compromise, acknowledging how our results correspond to existing results in the primate without exhaustively accounting for how they differ. Future work will be necessary to more carefully disambiguate whether species-specific differences are due to biomechanical, neurological, ethological, or as-of-yet undetermined sources. We have incorporated your final specific points about what could produce similar information in M1 and S1 into the Discussion.

      “This may simply be a consequence of widely distributed representations of movement across mouse cortex (Musall et al. 2019; Steinmetz et al. 2019; Stringer et al. 2019), including forelimb somatosensory areas, or may be a consequence of the close physical proximity of M1-fl and S1-fl hindering development of functionally distinct representations (Tennant et al. 2011).”

      In addition, there are a number of other issues related to the interpretation of findings here that are not adequately addressed. These are described in the Recommendations for improvement.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, Grier, Salimian, and Kaufman characterize the relationship between the activity of neurons in sensorimotor cortex and forelimb kinematics in mice performing a reach-to-grasp task. First, they train animals to reach to two cued targets to retrieve water reward, measure limb motion with high resolution, and characterize the stereotyped kinematics of the shoulder, elbow, wrist, and digits. Next, they find that inactivation of the caudal forelimb motor area severely impairs coordination of the limb and prevents successful performance of the task. They then use calcium imaging to measure the activity of neurons in motor and somatosensory cortex, and demonstrate that fine details of limb kinematics can be decoded with high fidelity from this activity. Finally, they show reach direction (left vs right target) can be decoded earlier in the trial from motor than from somatosensory cortex.

      Strengths:

      In my opinion, this manuscript is technically outstanding and really sets a new bar for motor systems neurophysiology in the mouse. The writing and figures are clear, and the claims are supported by the data. This study is timely, as there has been a recent trend towards recording large numbers of neurons across the brain in relatively uncontrolled tasks and inferring a widespread but coarse encoding of high-level task variables. The central finding here, that sensorimotor cortical activity reflects fine details of forelimb movement, argues against the resurgent idea of cortical equipotentiality, and in favor of a high degree of specificity in the responses of individual neurons and of the specialization of cortical areas.

      Thank you!

      Weaknesses:

      It would be helpful for the authors to be more explicit about which models of mouse cortical function their results support or rule out, and how their findings break new conceptual ground.

      We appreciate this feedback and have attempted to make these details clearer through changes to the Introduction and Discussion. One key change is noted below:

      “The presence of detailed kinematic signals in the sensorimotor cortex supports a model of mouse sensorimotor cortex in which M1-fl and S1-fl play a strong role in shaping the fine details of reaching and grasping movements.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      In addition to the weaknesses noted above, I suggest the authors also address the following:

      The last results section is generally lacking in statistical support for claims. Statistical support should be added.

      Thank you for pointing this out, we have added more statistical support to this section.

      The consideration in the Discussion of relevant previous findings and potential explanations for the distal limb signals in mouse sensorimotor cortex is somewhat lacking. There are several specific issues:

      (1) In contrast to the present study, the studies cited in regards to a lack of motor cortical involvement did not involve dexterous movements - in fact, Kawai et al. explicitly engineered a task that did not involve dexterity to distinguish the role of motor cortex in learning from its known role in dextrous movement execution. In Kawai et al., the authors note one rat who adopted a more dexterous approach to the lever pressing task; in this rat, a motor cortical lesion did cause a longer-lasting reduction in task performance. In additional experiments reported in Kawai's PhD thesis, performance of a dextrous task does erode with motor cortex lesion, as seen in other studies, like the early rodent reaching work of Whishaw and colleagues.

      (2) Other possible explanations for the persistence of non-dexterous tasks following motor cortical removal are compensation by, or redundant functionality in, other motor system regions.

      (3) It is also worth noting that stimulation in different regions of mouse M1 and S1 evokes alternately, digit, wrist, and elbow movements in fairly similar proportions (Tennant, 2011), suggesting that descending pathways substantially target spinal circuits that control all forelimb joints.

      (4) It also seems relevant that although the recovery time course is longer, nonhuman primates also retain substantial hand control after motor cortical removal (e.g. Lashley, 1925; Glees and Cole, 1950; Passingham et al., 1983). Humans of course, appear to be a different story.

      These are good points. We have tried to make the Discussion better reflect the tension in the literature, including with this new text:

      “However, several other previous results have indirectly suggested that M1 and S1 may be involved in the details of forelimb movement. Performance suffers with inactivation or lesioning of M1 and S1 in skilled, complex manual behaviors (Guo et al 2015, Mizes et al 2024, Whishaw et al 1990) or idiosyncratic use of digits to accomplish non-dexterous tasks (Kawai 2014). The sparing of non-dexterous tasks with these lesions may also reflect redundancy in control as opposed to irrelevance of M1 and S1. Nevertheless, our finding of low-level kinematic information in sensorimotor cortex supports a role for cortex beyond simply providing redundant high-level commands to these subcortical areas.”

      We have avoided mentioning points 3 and 4 in the paper; the stimulation results might follow from activating projections not normally involved in this behavior, and discussing primates in this context would require a long list of caveats. We agree that these points are worth thinking about, but are concerned that they are too circumstantial to include in interpreting the results formally.

      Although similar decoding performance is achieved using neurons from both CFA and fS1, I am left wondering whether you would do substantially better with CFA using activity at additional preceding time points, or when using exclusively time points from the past. The primary model used here appears to use neural signals from corresponding time points to decode limb parameters, but results seemingly could be different when using preceding time points as regressors.

      We appreciate this suggestion and have added the analysis to an additional supplementary panel for Figure 5 (Figure 5S3). Incorporating lags into the decoder via a Wiener filter does indeed improve the decoding performance, but this could simply be due to the increase in the number of predictor variables. This analysis did not, however, further disambiguate M1-fl and S1-fl: the performance improvement was similar across areas for both causal and acausal lag configurations. This could be a consequence of the time resolution of calcium imaging, so further experiments with electrophysiology would be required to rule this possibility out. We now note this new result:

      “Including additional causal (-100 ms preceding) and/or acausal (-100 ms preceding to 100 following) lags improved decoding performance modestly and similarly for both areas (Fig. 5S3E-F).”

      Related to this, I am also worried about the bleeding of signals across time here. If you deconvolve and interpolate between time points, the interpolation seemingly will pull information into the past, up to half the sampling period, which here is on the order of how long it takes signals to travel to and from the limb. The authors do not make any inappropriate claims about the neural signals here reflecting causes or consequences of what is happening at the limb, but readers (like me) will still try to draw these sorts of conclusions. Is it possible that, although decoding from instantaneous signals is similar for the two regions, the M1 signals are actually motor signals related to future limb state while the S1 signals are sensory consequences? Even if many of the relevant details related to conduction times are not known, perhaps the authors could clarify what can and can't be said related to causal interpretation here.

      Thank you for suggesting further explanation here. We agree that our interpretation could be made more specific. We have added text in the Discussion section to speak more directly to what can and cannot be concluded from our analyses. In short, it is hard to be certain of lags in calcium imaging data for many reasons, and using recording methods with finer temporal resolution (like electrophysiology) will be necessary for determining the precise temporal relationships between kinematics and neural activity. In the absence of these recordings, we limit our claim to kinematic information being present in M1-fl and S1-fl neural activity and leave determining the causal role of this information to future work.

      New clarifying text in the Discussion:

      “The use of calcium imaging further prevents strong conclusions about whether activity reflects future limb states or sensory consequences. Confirming this limitation, inclusion of lagged data in the decoding models, whether causal or acausal, resulted in similar performance changes in both areas.”

      An alternative reason why lift onset is less decodable in CFA is that CFA activates substantially before lift onset, as has been observed in previous rodent studies (Kargo and Nitz, 2004; Miri et al., 2017; Veuthey et al., 2020), perhaps as some sort of movement preparation. S1, on the other hand, may not have this early activity, and so may show a clearer transient at onset when the hand and limb start to move. This seems more likely than the explanations provided by the authors.

      This is a valid possible alternative explanation and we have updated the Discussion to reflect this. This difference in the structure of M1-fl activity versus S1-fl is apparent in the projections of Figure 6A, which show M1-fl projections more clearly aligned to cue-onset than S1-fl projections.

      “Our lift time decoding results are consistent with this view and align with recent observations characterizing mouse proprioceptive forelimb cortex, (Alonso et al 2023), although an alternative explanation may be simply that M1-fl activates earlier than S1-fl during reaching (Kargo and Nitz 2004; Miri et al 2017; Veuthey et al 2020).”

      To better clarify relevant similarities and differences between the rodent and primate systems, the Introduction could include some of these similarities and differences exposed by the literature currently cited, and the Discussion could include an additional paragraph specifically relating findings here to previous observations in the primate.

      We appreciate the reviewer’s thoughtfulness on possible framings of our results. When writing this paper, framing was a major challenge for us and we drafted quite a few versions of the Introduction including some that focused more on mouse-primate comparison. In the end, we decided the most critical function of the Intro was to set up our central question, of “levels-of-sensorimotor-control”. The rich primate literature was valuable here, but getting into a protracted compare-and-contrast exercise quickly became a distraction from the point. Further, we sought to highlight the relevance and importance of the question answered in our work as the mouse has gained prominence for filling gaps that are challenging to address with primates. This paper serves as one of many early steps towards the ultimate goal of revealing general properties of sensorimotor cortical function with the mouse model. We have made some subtle changes to the Introduction that we hope will more clearly communicate this narrative. 

      We agree that a Discussion paragraph directly relating our results to those in primates would benefit our conclusions and have added one:

      “These results expand our understanding of the rodent sensorimotor system and highlight similarities to nonhuman primates. We show here evidence in mice of detailed joint angle kinematic signals from the full forelimb in M1 and S1, as has been shown in macaque cortex during tasks involving reaching and grasping objects (Vargas-Irwin et al. 2010; Saleh et al. 2010, 2012; Goodman et al. 2019; Okorokova et al. 2020). Additionally, the earlier onset of movement-related activity in M1-fl compared to S1-fl is similar to macaque M1 and S1 (Tanji and Evarts 1976). Taken together these results suggest that the mouse can be employed to address questions traditionally explored in primates about how cortical activity encodes detailed movement commands.”

      Although this is outside the scope of the present study, it would be interesting to image descending projection neurons to see what signals are conveyed downstream, and to what targets. Some signals observed in layer 2/3 may not be strongly reflected in descending projections.

      We agree that recording from descending projection neurons in this task would be of deep interest – and also agree that these experiments are beyond the scope of the present study. We look forward to performing these additional experiments in future work.

      Minor:

      (1) The use of "CFA" and “fS1” is a bit confusing. S1, like M1, is defined primarily based on histological criteria, while CFA is defined by intracortical microstimulation. CFA contains a substantial fraction of fS1, seemingly most of it based on the maps shown in Tennant et al., 2011. This is not really a criticism, as the field has not reached any sort of consensus on this nomenclature yet.

      We are similarly unhappy with the inconsistency of the terminology in the field, and struggled with how not to make it worse.  After much debate and consultation with colleagues, we decided to use “M1” and “S1” to evoke the century of literature on these areas; and “-fl” to indicate forelimb because it is more intuitive than “-ul” and avoids using the illegible “-ll” for hindlimb (relevant to our subsequent paper). For what we called M1-fl, we recorded where we did because anecdotally we saw similar responses across that swath; but note that this definition is also consistent with the definition of “MOp-ul” found with multimodal mapping by

      Munoz-Castaneda (2021), which extends a little anteriorly of MOp as defined by the Allen CCF. As the field continues to mature, we hope future work can converge on a set of shared terms.

      (2) Page 4: "Inactivations and lesions of M1 and S1 have shown that M1 is required for the execution of dexterous reach-to-grasp movements" - to me, earlier work from Whishaw and colleagues deserves to be cited here.

      We appreciate the suggestion and have updated the references in this section to better reflect the prior work from Whishaw and other researchers.

      (3) Page 5: "evoking sufficient trial-to-trial variability to avoid model overfitting." - what I think the authors are referring to here is a particular kind of "overfitting," the consequence of not exploring the full movement space, as opposed to model overfitting from issues with the model-fitting method itself. Rather than just saying overfitting, the authors could be clearer about what they are referring to.

      The reviewer is right; the phenomenon we intended to refer to is not properly termed overfitting. Specifically, we meant that data with restricted range does not necessarily express global structure, and models can therefore incorrectly fit them. For example, fitting a linear model to data including many periods of a sine wave will correctly show a zero-slope linear component, but fitting to only a portion of a single cycle will typically yield a nonzero slope. This is not overfitting, is not exactly underfitting (because the relevant structure is barely present in the data, as opposed to missed by an insufficiently powerful model), is not bias (the data are fit well), and is not even necessarily a problem (the local relationship may be what you are interested in). Yet, it does not reflect the larger structure of the data.

      We do not know of a standard term for this phenomenon, so instead of dragging the reader through this tangential argument, we have tried to offer a simpler motivation for using multiple targets:

      “Assessing the relationship between neural activity and the details of movement requires striking a balance between achieving repeatable behavior and evoking sufficient trial-to-trial variability to broadly sample movement space”.

      (4) Page 5: Caudal Forelimb Area should not be capitalized.

      Obviated with the change in area nomenclature.

      (5) Page 7: "of linearly independent degrees of freedom" - for a neuroscience audience, I think it is better to explicitly mention that the resulting PCs are uncorrelated.

      We agree that this section could benefit from clarification. We have attempted to provide additional nuance to indicate what the analysis was intended to test.

      “Despite the strong coupling between the proximal and distal joint angles, rich variation remained in the action of different joints over time. The presence of strong correlations across joints suggested that the kinematics may be well described by a smaller number of independent degrees of freedom than the total number of recorded angles. To assess the number of linearly independent (uncorrelated) degrees of freedom amongst the 24 joint angles and velocities, we used double-cross-validated PCA (Yu et al. 2009); Methods; Fig. 3D), finding intermediate dimensionalities of 7 (median for joint angles) and 10 (velocities; Fig. 3E). This is consistent with the idea that joint angles across the limb are coordinated instead of controlled independently, and that this coordination is flexible enough over time to enable accurately performing reaching and grasping to different targets.”

      (6) Page 7: In the Results, the authors should mention what indicator is being used, the imaging frame rate, and summarize briefly how cells were defined.

      Thank you for the suggestion, these details have been added to the relevant results section for clarity.

      “To do so, we recorded neural activity from neurons in layer 2/3 M1-fl extending into the immediately adjacent secondary motor cortex (M2), and the forelimb region of S1 (S1-fl) using two-photon calcium imaging of GCaMP6f-expressing neurons in layer 2/3 (185-230 μm deep, imaged at 31 Hz, cells extracted with Suite2p (Pachitariu et al 2017)).”

      (7) Page 7: "corrected at n=2" - n doesn't typically refer to the number of tests, so for clarity I would say "corrected for dual tests."

      Thank you for pointing this out, we have corrected the text and added additional explanation in the methods for our approach to determining statistical significance across the targets and locking events.

      “P-values obtained through the ZETA were then Bonferroni corrected for dual tests when measuring the number of cells modulated to a given event and corrected for six tests (2 targets and 3 events) when measuring the overall number of modulated cells.”

      (8) Page 7: In the Results, when the decoding is introduced, it would be helpful to have a few details without having to hunt through the Methods. For example, were things regularized, how was cross-validation handled, etc?

      Thank you for the suggestion, these details have been added to the relevant results section for clarity.

      A simple linear regression model related the single-trial joint angles at all time points to single-trial neural activity at the corresponding moments. The model was fit with ridge regression, the ridge penalty was determined via a heuristic (Karabatsos 2018), and performance was measured on held-out trials (80/20 train/test split, 50 folds).

      (9) Page 8: I think it is worth noting how much mouse reaching involves shoulder rotation as opposed to movement in other joints, as this seems very different from primates.

      Thank you for pointing this out. We think this is mostly a task difference: our mice were in a quadrupedal stance, whereas monkeys are typically asked to reach from a sitting position. We now mention this in the Results. 

      “Reaching evoked particularly large rotation of the shoulder, likely because the mice reached from a quadrupedal position to targets on either side of the snout.”

      (10) Page 8: Should provide quantification to clarify what is meant by "closely tracked."

      We have updated the text to indicate that this claim was meant to be qualitative, and to more clearly highlight that the interest here is the first demonstration of the ability to reconstruct valid forelimb postures from decoded joint angles in the mouse. Quantifying the reconstruction properly would require substantially more manual data labeling, and the successful decoding itself demonstrates indirectly that the reconstructions are good enough to obtain the results of interest.

      Additionally, we reconstructed the skeletal representation of the forelimb from the decoded joint angles and found that, as intended, the reconstructed postures had strong qualitative resemblance to the true postures, even of “minor” angles like cylindrical paw deformation or digit splay (Fig. 5C,G).

      (11) Page 8: "Overall, these results suggest that instantaneous movement-related signals are similarly distributed across CFA and fS1." - I know we are being succinct here, but this sentence sounds like a non sequitur in the context of this paragraph - perhaps include a conclusion from the results in this paragraph first, then summarize the whole section.

      Thank you for the suggestion, we have updated this text to more clearly conclude the results of this section.

      Overall, these results reveal that neural activity in M1-fl and S1-fl is closely related to the kinematic details of reach-to-grasp movements. The ability to decode substantial variance in proximal and distal joints suggests that this relationship extends to the entire forelimb and the similar performance obtained from each area suggests that this information is similarly distributed across M1-fl and S1-fl. 

      (12) Page 10: Mention of projections from fS1 does not explicitly specify their preferential targeting of the dorsal horn, which seems relevant.

      We appreciate the suggestion and have added this detail to the text.

      Rodent S1-fl is known to influence interneuron populations in the spinal cord through direct and indirect projections that predominantly target the dorsal horn (Ueno et al. 2018), thus these signals may also reflect S1-fl’s important role in modulating reflex circuits to coordinate sensory feedback with movement generation (Moreno-López et al. 2016; Moreno-Lopez et al. 2021; Seki et al. 2003).

      (13) Page 31: Labels on the figure indicating what blue and red stand for would be helpful.

      Thank you for the suggestion, labels have been added to indicate left and right trials for Figure 5 C/F and Figure 6A.

      (14) Page 32: Legend does not include panel D.

      Thank you for catching this, the corresponding caption has been added.

      Reviewer #2 (Recommendations for the authors):

      (1) The Introduction could perhaps set the central question in starker relief. What specifically do the authors mean by high- vs low-level control? As suggested by the cited studies, this has been a fraught issue in primate work for decades, and I think a finer-grained framing of alternative hypotheses would help set up the results. For example, would better performance at decoding joint angles than paw position be evidence for lower-level control? The clarity of the Introduction might also be improved if the facts and unknowns were broken down by species throughout.

      We have tried to further improve the focus of the Introduction on the central question, clarify what we mean, and make clearer in the review of the literature which species a finding comes from.

      The clarifying text from the introduction is quoted below:

      Extensive motor mapping experiments in rodents have revealed that activating different parts of the sensorimotor cortex evokes movements of different body parts or different kinds of movements of the same body part, as it does in primates (for review, see (Harrison and Murphy 2014)). Yet it is unclear how the topography of stimulation-evoked movements relates to the roles of these areas during volitional actions. Perturbations during behavioral tasks in mice involving forelimb lever or reaching movements have provided a coarse-level understanding of how these areas contribute during behavior. Inactivations and lesions of M1 and S1 have shown that M1 is required for the execution of dexterous reach-to-grasp movements (Guo et al. 2015; Sauerbrei et al. 2020; Galiñanes et al. 2018; Wang et al. 2017; Whishaw et al. 1991; Whishaw 2000) and that S1 is essential for adapting learned movements to external perturbations of a joystick (Mathis et al. 2017). However, spinal cord projections from mouse M1 and S1 primarily target spinal interneurons rather than directly synapsing onto motor neurons (Gu et al. 2017; Ueno et al. 2018; Wang et al. 2017), suggesting cortical activity might play a more modulatory role. Further, stimulation of brainstem nuclei alone can evoke naturalistic forelimb actions, including realistic reaching movements involving coordinated flexion and extension of the proximal and distal limb (Esposito et al. 2014; Ruder et al. 2021; Yang et al. 2023). Taken together, these results have raised the question of what role mouse M1 and S1 play in the control of goal-directed forelimb movements. 

      One route to answering this question involves characterizing the signals present in mouse M1 and S1 during movement. If mouse M1 and S1 were to control only high-level aspects of forelimb movements, activity should be dominated by ‘abstract’ signals like target location and reflect little trial-to-trial variability in reach kinematics. If instead M1 and S1 control low-level movement features then activity should correlate strongly with forelimb joint angle kinematics and their trial-to-trial variation when reaching to different targets. While the presence of high- or low-level signals in a cortical area does not necessarily imply that they are causally responsible for these aspects of movement, characterizing what signals are present serves as a first step toward determining how these areas relate to movement.

      (2) The kinematics and calcium traces appear to be highly stereotyped across trials. If the population encodes joint angles, would one expect to find correlations between the neural and kinematic residuals after subtraction of the time-varying means? Some additional analysis and/or discussion on this point would be helpful, especially as there are only two targets.

      This is a great idea. As suggested, we implemented regression models on the residuals for each target in the new Figure 5S3. Figure 5S3 A and B show the performance when decoding the residuals for right trials and C and D show performance for left trials. Decoding remained well above chance, despite shrinking down due to predicting this relatively small within-target variation. This analysis supports our claims from the main regression models in Figure 5 and 5S1-2, and also suggests that movements ipsilateral to the reaching limb (contralateral to the recording hemisphere) may be better encoded than movements contralateral to the reaching limb. We have added a reference to this additional residual analysis in the final paragraph of the decoding section of the Results section:

      “Finally, we tested whether the ability to decode these many joint angles was a direct consequence of inter-joint correlations, and might not be indicative of the presence of “real” information about some of these joints. To do so, we fit partial correlation models that removed correlations between proximal and distal joints, or removed correlations of the joint angles with a high-level parameter – the overall distance of the paw centroid to the spout. Despite substantially lowering the behavioral variance, in each case the residuals could still be decoded from neural activity (Fig 5S2A-D). Similar decoding performance for M1-fl and S1-fl was obtained from models fit to decode single-trial residuals separately for left and right trials (Fig 5S3A-D), indicating that trial-to-trial variations on each basic movement were decodable from these populations.”

      Along similar lines, binary classification is used to characterize cue-, lift-, and contact-responsive neurons. Is it possible to exploit trial-to-trial variation in the cue-lift and lift-contact latencies to extract the time-varying marginal effects of each event (e.g., using a GLM)?

      For the detection of single-cell modulations by different events, we have elected to retain our simple statistical test to determine modulation; in our experience, encoding models typically involve a surprising number of steps to get them to do what you actually intend. We leave more extensive encoding model-style analysis to future work, currently in progress.

      (3) The authors mention prior studies suggesting that the control of some forelimb tasks can be gradually transferred from the cortex to the subcortical centers. Have they performed the inactivation at different time points across learning, and if so, do they have evidence for a diminishing effect over time (e.g., blocking of both initiation and coordination early in training)? In addition, the effects of motor cortex inactivation are similar to, but slightly different from, effects shown in reaching tasks in prior studies. Some additional discussion on this point would be useful.

      Our inactivation experiments in this study were intended to coarsely demonstrate the involvement of mouse forelimb sensorimotor cortex in our task. We have not performed the inactivations over learning and leave such experiments to future work. 

      We agree that a little more clarity relating our results to previous ones was warranted. Previous studies (Guo et al. 2015 and Galinanes et al. 2018) have demonstrated inactivation impacts on similar tasks, but for thoroughness we sought to show the same for our task as it varied from the pellet and motorized water spout tasks in both training time and target configurations. Our results are strongly in line with those of Galinanes et al. 2018 which used a fairly similar water spout target configuration. In the inactivation experiments of that paper, 3 out of 13 animals with initiation-triggered inactivations were able to initiate reaching within a time window similar to control trials. Additionally, a proportion of trials across multiple mice proceeded with little perturbation from the inactivations. This is consistent with our observation that M1-fl inactivations may either abolish movement initiation or allow movement initiation but impair task completion on a trial-by-trial and animal-to-animal basis. Further work is required to determine what factors influence these differential responses to inactivation and to determine how these effects differ across task variations (i.e., pellet vs water spout). We have added a brief description of these nuances to the text for clarity. 

      “These inactivations blocked the execution of the reach to grasp sequence, preventing the animal from making contact with the spout during the 3-second laser stimulation period (Fig. 1F; 86.5% control trials with contact within 3 seconds of cue, 5.1% inactivation trials with contact, P < 10<sup>-191</sup>, Mann-Whitney U test, 2 mice, 495 stimulation trials). Interestingly, inactivation at the time of cue often did not prevent reach initiation (mouse 1: 54.7%, mouse 2: 34.2% of inactivation trials with lift within 3 seconds; 93.5%, 86.2% control trials). Yet the movement stalled once the paw and digits extended towards the spout, producing uncoordinated and unsuccessful reaching trajectories (Fig. 1I, two representative datasets). Taken together, these results support the involvement of M1-fl in the water-reaching task and suggest that the strength of inactivation effects may depend on specific task details like training time or target configuration (c.f. Galinanes et al. 2018).”

      Minor points

      (1) The rationale for the multiple comparisons procedure in identifying event-locked responses should be explained in more detail. If I understand correctly, the authors are not correcting for comparisons across ROIs, but instead control the family-wise error rate across brain regions and event types (dividing alpha by two or six). Why not instead control the false discovery rate across ROIs? 

      Thank you for pointing this out, it was confusing as written and we received a similar comment from Reviewer 1. We have fixed the wording now to make it clearer why we did this. We simply aimed to describe how many of the recorded neurons in each area were modulated by the task as a proxy for the engagement of these areas during the behavior, and to use this measure of modulation as a criterion for including the neuron in subsequent analysis. In other words, if the question had been “are any neurons in this area modulated by the task?” then correcting for the number of ROIs would be the correct method; but if the question is, “is this neuron probably modulated and therefore worth including in my decoder?” correcting for the number of ROIs will typically be much too conservative. Thus, we only sought to correct for the false discovery rate across events and targets for each ROI. We have added additional text in the methods to clarify these choices, below. Please also see response to (7) from Reviewer 1 above.

      “Note that we did not correct for the number of ROIs tested for two reasons. First, the goal of this testing was to serve as a criterion for inclusion in subsequent decoding analyses, not to determine whether any neurons in the area at all were modulated; and second, correcting for the number of ROIs would bias comparison between areas if different numbers of ROIs were recorded in one area vs. the other.”

      (2) It appears joint angles are treated as linear variables in the decoding analysis; is this correct? This seems reasonable as long as the range of motion is not too large, but the authors might briefly comment on the issue in the Methods. 

      Yes, all joint angles are treated as linear variables in the linear regression model. We observed empirically (as can be seen in Figure 3B and Figure 5B/F) that the joint angle variables were relatively constrained to specific ranges during the task, with no angles displaying substantial wrap-around during the reaching and grasping movements. It is true that use of nonlinear decoding would almost surely improve performance further. Future work could also compare decoding of joint angles with muscle forces, which correlate and which we made no effort to distinguish here. In this work, though, the demonstration of a substantial relationship between neural activity and kinematics already tells us that fine details of movement are present in the M1 and S1-fl populations, which is a critical fact to understand these areas and was not previously known. We now comment explicitly on this, as suggested.

      “Joint angle or velocity kinematics were linearly interpolated from their original 6.66 ms to 10 ms and smoothed with a Gaussian (15 ms s.d.). These angular variables were then treated linearly in decoding analyses as their ranges were relatively constrained during the reaching and grasping movements; although the true relationships are likely nonlinear, this serves as a sufficient approximation to demonstrate the presence of a relationship between neural activity and kinematics.”

      (3) Are the limb pose estimates mirrored along the mediolateral axis? Figures 1C and 2D appear to show reaches to the left spout on the animal's right.

      Thank you for pointing out the ambiguity in the display of these data. The reach trajectories were not mirrored along the mediolateral axis, but they are displayed from the perspective of the behavioral imaging cameras as shown in Figure 1A. Thus the right target reaches (ipsilateral to the animal’s reaching arm) are on the left side of the camera image and the left target reaches (contralateral to the animal’s reaching arm) are on the right side of the image. We have clarified this in the figure captions.

    1. eLife Assessment

      This important study uses an original method to address the longstanding question of why reaching movements are often biased. The combination of a wide range of experimental conditions and computational modeling is a strength. Convincing evidence is presented in support of the main claim that most of the biases in 2-D movement planning originate in misalignment between visuo-proprioceptive reference frames.

    2. Reviewer #1 (Public review):

      Wang et al. studied an old, still unresolved problem: Why are reaching movements often biased? Using data from a set of new experiments and from earlier studies, they identified how the bias in reach direction varies with movement direction and movement extent, and how this depends on factors such as the hand used, the presence of visual feedback, the size and location of the workspace, the visibility of the start position and implicit sensorimotor adaptation. They then examined whether a target bias, a proprioceptive bias, a bias in the transformation from visual to proprioceptive coordinates and/or biomechanical factors could explain the observed patterns of biases. The authors conclude that biases are best explained by a combination of transformation and target biases.

      A strength of this study is that it used a wide range of experimental conditions with also a high resolution of movement directions and large numbers of participants, which produced a much more complete picture of the factors determining movement biases than previous studies did. The study used an original, powerful and elegant method to distinguish between the various possible origins of motor bias, based on the number of peaks in the motor bias plotted as a function of movement direction. The biomechanical explanation of motor biases could not be tested in this way, but this explanation was excluded in a different way using data on implicit sensorimotor adaptation. This was also an elegant method as it allowed the authors to test biomechanical explanations without the need to commit to a certain biomechanical cost function.

      Overall, the authors have done a good job mapping out reaching biases in a wide range of conditions, revealing new patterns in one of the most basic tasks, and the evidence for the proposed origins is convincing. The study will likely have substantial impact on the field, as the approach taken is easily applicable to other experimental conditions. As such, the study can spark future research on the origin of reaching biases.

      Comments on revisions:

      The authors have addressed my concerns convincingly. The inclusion of the data on movement extent, and the comparison with the data and explanation of Gordon et al. (1994), has strengthened the paper, as it shows that the proposed model can also explain biases in movement extent. I also appreciate the addition of the mathematical analysis, although I suspect that this analysis can be developed further to yield more detailed insights into the conditions under which the 1-, 2- and 4-peaked patterns arise, but that is a more suitable question for follow-up work.

    3. Reviewer #2 (Public review):

      Summary:

      This work examines an important question in the planning and control of reaching movements - where do biases in our reaching movements arise and what might this tell us about the planning process. They compare several different computational models to explain the results from a range of experiments including those within the literature. Overall, they highlight that motor biases are primarily caused errors in the transformation between eye and hand reference frames. One strength of the paper is the large numbers of participants studied across many experiments. However, one weakness is that most of the experiments follow a very similar planar reaching design - with slicing movements through targets rather than stopping within a target. This is partially addressed with Exp 4. This work provides a valuable insight into the biases that govern reaching movements. While the evidence is solid for planar reaching movements, further support in the manner of 3D reaching movements would help strengthen the findings.

      Strengths:

      The work uses a large number of participants both with studies in the laboratory which can be controlled well and a huge number of participants via online studies. In addition, they use a large number of reaching directions allowing careful comparison across models. Together these allow a clear comparison between models which is much stronger than would usually be performed.

      Comments on revisions:

      I thank the authors for all the additions to the manuscript, which has addressed my concerns.

    4. Reviewer #3 (Public review):

      This study makes excellent use of a uniquely large dataset of reaching movements collected over several decades to evaluate the origins of systematic motor biases. The analyses convincingly demonstrate that these biases are not explained by errors in sensed hand position or by biomechanical constraints, but instead arise from a misalignment between eye-centric and body-centric representations of position. By testing multiple computational models across diverse contexts-including different effectors, visible versus occluded start positions-the authors provide strong evidence for their transformation model. My earlier concerns have been addressed, and I find the work to be a significant and timely contribution that will be of broad interest to researchers studying visuomotor control, perception, and sensorimotor integration.

      Comments on revisions:

      None

    5. Author response:

      The following is the authors’ response to the previous reviews

      General recommendations (from the Reviewing Editor):

      The reviewers agreed that addressing some specific concerns would improve the clarity of the paper and the strength of the conclusions. These points are listed below, and described in more detail in the reviewer-specific 'Recommendations for Authors':

      We thanks the editor and reviewers for the encouraging feedback and constructive comments. We provide our point-by-point response below.

      (1) The details of the new experiment including number of subjects and a description of the analysis should be provided in the main text.

      We now provide a detailed description of the methods (including the number of subjects; N = 30) and analyses for the new experiment. See our response to Reviewer 2 for more details.

      (2) It would be informative to see how the amplitude biases observed, agree with those found by Gordon et al. 1994.

      Addressed. Please see our response to Reviewer 1, comment 1.

      (3) Each of the models lead to different bias patterns. It would be very helpful to hear the author's interpretation, ideally with a mathematical explanation, of what leads to these distinct patterns.

      Addressed. Please see our response to Reviewer 1, comment 2.

      Reviewer #1 (Recommendations for the authors):

      (1) Most of my points have been addressed convincingly in this revision. The new experiment in which also biases in movement amplitude were determined is a welcome addition to the paper. However, I could not see the results of this study, as the authors did not include Fig. 4 in the manuscript, but repeated Fig. 3. That's unfortunate as I would have like to see the similarity between the biases in direction and amplitude. Moreover, I would have liked to see how the amplitude biases agree with those found by Gordon et al. EBR (1994) 99:112-130, and to which extent Gordon et al.'s explanation can explain the pattern.

      We apologize for including the incorrect figure in the previous version of our manuscript. We did make a correction and submitted a corrected version, but it appears that it didn’t make its way to you. The correct Figure 4 is now in the manuscript.

      The motor biases in amplitude (extent) observed in Experiment 4 (Author response image 1) are qualitatively similar to the pattern reported by Gordon et al. 1994. While the exact peaks do not match perfectly, both datasets show a two-peaked pattern.

      Gordon et al. (1994) attributed the bias in amplitude to direction-dependent variation in movement speed which, in their view, arise from anisotropies in limb inertia. Specifically, moving the upper arm along its quasiorthogonal direction (i.e., rotation about the elbow) requires lower effective inertia than moving parallel to the upper-arm axis. Given the arm posture in both datasets, the upper limb points toward ~135°/315°, with the orthogonal direction corresponding to ~45°/225°. The two-peaked speed profiles in both our data Author response image 1 and Gordon et al. are consistent with this prediction.

      Author response image 1.

      Gordon et al (1994) noted that, while the extent bias function should mirror the speed bias function, the motor planning system might proactively compensate for the speed bias. Indeed, while the extent and speed bias functions are roughly aligned in their study, the two are misaligned in our Experiment 4. For example, the speed function peaks around 45° which corresponds to a valley in the extent bias function. The difference between their data and ours could be due to a difference in the starting point configuration. However, their model predicts alignment of the speed and extent functions independent of starting point configuration. In contrast, the TR+TG model does predict our observed extent bias function and yields predictions about how this should change with different start point configurations. As such, while heterogeneity in movement speed may contribute to extent bias to some degree, we think the transformation bias and visual-target bias likely play a larger role in determining the amplitude bias observed extent bias at movement endpoint.

      We have added a discussion section about the bias function reported by Gordon et al. (1994) and their account in the manuscript (lines 482-493). We do not repeat it here, as the content largely overlaps with the response above.

      (2) One of the most important new insights from this study is that the three single-source models lead to different bias patterns, with 1, 2 or 4 peaks. However, what I miss in the paper is an intuitive explanation why they do so. Now, the models are described and their predictions are shown, but it remains unclear where these distinct patterns come from. As scientists, we want to understand things, so I would very much appreciate if the authors can provide such an intuitive explanation, for instance using a mathematical proof. That could also identify how general these patterns are, or if there are certain requirements for them to occur (such as a certain shape of the transformation bias).

      Note that the closed-form mathematical expression for the motor bias function is not straight forward. As such, the intuition comes primarily from inspection, that is, the model simulations themselves, what we show Figure 1 of the paper. Importantly, the model predictions are insensitive to the parameter values over a reasonable range. Thus, the number of peaks predicted by each model is a core distinguishing feature. We present in the Supplementary Results a formalized mathematical analysis to illustrate how different models produce different numbers of peaks in the movement-bias function.

      (3) I think it's a good idea to change the previous "Visual Bias" into a "Target Bias". This raises the question whether the "Prioprioceptive Bias" should not be changed into a "Hand Bias" or "Start Bias"?

      While we appreciate the reviewer’s point here, we prefer the term “Proprioceptive Bias” given that this term has been used in the literature and provides a contrast with sources of bias arising from vision. “Hand Bias” and "Start Bias” seem more ambiguous.

      L51: I think "would fall short" should be replaced by "would overshoot".

      L127: I think "biased toward the vertical axis" should be replaced by "biased away from the vertical axis". Figure 3 still contains the old terminology like T+V. Please replace by the new terminology. L255: Replace "Exp 1a" by "Exp 1b".

      L376: Replace 60 by 6.

      L831-2: I hope the summed LL was maximized, not minimized.

      Thanks for catching the typos. We have corrected all of them.

      Reviewer #2 (Recommendations for the authors):

      I think that Experiment 4 does not mention how many participants performed the study. (Only in the response to the reviewers I found this)

      We have added information regarding the number of participants in the Fig 4 (N=30).

      I am very happy that the authors added the biomechanical simulation into the paper. I am not convinced that this addressed my concerns exactly but it is an excellent addition and the authors have now adjusted the text appropriately.

      We appreciate the positive response to our additional assessment of biomechanical factors. We welcome any additional information on how we might fully address this issue.

      line 826: extend -> extent

      Corrected.

      Figure 4. I think that the authors have put the wrong figure here. I cannot see any data for extent. I would need to see this figure (or please correct me - but the caption doesn't match the figure and I don't see the results clearly. (I think the review might have the correct figure).

      We apologize for this mistake. We now provided the correct Figure 4 in the paper (also included in the first page of the response letter).

      I am missing the detailed description on when the direction error and distance error were calculated for exp 4 - and what exactly was used? How did the authors examine the values without correction? What time point was used? Did I miss the analysis section for this?

      Participants were instructed to make fast, straight movement without any corrections and were given up to 1 s to complete the movement. Hand position was recorded once the movement speed dropped below 1 cm/s. On 99.8% of trials, movement speed did not increase once this threshold was passed, indicating that the participants adhered to the instructions. On the remaining trials, we detected a secondary corrective movement (increase in speed >5 cm/s). On these trials, we used the position recorded when the movement speed initially dropped below 1 cm/s as the endpoint position. The pattern of results would be the same were we to exclude these trials.

      This information has been added to the Methods section (line 661-666).

    1. eLife Assessment

      This valuable study assesses through simulations how several features of local cortical circuits - interneuron subtypes, their specific targeting of dendritic compartments, and certain brain rhythms - together affect the integration of synaptic inputs by a pyramidal cell. Employing several carefully considered simulation setups they convincingly demonstrate that beta rhythms are best suited to modulate and control dendritic Ca-spikes while gamma rhythms affect their coupling to somatic spiking, or how basal inputs are directly integrated into somatic spikes. However, the baseline setup may be idealized for the generation of the events in question and it would be beneficial if the similarity to the in-vivo activity regime was demonstrated further. The results will be relevant for neuroscientists studying local circuits or developing more abstract theories at the systems level.

    2. Reviewer #1 (Public review):

      In this study, the authors explore the implications of two types of rhythmic inhibition - "gamma" (30-80 Hz) and "beta"(13-30Hz) - for synaptic integration. They study this in a multi-compartmental model L5 pyramidal neuron with Poisson excitation and rhythmic inhibition (16 Hz and 64 Hz), applied either to the perisomatic or apical tuft regions in the neuron. They find that 64 Hz inhibition applied to the cell body is effective in phasic modulation of AP generation, while 16 Hz inhibition applied to the apical tufts is effective in phasic modulation of dendritic spikes (in addition to APs). Switching the location of the two kinds of rhythmic inhibition reduces the overall excitability, but is not effective in phasic modulation of either dendritic spikes and weakly so for somatic APs.

      Strengths:

      The effect of the timescale of rhythmic inhibition on synaptic integration is an interesting question, since a) rhythmic spiking is most strongly evident in inhibitory population, b) rhythmic spiking is modulated by behavioral states and the sensory environment. The methods are clear and data are well-presented. The study systematically explores the effect of two frequencies of rhythmic inhibition in a biophysically detailed model. The study considers not only idealized rhythmic inhibition but also the bursty kind that is observed in in-vivo conditions. Both distributed and clustered excitatory synaptic organization are simulated, which covers the two extremes of the spatial organization of excitatory inputs in-vivo.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript illustrates how spatial targeting (perisomatic vs distal, apical and basal dendritic) and timing of inhibition is crucial to distinct effects on neuronal integration, and show that beta and gamma oscillations differentially engage dendritic spiking mechanisms.

      Strengths:

      The strength of this study lies in the integrative biophysical modelling of a layer 5 pyramidal neuron by bringing together in vitro and in vivo observations

      Weaknesses:

      The weaknesses are probably in some of the parameterization of inhibitory synaptic dynamics. A unitary peak conductance of 1nS is very high for inhibitory synapses. This high value could invariably skew some of the network-level predictions. The authors could obtain specific parameters from the Neocortical Collaboration Portal (https://bbp.epfl.ch/nmc-portal/microcircuit.html), which comes across an incredible resource for cortical neurons and synapses.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      SOM+ interneurons such as Martinotti cells target the apical tufts of pyramidals in the cortex. Since interneurons in general are strongly implicated in mediating rhythmic population activity over a range of timescales, it is quite appropriate to study the consequence of rhythmic inhibition provided by SOM+ interneurons for synaptic integration, including the phenomenon of dendritic spikes. However, using conclusions from a singular study (ref 22) to identify the beta band as the rhythm mediated by SOM+ is not very accurate. SOM+ interneurons have been implicated in regulating rhythms centered just below 30 Hz (refs 22, 21). It is a range that lies in the grey zone of the traditional definition of beta and gamma. However, it is significantly higher than the 16 Hz rhythms explored in this study. It thus remains unknown how a 25-30 Hz rhythmic inhibition (that has an experimentally suggested role for dendrite targeting SOM+ INs) in apical tufts regulates dendritic spikes.

      We agree with the reviewer that the rhythms arising from SOM+ interneurons can extend their frequencies higher than the 16 Hz analyzed in this study. To address this, we have conducted a new set of simulations where we delivered distal dendritic inhibition across a range of frequencies, from 0.5 to 80 Hz (see new Results section “Frequency specific effects of rhythmic inhibition on neuronal integration”). These results revealed, surprisingly, that at 30 Hz their ability to entrain Ca<sup>2+</sup> and NMDA spikes degrades (but not Na<sup>+</sup> spikes). This suggests that beta rhythms in the 20-30 Hz range are operating at the highest frequency for which dendritically targeting inhibition will be effective. The implications are covered in the Discussion section “Interaction with microcircuitry”. They are:

      “Particularly in the visual cortex, SOM interneurons can generate a rhythm in the 25-30 Hz range [22]. We found this to be at the upper end of the frequency range for dendritic inhibitory rhythms to be effective in modulating NMDA and Ca<sup>2+</sup> spikes. If this rhythm solely recruited SOM interneurons, its effectiveness would be marginal. Potentially compensating for this, recent work has found that PV interneurons also participate in beta/low-gamma [23, 24] (but see [21, 22]). In our model, on its own when beta rhythmic inhibition was delivered perisomatically we found that it was less able to entrain spiking and had an overall hyperpolarizing effect. However, if delivered in conjunction with the distal dendritic inhibition arising from SOM interneurons, this may strengthen entrainment.”

      Distal dendritic inhibition has been previously shown to be more effective in controlling dendritic spikes. However, given the slow timescale of dendritic spikes, it can be hypothesized that high-frequency rhythmic inhibition would be ineffective in entraining the dendritic spikes either in distal or proximal location, as demonstrated by 4H and 5F, and vice versa. A computational study can take this further by exploring the robustness of this hypothesis. By sticking to a single-frequency definition of what constitutes Gamma (64 Hz) and Beta (16 Hz) inhibition, the current exploration does support the core hypothesis. However, given the temporal dynamics of dendritic spikes, it is valuable to learn, for example, the upper bound of "Beta" range (13-30Hz) inhibition that fails to phasically modulate them. In addition to the reason stated in the earlier paragraph, Alpha band activity (8-12 Hz), has been implicated (e.g. van Kerkoerle, 2014) in signaling of inter-areal feedback to the superficial layer in the cortex, potentially targeting apical tufts of pyramidals from multiple layers and resulting in alpha-range rhythmic inhibition. To make the findings significant, it might therefore be more pertinent to understand the consequences of ~10Hz rhythmic inhibition (in addition to the ~25-30 Hz Beta/Gamma) in the apical tufts for phasic modulation of dendritic spikes.

      We added an additional set of simulations that address this in the Results section ‘Frequency specific effects of rhythmic inhibition on neuronal integration’. In general, we found that dendritic and perisomatic inhibitory rhythms at lower frequencies could entrain AP generation, but with less functional specialization. This is explored in our Discussion section ‘Interneuron specializations and rhythm timescales’.

      The differential effect of Gamma and Beta range inhibition on basal and apical excitatory clusters is not convincing from the information provided. The basal cluster appears to overlap with perisomatic inhibitory synapses. The description in the methods does not have enough information to negate the visual perception (ln 979-81). With this understanding, it is not surprising that the correlation between excitation and APs is high (during the trough of gamma) for basal and not apical excitation. A more comparable scenario would be a more distal location of the basal excitatory cluster.

      While we stated in the original manuscript that we were contrasting ‘basal’ vs. ‘apical’ clustered inputs, this terminology did not reflect our intent with these analyses. We meant to contrast proximal vs. distal dendritic clustered synaptic inputs, which the reviewer correctly noted is confounded in the apical vs. basal comparison. We have rewritten these results, their discussion, and corresponding figure, to clearly state that we are contrasting proximal vs. distal synaptic input.

      Reviewer #2:

      The weaknesses are probably in some of the parameterizations of inhibitory synaptic dynamics. A unitary peak conductance of 1nS is very high for inhibitory synapses. This high value could invariably skew some of the network-level predictions. The authors could obtain specific parameters from the Neocortical Collaboration Portal (https://bbp.epfl.ch/nmcportal/microcircuit.html), which is an incredible resource for cortical neurons and synapses.

      We appreciate the valuable resource mentioned by the reviewer and will consult it when constructing future models. Regarding the present one, our choice of peak conductance was based on previous studies, namely:

      Egger R, Narayanan RT, Guest JM, Bast A, Udvary D, Messore LF, Das S, de Kock CPJ, Oberlaender M (2020) Cortical output is gated by horizontally projecting neurons in the deep layers. Neuron 105, 122-137.e128.

      and

      Xiang Z, Huguenard JR, Prince DA (2002) Synaptic inhibition of pyramidal cells evoked by different interneuronal subtypes in layer v of rat visual cortex. J Neurophysiol 88, 740-750.

      The study by Egger et al. used an inhibitory peak conductance of 1 nS and was simulating circuitry very similar to ours. We validated these synapses in pilot simulations that sought to characterize the resulting IPSPs and IPSCs, and whose results can be seen in Table 1 of our methods. These synapses exhibited IPSCs whose peak amplitudes ranged over values (~24162 pA) that agreed with the experimental literature, such as Xiang et al.

      Given this, we feel our parameterization of inhibitory synapses does not warrant any changes.

      Reviewer #3:

      What disappointed me a bit was the lack of a concise summary of what we learned beyond the fact that beta and gamma act differently on dendritic integration. The individual paragraphs of the discussion often are 80% summary of existing theories and only a single vague statement about how the results in this study relate. I think a summarizing schematic or similar would help immensely.

      We agree with the reviewer that a summary schematic would help the reader. This has been added to the manuscript as Figure 11. It demonstrates the principal findings of the paper and is referenced in the opening paragraph of the discussion section.

      Orthogonal to that, there were some points where the authors could have offered more depth on specific features. For example, the authors summarized that their "results suggest that the timescales of these rhythms align with the specialized impacts of SOM and PV interneurons on neuronal integration". Here they could go deeper and try to explain why SOM impact is specialized at slower time scales. (I think their results provide enough for a speculative outlook.)

      This discussion has been expanded under the section “Interneuron specializations and rhythm timescales”. The added text is:

      “So, while our results suggest that spatial targeting of SOM and PV interneurons aligns with the timescales of their network-level rhythms, it could also be that their timing and subcellular localization interact to produce specialized neuron-level functions [85]. For instance, NMDA and Ca<sup>2+</sup> spikes in the distal dendrites last for ~50 ms, making the slower beta rhythm more appropriate for bidirectionally controlling them. Both can be described as dynamical systems with distinct phases with differing sensitivity to inhibition. Ca<sup>2+</sup> spikes are dynamical events comprised of an initiation, plateau, and termination phase. Inhibition delivered during the plateau phase shortens their duration [86]. If the beta rhythm is comprised of cycling between periods of elevated excitation (increased NMDA spike generation) followed by elevated inhibition, then Ca<sup>2+</sup> spike initiation will tend to occur during the excitatory phase, and its plateau during the subsequent inhibitory phase. A plateau during the inhibitory phase will more quickly enter termination. This is bidirectional control. On the other hand, slower rhythms (e.g. 1 Hz) initiate Ca<sup>2+</sup> spikes during the excitatory phase that plateau and enter termination autonomously, before the inhibitory phase is reached. The same principle holds for NMDA spikes [87]. As a result, rhythms in the range from 15-30 Hz are optimal for synchronizing the onsets and offsets of dendritic spikes across a population of neurons.

      The integrative effects of gamma (>40 Hz) are also specialized. Low frequency inhibitory rhythms delivered to the soma tended to shift the membrane potential higher or lower with the rhythm’s phase, effectively bringing it closer or farther from AP generation but not changing the neuron’s sensitivity to fast synaptic inputs. In the gamma frequency range, this is reversed, with the mean membrane potential not varying with rhythm phase but with a shifting bias to positive or negative membrane potential fluctuations. In addition, the trough phase of gamma lowers the threshold for AP generation, while slower rhythms like beta only raise the threshold. Consequently, the timing of gamma is ideal for increasing the sensitivity of the neuron to rapid excitation. This agrees with the observation that gamma oscillations accompany rapid excitation-inhibition balancing [88].”

      We also extended our discussion section ‘Relevance to coding’ to explore how beta and gamma rhythms can support sparse vs. dense population coding, respectively. It reads:

      “One interpretation of rhythms arising from local inhibitory feedback is that they maintain the balance between excitation and inhibition. This can be thought of as a normalization operation that maintains activity within a set range. Normalization can be achieved either through a subtractive effect that raises the threshold for initiating an action potential, or a multiplicative effect that lowers the slope of the relationship between excitation and action potential firing rate. When considered at the population level, these normalization effects impact coding in different ways. Subtractive normalization increases sparsity by dropping out neurons whose excitation is below the raised threshold. Multiplicative normalization, however, encourages dense codes by scaling down firing rates and compressing the range of firing rates. This study found that while both perisomatic and distal dendritic inhibition produced subtractive effects, only perisomatic had a multiplicative effect. Tying this to beta and gamma, beta rhythms may encourage sparse population codes while gamma allows for dense.”

      Beyond that, the authors invite the community to reappraise the role of gamma and beta in coding. This idea seems to be hindered by the fact that I cannot find a mention of a release of the model used in this work. The base pyramidal cell model is of course available from the original study, but it would be helpful for follow-up work to release the complete setup including excitatory and inhibitory synapses and their activation in the different simulation paradigms used. As well as code related to that.

      We have added a Code and Data Availability section that addresses this. It reads: “Simulation code is deposited at ModelDB athttps://modeldb.science/2019883 . The raw simulation data are available from DBH upon request. Analysis code is posted as a github repo at https://github.com/dbheadley/InhibOnDendComp.”

    1. eLife Assessment

      The presented findings are important for the field of cell-cycle control. They provide new insights into the origin of cell size variability in budding yeast. The strength of evidence is solid. However, the conclusions could be more strongly supported by additional analysis.

    2. Reviewer #1 (Public review):

      Summary:

      The authors investigate the determinants of population-level cell size variability, quantified via the coefficient of variation, in budding yeast populations. Using a combination of computational modeling and experimental readouts, they conclude that mother-daughter division asymmetry is the dominant factor shaping the coefficient of variation of cell size. In particular, through parameter sensitivity analysis of the Chandler-Brown model and empirical perturbations, the authors show that size-control mutations have limited effects on CV, whereas modulating mother-daughter asymmetry, by changing the growth environment, produces substantially larger shifts.

      Strengths:

      (1) The study addresses a fundamental question in biophysics, i.e., what are the mechanisms that produce and maintain population size heterogeneity?

      (2) It provides a conceptual reconciliation for previous observations that size-control mutants often alter mean size but not CV.

      (3) The modeling framework is clearly explained and compared to the data.

      (4) The parameter sensitivity analysis is thoughtfully performed and provides transparent intuition about which parameters influence variability.

      (5) The writing is clear, and the figures are well-organized.

      Weaknesses:

      (1) The work focuses on the Chandler-Brown model, so it is not clear to what extent the conclusions depend on it. A sensitivity or robustness check using an alternative model would strengthen generality.

      (2) CV is the sole descriptor used to quantify heterogeneity; while this is an efficient descriptor, it must be handled with care when used on experimental data, as it may vary due to differences in the chosen observables (e.g., if size is identified via cell volume, length, area, number of proteins, etc.) instead of real differences in the distribution.

      (3) The experimental validation using varied nutrient conditions is interesting; however, the statistical significance of the found correlations should be provided/discussed.

    3. Reviewer #2 (Public review):

      Summary:

      This paper provides a new framework for understanding how cell size variability arises in budding yeast populations. Whereas previous studies emphasized G1/S size control in daughter cells as the main regulator of size homeostasis, the authors show that perturbations to this control checkpoint have only modest effects on population-wide size variability.

      By extending a stochastic model of the yeast cell cycle to include both mother and daughter lineages, the authors demonstrate that division asymmetry-stemming from slower growth and longer post-Start phases in mother cells-is the key factor determining the population coefficient of variation (CV). As mothers grow larger and daughters smaller, the overall size distribution broadens. Experimental measurements across multiple mutants and conditions support the predicted correlation between asymmetry and CV.

      Strengths:

      The main conceptual advance of this study is to consider the full proliferating population, and in particular the dominant mother lineages, rather than single-cycle daughters, thereby offering a population-level explanation for size variability that is consistent with several previous but seemingly conflicting results.

      Weaknesses:

      Nevertheless, the modelling is described superficially and has notable limitations.

      (1) The extended Chandler-Brown model was originally parameterized only for daughter cells, and its generalization to mothers introduces several new assumptions that are not directly tested.

      (2) The model treats asymmetry phenomenologically, without a mechanistic basis, so while it correctly identifies correlations, causality remains uncertain.

      (3) Moreover, since population CVs emerge from steady-state lineage dynamics, they could be sensitive to parameter choices or growth-related details not fully explored in the current analysis.

      In summary, this study provides a useful conceptual synthesis and a useful quantitative framework, but it should be clear that readers should interpret the modeling as heuristic. The central message-that division asymmetry dominates population size variability-remains interesting and well supported at the phenomenological level.

    4. Reviewer #3 (Public review):

      Summary:

      The article studies the origins of cell size random variability in budding yeast. Different strains with different average cell sizes have very similar noise measured using the coefficient of variability defined as the standard deviation over the mean. Manipulating the noise in key variables such as the duration of cell stages, the growth rate or the division strategy (adder, timer, sizer) was not enough to explain the observed noise in mutants. The proposed solution for the origin of most of the cell size noise is related to the asymmetry in the average cell size for cells with two different phenotypes: daughter cells (New cells that have not passed the first division) AND 'Mother cells' (the rest). The origin of the cell size noise is mainly related to the fact that the distributions of these phenotypes have different cell size distributions. The article includes simple statistical methods for hypothesis analysis and explanatory figures.

      Strengths:

      The article provides different approaches: experimental (mutants and different growth conditions) and computational (simulations) to explain and test the hypothesis. The methods are based on previous articles with simple conclusions and explanations easy to follow.

      The rigor level in both mathematical and biological approaches looks fair to me. The terms are well defined and consistent throughout the article. Authors use well-established analysis techniques.

      The proposed theoretical analysis is coarse-grained and therefore can explain different strains and mutations using mathematical tools (noise analysis), aiming to reach general (mathematically) claims. This approach strengthens the conclusions and provides a good language to set a bridge between the biological community and mathematicians (quantitative biologists).

      The concept that the population heterogeneity (mothers vs daughters) is a fundamental reason behind the cell size variability is not new, but this article presents a clear experimental justification for the development of complete models of cell size regulation. I consider this contribution very relevant to the community modelling cell size.

      Weaknesses:

      The concept that population heterogeneity (mother and daughters) with different cell size distributions explains the observed size variability in a heterogeneous population. It is not clear how the population composition can affect this heterogeneity. Intuitively, I would expect that the fraction (number of daughters)/(number of mothers) changes in different stages of the population expansion due to the mean duration of both stages can change in different growth conditions. I would suggest studying how different (or not) these fractions are in different conditions. The authors should acknowledge this effect and discuss briefly using, for instance, simple models of random variables addition (adding different fractions of individuals with different cell size distributions) in which cases (different fractions or different means and noises in their respective distribution) their contribution is relevant. Finally. Do different simulations (gradient or sizer, timer) predict different moments (mean and CV) in distributions of both mother size and daughter size?

      Related to the previous comment, I would also include the fraction (number of daughters)/(number of mothers) or the percentage in different growth conditions with their respective size moments (mean and CV) to test whether the resultant cell size moments are related to the addition of two variables with different fractions with their respective moments.

      It is interesting how the G1 timer and G1 Sizer are located in different quadrants of Figure 4D, while the studied mutants belong to the other quadrant. I expected them to be closer to the G1 timer, similar to that observed in Figure 4G. I think the authors should discuss this dissimilarity.

      Although the authors are working using a definite model, other models would predict different results, especially in synthetic data. For instance, the same models for obtaining sizers can predict different noise levels.

      Nieto, C. et al., 2024. npj Systems Biology and Applications, 10(1), p.61.

      Barber, Felix, et al., Frontiers in cell and developmental biology 5 (2017): 92.

      Teimouri, H. et al,.2020. The Journal of Physical Chemistry Letters, 11(20), pp.8777-8782.

      I would mention that the noise level also depends on whether the population has reached steady-state conditions. This would require multiple generations, and measure over at least a couple of thousand cells. Therefore, experiments with single-cell-derived colonies would present different levels of noise than the noise in steady conditions, especially if few cells were sampled. However, I acknowledge that the purpose of the article is not a detailed description of the system but rather the presentation of the concept and for that matter, this level of detail is not mandatory.

    1. eLife Assessment

      This important paper presents the discovery of the molecular basis of differential apterous expression during early Drosophila wing disc development. The evidence supporting these conclusions is compelling, ranging from classical genetic approaches to state-of-the-art genetic engineering techniques. By opening new questions, this paper is expected to be of broad interest to developmental biologists and geneticists working on transcriptional regulation.

    2. Reviewer #1 (Public review):

      Summary:

      The Drosophila wing disc is an epithelial tissue which study has provided many insights into the genetic regulation of organ patterning and growth. One fundamental aspect of wing development is the positioning of the wing primordia, which occurs at the confluence of two developmental boundaries, the anterior-posterior and the dorsal-ventral. The dorsal-ventral boundary is determined by the domain of expression of the gene apterous, which is set early in the development of the wing disc. For this reason, the regulation of apterous expression is a fundamental aspect of wing formation.

      In this manuscript the authors used state of the art genomic engineering and a bottom-up approach to analyze the contribution of a 463 base pair fragment of apterous regulatory DNA. They find compelling evidence about the inner structure of this regulatory DNA and the upstream transcription factors that likely bind to this DNA to regulate apterous early expression in the Drosophila wing disc.

      Strengths:

      This manuscript has several strengths concerning both the experimental techniques used to address a problem of gene regulation and the relevance of the subject. To identify the mode of operation of the 463 bp enhancer, the authors use a balanced combination of different experimental approaches. First, they use bioinformatic analysis (sequence conservation and identification of transcription factors binding sites) to identify individual modules within the 463 bp enhancer. Second, they identify the functional modules through genetic analysis by generating Drosophila strains with individual deletions. Each deletion is characterized by looking at the resulting adult phenotype and also by monitoring apterous expression in the mutant wing discs. They then use a clever method to interfere in a more dynamic manner with the function of the enhancer, by directing the expression of catalytically inactive Cas9 to specific regions of this DNA. Finally, they recur to a more classical genetic approach to uncover the relevance of candidate transcription factors, some of them previously know and other suggested by the bioinformatic analysis of the 463 bp sequence. This workflow is clearly reflected in the manuscript, and constitute a great example of how to proceed experimentally in the analysis of regulatory DNA.

      Weaknesses:

      The previously pointed weakness (vg expression, P compartment specific effects, early vs late analysis of ap expression in mutants) have been throughly and satisfactorily addressed by the authors.

    3. Reviewer #3 (Public review):

      In this manuscript, authors use the Drosophila wing as model system and combine state-of-the-arte genetic engineering to identify and validate the molecular players mediating the activity of one of the cis-regulatory enhancers of the apterous gene involved in the regulation of its expression domain in the dorsal compartment of the wing primordium during larval development. The paper is subdivided into the following chapters/figures:

      (1) In the first couple of figures, authors describe the methodology to genetically manipulate the apE enhancer (a cartoon summarizing all the previous work with this enhancer might help) and identify two well-conserved domains in the OR463 enhancer required for wing development (the m3 region whose deletion phenocopies OR463 deletion: loss of wing, and the m1 region, whose deletion gives rise to AP identify changes in the P compartment).

      (2) In the following three figures, authors characterize the m1 regulatory region, identify HOX and ETS binding sites, functionally validate their role in wing development and the activity of the genes/proteins regulating their activity (eg-. Hth and Pointed) by their ability to phenocopy (when depleted) the m1 loss of function wing phenotype. Authors conclude that Hth and Pointed regulate apterous expression through the m1 region.

      (3) In the last few figures, authors perform similar experiments with the m3 regulatory region to conclude that the Grn and Antennapedia regulate apterous expression through the m3 enhancer.

      My comments:

      Technically sound: As stated in my previous review, the work is technically excellent (authors use state-of-the-art genetic engineering to manipulate the enhancer and combine it with genetic analysis through RNAi and CRISPR/Cas9 and phenotypic characterization to functionally validate their findings), figures are nicely done and cartoons are self-explanatory.

      Poor paper writing: The paper is too long and difficult to read/understand, many grammatical mistakes are found, and formatting is in some cases heterodox.

      Science:

      (1) The question of "who is locating the relative position of the AP and DV boundaries in the developing wing?" is not resolved. I would then change the intro or reduce the tone of this question. Having said that, I agree that these results shed light on the wing phenotypes of some apterous alleles related to AP identify and growth and, as such, I congratulate the authors.

      (2) Identification of two TFs (Grain and Antp) mediating the regulation of apterous expression is interesting but some contextualization might be required. Data on Antp is not as convincing as data on Grn. I wonder whether Antp data can be removed at all.

      (3) I am not sure whether the term hemizygous is used properly

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The Drosophila wing disc is an epithelial tissue, the study of which has provided many insights into the genetic regulation of organ patterning and growth. One fundamental aspect of wing development is the positioning of the wing primordia, which occurs at the confluence of two developmental boundaries, the anterior-posterior and the dorsal-ventral. The dorsal-ventral boundary is determined by the domain of expression of the gene apterous, which is set early in the development of the wing disc. For this reason, the regulation of apterous expression is a fundamental aspect of wing formation.

      In this manuscript, the authors used state-of-the-art genomic engineering and a bottom-up approach to analyze the contribution of a 463 base pair fragment of apterous regulatory DNA. They find compelling evidence about the inner structure of this regulatory DNA and the upstream transcription factors that likely bind to this DNA to regulate apterous early expression in the Drosophila wing disc.

      Strengths:

      This manuscript has several strengths concerning both the experimental techniques used to address the problem of gene regulation and the relevance of the subject. To identify the mode of operation of the 463 bp enhancer, the authors use a balanced combination of different experimental approaches. First, they use bioinformatic analysis (sequence conservation and identification of transcription factors binding sites) to identify individual modules within the 463 bp enhancer. Second, they identify the functional modules through genetic analysis by generating Drosophila strains with individual deletions. Each deletion is characterized by looking at the resulting adult phenotype and also by monitoring apterous expression in the mutant wing discs. They then use a clever method to interfere in a more dynamic manner with the function of the enhancer, by directing the expression of catalytically inactive Cas9 to specific regions of this DNA. Finally, they recur to a more classical genetic approach to uncover the relevance of candidate transcription factors, some of them previously known and others suggested by the bioinformatic analysis of the 463 bp sequence. This workflow is clearly reflected in the manuscript, and constitutes a great example of how to proceed experimentally in the analysis of regulatory DNA.

      We thank the reviewer for these positive comments on the manuscript.

      Weaknesses:

      There are several caveats with the data that might be constructed as weaknesses, some of them are intrinsic to this detailed analysis or to the experimental difficulties of dealing with the wing disc in its earliest stages, and others are more conceptual and are offered here in case the authors may wish to consider them.

      (1) The primordium of the wing region of the wing imaginal disc is defined by the expression of the gen vestigial, which is regulated by inputs coming from the dorsal-ventral boundary (Notch and wg) and from the anterior-posterior boundary (Dpp). Having such a principal role in wing primordium specification and expansion, I am surprised that this manuscript does not mention this gene in the main text and only contains indirect references to it. I consider that the manuscript would have benefited a lot by including vestigial in the analysis, at least as a marker of early wing primordium. This might allow us to visualize directly the positioning of the primordium in the apterous mutants generated in this study, adding more verisimilitude to the interpretations that place this domain based on indirect evidence.

      Vg does indeed play a critical role on the formation of the wing disc, and it is an ideal marker for the identification of the wing pouch. In the updated version of the article, we have now followed the expression of vg in some of the OR463 mutants via immunostaining of the Vg protein (Supplementary Figure 6). Cells within posterior wing outgrowths in Δm1flies were invariably positive for Vg. This result further supports our previous identification of these cells as pouch cells. In those mutants in which no cross-over between DV and AP was observed, vg expression was severely reduced or absent, indicating that the wing pouch had not been specified. We thank the reviewer for this experimental idea, which we believe strengthens the final manuscript.

      We have added to the text:

      “To identify the nature of the posterior outgrowths, we performed anti-Vestigal (Vg) antibody staining of Δm1 mutants (Supplementary Figure 6). Vg is a key regulator of wing specifications and also participates in wing growth and patterning (Baena-Lopez & García-Bellido, 2006; Kim et al., 1996; Zecca & Struhl, 2007a). In those discs, in which the stripe was extended and the P compartment was enlarged, Vg was detected throughout the outgrowth, supporting the wing pouch identity of this region (Supplementary Figure 6B). Hemizygous Δm3 mutants presented a highly reduced anti-Vg signal, which suggests that no wing pouch is specified in these mutants (Supplementary Figure 6C).”

      (2) The authors place some emphasis on the idea that their work addresses possible coordination between setting the D/V boundary and the A/P boundary:

      Abstract: "Thus, the correct establishment of ap expression pattern with respect to en must be tightly controlled", "...challenging the mechanism by which apE miss-regulation leads to AP defects." "Detailed mutational analyses using CRISPR/Cas revealed a role of apE in positioning the DV boundary with respect to the AP boundary"

      Introduction: "However, little is known about how the expression pattern of ap is set up with respect that of en. In other words, how is the DV boundary positioned with respect to the AP boundary?"

      "How such interaction between ap and the AP specification program arises is unknown."

      Results: "Some of these phenotypes are reminiscent of those reported for apBlot (Whittle, 1979) and point towards a yet undescribed crosstalk between ap early expression and the AP specification program."

      At the same time, they express the notion, with which this reviewer agrees, that all defects observed in A/P patterning arising as a result of apterous miss-regulation are due to the fact that in their mutants, apterous expression is lost mainly in the posterior dorsal compartment, bringing novel confrontations between the A/P and the D/V boundaries.

      To me, the key point is why the expression of apterous in different mutants of the OR463 enhancer affects only the posterior compartment. This should be discussed because it is far from obvious that apterous expression has different regulatory requirements in the anterior and posterior compartments.

      We agree with the reviewer that the differential effect of the mutations on the expression of ap in the A and P compartment is a key factor underlying our explanation of how the phenotypes arise. To clarify this point, we have now extended our first discussion point. Moreover, we have included some other references of differential enhancer regulation in different wing disc compartments. In addition, we have discussed whether this effect has to do with the different regulation of the enhancer in the A and P compartment or due to regulation of downstream effectors.

      Added paragraph:

      “Although apE is active throughout the dorsal compartment, its disruption leads to a preferential loss of ap expression in posterior cells. The asymmetric effect of apE perturbation on the anterior and posterior compartments suggests that apE transcriptional control is not equivalent across the A/P axis. Compartment-dependent differences in enhancer regulation have also been documented in other developmental contexts; for example, the Distal-less DMX-R element is interpreted through distinct cofactor combinations (Sloppy paired anteriorly and Engrailed posteriorly) (Gebelein et al., 2004), and specific mutations within DMX-R preferentially disrupt enhancer function in anterior versus posterior cells. It is possible that apE is more sensitive to misregulation due to differential transcriptional regulation across compartments. Nevertheless, we cannot exclude the possibility that the posterior bias we observe arises not from enhancer logic per se, but from intrinsic differences in tissue architecture or the dynamics of boundary positioning during wing disc development.”

      (3) The description of gene expression in the wing disc of novel apterous mutants is only carried out in late third instar discs (Figs. 2, 3, 5, and 7). This is understandable given the technical difficulties of dealing with early discs, as those shown in the analysis of candidate apterous regulatory transcription factors (Fig. 4F, Fig. 6 C-D). However, because the effects of the mutants on apterous expression are expected to occur much earlier than the time of expression analysis, this fact should be discussed.

      We agree with the reviewer regarding the limitations of our analysis whenever we analyzed third instar larvae to assess the expression of the OE463 enhancer. We have included a statement in which this is mentioned in the discussion:

      “It is important to acknowledge that all expression analyses were conducted in third-instar discs, a stage that follows the initial establishment of ap expression. Earlier effects are therefore inferred rather than directly observed, as imaging and staging of early discs present significant technical challenges due to their small size and fragility. A direct observation of the early wing disc across mutant conditions would likely help to clarify the role of the discovered factors during early ap expression.”

      Reviewer #2 (Public Review):

      In their manuscript, "Transcriptional control of compartmental boundary positioning during Drosophila wing development," Aguilar and colleagues do an exceptional job of exploring how tissue axes are established across Drosophila development. The authors perform a series of functional perturbations using mutational analyses at the native locus of apterous (ap), and perform tissue-specific enhancer disruption via dCas9 expression. This innovative approach allowed them to explore the spatio-temporal requirements of an apterous enhancer. Combining these techniques allowed the authors to explore the molecular basis of apterous expression, connecting the genotypes to the phenotypical effects of enhancer perturbations. To me, this paper was a beautiful example of what can be done using modern drosophila genetics to understand classic questions in developmental biology and transcriptional regulation.

      In sum, this was a rigorous paper bridging scales from the molecular to phenotypes, with new insight into how enhancers control compartmental boundary positioning during Drosophila wing development.

      We would like to thank the reviewer for its positive and encouraging comments, as well as for the careful review of the manuscript and figures. We have adapted most of the suggestions in the new manuscript.

      Reviewer #3 (Public Review):

      In this manuscript, authors use the Drosophila wing as a model system and combine state-ofthe-art genetic engineering to identify and validate the molecular players mediating the activity of one of the cis-regulatory enhancers of the apterous gene involved in the regulation of its expression domain in the dorsal compartment of the wing primordium during larval development.

      (1) The authors raise two very important questions in the Introduction: (1) who is locating the relative position of the AP and DV boundaries in the developing wing, and (2) who is responsible for the maintenance of the apterous expression domain late in larval development. None of these two questions have been responded to and, indeed, the summary of the work (as stated in the conclusions of the last paragraph of the Introduction) does not resolve any of these questions.

      We believe the results presented, together with those added during the revision, shed some on the positioning of the boundary. We proposed that the combined integration of four TFs by the OR463 enhancer is fundamental for the correct positioning. Additionally, we proposed a model on how these positioning problems result in the phenotypes observed (Supplementary figure 7, now also shown in Figure 2D). Our results indicate that ap expression in the PD quadrant is particularly sensitive to mutations in the enhancer, which we have now further elaborated on in the first part of the discussion. Together, we believe that our results do tackle the first problem posed in the introduction, while not completely solving them. As for the second question, we have tried to remove any suggestions that this article tries to explain later regulation of apterous. Probably this misunderstanding arises from a sentence in the introduction which has now been deleted. The means of the maintenance of ap expression in later stages has been partially explored previously (See Bieli et al 2015) and it is subject of our current studies.

      (2) The authors have identified two different regions whose deletions give very interesting phenotypes in the adult wing (AP identify change & outgrowths, and loss of wing), and have bioinformatically identified and functionally verified 4 TFs that mediate the activity of these regions by their capacity to phenocopy the wing phenotype. While identification of the 2 TFs acting on the m1 is incremental with respect to previous work on the identification of the enhancer responsible for the early expression of Ap, identification of Antp and Grn does not explain the loss of function phenotype of the m3 enhancer. Does any of these results shed any light on the first two Qs? Do these results explain the compartment boundary position in the wing as stated in the title? Expression of lacZ reporter assays is fundamental to demonstrate their model of Figure 8. The reduction of the PD compartment is difficult to understand by the sole reduction in ap expression in this region (which has not been demonstrated).

      We agree that the identification of Antp and Grn does not by itself explain the loss-of-function phenotype of the m3 enhancer. However, these transcription factors represent the best current candidates for direct regulators for this enhancer. We have clarified in the text that Antp and Grn may not act as instructive inputs but rather play a permissive role in enabling ap expression through m3. Importantly, the dCas9-mediated perturbation experiments directly demonstrate that targeted manipulation of apE in this region is sufficient to produce the characteristic duplications, providing functional evidence that apE activity underlies the observed phenotypes. In addition, lacZ reporter assays confirm that apE expression is indeed affected in all cases where the experimental setup permitted detection. Together, these results validate that the observed morphological phenotypes stem from perturbation of apE activity and support the proposed model for enhancer regulation and its role in compartment boundary maintenance.

      (3) The authors state in one of the sections "Spatio-temporal analysis of apE via dCas9 ". No temporal manipulation of gene activity is shown. The authors should combine GAL4/UAs with the Gal80ts to demonstrate the temporal requirements of Antp/Grn and Pnt/Hth as depicted in their model of Figure 8.

      We agree with the reviewer that the temporal dimension was not explored in the first version of the manuscript (aside of the temporal constrains of en-Gal4 driver). As suggested by the reviewer, we have now used a tub-Gal80ts allele to temporally control the enhancer perturbation and delimit its window of activity. The results are included in two new panels in the figure 3 (H and H’). The new data agrees with the notion that apE enhancer is important up to L2 stages but dispensable later in development. We have added the following paragraph to the text:

      “To define the developmental time window during which the apE enhancer remains sensitive to repression, we combined the temperature-sensitive tub-Gal80<sup>ts</sup> system with temporally controlled expression of dCas9. Animals carrying the en-Gal4, tub-Gal80<sup>ts</sup>, UAS-dCas9 and U6-OR463gRNA(4x) transgenes were maintained at 18 °C to suppress dCas9 expression. Independent sets of embryos were then shifted to 29 °C at successive developmental intervals ranging from 0 to 168 h after egg laying (AEL), so that dCas9 induction occurred at distinct time points in development (Figure 3H). Under these conditions, dCas9 transcription was induced only after the temperature shift, while the gRNAs were expressed constitutively. Wing phenotypes were quantified in adult progeny as a readout of apE enhancer perturbation. When dCas9 was expressed from embryonic or early larval stages (0–48 h AEL), nearly all wings (70–90%) displayed severe ap-like phenotypes, including posterior compartment duplication and loss of anterior–posterior boundary integrity. Shifting animals later (48–72 h AEL) still produced a majority (~66%) of abnormal wings, whereas induction after 72 h AEL resulted in progressively weaker effects and complete loss of phenotypes by 96 h AEL (Figure 3H’).

      These results delineate the developmental period during which apE activity is required for proper wing patterning. Perturbation during the first half of the second larval instar (≤ 96 h at 18 °C) was sufficient to elicit strong ap-like transformations, consistent with the enhancer being functionally required during early larval stages and becoming dispensable thereafter. The temporal decline in phenotype penetrance thus reflects the progressive loss of apE sensitivity to dCas9-mediated repression, providing a precise estimate of when its activity is no longer required for wing morphogenesis.”

      (4) The authors have not managed to explain the AP phenotype. Thus, this work opens many unresolved questions and does not resolve the title, which is a big overstatement. Thus, strengths (technically excellent), weakness (there is not much to learn about wing development and apterous regulation from these results besides the incremental identification of 4 additional TFs mediating the regulation of ap expression by their ability to phenocopy regulatory mutations of the apterous gene).

      As mentioned in response to reviewer 1, we have indeed no concrete explanation  for why the P compartment seems more sensitive to mutations. We have now further discussed this point (see below paragraph, now included in  the discussion). As for how the adult phenotypes arise from the mutant wing discs, we have a good idea (see Supplementary figure 7 and Figure 2). 

      We are pleased to hear that the reviewer considers our article technically valuable. Therefore, we have reformulated the title such as the technical merits play a bigger role in it:

      ”in situ mutational screening and CRISPR interference demonstrate that the apterous Early enhancer is required for developmental boundary positioning”

      Paragraph added to the discussion:

      " Although apE is active throughout the dorsal compartment, its disruption leads to a preferential loss of ap expression in posterior cells. The asymmetric effect of apE perturbation on the anterior and posterior compartments suggests that apE transcriptional control is not equivalent across the A/P axis. Compartment-dependent differences in enhancer regulation have also been documented in other developmental contexts; for example, the Distal-less DMX-R element is interpreted through distinct cofactor combinations (Sloppy paired anteriorly and Engrailed posteriorly) (Gebelein et al., 2004), and specific mutations within DMX-R preferentially disrupt enhancer function in anterior versus posterior cells. It is possible that apE is more sensitive to misregulation due to differential transcriptional regulation across compartments. Nevertheless, we cannot exclude the possibility that the posterior bias we observe arises not from enhancer logic per se, but from intrinsic differences in tissue architecture or the dynamics of boundary positioning during wing disc development.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Formatting of references should be checked throughout the manuscript

      Reviewer #2 (Recommendations For The Authors):

      Here, I note a few points that would help clarify the manuscript and connect it with a broader community.

      Figure 1: it could help the reader to add the landing site genetic scheme to the main figure.

      In a first draft that was exactly the original configuration, but after comparing both versions we determined that the presence of the landing site removes a bit of the focus of the phenotypes.

      Figure 1: what species were used for the conservation alignment? Further details would be nice to add here.

      We have now added a section of bioinformatical analysis, which was missing in the original manuscript:

      Sequence conservation of the OR463 fragment within the ap upstream intergenic region was analysed across different dipteran species using the “Cons 124 Insects” multiple-alignment track of the D. melanogaster dm6 genome on the UCSC Genome Browser (Kent et al., 2002, https://genome.ucsc.edu). Conservation scores were obtained from the phastCons (Siepel et al., 2005) and used to delineate conserved and less conserved blocks within OR463. Conserved transcription factor binding sites were predicted with MotEvo (Arnold et al., 2011), which defined four conserved modules (m1–m4) and six inter-modules (N1–N6). Additional motif analysis was performed using the JASPAR CORE Insecta database and the Target Explorer tool to cross-validate conserved binding-site predictions and refine motif assignments within the enhancer.

      From Figure 2: I would consider moving the model or portions of it to a main figure. These models, while descriptive, really help make the manuscript more approachable. Note that eLife does not have forced figure requirements.

      We have adapted the reviewer’s suggestion and we are very grateful for it. We think the figure has greatly improved. The final figure now highlights a small part of the model, which is still included in the Supplementary Figure.

      Figure 5: This figure is fantastic, and the results are particularly important. I would recommend increasing the weight of the arrows from D to E, making it more obvious. Did the authors consider any temperature or other perturbations to look at robustness? They mention "robustness" a few times, and this could be an excellent system to explore a bit further. For panels F and G, it would be nice to have a bit of biochemistry here to test the spacing requirements' effects on the distances (but it's great phenotypical data, regardless).

      We have chosen a darker grey to highlight the lines. 

      We appreciate the reviewer’s suggestions. With respect to robustness assays, such as temperature perturbations, we agree that the apE enhancer would be a suitable system for such experiments. However, these analyses would move the study beyond its current scope, which is focused on defining the regulatory logic of boundary positioning through mutational dissection and CRISPRi. We therefore prefer not to expand the work in this direction here, but we note that this would be an interesting avenue for future investigation.

      Similarly, biochemical assays probing spacing requirements would provide additional mechanistic insight but would represent a separate line of work. In this manuscript, we aimed to establish the functional consequences of motif spacing using in vivo genetic and phenotypic analyses, which we believe sufficiently support our conclusions.

      Thank you for the insight.

      Discussion: To the point "most point mutations or short deletions in enhancer regions have little effect on gene expression" I would push the authors to discuss their work in relation to Fuqua et al., (Nature 2020) and Kvon et al., (Cell 2020). Their work is consistent with enhancers being sensitive to mutations, and this warrants further discussion because it could be important for the transcription field.

      Hox genes as pioneer factors, I would recommend citing Loker et al., (Curr Biol 2021), as an example of Hox genes functioning as a pioneer factor.

      We thank the reviewer for this suggestion. We have now added a short paragraph in the Discussion noting how our observations may relate to the mutational patterns described in Fuqua et al. (2020) and Kvon et al. (2020), while keeping the interpretation tentative. The text now says:

      “Recent large-scale enhancer mutagenesis studies have shown that the mutational consequences within enhancers can vary widely. In some cases, many nucleotide positions appear tolerant to single-base changes and only a small subset of mutations produce clear functional effects (Kvon et al., 2020). In other enhancers, regulatory information is distributed more densely, and mutations at multiple positions can alter output (Fuqua et al., 2020). Together, these studies illustrate that enhancer sensitivity is not uniform but depends on enhancer-specific features such as motif organization, cooperativity, and redundancy. Within this broader landscape, the apE enhancer appears to represent a particularly sensitive case.”

      We also included a citation to Loker et al. (2021) in connection with the possible pioneer-like contribution of HOX input to apE.

      We would like to thank all reviewers for their effort.

    1. eLife Assessment

      In this valuable study, Parrotta et al. showed that it is possible to modulate pain perception and heart rate by providing false heart rate (HR) acoustic feedback before administering electrical cutaneous shocks. The evidence supporting the claims of the authors is rather solid, although what they consider an interoceptive signal is not necessarily supported as such by the results. In this regard, including a larger number of trials per participant, increasing the sample size, and adding a measure of actual pain perception after its induction would have strengthened the study. Although mechanisms and some alternative explanations for this effect remain to be addressed, the work will nonetheless be of interest to neuroscientists working on predictions and perception, health psychologists, pain researchers, and placebo researchers.

    2. Reviewer #1 (Public review):

      Summary:

      I read the paper by Parrotta et al with great interest. The authors are asking an interesting and important question regarding pain perception, which is derived from predictive processing accounts of brain function. They ask: If the brain indeed integrates information coming from within the body (interoceptive information) to comprise predictions about the expected incoming input and how to respond to it, could we provide false interoceptive information to modulate its predictions, and subsequently alter the perception of such input? To test this question, they use pain as the input and the sounds of heartbeats (falsified or accurate) as the interoceptive signal.

      Strengths:

      I found the question well-established, interesting and important, with important implications and contributions for several fields, including neuroscience of prediction-perception and pain research. The study is clearly written, the methods are generally adequate, and the results indeed support the claim that false cardiac feedback modulates both pain perception and anticipatory cardiac frequency. Importantly, the authors include a control experiment using exteroceptive auditory feedback to test whether effects are specific to heartbeat-like cues. This addition substantially strengthens interpretability.

      Weaknesses:

      In my view, the authors' central interpretation, namely that the effects arise because the manipulation targets interoceptive rather than exteroceptive or high-level threat-related cues, cannot be fully supported by the current design. The evidence does not rule out the possibility that participants interpret increased heartbeat sounds as a generic danger/threat cue rather than as (manipulated) interoceptive input. I also disagree with several other claims, though they are less critical, for example, that the use of specific comparisons without pre-registering them, the use of sensitivity analysis to justify sample size, and the intentional use of only 6 trials per participant.

      Conclusion:

      To conclude, the authors have shown in their findings that predictions about an upcoming aversive (pain) stimulus - and its subsequent subjective perception - can be altered not only by external expectations, or manipulating the pain cue, as was done in studies so far, but also by manipulating a cue that has fundamental importance to human physiological status, namely heartbeats. Whether this is a manipulation of actual interoception as sensed by the brain is, in my view, left to be proven.

      Even if the authors drop this claim, the paper has important implications in several fields of science, ranging from neuroscience prediction-perception research, to pain research, and may have implications for clinical disorders, as the authors propose. Furthermore, it may lead - either the authors or someone else - to further test this interesting question of manipulation of interoception in a different or more controlled manner.

      I salute the authors for coming up with this interesting question and encourage them to continue and explore ways to study it and related follow-up questions.

    3. Reviewer #3 (Public review):

      Parrotta et al provide a convincing and thorough revision of their manuscript "Exposure to false cardiac feedback alters pain perception and anticipatory cardiac frequency". The authors addressed my previous concerns regarding theoretical framing and methodological clarity. For example:

      They provided additional detail on the experimental design, procedure and statistical analyses.

      The predictive coding rationale for the hypotheses has been clarified.

      The limitations of the study are discussed comprehensively

      Additional analyses were performed to investigate the role of learning effects and across-experiment effects

      New supplementary figures allow a closer look at the feedback-related response patterns

      In sum, the revisions improve the manuscript. However, some issues remain present.

      (1) Potential learning/ habituation effects. In my first review of the manuscript, I raised the concern that learning effects may have contributed to the observed differences between interoceptive & exteroceptive cues.<br /> The authors argue that the small number of six trials per condition could limit aversive effects of differential learning between experiments. However, electric nociceptive stimuli are exceptionally potent in classical conditioning experiments and humans can develop conditioned responses to these types of stimuli after a single trial [1-2]. Therefore, six trials are sufficient to allow for associative or expectancy-based learning processes.

      However, the authors are also presenting additional analyses, i.e. LME models which included trial rank as a predictor. While these models do not show a statistically significant learning effect, they do indicate a noteworthy larger effect in earlier trials compared to later ones. However, in my reading, this speaks towards the presence of unspecific effects of attention or arousal. This pattern is compatible with early learning or, alternatively, with non-specific attentional or arousal responses that diminish across repetitions. This is potentially a limitation of the design: repetition-related effects (attention reduction, arousal habituation, early learning) may contribute to the results, and distinguishing between interoceptive inference and non-specific effects remains challenging within this paradigm.

      (1) Haesen K, Beckers T, Baeyens F, Vervliet B. One-trial overshadowing: Evidence for fast specific fear learning in humans. Behav Res Ther. 2017 Mar;90:16-24. doi: 10.1016/j.brat.2016.12.001. Epub 2016 Dec 8. PMID: 27960093.

      (2) Glenn CR, Lieberman L, Hajcak G. Comparing electric shock and a fearful screaming face as unconditioned stimuli for fear learning. Int J Psychophysiol. 2012 Dec;86(3):214-9. doi: 10.1016/j.ijpsycho.2012.09.006. Epub 2012 Sep 21. PMID: 23007035; PMCID: PMC3627354.

      (2) SESOI and power rationale. The authors elaborated on the sensitivity analyses and the rationale of reporting SESOI rather than traditional a-priori power analyses and included this information in the manuscript, which improves transparency.

      (3) Unspecific arousal/ attention mechanisms. The authors argue against unspecific arousal mechanisms based on the absence of main effects in pain ratings and heart rate. This reduces the likelihood of a purely unspecific arousal account, however, these unspecific effects may not need to manifest as main effects. Unspecific mechanisms are likely adding (at least residual) effects onto the results.

      Regarding attention-based mechanisms, the authors have clarified that in Experiment 2 (exteroceptive cue), the participants are instructed that the sound does not have any relation with their heart rate. If participants did not receive any instructions on the meaning of the knocking sounds, they may have simply ignored it - not unlikely, also because the exteroceptive feedback did not elicit any systematic effect on the outcome variables (minus the slowing of HR with slower exteroceptive feedback, which may reflect noise, altering, multiple comparisons?). Ultimately, how the participants did or did not process the exteroceptive cue is unclear.

      (4) The authors provided more context to their hypothesis and strengthened its theoretical motivation (increased pain intensity with incongruent-high cardiac feedback), rooting it in predictive coding accounts of interoception. For instance, their prior study shows that participants report an increased cardiac frequency while anticipating pain. The reasoning behind this study is hence that if pain shapes cardiac perception, cardiac perception should in turn shape pain perception. The introduction has been revised accordingly, adding more references on the interplay between cardiac feedback and pain and emotional responses. While this rooting within the predictive processing framework is now clearly developed, it also underscores a gap between the proposed theoretical mechanism and the current analytical approach. The hypothesis is formulated in a mechanistic, computational-level language, yet the statistical analysis remains primarily descriptive, at a group level, and does not directly test the predictive-coding account.

      New concerns introduced by the revision:

      (1) Some of the newly added paragraphs interrupt the narrative flow. For example, the justification of the supradiaphragmatic focus based on the BPQ questionnaire feels too long for this section and might fit more naturally in the theoretical background or introduction. Similarly, the predictive-coding paragraph appearing after the hypotheses seems better suited to the earlier conceptual framing rather than following the hypothesis statements. It would be better for the argumentative flow if hypotheses followed from theoretical considerations.

      (2) The authors now note that the administration of the BPQ questionnaire was exploratory, explaining the null-results in the methods section as resulting from an underpowered design. But if the design is not appropriate for discovering a connection between self-reported body awareness and pain ratings, why was it administered in the first place? The rationale here is unclear.

      (3) The discussion is longer than before and would benefit greatly from streamlining the arguments.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      I read the paper by Parrotta et al with great interest. The authors are asking an interesting and important question regarding pain perception, which is derived from predictive processing accounts of brain function. They ask: If the brain indeed integrates information coming from within the body (interoceptive information) to comprise predictions about the expected incoming input and how to respond to it, could we provide false interoceptive information to modulate its predictions, and subsequently alter the perception of such input? To test this question, they use pain as the input and the sounds of heartbeats (falsified or accurate) as the interoceptive signal.

      Strengths:

      I found the question well-established, interesting, and important, with important implications and contributions for several fields, including neuroscience of prediction-perception, pain research, placebo research, and health psychology. The paper is well-written, the methods are adequate, and the findings largely support the hypothesis of the authors. The authors carried out a control experiment to rule out an alternative explanation of their finding, which was important.

      Weaknesses:

      I will list here one theoretical weakness or concern I had, and several methodological weaknesses.

      The theoretical concern regards what I see as a misalignment between a hypothesis and a result, which could influence our understanding of the manipulation of heartbeats, and its meaning: The authors indicate from prior literature and find in their own findings, that when preparing for an aversive incoming stimulus, heartbeats *decrease*. However, in their findings, manipulating the heartbeats that participants hear to be slower than their own prior to receiving a painful stimulus had *no effect* on participants' actual heartbeats, nor on their pain perceptions. What authors did find is that when listening to heartbeats that are *increased* in frequency - that was when their own heartbeats decreased (meaning they expected an aversive stimulus) and their pain perceptions increased.

      This is quite complex - but here is my concern: If the assumption is that the brain is collecting evidence from both outside and inside the body to prepare for an upcoming stimulus, and we know that *slowing down* of heartbeats predicts an aversive stimulus, why is it that participants responded in a change in pain perception and physiological response when listened to *increased heartbeats* and not decreased? My interpretation is that the manipulation did not fool the interoceptive signals that the brain collects, but rather the more conscious experience of participants, which may then have been translated to fear/preparation for the incoming stimulus. As the authors indicate in the discussion (lines 704-705), participants do not *know* that decreased heartbeats indicate upcoming aversive stimulus, and I would even argue the opposite - the common knowledge or intuitive response is to increase alertness when we hear increased heartbeats, like in horror films or similar scenarios. Therefore, the unfortunate conclusion is that what the authors assume is a manipulation of interoception - to me seems like a manipulation of participants' alertness or conscious experience of possible danger. I hope the (important) distinction between the two is clear enough because I find this issue of utmost importance for the point the paper is trying to make. If to summarize in one sentence - if it is decreased heartbeats that lead the brain to predict an approaching aversive input, and we assume the manipulation is altering the brain's interoceptive data collection, why isn't it responding to the decreased signal? --> My conclusion is, that this is not in fact a manipulation of interoception, unfortunately

      We thank the reviewer for their comment, which gives us the opportunity to clarify what we believe is a theoretical misunderstanding that we have not sufficiently made clear in the previous version of the manuscript. The reviewer suggests that a decreased heart rate itself might act as an internal cue for a forthcoming aversive stimulus, and questions why our manipulation of slower heartbeats then did not produce measurable effects.

      The central point is this: decreased heart rate is not a signal the brain uses to predict a threat, but is a consequence of the brain having already predicted the threat. This distinction is crucial. The well-known anticipatory decrease of heartrate serves an allostatic function: preparing the body in advance so that physiological responses to the actual stressor (such as an increase in sympathetic activation) do not overshoot. In other words, the deceleration is an output of the predictive model, not an input from which predictions are inferred. It would be maladaptive for the brain to predict threat through a decrease in heartrate, as this would then call for a further decrease, creating a potential runaway cycle.

      Instead, increased heart rate is a salient and evolutionarily conserved cue for arousal, threat, and pain. This association is reinforced both culturally - for example, through the use of accelerating heartbeats in films and media to signal urgency, as R1 mentions - and physiologically, as elevated heart rates reliably occur in response to actual (not anticipated) stressors. Decreased heartrates, in contrast, are reliably associated with the absence of stressors, for example during relaxation and before (and during) sleep. Thus, across various everyday experiences, increased (instead of decreased) heartrates are robustly associated with actual stressors, and there is no a priori reason to assume that the brain would treat decelerating heartrates as cue for threat. As we argued in previous work, “the relationship between the increase in cardiac activity and the anticipation of a threat may have emerged from participants’ first-hand experience of increased heart rates to actual, not anticipated, pain” (Parrotta et al., 2024). The changes in heart rate and pain perception that we hypothesize (and observe) are therefore fully in line with the prior literature on the anticipatory compensatory heartrate response (Bradley et al., 2008, 2005; Colloca et al., 2006; Lykken et al., 1972; Taggart et al., 1976; Tracy et al., 2017; Skora et al., 2022), as well as with Embodied Predictive Coding models (Barrett & Simmons, 2015; Pezzulo, 2014; Seth, 2013; Seth et al., 2012), which assume that our body is regulated through embodied simulations that anticipate likely bodily responses to upcoming events, thereby enabling anticipatory or allostatic regulation of physiological states (Barrett, 2017).

      We now add further explanation to this point to the Discussion (lines 740-758) and Introduction (lines 145-148; 154-156) of our manuscript to make this important point clearer.

      Barrett, L. F., & Simmons, W. K. (2015). Interoceptive predictions in the brain. Nature reviews neuroscience, 16(7), 419-429.

      Barrett, L. F. (2017). The theory of constructed emotion: An active inference account of interoception and categorization. Social cognitive and affective neuroscience, 12(1), 1-23.

      Bradley, M. M., Moulder, B., & Lang, P. J. (2005). When good things go bad: The reflex physiology of defense. Psychological science, 16(6), 468-473.

      Bradley, M. M., Silakowski, T., & Lang, P. J. (2008). Fear of pain and defensive activation. PAIN®, 137(1), 156-163.

      Colloca, L., Petrovic, P., Wager, T. D., Ingvar, M., & Benedetti, F. (2010). How the number of learning trials affects placebo and nocebo responses. Pain®, 151(2), 430-439.

      Lykken, D., Macindoe, I., & Tellegen, A. (1972). Preception: Autonomic response to shock as a function of predictability in time and locus. Psychophysiology, 9(3), 318-333.

      Taggart, P., Hedworth-Whitty, R., Carruthers, M., & Gordon, P. D. (1976). Observations on electrocardiogram and plasma catecholamines during dental procedures: The forgotten vagus. British Medical Journal, 2(6039), 787-789.

      Tracy, L. M., Gibson, S. J., Georgiou-Karistianis, N., & Giummarra, M. J. (2017). Effects of explicit cueing and ambiguity on the anticipation and experience of a painful thermal stimulus. PloS One, 12(8), e0183650.

      Parrotta, E., Bach, P., Perrucci, M. G., Costantini, M., & Ferri, F. (2024). Heart is deceitful above all things: Threat expectancy induces the illusory perception of increased heartrate. Cognition, 245, 105719.

      Pezzulo, G. (2014). Why do you fear the bogeyman? An embodied predictive coding model of perceptual inference. Cognitive, Affective & Behavioral Neuroscience, 14(3), 902-911.

      Seth, A., Suzuki, K., & Critchley, H. (2012). An Interoceptive Predictive Coding Model of Conscious Presence. Frontiers in Psychology, 2. https://www.frontiersin.org/articles/10.3389/fpsyg.2011.00395

      Seth, A. K. (2013). Interoceptive inference, emotion, and the embodied self. Trends in Cognitive Sciences, 17(11), 565-573.

      Skora, L. I., Livermore, J. J. A., & Roelofs, K. (2022). The functional role of cardiac activity in perception and action. Neuroscience & Biobehavioral Reviews, 104655.

      I will add that the control experiment - with an exteroceptive signal (knocking of wood) manipulated in a similar manner - could be seen as evidence of the fact that heartbeats are regarded as an interoceptive signal, and it is an important control experiment, however, to me it seems that what it is showing is the importance of human-relevant signals to pain prediction/perception, and not directly proves that it is considered interoceptive. For example, it could be experienced as a social cue of human anxiety/fear etc, and induce alertness.

      The reviewer asks us to consider whether our measured changes in pain response happen not because the brain treats the heartrate feedback in Experiment 1 as interoceptive stimulus, but because heartbeat sounds could have signalled threat on a more abstract, perhaps metacognitive or affective, level, in contrast to the less visceral control sounds in Experiment 2. We deem this highly unlikely for several reasons.

      First, as we point out in our response to Reviewer 3 (Point 3), if this were the case, the different sounds in both experiments should have induced overall (between-experiment) differences in pain perception and heart rate, induced by the (supposedly) generally more threatening heart beat sounds. However, when we added such comparisons, no such between-experiment differences were obtained (See Results Experiment 2, and Supplementary Materials, Cross-experiment analysis between-subjects model). Instead, we only find a significant interaction between experiment and feedback (faster, slower). Thus, it is not the heartbeat sounds per se that induce the measured changes to pain perception, but the modulation of their rate, and that identical changes to the rate of non-heartrate sounds produce no such effects. In other words, pain perception is sensitive to a change in heart rate feedback, as we predicted, instead of the overall presence of heartbeat sounds (as one would need to predict if heart beat sounds had more generally induced threat or stress).

      Second, one may suspect that it is precisely the acceleration of heartrate feedback that could act as cue to arousal, while accelerated exteroceptive feedback would not. However, if this were the case, one would need to predict a general heart rate increase with accelerated feedback, as this is the general physiological marker of increasing alertness and arousal (e.g. Tousignant-Laflamme et al., 2005; Terkelsen et al., 2005; for a review, see Forte et al., 2022). However, the data shows the opposite, with real heartrates decreasing when the heartrate feedback increases. This result is again fully in line with the predicted interoceptive consequences of accelerated heartrate feedback, which mandates an immediate autonomic regulation, especially when preparing for an anticipated stressor.

      Third, our view is further supported by neurophysiological evidence showing that heartbeat sounds, particularly under the belief they reflect one’s own body, are not processed merely as generic aversive or “human-relevant” signals. For instance, Vicentin et al. (2024) showed that simulated faster heartbeat sounds elicited stronger EEG alpha-band suppression, indicative of increased cortical activation  over frontocentral and right frontal areas, compatible with the localization of brain regions contributing to interoceptive processes (Kleint et al., 2015). Importantly, Kleint et al. also demonstrated via fMRI that heartbeat sounds, compared to acoustically matched tones, selectively activate bilateral anterior insula and frontal operculum, key hubs of the interoceptive network. This suggests that the semantic identity of the sound as a heartbeat is sufficient to elicit internal body representations, despite its exteroceptive nature. Further evidence comes from van Elk et al. (2014), who found that heartbeat sounds suppress the auditory N1 component, a neural marker of sensory attenuation typically associated with self-generated or predicted stimuli. The authors interpret this as evidence that the brain treats heartbeat sounds as internally predicted bodily signals, supporting interoceptive predictive coding accounts in which exteroceptive cues (i.e., auditory cardiac feedback) are integrated with visceral information to generate coherent internal body representations.

      Finally, it is worth noting that the manipulation of heartrate feedback in our study elicited measurable compensatory changes in participants’ actual heart rate. This is striking compared to our previous work (Parrotta et al., 2024), wherein we used a highly similar design as here, combined with a very strong threat manipulation. Specifically, we presented participants with highly salient threat cues (knives directed at an anatomical depiction of a heart), which predicted forthcoming pain with 100% validity (compared to flowers that did predict the absence of pain with 100%). In other words, these cues perfectly predicted actual pain, through highly visceral stimuli. Nevertheless, we found no measurable decrease in actual heartrate. From an abstract threat perspective, it is therefore striking that the much weaker manipulation of slightly increased or decreased heartrates we used here would induce such a change. The difference therefore suggests that what caused the response here is not due to an abstract feeling of threat, but because the brain indeed treated the increased heartrate feedback as an interoceptive signal for (stressor-induced) sympathetic activation, which would then be immediately down-regulated.

      Together, we hope you agree that these considerations make a strong case against a non-specific, arousal or alertness-related explanation of our data. We now make this point clearer in the new paragraph of the Discussion (Accounting for general unspecific contributionslines 796-830), and have added the relevant between experiment comparisons to the Results of Experiment 2.

      Forte, G., Troisi, G., Pazzaglia, M., Pascalis, V. D., & Casagrande, M. (2022). Heart rate variability and pain: a systematic review. Brain sciences, 12(2), 153.

      Vicentin, S., Guglielmi, S., Stramucci, G., Bisiacchi, P., & Cainelli, E. (2024). Listen to the beat: behavioral and neurophysiological correlates of slow and fast heartbeat sounds. International Journal of Psychophysiology, 206, 112447.

      Kleint, N. I., Wittchen, H. U., & Lueken, U. (2015). Probing the interoceptive network by listening to heartbeats: an fMRI study. PloS one, 10(7), e0133164.

      Parrotta, E., Bach, P., Perrucci, M. G., Costantini, M., & Ferri, F. (2024). Heart is deceitful above all things: Threat expectancy induces the illusory perception of increased heartrate. Cognition, 245, 105719.

      Terkelsen, A. J., Mølgaard, H., Hansen, J., Andersen, O. K., & Jensen, T. S. (2005). Acute pain increases heart rate: differential mechanisms during rest and mental stress. Autonomic Neuroscience, 121(1-2), 101-109.

      Tousignant-Laflamme, Y., Rainville, P., & Marchand, S. (2005). Establishing a link between heart rate and pain in healthy subjects: a gender effect. The journal of pain, 6(6), 341-347.

      van Elk, M., Lenggenhager, B., Heydrich, L., & Blanke, O. (2014). Suppression of the auditory N1-component for heartbeat-related sounds reflects interoceptive predictive coding. Biological psychology, 99, 172-182.

      Several additional, more methodological weaknesses include the very small number of trials per condition - the methods mention 18 test trials per participant for the 3 conditions, with varying pain intensities, which are later averaged (and whether this is appropriate is a different issue). This means 6 trials per condition, and only 2 trials per condition and pain intensity. I thought that this number could be increased, though it is not a huge concern of the paper. It is, however, needed to show some statistics about the distribution of responses, given the very small trial number (see recommendations for authors). The sample size is also rather small, on the verge of "just right" to meet the required sample size according to the authors' calculations.

      We provide detailed responses to these points in the “Recommendations for The Authors” section, where each of these issues is addressed point by point in response to the specific questions raised.

      Finally, and just as important, the data exists to analyze participants' physiological responses (ECG) after receiving the painful stimulus - this could support the authors' claims about the change in both subjective and objective responses to pain. It could also strengthen the physiological evidence, which is rather weak in terms of its effect. Nevertheless, this is missing from the paper.

      This is indeed an interesting point, and we agree that analyzing physiological responses such as ECG following the painful stimulus could offer additional insights into the objective correlates of pain. However, it is important to clarify that the experiment was not designed to investigate post-stimulus physiological responses. Our primary focus was on the anticipatory processes leading up to the pain event. Notably, in the time window immediately following the stimulus - when one might typically expect to observe physiological changes such as an increase in heart rate - participants were asked to provide subjective ratings of their nociceptive experience. It is therefore not a “clean” interval that would lend itself for measurement, especially as a substantial body of evidence indicates that one’s heart rate is strongly modulated by higher-order cognitive processes, including attentional control, executive functioning, decision-making and action itself (e.g., Forte et al., 2021a; Forte et al., 2021b; Luque-Casado et al., 2016).

      This limitation is particularly important as the induced change in pain ratings by our heart rate manipulation is substantially smaller than the changes in heart rate induced by actual pain (e.g., Loggia et al., 2011). To confirm this for our study, we simply estimated how much change in heart rate is produced by a change in actual stimulus intensity in the initial no feedback phase of our experiment. There, we find that a change between stimulus intensities 2 and 4 induces a NPS change of 32.95 and a heart rate acceleration response of 1.19 (difference in heart rate response relative to baseline, Colloca et al., 2006), d = .52, p < .001. The change of NPS induced by our implicit heart rate manipulation, however, is only a seventh of this (4.81 on the NPS). This means that the expected effect size of heart rate acceleration produced by our manipulation would only be d = .17. A power analysis, using GPower, reveals that a sample size of n = 266 would be required to detect such an effect, if it exists. Thus, while we agree that this is an exciting hypothesis to be tested, it requires a specifically designed study, and a much larger sample than was possible here.

      Colloca, L., Benedetti, F., & Pollo, A. (2006). Repeatability of autonomic responses to pain anticipation and pain stimulation. European Journal of Pain, 10(7), 659-665.

      Forte, G., Morelli, M., & Casagrande, M. (2021a). Heart rate variability and decision-making: Autonomic responses in making decisions. Brain sciences, 11(2), 243.

      Forte, G., Favieri, F., Oliha, E. O., Marotta, A., & Casagrande, M. (2021b). Anxiety and attentional processes: the role of resting heart rate variability. Brain sciences, 11(4), 480.

      Loggia, M. L., Juneau, M., & Bushnell, M. C. (2011). Autonomic responses to heat pain: Heart rate, skin conductance, and their relation to verbal ratings and stimulus intensity. PAIN®, 152(3), 592-598.

      Luque-Casado, A., Perales, J. C., Cárdenas, D., & Sanabria, D. (2016). Heart rate variability and cognitive processing: The autonomic response to task demands. Biological psychology, 113, 83-90

      I have several additional recommendations regarding data analysis (using an ANOVA rather than multiple t-tests, using raw normalized data rather than change scores, questioning the averaging across 3 pain intensities) - which I will detail in the "recommendations for authors" section.

      We provide detailed responses to these points in the “Recommendations for The Authors” section, where each of these issues is addressed point by point in response to the specific questions raised.

      Conclusion:

      To conclude, the authors have shown in their findings that predictions about an upcoming aversive (pain) stimulus - and its subsequent subjective perception - can be altered not only by external expectations, or manipulating the pain cue, as was done in studies so far, but also by manipulating a cue that has fundamental importance to human physiological status, namely heartbeats. Whether this is a manipulation of actual interoception as sensed by the brain is - in my view - left to be proven.

      Still, the paper has important implications in several fields of science ranging from neuroscience prediction-perception research, to pain and placebo research, and may have implications for clinical disorders, as the authors propose. Furthermore, it may lead - either the authors or someone else - to further test this interesting question of manipulation of interoception in a different or more controlled manner.

      I salute the authors for coming up with this interesting question and encourage them to continue and explore ways to study it and related follow-up questions.

      We sincerely thank the reviewer for the thoughtful and encouraging feedback. We hope our responses to your points below convince you a bit more that what we are measuring does indeed capture interoceptive processes, but we of course fully acknowledge that additional measures - for example from brain imaging (or computational modelling, see Reviewer 3) - could further support our interpretation, and highlights in the Limitations and Future directions section.

      Reviewer #2 (Public Review):

      In this manuscript, Parrotta et al. tested whether it is possible to modulate pain perception and heart rate by providing false HR acoustic feedback before administering electrical cutaneous shocks. To this end, they performed two experiments. The first experiment tested whether false HR acoustic feedback alters pain perception and the cardiac anticipatory response. The second experiment tested whether the same perceptual and physiological changes are observed when participants are exposed to a non-interoceptive feedback. The main results of the first experiment showed a modulatory effect for faster HR acoustic feedback on pain intensity, unpleasantness, and cardiac anticipatory response compared to a control (acoustic feedback congruent to the participant's actual HR). However, the results of the second experiment also showed an increase in pain ratings for the faster non-interoceptive acoustic feedback compared to the control condition, with no differences in pain unpleasantness or cardiac response.

      The main strengths of the manuscript are the clarity with which it was written, and its solid theoretical and conceptual framework. The researchers make an in-depth review of predictive processing models to account for the complex experience of pain, and how these models are updated by perceptual and active inference. They follow with an account of how pain expectations modulate physiological responses and draw attention to the fact that most previous studies focus on exteroceptive cues. At this point, they make the link between pain experience and heart rate changes, and introduce their own previous work showing that people may illusorily perceive a higher cardiac frequency when expecting painful stimulation, even though anticipating pain typically goes along with a decrease in HR. From here, they hypothesize that false HR acoustic feedback evokes more intense and unpleasant pain perception, although the actual HR actually decreases due to the orienting cardiac response. Furthermore, they also test the hypothesis that an exteroceptive cue will lead to no (or less) changes in those variables. The discussion of their results is also well-rooted in the existing bibliography, and for the most part, provides a credible account of the findings.

      Thank you for the clear and thoughtful review. We appreciate your positive comments on the manuscript’s clarity, theoretical framework, and interpretation of results.

      The main weaknesses of the manuscript lies in a few choices in methodology and data analysis that hinder the interpretation of the results and the conclusions as they stand.

      The first peculiar choice is the convoluted definition of the outcomes. Specifically, pain intensity and unpleasantness are first normalized and then transformed into variation rates (sic) or deltas, which makes the interpretation of the results unnecessarily complicated. This is also linked to the definitions of the smallest effect of interest (SESOI) in terms of these outcomes, which is crucial to determining the sample size and gauging the differences between conditions. However, the choice of SESOI is not properly justified, and strangely, it changes from the first experiment to the second.

      We thank the reviewer for this important observation. In the revised manuscript, we have made substantial changes and clarifications to address both aspects of this concern: (1) the definition of outcome variables and their normalization, and (2) the definition of the SESOI.

      First, As explained in our response to Reviewer #1, we have revised the analyses and removed the difference-based change scores from the main results, addressing concerns about interpretability. However, we retained the normalization procedure: all variables (heart rate, pain intensity, unpleasantness) are normalized relative to the no-feedback baseline using a standard proportional change formula (X−bX)/bX(X - bX)/bX(X−bX)/bX, where X is the feedback-phase mean and bX is the no-feedback baseline. This is a widely used normalization procedure (e.g., Bartolo et al., 2013; Cecchini et al., 2020). This method controls for interindividual variability by expressing responses relative to each participant’s own baseline. The resulting normalized values are then used directly in all analyses, and not further transformed into deltas.

      To address potential concerns about this baseline correction approach and its interpretability, we also conducted a new set of supplementary analyses (now reported in the supplementary materials) that include the no-feedback condition explicitly in the models, rather than treating it as a baseline for normalization. These models confirm that our main effects are not driven by the choice of normalization and hold even when no-feedback is analyzed as an independent condition. The new analyses and results are now reported in the Supplementary Materials.

      Second, concerning the SESOI values and their justification: The difference in SESOI values between Experiment 1 and Experiment 2 reflects the outcome of sensitivity analyses conducted for each dataset separately, rather than a post-hoc reinterpretation of our results. Specifically, we followed current methodological recommendations (Anderson, Kelley & Maxwell, 2017; Albers & Lakens, 2017; Lakens, 2022), which advise against estimating statistical power based on previously published effect sizes, especially when working with novel paradigms or when effect sizes in the literature may be inflated or imprecise. Instead, we used the sensitivity analysis function in G*Power (Version 3.1) to determine the smallest effect size our design was capable of detecting with high statistical power (90%), given the actual sample size, test type, and alpha level used in each experiment. This is a prospective, design-based estimation rather than a post-hoc analysis of observed effects. The slight differences in SESOI are due to more participants falling below our exclusions criteria in Experiment 2, leading to slightly larger effect sizes that can be detected (d = 0.62 vs d = 0.57). Importantly, both experiments remain adequately powered to detect effects of a size commonly reported in the literature on top-down pain modulation. For instance, Iodice et al. (2019) reported effects of approximately d = 0.7, which is well above the minimum detectable thresholds of our designs.

      We have now clarified the logic in the Participant section of Experiment 1 (193-218).

      Anderson, S. F., Kelley, K., & Maxwell, S. E. (2017). Sample-Size Planning for More Accurate Statistical Power: A Method Adjusting Sample Effect Sizes for Publication Bias and Uncertainty. Psychological Science, 28(11), 1547-1562.

      Bartolo, M., Serrao, M., Gamgebeli, Z., Alpaidze, M., Perrotta, A., Padua, L., Pierelli, F., Nappi, G., & Sandrini, G. (2013). Modulation of the human nociceptive flexion reflex by pleasant and unpleasant odors. PAIN®, 154(10), 2054-2059.

      Cecchini, M. P., Riello, M., Sandri, A., Zanini, A., Fiorio, M., & Tinazzi, M. (2020). Smell and taste dissociations in the modulation of tonic pain perception induced by a capsaicin cream application. European Journal of Pain, 24(10), 1946-1955.

      Lakens, D. (2022). Sample size justification. Collabra: psychology, 8(1), 33267.

      Albers, C., & Lakens, D. (2018). When power analyses based on pilot data are biased: Inaccurate effect size estimators and follow-up bias. Journal of experimental social psychology, 74, 187-195.

      Furthermore, the researchers propose the comparison of faster vs. slower delta HR acoustic feedback throughout the manuscript when the natural comparison is the incongruent vs. the congruent feedback.

      We very much disagree that the natural comparison is congruent vs incongruent feedback. First, please note that congruency simply refers to whether the heartrate feedback was congruent with (i.e., matched) the participant’s heartrate measurements in the no feedback trials, or whether it was incongruent, and was therefore either faster or slower than this baseline frequency. As such, simply comparing congruent with incongruent feedback could only indicate that pain ratings change when the feedback does not match the real heart rate, irrespective of whether it is faster or slower. Such a test can therefore only reveal potential general effects of surprise or salience, when the feedback heartrate does not match the real one.

      We therefore assume that the reviewer specifically refers to the comparison of congruent vs incongruent faster feedback. However, this is not a good test either, as this comparison is, by necessity, confounded with the factor of surprise described above. In other words, if a difference would be found, it would not be clear if it emerges because, as we assume, that faster feedback is represented as an interoceptive signal for threat, or simply because participants are surprised about heartrate feedback that diverges from their real heartrate. Note that even a non-significant result in the analogous comparison of congruent vs incongruent slower feedback would not be able to resolve this confound, as in null hypothesis testing the absence of a significant effect does, per definition, not indicate that there is no effect - only that it could not be detected here.

      Instead, the only possible test of our hypothesis is the one we have designed our experiment around and focussed on with our central t-test: the comparison of incongruent faster with incongruent slower feedback. This keeps any possible effects of surprise/salience from generally altered feedback constant and allows us to test our specific hypothesis: that real heart rates will decrease and pain ratings will increase when receiving false interoceptive feedback about increased compared to decreasing heartrates. Note that this test of faster vs slower feedback is also statistically the most appropriate, as it collapses our prediction onto a single and highest-powered hypothesis test: As faster and slower heartrate feedback are assumed to induce effects in the opposite direction, the effect size of their difference is, per definition, double than the averaged effect size for the two separate tests of faster vs congruent feedback and slower vs congruent feedback.

      That being said, we also included comparisons with the congruent condition in our revised analysis, in line with the reviewer’s suggestion and previous studies. These analyses help explore potential asymmetries in the effect of false feedback. While faster feedback (both interoceptive and exteroceptive) significantly modulated pain relative to congruent feedback, the slower feedback did not, consistent with previous literature showing stronger effects for arousal-increasing cues (e.g., Valins, 1966; Iodice et al., 2019). To address this point, in the revised manuscript we have added a paragraph to the Data Analysis section of Experiment 1 (lines 405-437) to make this logic clearer.

      Valins, S. (1966). Cognitive effects of false heart-rate feedback. Journal of personality and social psychology, 4(4), 400.

      Iodice, P., Porciello, G., Bufalari, I., Barca, L., & Pezzulo, G. (2019). An interoceptive illusion of effort induced by false heart-rate feedback. Proceedings of the National Academy of Sciences, 116(28), 13897-13902.

      This could be influenced by the fact that the faster HR exteroceptive cue in experiment 2 also shows a significant modulatory effect on pain intensity compared to congruent HR feedback, which puts into question the hypothesized differences between interoceptive vs. exteroceptive cues. These results could also be influenced by the specific choice of exteroceptive cue: the researchers imply that the main driver of the effect is the nature of the cue (interoceptive vs. exteroceptive) and not its frequency. However, they attempt to generalize their findings using knocking wood sounds to all possible sounds, but it is possible that some features of these sounds (e.g., auditory roughness or loomingness) could be the drivers behind the observed effects.

      We appreciate this thoughtful comment. We agree that low-level auditory features can potentially introduce confounds in the experimental design, and we acknowledge the importance of distinguishing these factors from the higher-order distinction that is central to our study: whether the sound is perceived as interoceptive (originating from within the body) or exteroceptive (perceived as external). To this end, the knocking sound was chosen not for its specific acoustic profile, but because it lacked bodily relevance, thus allowing us to test whether the same temporal manipulations (faster, congruent, slower) would have different effects depending on whether the cue was interpreted as reflecting an internal bodily state or not. In this context, the exteroceptive cue served as a conceptual contrast rather than an exhaustive control for all auditory dimensions.

      Several aspects of our data make it unlikely that the observed effects are driven by unspecific acoustic characteristics of the sounds used in the exteroceptive and interoceptive experiments (see also our responses to Reviewer 1 and Reviewer 3 who raised similar points).

      First, if the knocking sound had inherent acoustic features that strongly influenced perception or physiological responses, we would expect it to have produced consistent effects across all feedback conditions (Faster, Slower, Congruent), regardless of the interpretive context. This would have manifested as an overall difference between experiments in the between-subjects analyses and in the supplementary mixed-effects models that included Experiment as a fixed factor. Yet, we observed no such main effects in any of our variables. Instead, significant differences emerged only in specific theoretically predicted comparisons (e.g., Faster vs. Slower), and critically, these effects depended on the cue type (interoceptive vs. exteroceptive), suggesting that perceived bodily relevance, rather than a specific acoustic property, was the critical modulator. In other words, any alternative explanation based on acoustic features would need to be able to explain why these acoustic properties would induce not an overall change in heart rate and pain perception (i.e., similarly across slower, faster, and congruent feedback), but the brain’s response to changes in the rate of this feedback – increasing pain ratings and decreasing heartrates for faster relative to slower feedback. We hope you agree that a simple effect of acoustic features would not predict such a sensitivity to the rate with which the sound was played.

      Please refer to our responses to Reviewers 1 and 2 for further aspects of the data, arguing strongly against other features associated with the sounds (e.g., alertness, arousal) could be responsible for the results, as the data pattern again goes in the opposite direction than that predicted by such accounts (e.g., faster heartrate feedback decreased real heartrate, instead of increasing them, as would be expected if accelerated heartrate feedback increased arousal).

      Finally, to further support this interpretation, we refer to neurophysiological evidence showing that heartbeat sounds are not processed as generic auditory signals, but as internal, bodily relevant cues especially when believed to reflect one’s own physiological state. For instance, fMRI research (Kleint et al., 2015) shows that heartbeat sounds engage key interoceptive regions such as the anterior insula and frontal operculum more than acoustically matched control tones. EEG data (Vicentin et al., 2024) showed that faster heartbeat sounds produce stronger alpha suppression over frontocentral areas, suggesting enhanced processing in networks associated with interoceptive attention. Moreover, van Elk et al. (2014) found that heartbeat sounds attenuate the auditory N1 response, a neural signature typically linked to self-generated or predicted bodily signals. These findings consistently demonstrate that heartbeats sounds are processed as interoceptive and self-generated signals, which is in line with our rationale that the critical factor at play concern whether it is semantically perceived as reflecting one’s own bodily state, rather than the physical properties of the sound.

      We now explicitly discuss these issues in the revised Discussion section (lines 740-758).

      Kleint, N. I., Wittchen, H. U., & Lueken, U. (2015). Probing the interoceptive network by listening to heartbeats: an fMRI study. PloS one, 10(7), e0133164.

      van Elk, M., Lenggenhager, B., Heydrich, L., & Blanke, O. (2014). Suppression of the auditory N1-component for heartbeat-related sounds reflects interoceptive predictive coding. Biological psychology, 99, 172-182.

      Vicentin, S., Guglielmi, S., Stramucci, G., Bisiacchi, P., & Cainelli, E. (2024). Listen to the beat: behavioral and neurophysiological correlates of slow and fast heartbeat sounds. International Journal of Psychophysiology, 206, 112447.

      Finally, it is noteworthy that the researchers divided the study into two experiments when it would have been optimal to test all the conditions with the same subjects in a randomized order in a single cross-over experiment to reduce between-subject variability. Taking this into consideration, I believe that the conclusions are only partially supported by the evidence. Despite of the outcome transformations, a clear effect of faster HR acoustic feedback can be observed in the first experiment, which is larger than the proposed exteroceptive counterpart. This work could be of broad interest to pain researchers, particularly those working on predictive coding of pain.

      We appreciate the reviewer’s suggestion regarding a within-subject crossover design. While such a design indeed offers increased statistical power by reducing interindividual variability (Charness, Gneezy, & Kuhn, 2012), we intentionally opted for a between-subjects design due to theoretical and methodological considerations specific to studies involving deceptive feedback. Most importantly, carryover effects are a major concern in deception paradigms. Participants exposed to one type of feedback initially (e.g., interoceptive), and then the other (exteroceptive) would be more likely to develop suspicion or adaptive strategies that would alter their responses. Such expectancy effects could contaminate results in a crossover design, particularly when participants realize that feedback is manipulated. In line with this idea, past studies on false cardiac feedback (e.g., Valins, 1966; Pennebaker & Lightner, 1980) often employed between-subjects or blocked designs to mitigate this risk.

      Pennebaker, J. W., & Lightner, J. M. (1980). Competition of internal and external information in an exercise setting. Journal of personality and social psychology, 39(1), 165.

      Valins, S. (1966). Cognitive effects of false heart-rate feedback. Journal of personality and social psychology, 4(4), 400.

      Reviewer #3 (Public Review):

      In their manuscript titled "Exposure to false cardiac feedback alters pain perception and anticipatory cardiac frequency", Parrotta and colleagues describe an experimental study on the interplay between false heart rate feedback and pain experience in healthy, adult humans. The experimental design is derived from Bayesian perspectives on interoceptive inference. In Experiment 1 (N=34), participants rated the intensity and unpleasantness of an electrical pulse presented to their middle fingers. Participants received auditory cardiac feedback prior to the electrical pulse. This feedback was congruent with the participant's heart rate or manipulated to have a higher or lower frequency than the participant's true heart rate (incongruent high/ low feedback). The authors find heightened ratings of pain intensity and unpleasantness as well as a decreased heart rate in participants who were exposed to the incongruent-high cardiac feedback. Experiment 2 (N=29) is equivalent to Experiment 1 with the exception that non-interoceptive auditory feedback was presented. Here, mean pain intensity and unpleasantness ratings were unaffected by feedback frequency.

      Strengths:

      The authors present interesting experimental data that was derived from modern theoretical accounts of interoceptive inference and pain processing.

      (1) The motivation for the study is well-explained and rooted within the current literature, whereas pain is the result of a multimodal, inferential process. The separation of nociceptive stimulation and pain experience is explained clearly and stringently throughout the text.

      (2) The idea of manipulating pain-related expectations via an internal, instead of an external cue, is very innovative.

      (3) An appropriate control experiment was implemented, where an external (non-physiological) auditory cue with parallel frequency to the cardiac cue was presented.

      (4) The chosen statistical methods are appropriate, albeit averaging may limit the opportunity for mechanistic insight, see weaknesses section.

      (5) The behavioral data, showing increased unpleasantness and intensity ratings after exposure to incongruent-high cardiac feedback, but not exteroceptive high-frequency auditory feedback, is backed up by ECG data. Here, the decrease in heart rate during the incongruent-high condition speaks towards a specific, expectation-induced physiological effect that can be seen as resulting from interoceptive inference.

      We thank the reviewer for their positive feedback. We are glad that the study’s theoretical foundation, innovative design, appropriate control conditions, and convergence of behavioral and physiological data were well received.

      Weaknesses:

      Additional analyses and/ or more extensive discussion are needed to address these limitations:

      (1) I would like to know more about potential learning effects during the study. Is there a significant change in ∆ intensity and ∆ unpleasantness over time; e.g. in early trials compared to later trials? It would be helpful to exclude the alternative explanation that over time, participants learned to interpret the exteroceptive cue more in line with the cardiac cue, and the effect is driven by a lack of learning about the slightly less familiar cue (the exteroceptive cue) in early trials. In other words, the heartbeat-like auditory feedback might be "overlearned", compared to the less naturalistic tone, and more exposure to the less naturalistic cue might rule out any differences between them w.r.t. pain unpleasantness ratings.

      We thank the reviewer for raising this important point. Please note that the repetitions in our task were relatively limited (6 trials per condition), which limits the potential influence of such differential learning effects between experiments. To address this concern, we performed an additional analysis, reported in the Supplementary Materials, using a Linear Mixed-Effects Model approach. This method allowed us to include "Trial" (the rank order of each trial) as a variable to account for potential time-on-task effects such as learning, adaptation, or fatigue (e.g., Möckel et al., 2015). All feedback conditions (no-feedback, congruent, faster, slower) and all stimulus intensity levels were included.

      Specifically, we tested the following models:

      Likert Pain Unpleasantness Ratings ~ Experiment × Feedback × StimInt × Trial + (StimInt + Trial | Subject)

      Numeric Pain Scale of Intensity Ratings ~ Experiment × Feedback × StimInt × Trial + (StimInt + Trial | Subject)

      In both models, no significant interactions involving Trial × Experiment or Trial × Feedback × Experiment were found. Instead, we just find generally larger effects in early trials compared to later ones (Main effect of Trial within each Experiment), similar to other cognitive illusions where repeated exposure diminishes effects. Thus, although some unspecific changes over time may have occurred (e.g., due to general task exposure), these changes did not differ systematically across experimental conditions (interoceptive vs. exteroceptive) or feedback types. However, we are fully aware that the absence of significant higher-order interactions does not conclusively rule out the possibility of learning-related effects. It is possible that our models lacked the statistical power to detect more subtle or complex time-dependent modulations, particularly if such effects differ in magnitude or direction across feedback conditions.

      We report the full description of these analyses and results in the Supplementary materials 1. Cross-experiment analysis (between-subjects model).

      (2) The origin of the difference in Cohen's d (Exp. 1: .57, Exp. 2: .62) and subsequently sample size in the sensitivity analyses remains unclear, it would be helpful to clarify where these values are coming from (are they related to the effects reported in the results? If so, they should be marked as post-hoc analyses).

      Following recommendations (Anderson, Kelley & Maxwell, 2017; Albers &  Lakens, 2017), we do not report theoretical power based on previously reported effect sizes as this neglects uncertainty around effect size measurements, especially for new effects for which no reliable expected effect size estimates can be derived across the literature. Instead, the power analysis is based on a sensitivity analysis, conducted in G*Power (Version 3.1). Importantly, these are not post-hoc analyses, as they are not based on observed effect sizes in our study, but derived a priori. Sensitivity analyses estimate effect sizes that our design is well-powered (90%) to detect (i.e. given target power, sample size, type of test), for the crucial comparison between faster and slower feedback in both experiments (Lakens, 2022). Following recommendations, we also report the smallest effect size this test can in principle detect in our study (SESOI, Lakens, 2022). This yields effect sizes of d = .57 in Experiment 1 and d = .62 in Experiment 2 at 90% power and SESOIs of d = .34 and .37, respectively. Note that values are slightly higher in Experiment 2, as more participants were excluded based on our exclusion criteria. Importantly, detectable effect sizes in both experiments are smaller than reported effect sizes for comparable top-down effects on pain measurements of d = .7 (Iodice et al., 2019).  We have now added more information to the power analysis sections to make this clearer (lines 208-217).

      Albers, C., & Lakens, D. (2018). When power analyses based on pilot data are biased: Inaccurate effect size estimators and follow-up bias. Journal of experimental social psychology, 74, 187-195.

      Anderson, S. F., Kelley, K., & Maxwell, S. E. (2017). Sample-Size Planning for More Accurate Statistical Power: A Method Adjusting Sample Effect Sizes for Publication Bias and Uncertainty. Psychological Science, 28(11), 1547-1562.

      Lakens, D. (2022). Sample size justification. Collabra: psychology, 8(1), 33267.

      (3) As an alternative explanation, it is conceivable that the cardiac cue may have just increased unspecific arousal or attention to a larger extent than the exteroceptive cue. It would be helpful to discuss the role of these rather unspecific mechanisms, and how it may have differed between experiments.

      We thank the reviewer for raising this important point. We agree that, in principle, unspecific mechanisms such as increased arousal or attention driven by cardiac feedback could be an alternative explanation for the observed effects. However, several aspects of our data indicate that this is unlikely:

      (1) No main effect of Experiment on pain ratings:

      If the cardiac feedback had simply increased arousal or attention in a general (non-specific) way, we would expect a main effect of Experiment (i.e., interoceptive vs exteroceptive condition) on pain intensity or unpleasantness ratings, regardless of feedback frequency. However, such a main effect was never observed when we compared between experiments (see between-experiment t-tests in results, and in supplementary analyses). Instead, effects were specific to the manipulation of feedback frequency.

      (2) Heart rate as an arousal measure:

      Heart rate (HR) is a classical physiological index of arousal. If there had been an unspecific increase in arousal in the interoceptive condition, we would expect a main effect of Experiment on HR. However, no such main effect was found. Instead, our HR analyses revealed a significant interaction between feedback and experiment, suggesting that HR changes depended specifically on the feedback manipulation rather than reflecting a general arousal increase.

      (3) Arousal predicts faster, not slower, heart rates

      In Experiment 1, faster interoceptive cardiac feedback led to a slowdown in heartrates both when compared to slower feedback and to congruent cardiac feedback. This is in line with the predicted compensatory response to faster heart rates. In contrast, if faster feedback would have only generally increased arousal, heart rates should have increased instead of decreased, as indicated by several prior studies (Tousignant-Laflamme et al., 2005; Terkelsen et al., 2005; for a review, see Forte et al., 2022), predicting the opposite pattern of responses than was found in Experiment 1.

      Taken together, these findings indicate that the effects observed are unlikely to be driven by unspecific arousal or attention mechanisms, but rather are consistent with feedback-specific modulations, in line with our interoceptive inference framework.

      We have now integrated these considerations in the revised discussion (lines 796-830), and added the relevant between-experiment comparisons to the Results of Experiment 2 and the supplementary analysis.

      Terkelsen, A. J., Mølgaard, H., Hansen, J., Andersen, O. K., & Jensen, T. S. (2005). Acute pain increases heart rate: differential mechanisms during rest and mental stress. Autonomic Neuroscience, 121(1-2), 101-109.

      Tousignant-Laflamme, Y., Rainville, P., & Marchand, S. (2005). Establishing a link between heart rate and pain in healthy subjects: a gender effect. The journal of pain, 6(6), 341-347.

      Forte, G., Troisi, G., Pazzaglia, M., Pascalis, V. D., & Casagrande, M. (2022). Heart rate variability and pain: a systematic review. Brain sciences, 12(2), 153.

      (4) The hypothesis (increased pain intensity with incongruent-high cardiac feedback) should be motivated by some additional literature.

      We thank the reviewer for this helpful suggestion. Please note that the current phenomenon was tested in this experiment for the first time. Therefore, there is no specific prior study that motivated our hypotheses; they were driven theoretically, and derived from our model of interoceptive integration of pain and cardiac perception. The idea that accelerated cardiac feedback (relative to decelerated feedback) will increase pain perception and reduce heart rates is grounded on Embodied Predictive coding frameworks. Accordingly, expectations and signals from different sensory modalities (sensory, proprioceptive, interoceptive) are integrated both to efficiently infer crucial homeostatic and physiological variables, such as hunger, thirst, and, in this case, pain, and regulate the body’s own autonomic responses based on these inferences.

      Within this framework, the concept of an interoceptive schema (Tschantz et al., 2022; Iodice et al., 2019; Parrotta et al., 2024; Schoeller et al., 2022) offers the basis for understanding interoceptive illusions, wherein inferred levels of interoceptive states (i.e., pain) deviate from the actual physiological state. Cardiac signals conveyed by the feedback manipulation act as a misleading prior, shaping the internal generative model of pain. Specifically, an increased heart rate may signal a state of threat, establishing a prior expectation of heightened pain. Building on predictive models of interoception, we predict that this cardiac prior is integrated with interoceptive (i.e., actual nociceptive signal) and exteroceptive inputs (i.e., auditory feedback input), leading to a subjective experience of increased pain even when there is no corresponding increase in the nociceptive input.

      This idea is not completely new, but it is based on our previous findings of an interoceptive cardiac illusion driven by misleading priors about anticipated threat (i.e., pain). Specifically, in Parrotta et al. (2024), we tested whether a common false belief that heart rate increases in response to threat lead to an illusory perception of accelerated cardiac activity when anticipating pain. In two experiments, we asked participants to monitor and report their heartbeat while their ECG was recorded. Participants performed these tasks while visual cues reliably predicted a forthcoming harmless (low-intensity) vs. threatening (high-intensity) cutaneous electrical stimulus. We showed that anticipating a painful vs. harmless stimulus causes participants to report an increased cardiac frequency, which does not reflect their real cardiac response, but the common (false) belief that heart rates would accelerate under threat, reflecting the hypothesised integration of prior expectations and interoceptive inputs when estimating cardiac activity.

      Here we tested the counterpart of such a cardiac illusion. We reasoned that if cardiac interoception is shaped by expectations about pain, then the inverse should also be true: manipulating beliefs about cardiac activity (via cardiac feedback) in the context of pain anticipation should influence the perception of pain. Specifically, we hypothesized that presenting accelerated cardiac feedback would act as a misleading prior, leading to an illusory increase in pain experience, even in the absence of an actual change in nociceptive input.

      Moreover, next to the references already provided in the last version of the manuscript, there is ample prior research that provides more general support for such relationships. Specifically, studies have shown that providing mismatched cardiac feedback in contexts where cardiovascular changes are typically expected (i.e. sexual arousal, Rupp & Wallen, 2008; Valins, 1996; physical exercise, Iodice et al., 2019) can enhance the perception of interoceptive states associated with those experiences. Furthermore, findings that false cardiac feedback can influence emotional experience suggest that it is the conscious perception of physiological arousal, combined with the cognitive interpretation of the stimulus, that plays a key role in shaping emotional responses (Crucian et al., 2000).

      This point is now addressed in the revised Introduction, wherein additional references have been integrated (lines 157-170).

      Crucian, G. P., Hughes, J. D., Barrett, A. M., Williamson, D. J. G., Bauer, R. M., Bowers, D., & Heilman, K. M. (2000). Emotional and physiological responses to false feedback. Cortex, 36(5), 623-647.

      Iodice, P., Porciello, G., Bufalari, I., Barca, L., & Pezzulo, G. (2019). An interoceptive illusion of effort induced by false heart-rate feedback. Proceedings of the National Academy of Sciences, 116(28), 13897-13902.

      Parrotta, E., Bach, P., Perrucci, M. G., Costantini, M., & Ferri, F. (2024). Heart is deceitful above all things: Threat expectancy induces the illusory perception of increased heartrate. Cognition, 245, 105719.

      Rupp, H. A., & Wallen, K. (2008). Sex differences in response to visual sexual stimuli: A review. Archives of sexual behavior, 37(2), 206-218.

      Schoeller, F., Horowitz, A., Maes, P., Jain, A., Reggente, N., Moore, L. C., Trousselard, M., Klein, A., Barca, L., & Pezzulo, G. (2022). Interoceptive technologies for clinical neuroscience.

      Tschantz, A., Barca, L., Maisto, D., Buckley, C. L., Seth, A. K., & Pezzulo, G. (2022). Simulating homeostatic, allostatic and goal-directed forms of interoceptive control using active inference. Biological Psychology, 169, 108266.

      Valins, S. (1966). Cognitive effects of false heart-rate feedback. Journal of personality and social psychology, 4(4), 400.

      (5) The discussion section does not address the study's limitations in a sufficient manner. For example, I would expect a more thorough discussion on the lack of correlation between participant ratings and self-reported bodily awareness and reactivity, as assessed with the BPQ.

      We thank the reviewer for this valuable observation. In response, we have revised the Discussion section to explicitly acknowledge and elaborate on the lack of significant correlations between participants’ pain ratings and their self-reported bodily awareness and reactivity as assessed with the BPQ.

      We now clarify that the inclusion of this questionnaire was exploratory. While it would be theoretically interesting to observe a relationship between subjective pain modulation and individual differences in interoceptive awareness, detecting robust correlations between within-subject experimental effects and between-subjects trait measures such as the BPQ typically requires much larger sample sizes (often exceeding N = 200) due to the inherently low reliability of such cross-level associations (see Hedge, Powell & Sumner, 2018; the “reliability paradox”). As such, the absence of a significant correlation in our study does not undermine the conclusions we draw from our main findings. Future studies with larger samples will be needed to systematically address this question. We now acknowledge this point explicitly in the revised manuscript (lines 501-504; 832-851).

      Hedge, C., Powell, G., & Sumner, P. (2018). The reliability paradox: Why robust cognitive tasks do not produce reliable individual differences. Behavior Research Methods, 50(3), 1166-1186. https://doi.org/10.3758/s13428-017-0935-1

      (a) Some short, additional information on why the authors chose to focus on body awareness and supradiaphragmatic reactivity subscales would be helpful.

      We chose to focus on the body awareness and supradiaphragmatic reactivity subscales because these aspects are closely tied to emotional and physiological processing, particularly in the context of interoception. Body awareness plays a critical role in how individuals perceive and interpret bodily signals, which in turn affects emotional regulation and self-awareness. Supradiaphragmatic reactivity refers specifically to organs located or occurring above the diaphragm (i.e., the muscle that separates the chest cavity from the abdomen), which includes the heart, compared to subdiaphragmatic reactivity subscales further down. Our decision to include these subscales is further motivated by recent research, including the work by Petzschner et al. (2021), which demonstrates that the focus of attention can modulate the heartbeat-evoked potential (HEP), and that this modulation is predicted by participants’ responses on the supradiaphragmatic reactivity subscales. Thus, this subscale, and the more general body awareness scale, allows us to explore the interplay between bodily awareness, physiological reactivity, and emotional processing in our study. We now clarify this point in the revised version of the Methods - Body Perception Questionnaire (lines 384-393).

      (6) The analyses presented in this version of the manuscript allow only limited mechanistic conclusions - a computational model of participants' behavior would be a very strong addition to the paper. While this may be out of the scope of the article, it would be helpful for the reader to discuss the limitations of the presented analyses and outline avenues towards a more mechanistic understanding and analysis of the data. The computational model in [7] might contain some starting ideas.

      Thank you for your valuable feedback. We agree that a computational model would enhance the mechanistic understanding of our findings. While this is beyond the current scope, we now discuss the limitations of our analysis in the Limitations and Future directions section (lines 852-863). Specifically, we acknowledge that future studies could use computational models to better understand the interactions between physiological, cognitive, and perceptual factors.

      Some additional topics were not considered in the first version of the manuscript:

      (1) The possible advantages of a computational model of task behavior should be discussed.

      We agree that a computational model of task behavior could provide several advantages. By formalizing principles of predictive processing and active inference, such a model could generate quantitative predictions about how heart rate (HR) and feedback interact, providing a more precise understanding of their respective contributions to pain modulation. However, this is a first demonstration of a theoretically predicted phenomenon, and computationally modelling it is currently outside the scope of the article. We would be excited to explore this in the future. We have added a brief discussion of these potential advantages in the revised manuscript and suggest that future work could integrate computational modelling to further deepen our understanding of these processes (lines 852-890).

      (2) Across both experiments, there was a slightly larger number of female participants. Research suggests significant sex-related differences in pain processing [1,2]. It would be interesting to see what role this may have played in this data.

      Thank you for your insightful comment. While we acknowledge that sex-related differences in pain processing are well-documented in the literature, we do not have enough participants in our sample to test this in a well-powered way. As such, exploring the role of sex differences in pain perception will need to be addressed in future studies with more balanced samples. It would be interesting if more sensitive individuals, with a more precise representation of pain, also show smaller effects on pain perception. We have noted this point in the revised manuscript (lines 845-851) and suggest that future research could specifically investigate how sex differences might influence the modulation of pain and physiological responses in similar experimental contexts.

      (3) There are a few very relevant papers that come to mind which may be of interest. These sources might be particularly useful when discussing the roadmap towards a mechanistic understanding of the inferential processes underlying the task responses [3,4] and their clinical implications.

      Thank you for highlighting these relevant papers. We appreciate your suggestion and have now cited them in the Limitations and Future directions paragraph (lines 852-863).

      (4) In this version of the paper, we only see plots that illustrate ∆ scores, averaged across pain intensities - to better understand participant responses and the relationship with stimulus intensity, it would be helpful to see a more descriptive plot of task behavior (e.g. stimulus intensity and raw pain ratings)

      To directly address the reviewer’s request, we now provide additional descriptive plots in the supplementary material of the revised manuscript, showing raw pain ratings across different stimulus intensities and feedback conditions. These plots offer a clearer view of participant behavior without averaging across pain levels, helping to better illustrate the relationship between stimulus intensity and reported pain.

      Mogil, J. S. (2020). Qualitative sex differences in pain processing: emerging evidence of a biased literature. Nature Reviews Neuroscience, 21(7), 353-365. https://www.nature.com/articles/s41583-020-0310-6

      Sorge, R. E., & Strath, L. J. (2018). Sex differences in pain responses. Current Opinion in Physiology, 6, 75-81. https://www.sciencedirect.com/science/article/abs/pii/S2468867318300786?via%3Dihub

      Unal, O., Eren, O. C., Alkan, G., Petzschner, F. H., Yao, Y., & Stephan, K. E. (2021). Inference on homeostatic belief precision. Biological Psychology, 165, 108190.

      Allen, M., Levy, A., Parr, T., & Friston, K. J. (2022). In the body's eye: the computational anatomy of interoceptive inference. PLoS Computational Biology, 18(9), e1010490.

      Stephan, K. E., Manjaly, Z. M., Mathys, C. D., Weber, L. A., Paliwal, S., Gard, T., ... & Petzschner, F. H. (2016). Allostatic self-efficacy: A metacognitive theory of dyshomeostasis-induced fatigue and depression. Frontiers in human neuroscience, 10, 550.

      Friston, K. J., Stephan, K. E., Montague, R., & Dolan, R. J. (2014). Computational psychiatry: the brain as a phantastic organ. The Lancet Psychiatry, 1(2), 148-158.

      Eckert, A. L., Pabst, K., & Endres, D. M. (2022). A Bayesian model for chronic pain. Frontiers in Pain Research, 3, 966034.

      We thank the reviewer for highlighting these relevant references which have now been integrated in the revised version of the manuscript.

      Recommendations For The Authors: 

      Reviewer #1 (Recommendations For The Authors):

      At the time I was reviewing this paper, I could not think of a detailed experiment that would answer my biggest concern: Is this a manipulation of the brain's interoceptive data integration, or rather a manipulation of participants' alertness which indirectly influences their pain prediction?

      One incomplete idea that came to mind was delivering this signal in a more "covert" manner (though I am not sure it will suffice), or perhaps correlating the effect size of a participant with their interoceptive abilities, as measured in a different task or through a questionnaire.... Another potential idea is to tell participants that  this is someone else's HR that they hear and see if that changes the results (though requires further thought). I leave it to the authors to think further, and perhaps this is to be answered in a different paper - but if so, I am sorry to say that I do not think the claims can remain as they are now, and the paper will need a revision of its arguments, unfortunately. I urge the authors to ask further questions if my point about the concern was not made clear enough for them to address or contemplate it.

      We thank the reviewer for raising this important point. As detailed in our previous response, this point invites an important clarification regarding the role of cardiac deceleration in threat processing. Rather than serving as an interoceptive input from which the brain infers the likelihood of a forthcoming aversive event, heart rate deceleration is better described as an output of an already ongoing predictive process, as it reflects an allostatic adjustment of the bodily state aimed at minimizing the impact of the predicted perturbation (e.g., pain) and preventing sympathetic overshoot. It would be maladaptive for the brain to use a decelerating heart rate as evidence of impending threat, since this would paradoxically trigger further parasympathetic activation, initiating a potentially destabilizing feedback loop. Conversely, increased heart rate represents an evolutionarily conserved cue for arousal, threat, and pain. Our results therefore align with the idea that the brain treats externally manipulated increases in cardiac signals as congruent with anticipated sympathetic activation, prompting a compensatory autonomic and perceptual response consistent with embodied predictive processing frameworks (e.g., Barrett & Simmons, 2015; Seth, 2013).

      We would also like to re-iterate that our results cannot be explained by general differences induced by the different heart rate sounds relative to the exteroceptive (see also our detailed comments to your point above, and our response to a similar point from Reviewer 3), for three main reasons.

      (1) No main effect of Experiment on pain ratings:

      If the cardiac feedback had simply increased arousal or attention in a general (non-specific) way, we would expect a main effect of Experiment (i.e., interoceptive vs exteroceptive condition) on pain intensity or unpleasantness ratings, regardless of feedback frequency. However, such a main effect was never observed. Instead, effects were specific to the manipulation of feedback frequency.

      (2) Heart rate as an arousal measure:

      Heart rate (HR) is a classical physiological index of arousal. If there had been an unspecific increase in arousal in the interoceptive condition, we would expect a main effect of Experiment on HR. However, no such main effect was found. Instead, our HR analyses revealed a significant interaction between feedback and experiment, suggesting that HR changes depended specifically on the feedback manipulation rather than reflecting a general arousal increase.

      (3) Arousal predicts faster, not slower, heart rates

      In Experiment 1, faster interoceptive cardiac feedback led to a slowdown in heartrates both when compared to slower feedback and to congruent cardiac feedback. This is in line with the predicted compensatory response to faster heart rates. In contrast, if faster feedback would have only generally increased arousal, heart rates should have increased instead of decreased, as indicated by several prior studies (for a review, see Forte et al., 2022), predicting the opposite pattern of responses than was found in Experiment 1.

      Taken together, these findings indicate that the effects observed are unlikely to be driven by unspecific arousal or attention mechanisms, but rather are consistent with feedback-specific modulations, in line with our interoceptive inference framework. We now integrate these considerations in the general discussion (lines 796-830).

      Barrett, L. F., & Simmons, W. K. (2015). Interoceptive predictions in the brain. Nature reviews neuroscience, 16(7), 419-429.

      Forte, G., Troisi, G., Pazzaglia, M., Pascalis, V. D., & Casagrande, M. (2022). Heart rate variability and pain: a systematic review. Brain sciences, 12(2), 153.

      Seth, A. K. (2013). Interoceptive inference, emotion, and the embodied self. Trends in Cognitive Sciences, 17(11), 565-573.

      Additional recommendations:

      Major (in order of importance):

      (1) Number of trials per participant, per condition: as I mentioned, having only 6 trials for each condition is very little. The minimum requirement to accept so few trials would be to show data about the distribution of participants' responses to these trials, both per pain intensity (which was later averaged across - another issue discussed later), and across pain intensities, and see that it allows averaging across and that it is not incredibly variable such that the mean is unreliable.

      We appreciate the reviewer’s concern regarding the limited number of trials per condition. This choice was driven by both theoretical and methodological considerations.

      First, as is common in body illusion paradigms (e.g., the Rubber Hand Illusion, Botvinick & Cohen, 1998; the Full Body Illusion, Ehrsson, 2007; the Cardio-visual full body illusion, Pratviel et al., 2022) only a few trials are typically employed due to the immediate effects these manipulations elicit. Repetition can reduce the strength of the illusion through habituation, increased awareness, or loss of believability.

      Second, the experiment was already quite long (1.5h to 2h per participant) and cognitively demanding. It would not have been feasible to expand it further without compromising data quality due to fatigue, attentional decline, or participant disengagement.

      Third, the need for a large number of trials is more relevant when using implicit measures such as response times or physiological indices, which are typically indirectly related to the psychological constructs of interest. In contrast, explicit ratings are often more sensitive and less noisy, and thus require fewer repetitions to yield reliable effects (e.g., Corneille et al., 2024).

      Importantly, we also addressed your concern analytically. We ran therefore linear mixed-effects model analyses across all dependent variables (See Supplementary materials), with Trial (i.e., the rank order of each trial) included as a predictor to account for potential time-on-task effects such as learning, adaptation, or fatigue (e.g., Möckel et al., 2015). These models captured trial-by-trial variability and allowed us to test for systematic changes in heart rate (HR) and pain ratings including interactions with feedback conditions (e.g., Klieg et al., 2011; Baayen et al., 2010; Ambrosini et al., 2019). The consistent effects of Trial suggest that repetition dampens the illusion, reinforcing our decision to limit the number of exposures.

      In the interoceptive experiment, these analyses revealed a significant Feedback × Trial interaction (F(3, 711.19) = 6.16, p < .001), indicating that the effect of feedback on HR was not constant over time. As we suspected, and in line with other illusion-like effects, the difference between Faster and Slower feedback, which was significant early on (estimate = 1.68 bpm, p = .0007), decreased by mid-session (estimate = 0.69 bpm, p = .0048), and was no longer significant in later trials (estimate = 0.30 bpm, p = .4775). At the end of the session, HR values in the Faster and Slower conditions even numerically converged (Faster: M = 74.4, Slower: M = 74.1), and the non-significant contrast confirms that the difference had effectively vanished (for further details about slope estimation, see Supplementary material).

      The same pattern emerged for pain-unpleasantness ratings. A significant Feedback × Trial interaction (F (3, 675.33) = 3.44, p = .0165) revealed that the difference between Faster and Slower feedback was strongest at the beginning of the session and progressively weakened. Specifically, Faster feedback produced higher unpleasantness than Slower in early trials (estimate= -0.28, p = .0058) and mid-session (estimate = - 0.19, p = .0001), but this contrast was no longer significant in the final trials, wherein all the differences between active feedback conditions vanished (all ps > .55).

      Finally, similar results were yielded for pain intensity ratings. A significant Feedback × Trial interaction (F (3, 669.15) = 9.86, p < .001) showed that the Faster vs Slower difference was greatest at the start of the session and progressively vanished over trials. In early trials Faster feedback exceeded Slower (estimate=-8.33, p = .0001); by mid-session this gap had shrunk to 4.48 points (p < .0001); and in the final trials it was no longer significant (all ps > .94).

      Taken together, our results show that the illusion induced by Faster relative to slower feedback fades with repetition; adding further trials would likely have masked this key effect, confirming the methodological choice to restrict each condition to fewer exposures. To conclude, given that this is the first study to investigate an illusion of pain using heartbeat-based manipulation, we intentionally limited repeated exposures to preserve the integrity of the illusion. The use of mixed models as complementary analyses strengthens the reliability of our conclusions within these necessary design constraints. We now clarify this point in the Procedure paragraph (lines 328-335)

      Ambrosini, E., Peressotti, F., Gennari, M., Benavides-Varela, S., & Montefinese, M. (2023). Aging-related effects on the controlled retrieval of semantic information. Psychology and Aging, 38(3), 219.

      Baayen, R. H., & Milin, P. (2010). Analyzing reaction times. International Journal of Psychological Research, 3(2), 12-28.

      Botvinick, M., & Cohen, J. (1998). Rubber hands ‘feel’touch that eyes see. Nature, 391(6669), 756-756.

      Corneille, O., & Gawronski, B. (2024). Self-reports are better measurement instruments than implicit measures. Nature Reviews Psychology, 3(12), 835–846.

      Ehrsson, H. H. (2007). The experimental induction of out-of-body experiences. Science, 317(5841), 1048-1048.

      Kliegl, R., Wei, P., Dambacher, M., Yan, M., & Zhou, X. (2011). Experimental effects and individual differences in linear mixed models: Estimating the relation of spatial, object, and attraction effects in visual attention. Frontiers in Psychology, 1, 238. https://doi.org/10.3389/fpsyg.2010.00238

      Möckel, T., Beste, C., & Wascher, E. (2015). The effects of time on task in response selection-an ERP study of mental fatigue. Scientific reports, 5(1), 10113.

      Pratviel, Y., Bouni, A., Deschodt-Arsac, V., Larrue, F., & Arsac, L. M. (2022). Avatar embodiment in VR: Are there individual susceptibilities to visuo-tactile or cardio-visual stimulations?. Frontiers in Virtual Reality, 3, 954808.

      (2) Using different pain intensities: what was the purpose of training participants on correctly identifying pain intensities? You state that the aim of having 5 intensities is to cause ambiguity. What is the purpose of making sure participants accurately identify the intensities? Also, why then only 3 intensities were used in the test phase? The rationale for these is lacking.

      We thank the reviewer for raising these important points regarding the use of different pain intensities. The purpose of using five levels during the calibration and training phases was to introduce variability and increase ambiguity in the participants’ sensory experience. This variability aimed to reduce predictability and prevent participants from forming fixed expectations about stimulus intensity, thereby enhancing the plausibility of the illusion. It also helped prevent habituation to a single intensity and made the manipulation subtler and more credible. We had no specific theoretical hypotheses about this manipulation. Regarding the accuracy training, although the paradigm introduced ambiguity, it was important to ensure that participants developed a stable and consistent internal representation of the pain scale. This step was essential to control for individual differences in sensory discrimination and to ensure that illusion effects were not confounded by participants’ inability to reliably distinguish between intensities.

      As for the use of only three pain intensities in the test phase, the rationale was to focus on a manageable subset that still covered a meaningful range of the stimulus spectrum. This approach followed the same logic as Iodice et al. (2019, PNAS), who used five (rather than all seven) intensity levels during their experimental session. Specifically, they excluded the extreme levels (45 W and 125 W) used during baseline, to avoid floor and ceiling effects and to ensure that each test intensity could be paired with both a “slower” and a “faster” feedback from an adjacent level. This would not have been possible at the extremes of the intensity range, where no adjacent level exists in one direction. We adopted the same strategy to preserve the internal consistency and plausibility of our feedback manipulation.

      We further clarified these points in the revised manuscript (lines 336-342).

      Iodice, P., Porciello, G., Bufalari, I., Barca, L., & Pezzulo, G. (2019). An interoceptive illusion of effort induced by false heart-rate feedback. Proceedings of the National Academy of Sciences, 116(28), 13897-13902.

      (3) Averaging across pain intensities: this is, in my opinion, not the best approach as by matching a participant's specific responses to a pain stimulus before and after the manipulation, you can more closely identify changes resulting from the manipulation. Nevertheless, the minimal requirement to do so is to show data of distributions of pain intensities so we know they did not differ between conditions per participant, and in general - as you indicate they were randomly distributed.

      We thank the reviewer for this thoughtful comment. The decision to average across pain intensities in our main analyses was driven by the specific aim of the study: we did not intend to determine at which exact intensity level the illusion was most effective, and the limited number of trials makes such an analysis difficult. Rather, we introduced variability in nociceptive input to increase ambiguity and reduce predictability in the participants’ sensory experience. This variability was critical for enhancing the plausibility of the illusion by preventing participants from forming fixed expectations about stimulus strength. Additionally, using a range of intensities helped to minimize habituation effects and made the feedback manipulation subtler and more credible.

      That said, we appreciate the reviewer’s point that matching specific responses before and after the manipulation at each intensity level could provide further insights into how the illusion operates across varying levels of nociceptive input. We therefore conducted supplementary analyses using linear mixed-effects models in which all three stimulus intensities were included as a continuous fixed factor. This allowed us to examine whether the effects of feedback were intensity-specific or generalized across different levels of stimulation

      These analyses revealed that, in both the interoceptive and exteroceptive experiments, the effect of feedback on pain ratings was significantly modulated by stimulus intensity, as indicated by a Feedback × Stimulus Intensity interaction (Interoceptive: unpleasantness F(3, 672.32)=3.90, p=.0088; intensity ratings F(3, 667.07)=3.46, p=.016. Exteroceptive: unpleasantness F(3, 569.16)=8.21, p<.0001; intensity ratings F(3, 570.65)=3.00, p=.0301). The interaction term confirmed that the impact of feedback varied with stimulus strength, yet the pattern that emerged in each study diverged markedly.

      In the interoceptive experiment, the accelerated-heartbeat feedback (Faster) systematically heightened pain relative to the decelerated version (Slower) at every level of noxious input: for low-intensity trials Faster exceeded Slower by 0.22 ± 0.08 points on the unpleasantness scale (t = 2.84, p = .0094) and by 3.87 ± 1.69 units on the numeric intensity scale (t = 2.29, p = .0448); at the medium intensity the corresponding differences were 0.19 ± 0.05 (t = -4.02, p = .0001) and 4.52 ± 1.06 (t = 4.28, p < .0001); and even at the highest intensity, Faster still surpassed Slower by 0.17 ± 0.08 on unpleasantness (t = 2.21, p = .0326) and by 5.16 ± 1.67 on intensity (t = 3.09, p = .0032). This uniform Faster > Slower pattern indicates that the interoceptive manipulation amplifies perceived pain in a stimulus-independent fashion.

      The exteroceptive control experiment told a different story: the Faster-Slower contrast reached significance only at the most noxious setting (unpleasantness: estimate = 0.24 ± 0.07, t = -3.24, p = .0019; intensity: estimate = - 5.14 ± 1.82, t = 2.83, p = .0072) and was absent at the medium level (intensity , p=0.29; unpleasantness,  p=0.45), while at the lowest level Slower actually produced numerically higher unpleasantness (2.56 versus 2.40) and intensity ratings (44.7 versus 42.2).

      Thus, although both studies show that feedback effects depend on the actual nociceptive level of the stimulus, the results suggest that the faster vs. slower interoceptive feedback manipulation delivers a robust and intensity-invariant enhancement of pain, whereas the exteroceptive cue exerts a sporadic influence that surfaces solely under maximal stimulation.

      These new results are now included in the Supplementary Materials, where we report the detailed analyses for both the Interoceptive and Exteroceptive experiments on the Likert unpleasantness ratings and the numeric pain intensity ratings.

      (4) Sample size: It seems that the sample size was determined after the experiment was conducted, as the required N is identical to the actual N. I would be transparent about that, and say that retrospective sample size analyses support the ability of your sample size to support your claims. In general, a larger sample size than is required is always recommended, and if you were to run another study, I suggest you increase the sample size.

      As also addressed in our responses to your later comments (see our detailed reply regarding the justification of SESOI and power analyses), the power analyses reported here were not post-hoc power analyses based on obtained results. In line with current recommendations (Anderson, Kelley & Maxwell, 2017; Albers & Lakens, 2018), we did not base our analyses on previously reported effect sizes, as these can carry considerable uncertainty, particularly for novel effects where robust estimates are lacking. Instead, we used sensitivity analyses, conducted using the sensitivity analysis function in G*Power (Version 3.1). Sensitivity analyses allow us to report effect sizes that our design was adequately powered (90%) to detect, given the actual sample size, desired power level, and the statistical test used in each experiment (Lakens, 2022). Following further guidance (Lakens, 2022), we also report the smallest effect size of interest (SESOI) that these tests could reliably detect.

      This approach indicated that our design was powered to detect effect sizes of d = 0.57 in Experiment 1 and d = 0.62 in Experiment 2, with corresponding SESOIs of d = 0.34 and d = 0.37, respectively. The slightly higher value in Experiment 2 reflects the greater number of participants excluded (from an equal number originally tested) based on pre-specified criteria. Importantly, both experiments were well-powered to detect effects smaller than those typically reported in similar top-down pain modulation studies, where effect sizes around d = 0.7 have been observed (Iodice et al., 2019).

      We have now clarified this rationale in the revised manuscript, Experiment 1- Methods - Participants (lines 208-217).

      Albers, C., & Lakens, D. (2018). When power analyses based on pilot data are biased: Inaccurate effect size estimators and follow-up bias. Journal of experimental social psychology, 74, 187-195.

      Anderson, S. F., Kelley, K., & Maxwell, S. E. (2017). Sample-Size Planning for More Accurate Statistical Power: A Method Adjusting Sample Effect Sizes for Publication Bias and Uncertainty. Psychological Science, 28(11), 1547-1562. https://doi.org/10.1177/0956797617723724

      Lakens, D. (2022). Sample size justification. Collabra: psychology, 8(1), 33267.

      (5) Analysis: the use of change scores instead of the actual scores is not recommended, as it is a loss of data, but could have been ignored if it didn't have a significant effect on the analyses conducted. Instead of conducting an RM-ANOVA of conditions (faster, slower, normal heartbeats) across participants, finding significant interaction, and then moving on to specific post-hoc paired comparisons between conditions, the authors begin with the change score but then move on to conduct the said paired comparisons without ever anchoring these analyses in an appropriate larger ANOVA. I strongly recommend the use of an ANOVA but if not, the authors would have to correct for multiple comparisons at the minimum.

      We thank the reviewer for their comment regarding the use of change scores. These were originally derived from the difference between the slower and faster feedback conditions relative to the congruent condition. In line with the reviewer’s recommendation, we have now removed these difference-based change scores from the main analysis. The results remain identical. Please note that we have retained the normalization procedure, relative to each participant’s initial baseline in the no feedback trials, as it is widely used in the interoceptive and pain literature (e.g., Bartolo et al., 2013; Cecchini et al., 2020; Riello et al., 2019). This approach helps to control for interindividual variability and baseline differences by expressing each participant’s response relative to their no-feedback baseline. As before, normalization was applied across all dependent variables (heart rate, pain intensity, and pain unpleasantness).

      To address the reviewer’s concern about statistical validity, we now first report a 1-factor repeated-measures ANOVA (Greenhouse-Geisser corrected) for each dependent variable, with feedback condition (slower, congruent, faster) as the within-subject factor.

      These show in each case a significant main effect, which we then follow with planned paired-sample t-tests comparing:

      Faster vs. slower feedback (our main hypothesis, as these manipulations are expected to produce largest, most powerful, test of our hypothesis, see response to Reviewer 3),

      Faster vs. congruent and slower vs. congruent (to test for potential asymmetries, as suggested  by previous false heart rate feedback studies).

      The rationale of these analyses is further discussed in the Data Analysis of Experiment 1 (lines 405-437).

      Although we report the omnibus one-factor RM-ANOVAs to satisfy conventional expectations, we note that such tests are not statistically necessary, nor even optimal, when the research question is fully captured by a priori, theory-driven contrasts. Extensive methodological work shows that, in this situation, going straight to planned contrasts maximises power without inflating Type I error and avoids the logical circularity of first testing an effect one does not predict (e.g., Rosenthal & Rosnow, 1985). In other words, an omnibus F is warranted only when one wishes to protect against unspecified patterns of differences. Here our hypotheses were precise (Faster ≠ Slower; potential asymmetry relative to Congruent), so the planned paired comparisons would have sufficed statistically. We therefore include the RM-ANOVAs solely for readers who expect to see them, but our inferential conclusions rest on the theoretically motivated contrasts.

      Rosenthal, R., & Rosnow, R. L. (1985). Contrast analysis. New York: Cambridge.

      (6) Correlations: were there correlations between subjects' own heartbeats (which are considered a predictive cue) and pain perceptions? This is critical to show that the two are in fact related.

      We thank the reviewer for this thoughtful suggestion. While we agree that testing for a correlation between anticipatory heart rate responses and subjective pain ratings is theoretically relevant. However, we have not conducted this analysis in the current manuscript, as our study was not designed or powered to reliably detect such individual differences. As noted by Hedge, Powell, and Sumner (2018), robust within-subject experimental designs tend to minimize between-subject variability in order to detect clear experimental effects. This reduction in variance at the between-subject level limits the reliability of correlational analyses involving trait-like or individual response patterns. This issue, known as the reliability paradox, highlights that measures showing robust within-subject effects may not show stable individual differences, and therefore correlations with other individual-level variables (like subjective ratings used here) require much larger samples to produce interpretable results than available here (and commonly used in the literature), typically more than 200 participants. For these reasons, we believe that running such an analysis in our current dataset would not yield informative results and could be misleading.

      We now explicitly acknowledge this point in the revised version of the manuscript (Limitations and future directions, lines 832-851) and suggest that future studies specifically designed to examine individual variability in anticipatory physiological responses and pain perception would be better suited to address this question.

      Hedge, C., Powell, G., & Sumner, P. (2018). The reliability paradox: Why robust cognitive tasks do not produce reliable individual differences. Behavior Research Methods, 50(3), 1166-1186. https://doi.org/10.3758/s13428-017-0935-1

      (7) The direct comparison between studies is great! and finally the use of ANOVA - but why without the appropriate post-hoc tests to support the bold claims in lines 542-544? This is needed. Same for 556-558.

      We apologize if our writing was not clear here, but the result of the ANOVAs fully warrants the claims in 542-544 (now lines 616-618) and 556-558 (now lines 601-603).

      In a 2x2 design, the interaction term is mathematically identical to comparing the difference induced by Factor 1 at one level of Factor 2 with the same difference induced at the other level of Factor 2. In our 2x2 analysis with the factors Experiment (Cardiac feedback, Exteroceptive feedback - between participants) and Feedback Frequency (faster, slower - within participants), the interaction therefore directly tests whether the effect of Feedback frequency differs statistically (i.e., is larger or smaller) in the participants in the interoceptive and exteroceptive experiments. Thus, the conclusion that “faster feedback affected the perceptual bias more strongly in the Experiment 1 than in Experiment 2” captures the outcome of the significant interaction exactly. Indeed, this test would be statistically equivalent (and would produce identical p values) to a simple between-group t-test between each participant’s difference between the faster and slower feedback in the interoceptive group and the analogous differences between the faster and slower feedback in the exteroceptive group, as illustrated in standard examples of factorial analysis (see, e.g., Maxwell, Delaney and Kelley, 2018).

      Please note that, for the above reason, mathematically the conclusion of larger effects in one experiment than the other is licensed by the significant interaction even without follow-up t-tests. However, if the reader would like to see these tests, they are simply the main analysis results reported in each of the two experiment sections, where significant (t-test) differences between faster and slower feedback were induced with interoceptive cues (Experiment 1) but not exteroceptive cues (Experiment 2). Reporting them in the between-experiment comparison section again would therefore be redundant.

      To avoid this lack of clarity, we have now re-written the results section of each experiment. First, as noted above, we now precede our main hypothesis test - the crucial t-test comparing heartrate and pain ratings after faster vs slower feedback - with an ANOVA including all three levels (faster, congruent, slower feedback). Moreover, we removed the separate between-experiment comparison section. Instead, in the Result section of the exteroceptive Experiment 2, we now directly compare the (absent or reversed) effects of faster vs slower feedback directly, with a between-groups t-test, with the present effects in the interoceptive Experiment 1. This shows conclusively, and hopefully more clearly, that the effects in both experiments differ. We hope that this makes the logic of our analyses clearer.

      Maxwell, S. E., Delaney, H. D., & Kelley, K. (2017). Designing experiments and analyzing data: A model comparison perspective. Routledge.

      (8) The discussion is missing a limitation paragraph.

      Thank you for the suggestion. We have now added a dedicated limitations paragraph in the Discussion section (lines 832-890).

      Additional recommendations:

      Minor (chronological order):

      (1) Sample size calculations for both experiments: what was the effect size based on? A citation or further information is needed. Also, clarify why the effect size differed between the two experiments.

      Please see above

      (2) "Participants were asked to either not drink coffee or smoke cigarettes" - either is implying that one of the two was asked. I suspect it is redundant as both were not permitted.

      The intention was to restrict both behaviors, so we have corrected the sentence to clarify that participants were asked not to drink coffee or smoke cigarettes before the session.

      (3) Normalization of ECG - what exactly was normalized, namely what measure of the ECG?

      The normalized measure was the heart rate, expressed in beats per minute (bpm). We now clarify this in the Data Analysis section of Experiment 1 (Measures of the heart rate recorded with the ECG (beats per minute) in the feedback phase were normalized)

      (4) Line 360: "Mean Δ pain unpleasantness ratings were analysed analogously" - this is unclear, if already described in methods then should be removed here, if not - should be further explained here.

      Thank you for your observation. We are no longer using change scores.

      (5) Lines 418-420: "Consequently, perceptual and cardiac modulations associated with the feedback manipulation should be reduced over the exposure to the faster exteroceptive sound." - why reduced and not unchanged? I didn't follow the logic.

      We chose the term “reduced” rather than “unchanged” to remain cautious in our interpretation. Statistically, the absence of a significant effect in one experiment does not necessarily mean that no effect is present; it simply means we did not detect one. For this reason, we avoided using language that would suggest complete absence of modulation. It also more closely matches the results of the between experiment comparisons that we report in the Result section of Experiment 2, which can in principle only show that the effect in Experiment 2 was smaller than that of Experiment 1, not that it was absent. Even the TOST analysis that we utilize to show the absence of an effect can only show that any effect that is present is smaller than we could reasonably expect to detect with our experimental design, not its complete absence.

      Also, on a theoretical level, pain is a complex, multidimensional experience influenced not only by sensory input but also by cognitive, emotional, social and expectancy factors. For this reason, we considered it important to remain open to the possibility that other mechanisms beyond the misleading cardiac prior induced by the feedback might have contributed to the observed effects. If such other influences had contributed to the induced differences between faster and slower feedback in Experiment 1, some remainder of this difference could have been observed in Experiment 2 as well.

      Thus, for both statistical and theoretical reasons, we were careful to predict a reduction of the crucial difference, not its complete elimination. However, to warrant the possibility that effects could be completely eliminated we now write that “perceptual and cardiac modulations associated with the feedback manipulation should be reduced or eliminated with exteroceptive feedback”

      (6) Study 2 generation of feedback - was this again tailored per participants (25% above and beyond their own HR at baseline + gradually increasing or decreasing), or identical for everyone?

      Yes, in Study 2, the generation of feedback was tailored to each participant, mirroring the procedure or Experiment 1. Specifically, the feedback was set to be 25% above or below their baseline heart rate, with the feedback gradually increasing or decreasing. This individualized approach ensured that each participant experienced feedback relative to their own baseline heart rate. We now clarify this in the Methods section (lines 306-318).

      (7) I did not follow why we need the TOST and how to interpret its results.

      We thank the reviewer for raising this important point. In classical null hypothesis significance testing (NHST), a non-significant p-value (e.g., p > .05) only indicates that we failed to find a statistically significant difference, not that there is no difference. It therefore does not allow us to conclude that two conditions are equivalent – only that we cannot confidently say they are different. In our case, to support the claim that exteroceptive feedback does not induce perceptual or physiological changes (unlike interoceptive feedback), we needed a method to test for the absence of a meaningful effect, not just the absence of a statistically detectable one.

      The TOST (Two One-Sided Tests) procedure reverses the logic of NHST by testing whether the observed effect falls within a predefined equivalence interval, called the smallest effect size of interest (SESOI) that is in principle measurable with our design parameters (e.g., type of test, number of participants). This approach is necessary when the goal is not to detect a difference, but rather to demonstrate that an observed effect is so small that it can be considered negligible – or at the least smaller than we could in principle expect to observe in the given experiment. We used the TOST procedure in Experiment 2 to test for statistical equivalence between the effects of faster and slower exteroceptive feedback on pain ratings and heart rate.

      We hope that the clearer explanation now provided in data analysis of Experiment 2 section (lines 5589-563) fully addresses the reviewer’s concern.

      (8) Lines 492-3: authors say TOST significant, while p value = 0.065

      We thank the reviewer for spotting this inconsistency. The discrepancy was due to a typographical error in the initial manuscript. During the revision of the paper, we rechecked and fully recomputed all TOST analyses, and the results have now been corrected throughout the manuscript to accurately reflect the statistical outcomes. In particular, for the comparison of heart rate between faster and slower exteroceptive feedback in Experiment 2, the corrected TOST analysis now shows a significant equivalence, with the observed effect size being d = -0.19 (90% CI [-0.36, -0.03]) and both one-sided tests yielding p = .025 and p < .001. These updated results are reported in the revised Results section.

      Reviewer #2 (Recommendations For The Authors):

      I would suggest the authors revise their definition of pain in the introduction, since it is not always a protective experience. The new IASP definition specifically takes this into consideration.

      We thank the reviewer for this suggestion. We have updated the definition of pain in the Introduction (lines 2-4) to align with the most recent IASP definition (2020), which characterizes pain as “an unpleasant sensory and emotional experience associated with, or resembling that associated with, actual or potential tissue damage” (lines 51-53).

      The work on exteroceptive cues does not necessarily neglect the role of interoceptive sources of information, although it is true that it has been comparatively less studied. I suggest rephrasing this sentence to reflect this.

      We thank the reviewer for pointing out this important nuance. We agree that studies employing exteroceptive cues to modulate pain perception do not necessarily neglect the role of interoceptive sources, even though these are not always the primary focus of investigation. Our intention was not to imply a strict dichotomy, but rather to highlight that interoceptive mechanisms have been comparatively under-investigated. We have revised the sentence in the Introduction accordingly to better reflect this perspective (Introduction, lines 110-112, “Although interoceptive processes may have contributed to the observed effects, these studies did not specifically target interoceptive sources of information within the inferential process.”).

      The last paragraph of the introduction (lines 158-164) contains generalizations beyond what can be supported by the data and the results, about the generation of predictive processes and the origins of these predictions. The statements regarding the understanding of pain-related pathologies in terms of chronic aberrant predictions in the context of this study are also unwarranted.

      We have deleted this paragraph now.

      I could not find the study registration (at least in clinicaltrials.gov). This is curious considering that the hypothesis and the experimental design seem in principle well thought out, and a study pre-registration improves the credibility of the research (Nosek et al., 2018). I also find the choice for the smallest effect of interest (SESOI) odd. Besides the unnecessary variable transformations (more on that later), there is no justification for why that particular SESOI was chosen, or why it changes between experiments (Dienes, 2021; King, 2011), which makes the choice look arbitrary. The SESOI is a fundamental component of a priori power analysis (Lakens, 2022), and without rationale and preregistration, it is impossible to tell whether this is a case of SPARKing or not (Sasaki & Yamada, 2023).

      We acknowledge that the study was not preregistered. Although our hypotheses and design were developed a priori and informed by established theoretical frameworks, the lack of formal preregistration is a limitation.

      The SESOI values for Experiments 1 and 2 were derived from sensitivity analyses based on the fixed design parameters (type of test, number of participants, alpha level) of our study, not from any post-hoc interpretation based on observed results - they can therefore not be a case of SPARKing. Following current recommendations (Anderson, Kelley & Maxwell, 2017; Albers & Lakens, 2017; Lakens, 2022), we avoided basing power estimates on published effect sizes, as no such values exist for in novel paradigms, and are typically inflated due to publication and other biases. Instead, sensitivity analyses (using G*Power, v 3.1) allows us to calculate, prospectively, the smallest effect each design could detect with 90 % power, given the actual sample size, test type, and α level. Because more participants were excluded in Experiment 2, this design can detect slightly larger effects (d = 0.62) than Experiment 1 (d = 0.57). Please note that both studies therefore remain well-powered to capture effects of the magnitude typically reported in previous research using feedback manipulations to explore interoceptive illusions (e.g., Iodice et al., 2019, d ≈ 0.7).

      We have added this clarification to the Participants section of Experiment 1 (Lines 208-217).

      Anderson, S. F., Kelley, K., & Maxwell, S. E. (2017). Sample-Size Planning for More Accurate Statistical Power: A Method Adjusting Sample Effect Sizes for Publication Bias and Uncertainty. Psychological Science, 28(11), 1547-1562.

      Lakens, D. (2022). Sample size justification. Collabra: psychology, 8(1), 33267.

      Albers, C., & Lakens, D. (2018). When power analyses based on pilot data are biased: Inaccurate effect size estimators and follow-up bias. Journal of experimental social psychology, 74, 187-195.

      In the Apparatus subsection, it is stated that the intensity of the electrical stimuli was fixed at 2 ms. I believe the authors refer to the duration of the stimulus, not its intensity.

      You are right, thank you for pointing that out. The text should refer to the duration of the electrical stimulus, not its intensity. We have corrected this wording in the revised manuscript to avoid confusion.

      It would be interesting to report (in graphical form) the stimulation intensities corresponding to the calibration procedure for the five different pain levels identified for all subjects.

      That's a good suggestion. We have included a supplementary figure showing the stimulation intensities corresponding to the five individually calibrated pain levels across all participants (Supplementary Figure 11.)

      It is questionable that researchers state that "pain and unpleasantness should be rated independently" but then the first level of the Likert scale for unpleasantness is "1=no pain". This is particularly relevant since simulation (and specifically electrical stimulation) can be unpleasant but non-painful at the same time. Since the experiments were already performed, the researchers should at least explain this choice.

      Thank you for raising this point. You are right in that the label of “no pain” in the pain unpleasantness scale was not ideal, and we now acknowledge this in the text (lines 886-890). Please note that this was always the second rating that participants gave (after pain intensity), and the strongest results come from this first rating.

      Discussion.

      I did not find in the manuscript the rationale for varying the frequency of the heart rate by 25% (instead of any other arbitrary quantity).

      We thank the Reviewer for this observation, which prompted us to clarify the rationale behind our choice of a ±25% manipulation of heart rate feedback. False feedback paradigms have historically relied on a variety of approaches to modulate perceived cardiac signals. Some studies have adopted non-individualised values, using fixed frequencies (e.g., 60 or 110 bpm) to evoke states of calm or arousal, independently of participants’ actual physiology (Valins, 1966; Shahidi & Baluch, 1991; Crucian et al., 2000; Tajadura-Jiménez et al., 2008). Others have used the participant’s real-time heart rate as a basis, introducing accelerations or decelerations without applying a specific percentage transformation (e.g., Iodice et al., 2019). More recently, a growing body of work has employed percentage-based alterations of the instantaneous heart rate, offering a controlled and participant-specific manipulation. These include studies using −20% (Azevedo et al., 2017), ±30% (Dey et al., 2018), and even ±50% (Gray et al., 2007).

      These different methodologies - non-individualised, absolute, or proportionally scaled - have all been shown to effectively modulate subjective and physiological responses. They suggest that the impact of false feedback does not depend on a single fixed method, but rather on the plausibility and salience of the manipulation within the context of the task. We chose to apply a ±25% variation because it falls well within the most commonly used range and strikes a balance between producing a detectable effect and maintaining the illusion of physiological realism. The magnitude is conceptually justified as being large enough to shape interoceptive and emotional experience (as shown by Azevedo and Dey), yet small enough to avoid implausible or disruptive alterations, such as those approaching ±50%. We have now clarified this rationale in the revised Procedure paragraph of Experiment 1 (lines 306-318).

      T. Azevedo, R., Bennett, N., Bilicki, A., Hooper, J., Markopoulou, F., & Tsakiris, M. (2017). The calming effect of a new wearable device during the anticipation of public speech. Scientific reports, 7(1), 2285.

      Crucian, G. P., Hughes, J. D., Barrett, A. M., Williamson, D. J. G., Bauer, R. M., Bowers, D., & Heilman, K. M. (2000). Emotional and physiological responses to false feedback. Cortex, 36(5), 623-647.

      Dey, A., Chen, H., Billinghurst, M., & Lindeman, R. W. (2018, October). Effects of manipulating physiological feedback in immersive virtual environments. In Proceedings of the 2018 Annual Symposium on Computer-Human Interaction in Play (pp. 101-111).

      Gray, M. A., Harrison, N. A., Wiens, S., & Critchley, H. D. (2007). Modulation of emotional appraisal by false physiological feedback during fMRI. PLoS one, 2(6), e546.

      Shahidi, S., & Baluch, B. (1991). False heart-rate feedback, social anxiety and self-attribution of embarrassment. Psychological reports, 69(3), 1024-1026.

      Tajadura-Jiménez, A., Väljamäe, A., & Västfjäll, D. (2008). Self-representation in mediated environments: the experience of emotions modulated by auditory-vibrotactile heartbeat. CyberPsychology & Behavior, 11(1), 33-38.

      Valins, S. (1966). Cognitive effects of false heart-rate feedback. Journal of personality and social psychology, 4(4), 400.

      The researchers state that pain ratings collected in the feedback phase were normalized to the no-feedback phase to control for inter-individual variability in pain perception, as established by previous research. They cite three studies involving smell and taste, of which the last two contain the same normalization presented in this study. However, unlike these studies, the outcomes here require no normalization whatsoever, because there should be no (or very little) inter-individual variability in pain intensity ratings. Indeed, pain intensity ratings in this study are anchored to 30, 50, and 70 / 100 as a condition of the experimental design. The researchers go to extreme lengths to ensure this is the case, by adjusting stimulation intensities until at least 75% of stimulation intensities are correctly matched to their pain ratings counterpart in the pre-experiment procedure. In other words, inter-individual variability in this study is in stimulation intensities, and not pain intensity ratings. Even if it could be argued that pain unpleasantness and heart rate still need to account for inter-individual variability, the best way to do this is by using the baseline (no-feedback) measures as covariates in a mixed linear model. Another advantage of this approach is that all the effects can be described in terms of the original scales and are readily understandable, and post hoc tests between levels can be corrected for multiple comparisons. On the contrary, the familywise error rate for the comparisons between conditions in the current analysis is larger than 5% (since there is a "main" paired t-test and additional "simple" tests).

      We disagree that there is little to no variability in the no feedback phase. Participants were tested in their ability to distinguish intensities in an initial pre-experiment calibration phase. In the no feedback phase, participants rated the pain stimuli in the full experimental context.

      In the pre-experiment calibration phase, participants were tested only once in their ability to match five electrical‐stimulation levels to the 0-100 NPS scale, before any feedback manipulation started. During this pre-experiment calibration we required that each level was classified correctly on ≥ 75 % of the four repetitions; “correct” meant falling within ± 5 NPS units of the target anchor (e.g., a response of 25–35 was accepted for the 30/100 anchor). This procedure served one purpose only: to make sure that every participant entered the main experiment with three unambiguously distinguishable stimulation levels (30 / 50 / 70). We integrated this point in the revised manuscript lines 263-270.

      Once the real task began, the context changed: shocks are unpredictable, attention is drawn to the heartbeat, and participants must judge both intensity and unpleasantness. In this full experimental setting the no-feedback block indeed shows considerable variability, even for the pain intensity ratings. Participants mean rating on the NPS scale was 46.4, with a standard deviation of 11.9 - thus participants vary quite strongly in their mean ratings (range 14.5 to 70). Moreover, while all participants show a positive correlation between actual intensities and their ratings (i.e., they rate the higher intensities as more intense than the lower ones), they vary in how much of the scale they use, with differences between reported highest and lowest intensities ranging between 8 and 91, for the participants showing the smallest and largest differences, respectively.

      Thus, while we simplified the analysis to remove the difference scoring relative to the congruent trials and now use these congruent trials as an additional condition in the analysis, we retained the normalisation procedure to account for the in-fact-existing between-participant variability, and ensure consistency with prior research (Bartolo et al., 2013; Cecchini et al., 2020; Riello et al., 2019) and our a priori analysis plan.

      However, to ensure we fully address your point here (and the other reviewers’ points about potential additional factors affecting the effects, like trial number and stimulus intensity), we also report an additional linear mixed-effects model analysis without normalization. It includes every feedback level as condition (No-Feedback, Congruent, Slower, Faster), plus additional predictors for actual stimulus intensity and trial rank within the experiment (as suggested by the other reviewers). This confirms that all relevant results remain intact once baseline and congruent trials are explicitly included in the model.

      In brief, cross‐experiment analyses demonstrated that the Faster vs Slower contrast was markedly larger when the feedback was interoceptive than when it was exteroceptive. This held for heart-rate deceleration (b = 0.94 bpm, p = .005), for increases in unpleasantness (b = -0.16 Likert units, p = .015), and in pain-intensity ratings (b = -3.27 NPS points, p = .037).

      These findings were then further confirmed by within-experiment analyses. Within the interoceptive experiment, the mixed-model on raw scores replicated every original effect: heart rate was lower after Faster than Slower feedback (estimate = –0.69 bpm, p = .005); unpleasantness was higher after Faster than Slower feedback (estimate = 0.19, p < .001); pain-intensity rose after Faster versus Slower (estimate=-4.285, p < .001). In the exteroceptive experiment, however, none of these Faster–Slower contrasts reached significance for heart rate (all ps > .33), unpleasantness (all ps > .43) or intensity (all ps > .10).  Because these effects remain significant even with No-Feedback and Congruent trials explicitly included in the model and vanish under exteroceptive control, the supplementary, non-normalised analyses confirm that the faster vs. slower interoceptive feedback uniquely lowers anticipatory heart rate while amplifying both intensity and unpleasantness of pain, independent of data transformation or reference conditions.  Please see Supplementary analyses for further details.

      Bartolo, M., Serrao, M., Gamgebeli, Z., Alpaidze, M., Perrotta, A., Padua, L., Pierelli, F., Nappi, G., & Sandrini, G. (2013). Modulation of the human nociceptive flexion reflex by pleasant and unpleasant odors. PAIN®, 154(10), 2054-2059.

      Cecchini, M. P., Riello, M., Sandri, A., Zanini, A., Fiorio, M., & Tinazzi, M. (2020). Smell and taste dissociations in the modulation of tonic pain perception induced by a capsaicin cream application. European Journal of Pain, 24(10), 1946-1955.

      Riello, M., Cecchini, M. P., Zanini, A., Di Chiappari, M., Tinazzi, M., & Fiorio, M. (2019). Perception of phasic pain is modulated by smell and taste. European Journal of Pain, 23(10), 1790-1800.

      I could initially not find a rationale for bringing upfront the comparison between faster vs. slower HR acoustic feedback when in principle the intuitive comparisons would be faster vs. congruent and slower vs. congruent feedback. This is even more relevant considering that in the proposed main comparison, the congruent feedback does not play a role: since Δ outcomes are calculated as (faster - congruent) and (slower - congruent), a paired t-test between Δ faster and Δ slower outcomes equals (faster - congruent) - (slower - congruent) = (faster - slower). I later realized that the statistical comparison (paired t-test) of pain intensity ratings of faster vs. slower acoustic feedback is significant in experiment 1 but not in experiment 2, which in principle would support the argument that interoceptive, but not exteroceptive, feedback modulates pain perception. However, the "simple" t-tests show that faster feedback modulates pain perception in both experiments, although the effect is larger in experiment 1 (interoceptive feedback) compared to experiment 2 (exteroceptive feedback).

      The comparison between faster and slower feedback is indeed crucial, and we regret not having made this clearer in the first version of the manuscript. As noted in our response to your point in the public review, this comparison is both statistically most powerful, and theoretically the most appropriate, as it controls for any influence of salience or surprise when heart rates deviate (in either direction) from what is expected. It therefore provides a clean measure of how much accelerated heartrate affects pain perception and physiological response, relative to an equal change in the opposite direction. However, as noted above, in the new version of the manuscript we have now removed the analysis via difference scores, and directly compared all three relevant conditions (faster, congruent, slower), first via an ANOVA and then with follow-up planned t-tests.

      Please refer to our previous response for further details (i.e., Furthermore, the researchers propose the comparison of faster vs. slower delta HR acoustic feedback throughout the manuscript when the natural comparison is the incongruent vs. the congruent feedback [..]).

      The design of experiment two involves the selection of knocking wood sounds to act as exteroceptive acoustic feedback. Since the purpose is to test whether sound affects pain intensity ratings, unpleasantness, and heart rate, it would have made sense to choose sounds that would be more likely to elicit such changes, e.g. Taffou et al. (2021), Chen & Wang (2022), Zhou et al. (2022), Tajadura-Jiménez et al. (2010). Whereas I acknowledge that there is a difference in effect sizes between experiment 1 and experiment 2 for the faster acoustic feedback, I am not fully convinced that this difference is due to the nature of the feedback (interoceptive vs. exteroceptive), since a similar difference could arguably be obtained by exteroceptive sound with looming or rough qualities. Since the experiment was already carried out and this hypothesis cannot be tested, I suggest that the researchers moderate the inferences made in the Discussion regarding these results.

      Please refer to our previous response for a previous detailed answer to this point in the Public Review (i.e., This could be influenced by the fact that the faster HR exteroceptive cue in experiment 2 also shows a significant modulatory effect [..]). As we describe there, we see little grounds to suspect such a non-specific influence of acoustic parameters, as it is specifically the sensitivity to the change in heart rate (faster vs slower) that is affected by our between-experiment manipulation, not the overall response to the different exteroceptive or interoceptive sounds. Moreover, the specific change induced by the faster interoceptive feedback - a heartrate deceleration - is not consistent with a change in arousal or alertness (which would have predicted an increase in heartrate with increasing arousal). See also Discussion-Accounting for general unspecific contributions.

      Additionally, the fact that no significant effects were found for unpleasantness ratings or heart rate (absence of evidence) should not be taken as proof that faster exteroceptive feedback does not induce an effect on these outcomes (evidence of absence). In this case, it could be that there is actually no effect on these variables, or that the experiment was not sufficiently powered to detect those effects. This would depend on the SESOIs for these variables, which as stated before, was not properly justified.

      We very much agree that the absence of significant effects should not be interpreted as definitive evidence of absence. Indeed, we were careful not to overinterpret the null findings for heart rate and unpleasantness ratings, and we conducted additional analyses to clarify their interpretation. First, the TOST analysis shows that any effects in Experiment 2 are (significantly) smaller than the smallest effect size that can possibly be detected in our experiment, given the experimental parameters (number of participants, type of test, alpha level). Second, and more importantly, we run between-experiments comparisons (see Results Experiment 2, and Supplementary materials, Cross-experiment analysis between-subjects model) of the crucial difference in the changes induced by faster and slower feedback. This showed that the differences were larger with interoceptive (Experiment 1) than exteroceptive cues (Experiment 2). Thus, even if a smaller than is in principle detectable effect is induced by the exteroceptive cues in Experiment 2, it is smaller than with interoceptive cues in Experiment 1.

      To ensure we fully address this point, we have now simplified our main analysis (main manuscript), replicated it with a different analysis (Supplementary material), we motivate more clearly (Methods Experiment 1), why the comparison between faster and slower feedback is crucial, and we make clearer that the difference between these conditions is larger in Experiment 1 than Experiment 2 (Results Experiment 2). Moreover, we went through the manuscript and ensured that our wording does not over-interpret the absence of effects in Experiment 2, as an absence of a difference.

      The section "Additional comparison analysis between experiments" encompasses in a way all possible comparisons between levels of the different factors in both experiments. My original suggestion regarding the use of a mixed linear model with covariates is still valid for this case. This analysis also brings into question another aspect of the experimental design: what is the rationale for dividing the study into two experiments, considering that variability and confounding factors would have been much better controlled in a single experimental session that includes all conditions?

      We thank the reviewer for their comment. We would like to note, first, that the between-experiment analyses did not encompass all possible comparisons between levels, as it just included faster and slower feedback for the within-experiment comparison Instead, they focus on the specific interaction between faster and slower feedback on the one hand, and interoceptive vs exteroceptive cues on the other. This interaction essentially compares, for each dependent measure (HR, pain unpleasantness, pain intensity), the difference between faster and slower feedback in Experiment 1 with that the same difference in Experiment 2 (and would produce identical p values to a between-experiment t-test). The significant interactions therefore indicate larger effects of interoceptive cues than exteroceptive ones for each of the measures. To make this clearer, we have now exchanged the analysis with between-experiment t-tests of the difference between faster and slower feedback for each measure (Results Experiment 2), producing identical results. Moreover, as suggested, we also now report linear mixed model analyses (see Supplementary Materials), which provide a comprehensive comparison across experiments.

      Regarding the experimental design, we appreciate the reviewer’s suggestion regarding a within-subject crossover design. While such an approach indeed offers greater statistical power by reducing interindividual variability (Charness, Gneezy, & Kuhn, 2012), we intentionally chose a between-subjects design due to theoretical and methodological considerations specific to deceptive feedback paradigms. First, carryover effects are a major concern in deception studies. Participants exposed to one type of feedback could develop suspicion or adaptive strategies that would alter their responses in subsequent conditions (Martin & Sayette, 1993). Expectancy effects could thus contaminate results in a crossover design, particularly when feedback manipulation becomes apparent. In line with this idea, past studies on false cardiac feedback (e.g., Valins, 1966; Pennebaker & Lightner, 1980) often employed between-subjects or blocked designs to maintain the ecological validity of the illusion.

      Charness, G., Gneezy, U., & Kuhn, M. A. (2012). Experimental methods: Between-subject and within-subject design. Journal of economic behavior & organization, 81(1), 1-8.

      Martin, C. S., & Sayette, M. A. (1993). Experimental design in alcohol administration research: limitations and alternatives in the manipulation of dosage-set. Journal of studies on alcohol, 54(6), 750-761.

      Pennebaker, J. W., & Lightner, J. M. (1980). Competition of internal and external information in an exercise setting. Journal of personality and social psychology, 39(1), 165.

      Valins, S. (1966). Cognitive effects of false heart-rate feedback. Journal of personality and social psychology, 4(4), 400.

      References

      Chen ZS, Wang J. Pain, from perception to action: A computational perspective. iScience. 2022 Dec 1;26(1):105707. doi: 10.1016/j.isci.2022.105707.

      Dienes Z. Obtaining Evidence for No Effect. Collabra: Psychology 2021 Jan 4; 7 (1): 28202. doi: 10.1525/collabra.28202

      King MT. A point of minimal important difference (MID): a critique of terminology and methods. Expert Rev Pharmacoecon Outcomes Res. 2011 Apr;11(2):171-84. doi: 10.1586/erp.11.9.

      Lakens D. Sample Size Justification. Collabra: Psychology 2022 Jan 5; 8 (1): 33267. doi: 10.1525/collabra.33267

      Nosek BA, Ebersole CR, DeHaven AC, Mellor DT. The preregistration revolution. Proc Natl Acad Sci U S A. 2018 Mar 13;115(11):2600-2606. doi: 10.1073/pnas.1708274114.

      Sasaki K, Yamada Y. SPARKing: Sample-size planning after the results are known. Front Hum Neurosci. 2023 Feb 22;17:912338. doi: 10.3389/fnhum.2023.912338.

      Taffou M, Suied C, Viaud-Delmon I. Auditory roughness elicits defense reactions. Sci Rep. 2021 Jan 13;11(1):956. doi: 10.1038/s41598-020-79767-0.

      Tajadura-Jiménez A, Väljamäe A, Asutay E, Västfjäll D. Embodied auditory perception: The emotional impact of approaching and receding sound sources. Emotion. 2010, 10(2), 216-229.https://doi.org/10.1037/a0018422

      Zhou W, Ye C, Wang H, Mao Y, Zhang W, Liu A, Yang CL, Li T, Hayashi L, Zhao W, Chen L, Liu Y, Tao W, Zhang Z. Sound induces analgesia through corticothalamic circuits. Science. 2022 Jul 8;377(6602):198-204. doi: 10.1126/science.abn4663.

      Reviewer #3 (Recommendations For The Authors):

      The manuscript would benefit from some spelling- and grammar checking.

      Done

      Discussion:

      The discussion section is rather lengthy and would benefit from some re-structuring, editing, and sub-section headers.

      In response, we have restructured and edited the Discussion section to improve clarity and flow.

      I personally had a difficult time understanding how the data relates to the rubber hand illusion (l.623-630). I would recommend revising or deleting this section.

      We thank the reviewer for this valuable feedback. We have revised the paragraph and made the parallel clearer (lines 731-739).

      Other areas are a bit short and might benefit from some elaboration, such as clinical implications. Since they were mentioned in the abstract, I had expected a bit more thorough discussion here (l. 718).

      Thank you for this suggestion. We have expanded the discussion to more thoroughly address the clinical implications of our interoceptive pain illusion (See Limitations and Future Directions paragraph).

      Further, clarification is needed for the following:

      I would like some more details on participant instructions; in particular, the potential difference in instruction between Exp. 1 and 2, if any. In Exp. 1, it says: (l. 280) "Crucially, they were also informed that over the 60 seconds preceding the administration of the shock, they were exposed to acoustic feedback, which was equivalent to their ongoing heart rate". Was there a similar instruction for Exp. 2? If yes, it would suggest a more specific effect of cardiac auditory feedback; if no, the ramifications of this difference in instructions should be more thoroughly discussed.

      Thank you for this suggestion. We have clarified this point in the Procedure of Experiment 2 (548-550).

    1. eLife Assessment

      Using their unique Fish-On-Chips optofluidics platform, the authors make three important findings: the presence of precise coupling between saccades and tail flips can be used to discriminate between turning or gliding behaviours; aversive and appetitive chemosensory cues differentially modulate these behaviours; transformation from cue valence to behaviour is encoded by the pallium. The evidence supporting these findings is solid. The work advances our understanding of the ancient interplay between chemosensation and motor output through the modulation of eye-body coordination.

    2. Reviewer #1 (Public review):

      Summary:

      This study was designed to manipulate and analyze the effects of chemosensory cues on visuomotor control. They approach this by analyzing how eye-body coordination and brain-wide activity are altered with specific chemosensation in larval zebrafish. After analyzing the dynamics of coupled saccade-tail coordination sequences - directionally linked and typically coupled to body turns - the authors investigated the effects of sensory cues shown to be either aversive or appetitive on freely swimming zebrafish on the eye-body coordination. Aversive chemicals lead to an increase in saccade-tail sequences in both number and dynamics, seemingly facilitating behaviors like escape. Brain-wide imaging led the authors to neurons in the telencephalic pallium as a target to study eye-body coordination. Pallium neuron activity correlated with both aversive chemicals and coupled saccade-tail movements.

      Recommendations for improvement are minimal. So much of the data is ultimately tabular, and the figures are an impenetrable wall of datapoints. 1c is an excellent example: three concentrations are presented, but it's rare for the three averages to trend appropriately. The key point, which is that aversive odors are repulsive and attractive odors (sometimes) attractive just gets lost in showing the three concentrations individually; it also makes direct comparisons impossible. There are similar challenges abound in the violin plots in 4e-4h, the error bars on the "fits" in 4i-4m, and so on. We recommend selecting an illustrative subset of data to present to permit interpretation and putting the rest in a supplemental table. (Presenting) less is more (effective).

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript by Sy SKH. et al. on pallium encoded chemosensory impact of eye-body coordination describes how the valence of chemosensory stimuli can affect the coordination of eye saccades with tail flips. They show that aversive valence stimuli can increase both the strength and frequency of tail flips through a pallium-mediated circuit.

      Overall, the manuscript is well-written and easy to follow, although the figures are quite dense, the methodology is mostly sound, and the improvement to the fish on chips system is very interesting. The methods description is thorough and welcome, making the experiments clear. The limited number of animals, and the spread between 5 and 6dpf is a concern as most of the statistics seem to have been done on the individual events, and not the number of biological samples.

      The initial behavioural experiments are very promising. However, the conclusions surrounding the role of the pallium are a lot more speculative and not supported by the results.

      Comments:

      (1) The fish on chips 2.0 methods show a lot of promise for future studies of chemosensory stimuli, combined with whole-brain imaging. This will provide new avenues of research for zebrafish neuroscientists.

      (2) Chemosensory cues would have a very different timing than visual cues; timing is very important for multisensory integration. How do the authors suggest those are integrated? How would they differentiate between an integration of various cues or a different arousal state, as they describe in the introduction?

      (3) Studies have looked at chemosensation in Drosophila, including multisensory integration, which should be discussed by the authors (see the work of Mark Frye, amongst others).

      (4) In the brain imaging methods, there is a mention of robustly behaving larvae. Does that mean that an exclusion criterion was used to select only 5 larvae? If so, this should be stated clearly. The authors also do not mention how they avoid the switch to a passive state that one of the coauthors has observed in closed closed-loop setup. The authors should comment on this point.

      (5) Were the statistics in Figure 2 done with an n of 5, or do they assume that each tail flip and saccade is an independent event? I would imagine the latter would have inflated p-values and should be avoided.

      (7) Page 7: Why do the authors think that the cumulative effect of these minor differences could lead to very different behavioural goals? Especially when comparing to actual startle responses, which are extremely strong and stereotypical. How do their observations compare to the thermosensory navigation of larval zebrafish observed by Martin Haesemeyer, for example, or the work of the RoLi lab?

      (8) Page 8: Figure 5, I am confused by the y-axis of g, in e and f, the values are capped at 2, whereas in g they go up to 6, with apparently a number of cells whose preference is out of the y-axis limit (especially in Q2). Having the number of cells in each quadrant would also help to assess if indeed there is some preference in the pallium towards Q1.

      (9) Figure 6: How is the onset of neuronal activity determined compared to the motor stimulus? Looking at Supplementary Figure 8, it is quite unclear how the pallium is different from the OB or subpallium. The label of onset delay is also confusing in this figure.

      (10) Page 9: I do not think that the small differences observed in the pallium are as clear-cut as the authors make them out to be, or that they provide such strong evidence of their importance. As there are no interventions showing any causality in the presence of these pallium responses and the sensorimotor responses, these could represent different arousal states rather than any integration of sensory information.

    4. Reviewer #3 (Public review):

      The manuscript investigates the coupling of saccadic eye movements (S) with directed tail flips (T). The remarkable discovery is that tail flips that are preceded by a conjugate sacced (S-T) can be credibly classified as specific "volitional" turns that are distinguished from the standard tail movements that seem to be more of a spontaneous and "impulsive" nature.

      They show that 'turning intent', as indicated by a small increase in S, is elevated by aversive odors, while 'gliding intent', as indicated by a decrease in S and an increase in undulation cycles, is elevated by appetitive odors.

      This is a very important finding, which is backed up by a thorough behavioral analysis, and the identification of neural populations in the pallium and sub-pallium that clearly distinguish between these kinds of turns is very promising. Here they identify neuronal populations that are preferentially active during - and predictive of - coupled (S-T) versus isolated (T) tail flips.

      Especially the fact that S-T turns (but not T turns) can be predicted already by pre-event, ramping, pallial activity is intriguing.

      The authors then go on and demonstrate that the frequency of (S-T) turns is modulated in fish exposed to appetitive or aversive odors.<br /> Specifically, they quantify the aversiveness and appetitive-ness of several odors in a free swimming assay. They select a couple of these odors based on their valence, and they demonstrate that these odors induce moderate modulation in the frequency of eye saccades (S) and tail flips (T) and (S-T) turns.

      The study is rigorous and thorough, and the findings are informative and novel.

      In important controls, they confirm that brain-wide imaging can distinguish between appetitive and aversive contexts, and they show that pallial activation by aversive odors is consistent with neural activity in the rhombencephalon that correlates with turning activity, whereas sub-pallial activation by appetitive odors correlates with rhombencephalic activity related to gliding.

      Overall, this manuscript is very good.

    5. Author response:

      We thank the editors and all reviewers for the detailed evaluation of the work and the overall positive remarks, as well as the constructive feedback to improve our manuscript. Based on the integrated comments of the reviewers and advice of the reviewing editor, we will suitably address all comments raised by the reviewers, and we outline our revision plan below:

      Interpretation of findings

      ● We will carefully reframe our interpretation of the data regarding the role of the pallium in the coupled saccade-tail turning events, and clearly state that we have not established a causal role, which requires additional perturbation experiments.

      ● We will also acknowledge the confounding interpretation that the pallial activities recorded may also represent or include arousal state signals.

      Streamlining the presentation

      ● In the introduction, we will better contextualize our study with additional discussions on (i) the advantageous use of zebrafish to study chemosensation, factoring in differences in the spread of chemical cues in water vs. air, and (ii) the disruption of eye-body coordination and underlying neural circuits.

      ● We will streamline the presentation of data in Fig. 1 by keeping the overall responses of the larvae to each chemical across concentrations in the main figure, while moving suitable additional details to a supplementary figure.

      ● Similarly, for each of the subsequent main figures, wherever suitable we will select an illustrative, core set of panels to retain in the main figure, and move other more detailed plots to supplementary figures.

      ● We will incorporate additional references and discussions of the past literature, including relating our findings to (i) chemosensation/multisensory integration in Drosophila, (ii) thermosensation-driven and navigational behavior in larval zebrafish, and (iii) fleeing or escape behavior in zebrafish and other species.

      ● We will clarify our animal subject inclusion criteria, that all larval subjects with sufficiently high-quality, stable imaging were included (i.e., we only excluded larvae because of insufficient quality of imaging, but not other factors).

      ● For applicable plots, adding suitable additional details to the plots or legends (e.g., clarification of measures, specifying numbers of cells).

      Data analysis and statistics

      We will perform additional data analysis, by making comparisons with statistics performedon fish subject-level, and include confident intervals wherever applicable.

    1. eLife Assessment

      This important study examined age-related changes in cerebellar function by testing a large sample of younger and older adults, including 30 over 80 years old, on motor and cognitive tasks linked to the cerebellum and conducting structural imaging. Their findings show that cerebellar-dependent functions are mostly maintained or even enhanced across the lifespan, with cerebellar-mediated motor abilities remaining intact despite degeneration, in contrast to non-cerebellar measures. Overall, the authors provide solid evidence in support of preserved cerebellar function with age. These results highlight the resilience and redundancy of cerebellar circuits and offer key insights into aging and motor behavior.

    2. Reviewer #1 (Public review):

      Summary:

      Witte et al. examined whether canonical behavioral functions attributed to the cerebellum decline with age. To test this, they recruited younger, old, and older-old adults in a comprehensive battery of tasks previously identified as cerebellar-dependent in the literature. Remarkably, they found that cerebellar function is largely preserved across the lifespan-and in some cases even enhanced. Structural imaging confirmed that their older adult cohort was representative in terms of both cerebellar gray- and white-matter volume. Overall, this is an important study with strong theoretical implications and convincing evidence supporting the motor reserve hypothesis, demonstrating that cerebellar-dependent measures remain largely intact with aging.

      Strengths:

      (1) Relatively large sample size.

      (2) Most comprehensive behavioral battery to date assessing cerebellar-dependent behavior.

      (3) Structural MRI confirmation of age-related decline in cerebellar gray and white matter, ensuring representativeness of the sample.

      Weaknesses:

      (1) Although the authors note this was outside the study's scope, the absence of a voxel-based morphometry (VBM) analysis limits anatomical and functional specificity. Such an analysis would clarify which functions are cerebellar-dependent rather than solely inferring this from prior neuropsychological literature.

      (2) As acknowledged in the Discussion, task classification (cerebellar-dependent vs. general measures) remains somewhat ambiguous. Some "general" measures may still rely on cerebellar processes based on the paper's own criteria - for example, tasks in which individuals with cerebellar degeneration show impairments.

      (3) Cerebellar-dependent and general measures may inherently differ in measurement noise, potentially biasing results toward detecting effects in general measures but not in cerebellar-dependent ones.

    3. Reviewer #2 (Public review):

      Summary:

      The authors are investigating cerebellar-mediated motor behaviors in a large sample of adults, including 30 individuals over the age of 80 (a great strength of this work). They employed a large battery of motor tasks that are tied to cerebellar function, in addition to a cognitive task and motor tasks that are more general. They also evaluated cerebellar structure. Across their behavioral metrics, they found that even with cerebellar degeneration, cerebellar-mediated motor behavior remained intact relative to young adults. However, this was not the case for measures not directly tied to cerebellar function. The authors suggest that these functions are preserved and speak to the resiliency and redundancy of function in the cerebellum. They also speculate that cerebellar circuits may be especially good for preserving function in the face of structural change. The tasks are described very well, and their implementation is also well-done with consideration for rigor in the data collection and processing. The inclusion of Bayesian estimates is also particularly useful, given the theoretically important lack of age differences reported. This work is methodologically rigorous with respect to the behavior, and certainly thought-provoking.

      Strengths:

      The methodological rigor, inclusion of Bayesian statistics, and the larger sample of individuals over the age of 80 in particular are all great strengths of this work. Further, as noted in the text, the fact that all participants completed the full testing battery is of great benefit.

      Weaknesses:

      The suggestion of cerebellar reserve, given that at the group level there is a lack of difference for cerebellar-specific behavioral components, could be more robustly tested. That is, the authors suggest that this is a reserve given that the volume of cerebellar gray matter is smaller in the two older groups, though behavior is preserved. This implies volume and behavior are seemingly dissociated. However, there is seemingly a great deal of behavioral variability within each group and likewise with respect to cerebellar volume. Is poorer behavior associated with smaller volume? If so, this would still suggest that volume and behavior are linked, but rather than being age that is critical, it is volume. On the flip side, a lack of associations between behavior and volume would be quite compelling with respect to reserve. More generally, as explicated in the recommendations, there are analyses that could be conducted that, in my opinion, would more robustly support their arguments given the data that they have available. This is a well-executed and thought-provoking investigation, but there is also room for a bit more discussion.

    1. eLife Assessment

      This important study employs functional magnetic resonance spectroscopy (fMRS) to demonstrate that GABAergic inhibition in the parietal cortex actively suppresses goal-irrelevant distractors, thereby facilitating goal-directed visual tracking. The data and analyses are solid, and the methodology is validated. However, the link between the metabolic changes and the purported functional mechanisms is incomplete due to concerns with experimental design and interpretations. The study will be of interest to researchers studying goal-directed behavior and neurochemical dynamics in cognitive processing.

    2. Reviewer #1 (Public review):

      Summary

      Wang et al. address the challenge of tracking goal-relevant visual signals amidst distractions, a fundamental aspect of adaptive visual information processing. By employing functional magnetic resonance spectroscopy (fMRS) during a visual tracking task, they quantify changes in both inhibitory (GABA) and excitatory (glutamate) neurotransmitter concentrations in the parietal and visual cortices. The results reveal that increases in GABA and glutamate in the parietal cortex are closely tied to the number of targets, and individual differences in GABAergic and glutamatergic responses within the parietal cortex predict tracking performance and distractor suppression. These findings underscore a neural mechanism in which GABAergic inhibition in the parietal cortex actively suppresses goal-irrelevant distractors, thereby facilitating goal-directed visual tracking and highlighting the dynamic role of these key metabolites in cognitive control during visual processing. I found the study to be well-written and thoughtful from an experimental standpoint, although it would benefit from some targeted revisions.

      Strengths

      (1) The study employs robust and validated fMRS methodology, allowing for real-time monitoring of metabolite changes during goal-directed tasks.

      (2) Simultaneous measurement of both GABA and Glx in parietal and visual cortices yields nuanced insights into the neurochemical correlates of visual attention.

      (3) The link between neurochemical changes and behavioral performance is clearly established, providing strong evidence for GABAergic involvement in distractor suppression.

      (4) Experimental protocols align with current standards for MEGA-PRESS, bolstering the technical reliability of the findings.

      Weaknesses

      (1) Certain aspects of terminology, methodological reporting, and confound management are inconsistently described throughout the manuscript.

      (2) Important confounding factors are not systematically reported or controlled.

      (3) Opportunities for additional analysis (e.g., behavioral dynamics, use of alternate fitting methods, more comprehensive quality metrics) have not been fully explored.

      (4) Open access data and/or codes for the analysis are not shared in the main manuscript

    3. Reviewer #2 (Public review):

      Summary:

      This study investigates how the visual system is able to track target objects when these are presented in the visual field together with other irrelevant and distracting visual objects. The authors use functional Magnetic Resonance Spectroscopy to measure the two most important excitatory and inhibitory neurotransmitters, glutamate and GABA, in both the visual and parietal cortex.

      Strengths:

      (1) Well-designed functional challenge.

      (2) Number of subjects.

      (3) Good quality spectra and appropriate reporting of MRS methods and quality assurance.

      (4) Introduction and discussion are clear for non-experts in visual processing.

      Weaknesses:

      (1) Rejection of spectra based on high % CRLB may artificially remove data with the lowest metabolite concentration.

      (2) SN description as percentage does not make sense.

    4. Reviewer #3 (Public review):

      Wang et al. report multiple experiments using functional magnetic resonance spectroscopy (fMRS) in a multiple object tracking (MOT) task to investigate the effect of experimentally manipulating a) the number of targets, b) object size, and c) total number of objects in the display on GABA and glutamate (Glx) concentrations in parietal and visual cortex. Data is analyzed in two orthogonal ways throughout: via condition differences in behavorial performance (inverse efficiency), GABA, and Glx concentrations and through correlations between changes in inverse efficiency and GABA or Glx. All three experimental manipulations affected inverse efficiency, with worse performance with more targets, smaller objects, and a larger total number of objects. However, only the manipulation of the target number produced a condition difference in GABA and Glx, with higher concentrations of both in the parietal VOI and only of Glx in the visual VOI with more targets ('high load'). Correlational analyses revealed that participants with a larger change in GABA in the parietal VOI with a higher number of targets showed a smaller drop in behavioral performance with more targets. The opposite direction of correlation was observed for Glx in both the visual and parietal VOI.

      In the two control experiments, correlations were only investigated in the parietal VOI. There was a negative correlation between change in Glx and change in inverse efficiency with manipulation of object size, i.e. participants exhibiting a positive change in Glx showed no or little difference in performance, but those with an increase in Glx with smaller targets showed a more pronounced drop in performance. There was no correlation with GABA for the manipulation of object size. For the manipulation of total object number, participants exhibiting an increasing GABA concentration with more objects showed a smaller drop in performance.

      The authors' main claim is that GABAergic suppression of goal-irrelevant distractors in parietal cortex is key to goal-directed visual information processing.

      The study is, to my knowledge, the first to employ fMRS in an MOT paradigm, and I read it with great interest. I am admittedly not an expert on the fMRS technique and have therefore refrained from commenting on the technical aspects of its use. Although the application of fMRS to MOT is novel and adds new knowledge to the field, I have some critiques and believe that a much more nuanced interpretation of the findings is warranted.

      Major

      (1) Especially the control experiments lean heavily on Bettencourt and Somers (2009) and adopt and to some extent exaggerate claims from that paper uncritically. This is obvious in referring to the manipulations of object size and object number as high/low enhancement and high/low suppression, as if the association of these physical manipulations of the stimulus display with attentional mechanisms were so obvious and beyond doubt that drawing any distinction between these manipulations and their supposed effects is entirely superfluous. This seems far beyond what is warranted to me. It may seem plausible that adding distractors engages distractor suppression more, but whether this is truly the case is an empirical question, and Bettencourt and Somers (2009) have no direct measure of distractor suppression to substantiate this claim. Their study is purely behavioral, and there is no attempt to assess distractor processing separately. The case for the 'target enhancement' manipulation is even weaker: objects are of a sufficient size and at maximum contrast (white on black screen, but exact details are omitted) to be clearly visible in either condition, so why would smaller objects require more enhancement? Although the present data shows a clear effect of manipulating object size, the corresponding size of the effect in Bettencourt and Somers (2009) is rather underwhelming and does not warrant such a strong conclusion. In summary, the link between the object number and object size manipulations with suppression and enhancement is very far from the 1:1 that the authors seem to assume. Accordingly, I believe that the manipulations should be labelled as object number and object size rather than their hypothesized effects, throughout and that there should be a much more critical discussion as to whether these manipulations are indeed related to these effects as expected.

      (2) The author's interpretation of the results seems rather uncritical. What is observed (at least in the first experiment) is a change in GABA and Glx concentrations with changes in the number of tracked targets. Is the only conceivable way in which this could happen through target enhancement and distractor suppression? The processing of targets and distractors is not measured directly, so any claims are indirect, at best. The authors cite the recent 'Ten simple rules to study distractor suppression' paper (Wöstmann et al., 2022), which presents a consensus between leading researchers in the field. Neither Bettencourt & Somers (2009) nor the design of the current study live up to the rules established in that paper, so a much more nuanced interpretation and discussion of the current findings seems warranted. It is anything but obvious to me that the only activity in the parietal cortex that could possibly be suppressed by GABA is the representation of distractors. Indeed, cueing more targets (high load) decreases the number of distractors in the first experiment, so the need for distractor suppression in the high load condition is less than in the low load condition. So, shouldn't we observe lower GABA concentrations in the 'high load' condition?

      (3) It seems that the authors included data from both correctly tracked and incorrectly tracked trials in their fMRS analysis. In MOT, attending target objects is the task per se, so task errors indicate that participants did not actually track the targets. So when comparing conditions with different error levels, it is ambiguous whether changes in brain activity reflect the experimental manipulation as such, or rather the different mix of correctly tracked and incorrectly tracked trials that result from this physical manipulation. Are the correlations perhaps driven by the inclusion of different proportions of correctly tracked trials across participants? It seems that the authors may have to separate correct and error trials in the analysis to check for the possibility that effects are due to the inclusion of data from trials in which participants may have stopped tracking at least some of the target objects. Of course, such an analysis is somewhat limited by the fact that only one target was probed, yielding a 50% guessing chance (i.e. even if the response is correct, we do not know whether the other, unprobed, objects were tracked correctly on that trial).

      (4) The key findings from the control experiments are purely correlational. The supposed cause may be what the authors claim, but there is an infinity of alternative explanations. Correlational findings cannot simply be interpreted as if they resulted from an experimental manipulation (...although this is, unfortunately, by no means rare in the cognitive neuroscience literature). The authors should make a rigorous effort to consider the most plausible alternative explanations for these correlations and argue why or why not they believe that they can be discounted.

      (5) Related to the previous point: the experimental manipulations did not produce mean differences in GABA/Glx in the control experiments. Doesn't this speak against the authors' interpretation? They briefly acknowledge this in the discussion, but I think there is a deeper problem. The absence of these effects casts doubt on what these manipulations actually do, and therefore also on the interpretation of the correlations in these experiments. For example, the authors might also have concluded from the same data that the absence of increased GABA in the 'high suppression' condition refutes the very idea that GABA concentrations are related to distractor suppression.

      (6) 'Inverse Efficiency' is a highly unusual measure of MOT performance in the literature, and its use reduces the comparability of the findings with previous work. The standard is to assess the correctness ('accuracy') of responses with no focus on speed. This makes sense as responses are given after the object motion has stopped. At the same time, reaction time can be informative too (e.g., Störmer et al., 2013). I think the authors should justify their use of inverse efficiency as the dependent variable.

      (7) The choice of variable names is problematic: it is sometimes misleading and makes understanding the findings harder (see also points 1 and 6): obvious, unambiguous, and importantly, interpretation free names for conditions such as target number (2/4), object size (small/large), and total object number (8/12) become load (high/low), target enhancement (high/low) and distractor suppression (low/high). This reduces clarity and, especially in the case of enhancement and suppression, conflates the actual manipulation with its interpretation.

    1. eLife Assessment

      This important study shows that a controlled pause in gene reading is required for early heart cells to form during development. The authors demonstrate that loss of this pause prevents the proper activation of the heart-producing program across animal and stem cell systems. The evidence is compelling, supported by careful genomic and functional analyses that clearly define the developmental block. Overall, this work will interest developmental biologists and inspire further studies on the origins of early heart defects.

    2. Reviewer #1 (Public review):

      This is a highly original and impactful study that significantly advances our understanding of transcriptional regulation, in particular RNAPII pausing, during early heart development. The Chen lab has a long history of producing influential studies in cardiac morphogenesis, and this manuscript represents another thorough and mechanistically insightful contribution. The authors have thoroughly addressed this Reviewer's concerns and incorporated all of my suggestions in the revised manuscript. In addition, their responses to the other reviewer's comments are also very clear. As it is, this work is of great interest to the readership of Elife, as well as to the general scientific community.

      The authors reveal a fundamentally new role for Rtf1-a component of the PAF1 complex-in governing promoter-proximal RNAPII pausing in the context of myocardial lineage specification. While transcriptional pausing has been implicated in stress responses and inducible gene programs, its developmental relevance has remained poorly defined. This study fills that gap with rigorous in vivo evidence demonstrating that Rtf1-dependent pausing is indispensable for activating the cardiac gene program from the lateral plate mesoderm.

      Importantly, the study also provides compelling therapeutic implications. Showing that CDK9 inhibition-using either flavopiridol or targeted knockdown-can restore promoter-proximal pausing and rescue cardiomyocyte formation in Rtf1-deficient embryos suggests that modulation of pause-release kinetics may represent a new avenue for correcting transcriptionally driven congenital heart defects. Given that many CDK inhibitors are clinically approved or in active development, this connection significantly elevates the translational impact of the findings.

      In sum, this study is rigorous, innovative, and transformative in its implications for developmental biology and cardiac medicine. I strongly support its publication.

    3. Reviewer #2 (Public review):

      Summary:

      Langenbacher at el. examine the requirement of Rtf1, a component of the PAF1C complex, which regulates transcriptional pausing in cardiac development. The authors first confirm that newly generated rtf1 mutant alleles recapitulate the defects in cardiac progenitor differentiation found using morpholinos from their previous work. The authors then show that conditional loss of Rtf1 in mouse embryos and depletion in mouse ESCs both demonstrates a failure to turn on cardiac progenitor and differentiation marker genes, supporting conservation of Rtf1 in promoting vertebrate cardiac progenitor development. The authors then employ bulk RNA-seq on flow-sorted hand2:GFP+ cells and multiomic single-cell RNA-seq on whole Rtf1-depleted zebrafish embryos at the 10-12 somite stage. These experiments corroborate that gene expression associated with cardiac progenitor differentiation is lost. Furthermore, analysis of differentiation trajectories suggests that the expression of genes associated with cardiac, blood, and endothelial progenitor differentiation is not initiated within the anterior lateral plate mesoderm. Structure-function analysis supports that the Rtf1 Plus3 domain is necessary for its function in promoting cardiac progenitor differentiation. ChIP-seq for RNA Pol II on 10-12 somite stage zebrafish embryos supports that Rtf1 is required for proper promoter pausing at the transcriptional start site. The transcriptional promoter pausing defect and cardiac differentiation can partially be rescued in zebrafish rtf1 mutants through pharmacological inhibition and depletion of Cdk9, a kinase that inhibits elongation. Thus, the authors have provided a clear analysis of the requirements and basic mechanism that Rf1 employs regulating cardiac progenitor development.

      Strengths and weaknesses:

      Overall, the data presented are strong and the message of the study is clear. The conclusions that Rtf1 is required for transcriptional pause release and promotes vertebrate cardiac progenitor differentiation are supported. Areas of strength include the complementary approaches in zebrafish and mouse embryos, and mouse embryonic stem cells, which together support the conserved requirement for Rtf1 in promoting cardiac differentiation. The bulk and single-cell RNA-sequencing analyses provide further support for this model via examining broader gene expression. In particular, the pseudotime analysis bolsters that there is a broader effect on differentiation of anterior lateral plate mesoderm derivatives. The structure-function analysis provides a relatively clean demonstration of the requirement of the Rtf1 Plus3 domain. The pharmacological and depletion epistasis of Cdk9 combined with the RNA Pol II ChIP-seq nicely support the mechanism implicating Cdk9 in the Rtf1-dependent RNA Pol II promoter pausing. Additionally, this is a revised manuscript. The authors were overall responsive to the previous critiques. The new analysis and revisions have helped to strengthen their hypothesis and improve the clarity of their study. While the revised manuscript is significantly improved, the lack of analysis from the multiomic analysis still represents a lost opportunity to provide further insight into Rtf1 mechanisms within this study. However, the authors have nevertheless achieved their goal for this study. The data sets reported will also be useful tools for further analysis and integration by the cardiovascular development community. Thus, the study will be of interest to scientists studying cardiovascular development and those broadly interested in epigenetic regulation controlling vertebrate development.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary:

      The manuscript submitted by Langenbacher et al., entitled " Rtf1-dependent transcriptional pausing regulates cardiogenesis", describes very interesting and highly impactful observations about the function of Rtf-1 in cardiac development. Over the last few years, the Chen lab has published novel insights into the genes involved in cardiac morphogenesis. Here, they used the mouse model, the zebrafish model, cellular assays, single cell transcription, chemical inhibition, and pathway analysis to provide a comprehensive view of Rtf1 in RNAPII (Pol2) transcription pausing during cardiac development. They also conducted knockdown-rescue experiments to dissect the functions of Rtf1 domains. 

      Strengths:

      The most interesting discovery is the connection between Rtf1 and CDK9 in regulating Pol2 pausing as an essential step in normal heart development. The design and execution of these experiments also demonstrate a thorough approach to revealing a previously underappreciated role of Pol2 transcription pausing in cardiac development. This study also highlights the potential amelioration of related cardiac deficiencies using small molecule inhibitors against cyclin dependent kinases, many of which are already clinically approved, while many other specific inhibitors are at various preclinical stages of development for the treatment of other human diseases. Thus, this work is impactful and highly significant. 

      We thank the reviewer for appreciating our work.

      Reviewer #2 (Public Review): 

      Summary: 

      Langenbacher at el. examine the requirement of Rtf1, a component of the PAF1C, which regulates transcriptional pausing in cardiac development. The authors first confirm their previous morphant study with newly generated rtf1 mutant alleles, which recapitulate the defects in cardiac progenitor and diUerentiation gene expression observed previously in morphants. They then examine the conservation of Rtf1 in mouse embryos and embryonic stem cell-derived cardiomyocytes. Conditional loss of Rtf1 in mesodermal lineages and depletion in murine ESCs demonstrates a failure to turn on cardiac progenitor and diUerentiation marker genes, supporting conservation of Rtf1 in promoting cardiac development. The authors subsequently employ bulk RNA-seq on flow-sorted hand2:GFP+ cells and multiomic single-cell RNA-seq on whole Rtf1-depleted embryos at the 10-12 stage. These experiments corroborate that genes associated with cardiac and muscle development are lost. Furthermore, the diUerentiation trajectories suggest that the expression of genes associated with cardiac maturation is not initiated.  Structure-function analysis supports that the Plus3 domain is necessary for its function in promoting cardiac progenitor formation. ChIP-seq for RNA Pol II on 1012 somite stage embryos suggests that Rtf1 is required for proper promoter pausing. This defect can partially be rescued through use of a pharmacological inhibitor for Cdk9, which inhibits elongation, can partially restore elongation in rtf1 mutants.  

      Strengths: 

      Many aspects of the data are strong, which support the basic conclusions of the authors that Rtf1 is required for transcriptional pausing and has a conserved requirement in vertebrate cardiac development. Areas of strength include the genetic data supporting the conserved requirement for Rtf1 in promoting cardiac development, the complementary bulk and single-cell RNA-sequencing approaches providing some insight into the gene expression changes of the cardiac progenitors, the structure-function analysis supporting the requirement of the Plus3 domain, and the pharmacological epistasis combined with the RNA Pol II ChIP-seq, supporting the mechanism implicating Cdk9 in the Rtf1 dependent mechanism of RNA Pol II pausing. 

      We thank the reviewer for the summary and for recognizing many strengths of our work. 

      Weaknesses: 

      While most of the basic conclusions are supported by the data, there are a number of analyses that are confusing as to why they chose to perform the experiments the way they did and some places where the interpretations presently do not support the interpretations. One of the conclusions is that the phenotype aUects the maturation of the cardiomyocytes and they are arresting in an immature state. However, this seems to be mostly derived from picking a few candidates from the single cell data in Fig. 6. If that were the case, wouldn't the expectation be to observe relatively normal expression of earlier marker genes required for specification, such as Nkx2.5 and Gata5/6? The in situ expression analysis from fish and mice (Fig. 2 and Fig. 3) and bulk RNA-seq (Fig. 5) seems to suggest that there are pretty early specification and diUerentiation defects. While some genes associated with cardiac development are not changed, many of these are not specific to cardiomyocyte progenitors and expressed broadly throughout the ALPM. Similarly, it is not clear why a consistent set of cardiac progenitor genes (for instance mef2ca, nkx2.5, and tbx20) was analyzed for all the experiments, in particular with the single cell analysis. 

      A major conclusion of our study is that Rtf1 deficiency impairs myocardial lineage differentiation from mesoderm, as suggested by the reviewer. Thus, the main goal of this study is to understand how Rtf1 drives cardiac differentiation from the LPM, rather than the maturation of cardiomyocytes.  Multiple lines of evidence support this conclusion:

      (a) In situ hybridization showed that Rtf1 mutant embryos do not have nkx2.5+ cardiac progenitor cells and subsequently fail to produce cardiomyocytes (Figs. 2, 3).

      (b) RT-PCR analysis showed that knockdown of Rtf1 in mouse embryonic stem cells causes a dramatic reduction of cardiac gene expression and production of significantly fewer beating patches (Fig.4).

      (c) Bulk RNA sequencing revealed significant downregulation of cardiac lineage genes, including nkx2.5 (Fig. 5).

      (d) Single cell RNA sequencing clearly showed that lateral plate mesoderm (LPM) cells are significantly more abundant in Rtf1 morphant,s whereas cardiac progenitors are less abundant (Fig. 6 and Fig.6 Supplement 1-5). 

      When feasible, we used cardiac lineage restricted markers in our assays. Nkx2.5 and tbx5a are not highlighted in the single cell analysis because their expression in our sc-seq dataset was too low to examine in the clustering/trajectory analysis.  In this revised manuscript, we provide violin plots showing the low expression levels of these genes in single cells from Rtf1 deficient embryos (Figure 6 Supplement 5).

      The point of the multiomic analysis is confusing. RNA- and ATAC-seq were apparently done at the same time. Yet, the focus of the analysis that is presented is on a small part of the RNA-seq data. This data set could have been more thoroughly analyzed, particularly in light of how chromatin changes may be associated with the transcriptional pausing. This seems to be a lost opportunity. Additionally, how the single cell data is covered in Supplemental Fig. 2 and 3 is confusing. There is no indication of what the diUerent clusters are in the Figure or the legend. 

      In this study, we performed single cell multiome analysis and used both scRNAseq and scATACseq datasets to generate reliable clustering.  The scRNAseq analysis reveals how Rtf1 deficiency impacts cardiac differentiation from mesoderm, which inspired us to investigate the underlying mechanism and led to the discovery of defects in Rtf1-dependent transcriptional pause release.

      We agree with the reviewer that deep examination of Rtf1-dependent chromatin changes would provide additional insights into how Rtf1 influences early development and careful examination of the scATACseq dataset is certainly a good future direction.  

      In this revised manuscript, we have revised Fig.6 Supplement 1 to include the predicted cell types and provide an additional excel file showing the annotation of all 39 clusters (Supplementary Table 2). 

      While the effect of Rtf1 loss on cardiomyocyte markers is certainly dramatic, it is not clear how well the mutant fish have been analyzed and how specific the eUect is to this population. It is interpreted that the eUects on cardiomyocytes are not due to "transfating" of other cell fates, yet supplemental Fig. 4 shows numerous eUects on potentially adjacent cell populations. Minimally, additional data needs to be provided showing the live fish at these stages and marker analysis to support these statements. In some images, it is not clear the embryos are the same stage (one can see pigmentation in the eyes of controls that is not in the mutants/morphants), causing some concern about developmental delay in the mutants. 

      Single cell RNA sequencing showed an increased abundance of LPM cells and a reduced abundance of cardiac progenitors in Rtf1 morphants (Fig. 6 and Fig.6 Supplement 1-5). The reclustering of anterior lateral plate mesoderm (ALPM) cells and their derivatives further showed that cells representing undifferentiated ALPM were increased whereas cells representing all three ALPM derivatives were reduced. These findings indicate a defect in ALPM differentiation. 

      The reviewer questioned whether we examined stage-matched embryos. In our assay, Rtf1 mutant embryos were collected from crosses of Rtf1 heterozygotes. Each clutch from these crosses consists of ¼ embryos showing rtf1 mutant phenotypes and ¾ embryos showing wild type phenotypes which were used as control. Mutants and their wild type siblings were fixed or analyzed at the same time.

      The reviewer questioned the specificity of the Rtf1 deficient cardiac phenotype and pointed out that Rtf1 mutant embryos do not have pigment cells around the eye.  Rtf1 is a ubiquitously expressed transcriptional regulator.  Previous studies in zebrafish have shown that Rtf1 deficiency significantly impacts embryonic development. Rtf1 deficiency causes severe defects in cardiac lineage and neural crest cell development; consequently, Rtf1 deficient embryos do not have cardiomyocytes and pigmentation (Langenbacher et al., 2011, Akanuma et al., 2007, and Jurynec et al., 2019).  We now provide an image showing a 2-day-old Rtf1 mutant embryo and their wild type sibling to illustrate the cardiac, neural crest, and somitogenesis defects caused by loss of Rtf1 activity (Fig. 2 Supplement 1).

      With respect to the transcriptional pausing defects in the Rtf1 deficient embryos, it is not clear from the data how this eUect relates to the expression of the cardiac markers. This could have been directly analyzed with some additional sequencing, such as PRO-seq, which would provide a direct analysis of transcriptional elongation. 

      We showed that Rtf1 deficiency results in a nearly genome-wide decrease in promoterproximal pausing and downregulation of cardiac makers. Attenuating transcriptional pause release could restore cardiomyocyte formation in Rtf1 deficient embryos. In this revised manuscript, we provide additional RNAseq data showing that the expression levels of critical cardiac development genes such as nkx2.5, tbx5a, tbx20, mef2ca, mef2cb, ttn.2, and ryr2b are significantly rescued.  We agree with the reviewer that further analyses using the PRO-seq approach could provide additional insights, but it is beyond the scope of this manuscript. 

      Some additional minor issues include the rationale that sequence conservation suggests an important requirement of a gene (line 137), which there are many examples this isn't the case, referencing figures panels out of order in Figs. 4, 7, and 8) as described in the text, and using the morphants for some experiments, such as the rescue, that could have been done in a blinded manner with the mutants. 

      We have clarified the rationale in this revised manuscript and made the eRort to reference figures in order. 

      The reviewer commented that rescue experiments “could have been done in a blinded manner with the mutants”. This was indeed how the flavopiridol rescue and cdk9 knockdown experiments were carried out. Embryos from crosses of Rtf1 heterozygotes were collected, fixed after treatment and subjected to in situ hybridization. Embryos were then scored for cardiac phenotype and genotyped (Fig.8 d-g). Morpholino knockdown was used in genomic experiments because our characterization of rtf1 morphants showed that they faithfully recapitulate the rtf1 mutant phenotype during the timeframe of interest (Fig. 2).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      This reviewer has a few suggestions below, aimed at improving the clarity and impact of the current study. Once these items are addressed, the manuscript should be of interest to the Elife reader. 

      Item 1. Strengthening the interaction between Rfh1 and CDK9 on Pol2 pausing. 

      The authors have convincingly shown that the chemical inhibition of CDK9 by flavopiridol can partially rescue the expression of cardiac genes in the zebrafish model. Although flavopiridol is FDA approved and has been a classical inhibitor for the dissection of CDK9 function, it also inhibits related CDKs (such as Flavopiridol (Alvocidib) competes with ATP to inhibit CDKs including CDK1, CDK2, CDK4, CDK6, and CDK9 with IC50 values in the 20-100 nM range) Therefore, this study could be more impactful if the authors can provide evidence on which of these CDKs may be most relevant during Rtf1-dependent cardiogenesis. To determine whether the observed cardiac defect indicates a preferential role for CDK9, or that other CDKs may also be able to provide partial rescue may be clarified using additional, more selective small molecules (e.g., BAY1251152, LDC000067 are commercially available). 

      The reviewer raised a reasonable concern about the specificity of flavopiridol. We thank the reviewer for the insightful suggestion and share the concern about specificity. To address this question, we have used an orthogonal testing through morpholino inhibition where we directly targeted CDK9 and observed the same level of rescue, supporting a critical role of transcription pausing in cardiogenesis.

      Item 2. Differences between CRISPR lines and morphants 

      Much of the work presented used Rtf1 morphants while the authors have already generated 2 CRISPR lines. What is the diUerence between morphants and mutants? The authors should comment on the similarities and/or differences between using morphants or mutants in their study and whether the same Rtf1- CDK9 connection also occurs in the CRISPR lines. 

      The morphology of our mutants (rtf1<sup>LA2678</sup> and rtf1<sup>LA2679</sup>) resembles the morphants and the previously reported ENU-induced rtf1<sup>KT641</sup> allele. Extensive in situ hybridization analysis showed that the morphants faithfully recapitulate the mutant phenotypes (Fig.2). We have performed rescue experiments (flavopiridol and CDK9 morpholino) using Rtf1 mutant embryos and found that inhibiting Cdk9 restores cardiomyocyte formation (Fig.8). 

      Item 3. Discuss the therapeutic relevance of study 

      The authors have already generated a mouse model of Rtf1 Mesp1-Cre knockout where cardiac muscle development is severely derailed (Fig 3B). Thus, a demonstration of a conserved role for CDK9 inhibitor in rescuing cardiogenesis using mouse cells or the mouse model will provide important information on a conserved pathway function relevant to mammalian heart development. In the Discussion, how this underlying mechanistic role may be useful in the treatment of congenital heart disease should be provided.  

      Thank you for the insight. We have incorporated your comments in the discussion. 

      Item 4. Insights into the role of CDK9-Rtf1 in response to stress versus in cardiogenesis. 

      In the Discussion, the authors commented on the role of additional stress-related stimuli such as heat shock and inflammation that have been linked to CDK9 activity. However, the current ms provides the first, endogenous role of Pol2 pausing in a critical developmental step during normal cardiogenesis. The authors should emphasize the novelty and significance of their work by providing a paragraph on the state of knowledge on the molecular mechanisms governing cardiogenesis, then placing their discovery within this framework. This minor addition will also clarify the significance of this work to the broad readership of eLife. 

      Thank you for the suggestion. We have incorporated your comments and elaborate on the novelty and significance of our work in the discussion. 

      Reviewer #2 (Recommendations For The Authors): 

      (1) It is diUicult to assess what the overt defects are in the embryos at any stages. Images of live images were not included in the supplement. Do these have a small, malformed heart tube later or are the embryos just deteriorating due to broad defects? 

      The Rtf1 deficient embryos do not produce nkx2.5+ cardiac progenitors. Consequently, we never observed a heart tube or detected cells expressing cardiomyocyte marker genes such as myl7. This finding is consistent with previous reports using rtf1 morphants and rtf<sup>1KT64</sup>, an ENU-induced point mutation allele (Langenbacher et al., 2011 and Akanuma, 2007). In this revised manuscript, we provide a live image of 2-day-old wild type and rtf1<sup>LA2679/LA2679</sup> embryos (Fig. 2 Supplement 1). After two days, rtf1 mutant embryos undergo broad cell death. 

      (2) Fig. 2, although the in situs are convincing, there is not a quantitative assessment of expression changes for these genes. This could have been done for the bulk or single cell RNA-seq experiments, but was not and these genes weren't not included in the heat maps. A quantitative assessment of these genes would benefit the study. 

      The top 40 most significantly differentially expressed genes are displayed in the heatmap presented in Fig.5d. The complete differential gene expression analysis results for our hand2 FACS-based comparison of rtf1 morphants and controls is presented in Supplementary Data File 1.  In this revised manuscript, we provide a new supplemental figure with violin plots showing the expression levels of genes of interest in our single cell sequencing dataset (Fig.6 Supplement 5).

      (3) It doesn't not appear that any statistical tests were used for the comparisons in Fig. 2.

      We now provide the statistical data in the legend and Fig.2 b, d, f, h and i.

      (4) It's not clear the magnifications and orientations of the embryos in Fig. 3b are the same. 

      Embryos shown in Fig.3b are at the same magnification. However, because Rtf1 mutant embryos display severe morphological defects, the orientation of mutant embryos was adjusted to examine the cardiac tissue.

      (5) The n's for analysis of MLC2v in WT Rtf1 CKO embryos in Fig. 3b are only 1. At least a few more embryos should be analyzed to confirm that the phenotype is consistent. 

      We have revised the figure and present the number of embryos analyzed and statistics in Fig.3c. 

      (6) A number of figure panels are referred to out of order in the text. Fig. 4E-G are before Fig. 4C, D, Fig. 7C  before 7B, Fig. 8D-I before 8A ,B. In general, it is easier for the reader if the figures panels are presented in the order they are referred to in the text. 

      Revised as suggested.

      (7) While additional genes can be included, it is not clear why the same sets of genes are not examined in the bulk or single-cell RNA-seq as with the in situs or expression was analyzed in embryos. I suggest including the genes like nkx2.5, tbx20, myl7, in all the sequencing analysis. 

      We used the same set of genes in all analyses when possible. However, the low expression of genes such as nkx2.5 and myl7 in our sc-seq dataset preclude them from the clustering/trajectory analysis. In this revised manuscript, we present violin plots showing their expression in wild type and rtf1 morphants (Fig. 6 Supplement 5).

      (8) If a multiomic approach was used, why wasn't its analysis incorporated more into the manuscript? In general, a clearer presentation and deeper analysis of the single cell data would benefit the study. The integration of the RNA and ATAC would benefit the analysis.

      As addressed in our response to the reviewer’s public review, both datasets were used in clustering. Examining changes in chromatin accessibility is certainly interesting, but beyond the scope of this study. 

      (9) Many of the markers analyzed are not cardiac specific or it is not clear they are expressed in cardiac progenitors at the stage of the analysis. Hand2 has broader expression. Additional confirmation of some of the genes through in situ would help the interpretations. 

      Markers used for the in situ hybridization analysis (myl7, mef2ca, nkx2.5, tbx5a, and tbx20) are known for their critical role in heart development. For sc-seq trajectory analyses, most displayed genes (sema3e, bmp6, ttn.2, mef2cb, tnnt2a, ryr2b, and myh7bb) were identified based on their differential expression along the LPM-cardiac progenitor pseudotime trajectory. Rather than selecting genes based on their cardiac specificity, our goal was to examine the progressive gene expression changes associated with cardiac progenitor formation and compare gene expression of wild type and rtf1 deficient embryos.

      (10) Additional labels of the cell clusters are needed for Supplemental Figs. 2 and 3. 

      The cluster IDs were presented on Supplementary Figures 2 and 3. In this revised version, we added predicted cell types to the UMAP (revised Fig.6 Supplement 1) and provided an excel file with this information (revised Supplementary Table 2). 

      (11) On lines 101-102, the interpretation from the previous data is that diUerentiation of the LPM requires Rtf1. However, later from the single cell data the interpretation based on the markers is that Rtf1 loss aUects maturation. However, it is not clear this interpretation is correct or what changed from the single cell data. If that were the case, one would expect to see maintenance of more early marks and subsequent loss of maturation markers, which does not appear to the be the case from the presented data.

      Our data suggests that cardiac progenitor formation is not accomplished by simultaneously switching on all cardiac marker genes. Our pseudotime trajectory analysis highlights tnnt2a, ryr2b, and myh7bb as genes that increase in expression in a lagged manner compared to mef2cb (Fig. 6). Thus, the abnormal activation of mef2cb without subsequent upregulation of tnnt2a, ryr2b, and myh7bb in rtf1 morphants suggests a requirement for rtf1 in the progressive gene expression changes required for proper cardiac progenitor differentiation. Our single cell experiment focuses on the process of cardiac progenitor differentiation and does not provide insights into cardiomyocyte maturation. We have edited the text to clarify these interpretations. 

      (12) The interpretation that there is not "transfating" is not supported by the shown data. Analysis of markers in other tissues, again with in situ, to show spatially would benefit the study. 

      As stated in our response to the reviewer’s public review, we observed a dramatic increase of ALPM cells, but a decrease of ALPM derivatives including the cardiac lineage. We did not observe the expansion of one ALPM-derived subpopulation at the expense of the others. These observations suggest a defect in ALPM differentiation and argue against the notion that the region of the ALPM that would normally give rise to cardiac progenitors is instead differentiating into another cell type.

      (13) The rationale that sequence conservation means a gene is important (lines 137-139) is not really true. There are examples a lot of highly conserved genes whose mutants don't have defects. 

      We have revised the text to avoid confusion. 

      (14) The data showing that the 8 bp mutations do not aUect the RNA transcript is not shown or at least indicated in Fig. 7. It would seem that this experiment could have been done in the mutant embryos, in which case the experiment would have been semi-blinded as the genotyping would occur after imaging. 

      The modified Rtf1 wt RNA (Rtf1 wt* in revised Fig. 7) robustly rescued nkx2.5 expression in rtf1 deficient embryos, demonstrating that the 8 bp modifications do not negatively impact the activity of the injected RNA. As stated previously, morpholino knockdown was used in some experiments because our characterization of rtf1 morphants showed that they faithfully recapitulate the rtf1 mutant phenotype during the timeframe of interest.

      (15) Using a technique like PRO-seq at the same stage as the ChIP-seq would complement the ChIP-seq and allow a more detailed analysis of the transcriptional pausing on specific genes observed in WT and mutant embryos. 

      As stated in our response to the reviewer’s public review, we appreciate the suggestion but PRO-seq is beyond the scope of this study.

    1. eLife Assessment

      This useful study reports that the exogenous expression of the microRNA miR-195 can partially compensate in early B cell development for the loss of EBF1, one of the key transcription factors in B cells. While this finding will be of interest to those studying lymphocyte development, the evidence, particularly with regard to the molecular mechanisms that underpin the effect of miR-195, is currently incomplete.

    2. Reviewer #1 (Public review):

      Summary:

      Here, the authors are proposing a role for miR-196, a microRNA that has been shown to bind and enhance degradation of mRNA targets in the regulation of cell processes, has a novel role in allowing the emergence of CD19+ cells in cells in which Ebf1, a critical B-cell transcription factor, has been genetically removed.

      Strengths:

      That over-expression of mR-195 can allow the emergence of CD19+ cells missing Ebf1 is somewhat novel.

      Their data does perhaps support to a degree the emergence of a transcriptional network that may bypass the absence of Ebf1, including the FOXO1 transcription factor, but this data is not strong or definitive.

      Weaknesses:

      It is unclear whether this observation is in fact physiological. When the authors analyse a knockout model of miR-195, there is not much of a change in the B-cell phenotype. Their findings may therefore be an artefact of an overexpression system.

      The authors have provided insufficient data to allow a thorough appraisal of the step-wise molecular changes that could account for their observed phenotype.

      On review of the resubmitted manuscript, while I note the authors have attempted to address several of my comments, unfortunately, their resubmission is not sufficient to address several of the comments I had previously made.

      In particular, in the resubmitted data that includes western blots for PAX5 and ERG in their EBF1-/- model, Supp Fig S3, the bands they show infer that that PAX5 and ERG expression can still be significantly detected in their EBF1-/- early B-cell model. This should not be the case, as no expression of PAX5 or ERG should be seen, as has been shown in prior literature.

    3. Reviewer #2 (Public review):

      Summary:

      The authors investigate miRNA miR-195 in the context of B-cell development. They demonstrate that ectopic expression of miR-195 in hematopoietic progenitor cells can, to a considerable extent, override the consequences of deletion of Ebf1, a central B-lineage defining transcription factor, in vitro and upon short-term transplantation into immunodeficient mice in vivo. In addition, the authors demonstrate that the reverse experiment, genetic deletion of miR-195, has virtually no effect on B-cell development. Mechanistically, the authors identify Foxo1 phosphorylation as one pathway partially contributing to the rescue effect of miR-195. An additional analysis of epigenetics by ATACseq adds potential additional factors that might also contribute to the effect of ectopic expression of miR-195.

      Strengths:

      The authors employ a robust assay system, Ebf1-KO HPC, to test for B-lineage promoting factors. The manuscript overall takes on an interesting perspective rarely employed for analysis of miRNA by overexpressing the miRNA of interest. Ideally, this approach may reveal, if not the physiological function of this miRNA, the role of distinct pathways in developmental processes.

      Weaknesses:

      At the same time, this approach constitutes a major weakness: It does not reveal information on the physiological role of miR-195. In fact, the authors themselves demonstrate in their KO approach, that miR-195 has virtually no role in B-cell development, as has been demonstrated already in 2020 by Hutter and colleagues. While the authors cite this paper, unfortunately, they do so in a different context, hence omitting that their findings are not original.

      Conceptually, the authors stress that a predominant function of miRNA (in contrast to transcription factors, as the authors suggest) lies in fine-tuning. However, there appears to be a misconception. Misregulation of fine tuning of gene expression may result in substantial biological effects, especially in developmental processes. The authors want to highlight that miR-195 is somewhat an exception in that regard, but this is clearly not the case. In addition to miR-150, as referenced by the authors, also the miR-17-92 or miR-221/222 families play a significant role in B-cell development, their absence resulting in stage-specific developmental blocks, and other miRNAs, such as miR-155, miR-142, miR-181, and miR-223 are critical regulators of leukocyte development and function. Thus, while in many instances a single miRNA moderately affects gene expression at the level of an individual target, quite frequently targets converge in common pathways, hence controlling critical biological processes.

      The paper has some methodological weaknesses as well: For the most part, it lacks thorough statistical analysis and only representative FACS plots are provided. Many bar graphs are based on heavy normalization making the T-tests employed inapplicable. No details are provided regarding statistical analysis of microarrays. Generation of the miR-195-KO mice is insufficiently described and no validation of deletion is provided. Important controls are missing as well, the most important one being a direct rescue of Ebf1-KO cells by re-expression of Ebf1. This control is critical to quantify the extent of override of Ebf1-deficiency elicited by miR-195 and should essentially be included in all experiments. A quantitative comparison is essential to support the authors' main conclusion highlighted in the title of the manuscript. As the manuscript currently stands, only negative controls are provided, which, given the profound role of Ebf1, are insufficient, because many experiments, such as assessment of V(D)J recombination, IgM surface expression, or class-switch recombination, are completely negative in controls. In addition, the authors should also perform long-term reconstitution experiments. While it is somewhat surprising that the authors obtain splenic IgM+ B cells after just 10 days, these experiments would certainly be much more informative after longer periods of time. Using "classical" mixed bone marrow chimeras using a combination of B-cell defective (such as mb1/mb1) bone marrow and reconstituted Ebf1-KO progenitors would permit much more refined analyses.

      With regard to mechanism, the authors show that the Foxo1 phosphorylation pathway accounts for the rescue of CD19 expression, but not of other factors, and mentioned in the discussion. The authors then resort to epigenetic analysis, but their rationale remains somewhat vague. It remains unclear how miR-195 is linked to epigenetic changes.

    4. Reviewer #3 (Public review):

      Summary:

      In this study, Miyatake et al. present the interesting finding that ectopic expression of miR-195 in EBF1-deficient hematopoietic progenitor cells can partially rescue their developmental block and allows B cells to progress to a B220+ CD19+ cells stage. Notably, this is accompanied by an upregulation of B cell specific genes and, correspondingly, a downregulation of T, myeloid and NK lineage-related genes, suggesting that miR-195 expression is at least in part equivalent to EBF1 activity in orchestrating the complex gene regulatory network underlying B cell development. Strengthening this point, ATAC sequencing of miR-195-expressing EBF1-deficient B220+CD19+ cells and a comparison of these data to public datasets of EBF1-deficient and -proficient cells suggest that miR-195 indirectly regulates gene expression and chromatin accessibility of some, but not all regions regulated by EBF1.

      Mechanistically, the authors identify a subset of potential target genes of miR-195 involved in MAPK and PI3K signalling. Dampening of these pathways has previously been demonstrated to activate FOXO1, a key transcription factor for early B cells downstream of EBF1. Accordingly, the authors hypothesize that miR-195 exerts its function through FOXO1. Supporting this claim, also exogenous FOXO1 expression is able to promote the development of EBF1-deficient cells to the B220+CD19+ stage and thus recapitulates the miR-195 phenotype.

      Strengths:

      The strength of the presented study is the detailed assessment of the altered chromatin accessibility in response to ectopic miR-195 expression. This provides insight into how miR-195 impacts on the gene regulatory network that governs B cell development and allows the formation of mechanistic hypotheses.

      Weaknesses:

      The key weakness of this study is that its findings are based on the artificial and ectopic expression of a miRNA out of its normal context, which in my opinion strongly limits the biological relevance of the presented work.

      While the authors performed qPCRs for miR-195 on different B cell populations and show that its relative expression peaks in early B cells, it remains unclear whether the absolute miR-195 expression is sufficiently high to have any meaningful biological activity. In fact, other miRNA expression data from immune cells (e.g. DOI 10.1182/blood-2010-10-316034 and DOI 10.1016/j.immuni.2010.05.009) suggest that miR-195 is only weakly, if at all, expressed in the hematopoietic system.<br /> Update to this part after revision: The authors now state in the discussion that their study does not aim to uncover and characterize a physiological role of miR-195 in lymphocytes development, but rather reveals "the potential of miR-195 to compensate for EBF1 deficiency". However, in my opinion, the absence of any physiological context still limits this study's relevance.

      The authors support their finding by a CRISPR-derived miR-195 knockout mouse model which displays mild but significant differences in the hematopoietic stem cell compartment and in B cell development. However, they fail to acknowledge and discuss a lymphocyte-specific miR-195 knockout mouse that does not show any B cell defects in the bone marrow or spleen and thus contradicts the authors' findings (DOI 10.1111/febs.15493). Of note, B-1 B cells in particular have been shown to be elevated upon loss of miR-15-16-1 and/or miR-15b-16-2, which contradicts the data presented here for loss of the family member miR-195.

      A second weakness is that some claims by the authors appear overstated or at least not fully backed up by the presented data. In particular, the findings that miR-195-expressing cells can undergo VDJ recombination, express the pre-BCR/BCR and can class switch need to be strengthened. It would be beneficial to include additional controls to these experiments, e.g. a RAG-deficient mouse as a reference/negative control for the ddPCR and the surface IgM staining, and cells deficient in class switching for the IgG1 flow cytometric staining.

      Moreover, the manuscript would be strengthened by a more thorough investigation of the hypothesis that miR-195 promotes the stabilization and activity of FOXO1, e.g. by comparing the authors' ATACseq data to the FOXO1 signature.

    5. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment 

      This useful study reports that the exogenous expression of the microRNA miR-195 can partially compensate in early B cell development for the loss of EBF1, one of the key transcription factors in B cells. While this finding will be of interest to those studying lymphocyte development, the evidence, particularly with regard to the molecular mechanisms that underpin the effect of miR-195, is currently incomplete. 

      Public Reviews: 

      Reviewer #1 (Public review):

      Summary: 

      Here, the authors are proposing a role for miR-196, a microRNA that has been shown to bind and enhance the degradation of mRNA targets in the regulation of cell processes, and has a novel role in allowing the emergence of CD19+ cells in cells in which Ebf1, a critical B-cell transcription factor, has been genetically removed. 

      Strengths: 

      That over-expression of mR-195 can allow the emergence of CD19+ cells missing Ebf1 is somewhat novel. 

      Their data does perhaps support to a degree the emergence of a transcriptional network that may bypass the absence of Ebf1, including the FOXO1 transcription factor, but this data is not strong or definitive. 

      Weaknesses: 

      It is unclear whether this observation is in fact physiological. When the authors analyse a knockout model of miR-195, there is not much of a change in the B-cell phenotype. Their findings may therefore be an artefact of an overexpression system. 

      The authors have provided insufficient data to allow a thorough appraisal of the stepwise molecular changes that could account for their observed phenotype. 

      Reviewer #2 (Public review): 

      Summary: 

      The authors investigate miRNA miR-195 in the context of B-cell development. They demonstrate that ectopic expression of miR-195 in hematopoietic progenitor cells can, to a considerable extent, override the consequences of deletion of Ebf1, a central Blineage defining transcription factor, in vitro and upon short-term transplantation into immunodeficient mice in vivo. In addition, the authors demonstrate that the reverse experiment, genetic deletion of miR-195, has virtually no effect on B-cell development. Mechanistically, the authors identify Foxo1 phosphorylation as one pathway partially contributing to the rescue effect of miR-195. An additional analysis of epigenetics by ATACseq adds potential additional factors that might also contribute to the effect of ectopic expression of miR-195. 

      Strengths: 

      The authors employ a robust assay system, Ebf1-KO HPC, to test for B-lineage promoting factors. The manuscript overall takes on an interesting perspective rarely employed for the analysis of miRNA by overexpressing the miRNA of interest. Ideally, this approach may reveal, if not the physiological function of this miRNA, the role of distinct pathways in developmental processes. 

      Weaknesses: 

      At the same time, this approach constitutes a major weakness: It does not reveal information on the physiological role of miR-195. In fact, the authors themselves demonstrate in their KO approach, that miR-195 has virtually no role in B-cell development, as has been demonstrated already in 2020 by Hutter and colleagues. While the authors cite this paper, unfortunately, they do so in a different context, hence omitting that their findings are not original. 

      Conceptually, the authors stress that a predominant function of miRNA (in contrast to transcription factors, as the authors suggest) lies in fine-tuning. However, there appears to be a misconception. Misregulation of fine-tuning of gene expression may result in substantial biological effects, especially in developmental processes. The authors want to highlight that miR-195 is somewhat of an exception in that regard, but this is clearly not the case. In addition to miR-150, as referenced by the authors, also the miR-17-92 or miR-221/222 families play a significant role in B-cell development, their absence resulting in stage-specific developmental blocks, and other miRNAs, such as miR-155, miR-142, miR-181, and miR-223 are critical regulators of leukocyte development and function. Thus, while in many instances a single miRNA moderately affects gene expression at the level of an individual target, quite frequently targets converge in common pathways, hence controlling critical biological processes. 

      The paper has some methodological weaknesses as well: For the most part, it lacks thorough statistical analysis, and only representative FACS plots are provided. Many bar graphs are based on heavy normalization making the T-tests employed inapplicable. No details are provided regarding the statistical analysis of microarrays. Generation of the miR-195-KO mice is insufficiently described and no validation of deletion is provided. Important controls are missing as well, the most important one being a direct rescue of Ebf1-KO cells by re-expression of Ebf1. This control is critical to quantify the extent of override of Ebf1-deficiency elicited by miR-195 and should essentially be included in all experiments. A quantitative comparison is essential to support the authors' main conclusion highlighted in the title of the manuscript. As the manuscript currently stands, only negative controls are provided, which, given the profound role of Ebf1, are insufficient, because many experiments, such as assessment of V(D)J recombination, IgM surface expression, or class-switch recombination, are completely negative in controls. In addition, the authors should also perform long-term reconstitution experiments. While it is somewhat surprising that the authors obtained splenic IgM+ B cells after just 10 days, these experiments would be certainly much more informative after longer periods of time. Using "classical" mixed bone marrow chimeras using a combination of B-cell defective (such as mb1/mb1) bone marrow and reconstituted Ebf1-KO progenitors would permit much more refined analyses. 

      With regard to mechanism, the authors show that the Foxo1 phosphorylation pathway accounts for the rescue of CD19 expression, but not for other factors, as mentioned in the discussion. The authors then resort to epigenetics analysis, but their rationale remains somewhat vague. It remains unclear how miR-195 is linked to epigenetic changes. 

      Reviewer #3 (Public review): 

      Summary: 

      In this study, Miyatake et al. present the interesting finding that ectopic expression of miR-195 in EBF1-deficient hematopoietic progenitor cells can partially rescue their developmental block and allow B cells to progress to a B220+ CD19+ cells stage. Notably, this is accompanied by an upregulation of B-cell-specific genes and, correspondingly, a downregulation of T, myeloid, and NK lineage-related genes, suggesting that miR-195 expression is at least in part equivalent to EBF1 activity in orchestrating the complex gene regulatory network underlying B cell development. Strengthening this point, ATAC sequencing of miR-195-expressing EBF1-deficient B220+CD19+ cells and a comparison of these data to public datasets of EBF1-deficient and -proficient cells suggest that miR-195 indirectly regulates gene expression and chromatin accessibility of some, but not all regions regulated by EBF1. 

      Mechanistically, the authors identify a subset of potential target genes of miR-195 involved in MAPK and PI3K signaling. Dampening of these pathways has previously been demonstrated to activate FOXO1, a key transcription factor for early B cells downstream of EBF1. Accordingly, the authors hypothesize that miR-195 exerts its function through FOXO1. Supporting this claim, also exogenous FOXO1 expression is able to promote the development of EBF1-deficient cells to the B220+CD19+ stage and thus recapitulates the miR-195 phenotype. 

      Strengths: 

      The strength of the presented study is the detailed assessment of the altered chromatin accessibility in response to ectopic miR-195 expression. This provides insight into how miR-195 impacts the gene regulatory network that governs B-cell development and allows the formation of mechanistic hypotheses. 

      Weaknesses: 

      The key weakness of this study is that its findings are based on the artificial and ectopic expression of a miRNA out of its normal context, which in my opinion strongly limits the biological relevance of the presented work. 

      While the authors performed qPCRs for miR-195 on different B cell populations and show that its relative expression peaks in early B cells, it remains unclear whether the absolute miR-195 expression is sufficiently high to have any meaningful biological activity. In fact, other miRNA expression data from immune cells (e.g. DOI

      10.1182/blood-2010-10-316034 and DOI 10.1016/j.immuni.2010.05.009) suggest that miR-195 is only weakly, if at all, expressed in the hematopoietic system. 

      The authors support their finding by a CRISPR-derived miR-195 knockout mouse model which displays mild, but significant differences in the hematopoietic stem cell compartment and in B cell development. However, they fail to acknowledge and discuss a lymphocyte-specific miR-195 knockout mouse that does not show any B cell defects in the bone marrow or spleen and thus contradicts the authors' findings (DOI

      10.1111/febs.15493). Of note, B-1 B cells in particular have been shown to be elevated upon loss of miR-15-16-1 and/or miR-15b-16-2, which contradicts the data presented here for loss of the family member miR-195. 

      A second weakness is that some claims by the authors appear overstated or at least not fully backed up by the presented data. In particular, the findings that miR-195expressing cells can undergo VDJ recombination, express the pre-BCR/BCR and class switch needs to be strengthened. It would be beneficial to include additional controls to these experiments, e.g. a RAG-deficient mouse as a reference/negative control for the ddPCR and the surface IgM staining, and cells deficient in class switching for the IgG1 flow cytometric staining. 

      Moreover, the manuscript would be strengthened by a more thorough investigation of the hypothesis that miR-195 promotes the stabilization and activity of FOXO1, e.g. by comparing the authors' ATACseq data to the FOXO1 signature. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors): 

      Miyatake et al., present a manuscript that explores the role of miR-195 in B cell development. 

      Their data suggests a role for this microRNA: 

      Using an Ebf1 fetal liver knockout of B-cell differentiation that a small population of CD19 expressing with some evidence of V(D)J recombination capable of class switch can be derived by transduction of miR-195. 

      In the emergent CD19+ Ebf1-/- cells, the authors provide some evidence that Mapk and Akt3 may be miR-195 targets that are downregulated allowing FOXO1 transcription factor pathway may be involved in the emergent CD19+ cells arising from miR-195 transduction. 

      Perhaps less compelling data is provided with regards to a role for miR-195 in normal Bcell development through analysis of a miR-195 knockout model. 

      While there are some interesting preliminary data presented for a role for miR-195 in the context of Ebf1-/- cells, there are some questions I think the authors could consider. 

      Comments: 

      (1-1) It is difficult to ascertain the potential role of miR-195 transduction in allowing the emergence of CD19+ cells from the data provided. miR-195 has been generally shown to destabilize mRNA transcripts by 3' UTR binding that targets mRNA transcripts for degradation. The effect of transduction of miR-195 would therefore be expected to be related to the degradation of factors opposing aspects of B-lineage specification or maintenance. I would be particularly interested in transcriptional or epigenetic regulators that may be modified in this way, at an mRNA as well as protein level.

      We appreciate the reviewerʼs thoughtful comments and agree that miRNAs often exert their effects through the degradation or translational repression of mRNAs encoding regulatory factors. In our study, we attempted to address this point by combining predictive analysis (using TargetScan and starBase) with luciferase reporter assays and qPCR to validate several potential targets of miR-195, including Mapk3 and Akt3. We acknowledge that this is not a comprehensive mechanistic analysis. We agree that a broader and systematic identification of direct targets of miR-195, particularly those involved in transcriptional and epigenetic regulation, would further clarify the mechanisms involved. However, due to limitations in resources and time, we are currently unable to perform global proteomic or ChIP-based validations. Nevertheless, our ATAC-seq and microarray data indicate that miR-195 overexpression leads to increased accessibility and expression of several key B-lineage transcription factors (Pax5, Runx1, Irf8), suggesting that miR-195 indirectly activates transcriptional programs relevant to B cell commitment. We have now clarified this limitation in the revised Discussion section (lines 505‒524), and we emphasize that our current findings represent the potential of miR-195 rather than its physiological role. We hope that this clarification addresses the concern.

      (1-2) While I acknowledge the authors have undertaken TargetScan and starBase analysis to try and predict miR-195 interactions, they do not provide a comprehensive list of putative targets that can be referenced against their cDNA data. Though they postulate Mapk3 and Akt3 as putative miR-195 targets and assay these in luciferase reporter systems (Figure 4), these were not clearly differentially regulated in the microarray data they provided (Figure 1E) as being downregulated on miR-195 transduction in Ebf1-/- cells.

      We thank the reviewer for pointing out the need for a more comprehensive list of predicted miR-195 targets. In response, we have now included a supplementary table 4 (human) and 5 (mouse) listing all putative miR-195 targets predicted by TargetScan and starBase. As noted, Mapk3 expression was indeed downregulated upon miR-195 transduction, consistent with our luciferase reporter and qPCR results. For Akt3, we observed variability in the microarray data depending on the probe used, resulting in inconsistent expression levels. We acknowledge this and have added a clarification in the revised manuscript (lines 335‒339), noting that the regulation of Akt3 by miR-195 is potentially probe-dependent and may require further validation. We hope this clarification resolves the concern.

      (1-3) The authors should provide a more comprehensive analysis of transcriptional changes induced by miR-195 Ebf1-/- specifically in the preproB cell stage of development in Ebf1-/- and miR-195 Ebf1-/- cells. The differentially expressed gene list should be provided as a supplemental file. The gene expression data should be provided for the different B-cell differentiation stages, eg. Ebf1-/- preproB cells, and Ebf1-/- miR-195 preproB cells, CD19+ cells and more differentiated subsets induced by miR-195 transduction.

      We appreciate the reviewerʼs suggestion to provide a more comprehensive transcriptomic analysis at different B-cell differentiation stages. Unfortunately, due to the limited availability of cells and technical constraints, we were unable to perform RNA-seq on miR-195 transduced Ebf1<sup>−/−</sup> pre-pro-B or CD19+ cells. However, to address this point, we referenced publicly available RNA-seq data (GEO accession: GSE92434), which includes transcriptomic profiles of Ebf1<sup>−/−</sup> pro-B cells and wild-type controls. By comparing our microarray data from miR-195 transduced Ebf1<sup>−/−</sup> cells with this dataset, we found partial restoration of expression for several key B-lineage genes, such as Pax5, Runx1, and Irf8, which are normally downregulated in the absence of EBF1. This comparison supports the notion that miR-195 partially reactivates the transcriptional network essential for B cell development. We have added this interpretation to the Discussion section (lines 528‒533).

      (1-4) More replicates (at least 3 of each genotype) are required for their Western Blots for FOXO1 and pFOXO1 (Fig 4C, D). Western blots should also be provided for other known B-lineage transcriptional regulators such as PAX5 and ERG.

      We thank the reviewer for these valuable suggestions. In response, we have now quantified and added the relative band intensities of FOXO1 and pFOXO1 from three independent experiments in the revised Figure 4C, and we include statistical analysis to support the reproducibility of these results. Additionally, as requested, we performed western blotting for PAX5 and ERG using the same samples. The results showed no significant change in these protein levels between miR-195-transduced and control Ebf1<sup>−/−</sup> cells, consistent with the modest upregulation observed in our microarray data. We have included the PAX5 and ERG western blot images in Supplementary Figure S3 and have revised the text in the Results section (lines 351‒35)

      (1-5) The authors have not shown a transcriptional binding by ChIPseq or other methods such as cut and tag/ cut and run for FOXO1 binding to B-lineage genes in their Ebf1-/- miR-195 CD19+ cells to be able to definitively show this TF is critical for the emergence of the C19+ cell phenotype by demonstrating direct binding to "upregulated" genes cis-regulatory regions in the Ebf1-/- miR-195 CD19+ cells

      We appreciate the reviewerʼs suggestion regarding the use of ChIP-seq or related methods to demonstrate direct FOXO1 binding to cis-regulatory regions of B-lineage genes in Ebf1<sup>−/−</sup> miR-195 CD19⁺ cells. We agree that such data would provide definitive evidence of FOXO1's direct involvement in promoting the B cell-like transcriptional program. However, due to current technical limitations, including the scarcity of CD19⁺ cells derived from Ebf1<sup>−/−</sup> miR-195 transduction and the requirement for large cell numbers in ChIP-seq or CUT&RUN protocols, we were unable to perform these assays in this study. Nevertheless, our current data provide multiple lines of indirect evidence supporting the involvement of FOXO1:

      miR-195 transduction leads to reduced phosphorylation and increased accumulation of FOXO1 protein (Fig. 4C).

      Overexpression of FOXO1 in Ebf1<sup>−/−</sup> HPCs partially recapitulates the miR-195 phenotype (Fig. 4D).

      ATAC-seq data show increased chromatin accessibility at known FOXO1 target gene loci (e.g., Pax5, Runx1, Irf8) in miR-195-induced CD19⁺ cells, many of which overlap with FOXO1 motifs(Fig.5)

      These observations collectively suggest that FOXO1 activity is functionally important for the emergence of CD19⁺ cells, even though direct binding has not been confirmed. We have added this limitation to the Discussion (lines 531‒537), and we note that future studies using FOXO1 CUT&RUN in this system would be valuable to further define the underlying mechanism.

      (1-6) The authors have not shown significant upregulation of expression of other critical B-cell regulatory transcription factors in their Ebf1-/- miR-195 CD19+ cells that could account for the emergence of these cells such as Pax5 or Erg. The legend in Figure 1E suggests for example the change in expression of Pax5 is modest if anything at best as no LogFC or western blot data is presented. 

      We thank the reviewer for raising this point. In our microarray analysis (Figure 1D, original Figure 1E), we observed that both Pax5 and Erg mRNA levels were upregulated in Ebf1<sup>−/−</sup> cells upon miR-195 transduction. Specifically, Pax5 showed an increase of approximately log₂FC 1.2, and Erg was also consistently elevated across biological replicates. These changes, although modest, were statistically significant and consistent with the upregulation of other B-lineage-associated transcription factors, such as Runx1 and Irf8. We agree that the magnitude of Pax5 upregulation is not as high as typically seen during full B cell commitment, and therefore may not have been immediately apparent in Figure 1D (original Figure 1E). To clarify this point, we have now revised the text in the Results section (lines 170‒174) to highlight the observed changes in Pax5 and Erg expression. We believe that the upregulation of these transcription factors, together with increased FOXO1 activity and changes in chromatin accessibility (Figure 5), contributes to the partial reactivation of the B cell gene regulatory network in the absence of EBF1.

      (1-7) Which V(D)J transcripts have been produced? A more detailed analysis other than ddPCR is required to help understand the emergence of this population that can presumably proceed through the preBCR and BCR checkpoints.

      We appreciate the reviewerʼs interest in understanding the nature of the V(D)J rearrangements in Ebf1<sup>−/−</sup> miR-195 CD19⁺ cells. As noted, our current data rely on droplet digital PCR (ddPCR), which was used to detect rearranged VH-JH segments in the bone marrow of engrafted mice. While this approach does not allow for detailed mapping of specific V, D, or J gene usage, it provides a sensitive and quantitative measure of V(D)J recombination activity. The detection of rearranged VH-JH fragments in miR-195-transduced Ebf1<sup>−/−</sup> cells suggests that at least partial recombination of the immunoglobulin heavy chain locus is occurring̶an essential checkpoint for progression past the pro-B cell stage. Given the lack of such rearrangements in control-transduced Ebf1<sup>−/−</sup> cells, we interpret this as evidence that miR-195 enables cells to initiate the recombination process. We acknowledge the limitations of ddPCR and agree that a more detailed analysis using VDJ-seq or singlecell RNA-seq would be valuable in determining the diversity and completeness of the V(D)J transcripts produced. This is a direction we intend to pursue in future work. We have added this limitation to the Discussion section (lines 538‒543).

      (1-8) The authors reveal that the Foxo1 transduced Ebf1-/- cells (Fig. 4D) do not persist in vitro or be detected via transplant assay (line 256) and therefore does not represent a truly "rescued" B cell, suggesting that CD19+ cells Ebf1-/- miR-195 transduced cells have more B-cell potential. Further characterisation is therefore warranted of this cell population. For instance, can these cells be induced to undergo myeloid differentiation in myeloid cytokine conditions? What other B-lineage transcriptional regulators are expressed in this cell population that could account for VDJ recombination and expression of a B-lineage transcriptional program (see comments 1, 3, and 5) that allow transition through preBCR and BCR checkpoints as well as undergo class switching?

      We thank the reviewer for this insightful comment. We agree that the persistence and lineage potential of the CD19⁺ cells emerging from Ebf1<sup>−/−</sup> miR-195-transduced progenitors deserve further characterization. Although we were unable to perform additional lineage re-direction assays, our current data provide several lines of evidence suggesting that these cells are stably committed toward the B-lineage:

      Gene expression profiling revealed upregulation of multiple B cell transcriptional regulators, including Pax5, Runx1, and Irf8.

      ATAC-seq analysis showed increased chromatin accessibility at B cell‒specific loci and enrichment of motifs bound by key B-lineage factors such as FOXO1 and E2A.

      The cells express surface IgM and undergo class switch recombination to IgG1 upon stimulation, indicating successful transition through the pre-BCR and BCR checkpoints and acquisition of mature B cell functions.

      Importantly, no upregulation of myeloid- or T-lineage genes was detected in the microarray analysis, arguing against multipotency at this stage.We acknowledge that functional tests for lineage plasticity under altered cytokine conditions would provide important insights and plan to address this question in future studies. This limitation has now been noted in the revised Discussion (lines 544‒550).

      (1-9) In the original Ebf1-/- miR-195 CD19+ experiments, a wild-type control should be provided for each experiment. 

      We appreciate the reviewerʼs suggestion to include wild-type controls in all experiments. While we did not include wild-type samples side-by-side in every assay, we carefully designed our experiments to include biologically appropriate and informative comparisons. For example, in the bone marrow transplantation experiments (Figure 2), Ebf1<sup>−/−</sup> cells transduced with empty vector served as negative controls, clearly lacking CD19 expression, V(D)J recombination, IgM surface expression, and class switch capability. This allowed us to specifically assess the gain-of-function effects of miR-195 in the EBF1-deficient background. In several analyses̶such as the ATAC-seq and microarray comparisons̶we did incorporate or refer to existing wild-type datasets (e.g., GSE92434), providing context for the extent of recovery toward a WT-like profile. We agree, however, that including parallel WT controls across all experimental platforms would enhance interpretability.

      (1-10) For ATACseq data, a comparison between Ebf1-/- preproB cells and Ebf1-/- miR-195 CD19+ cells should be undertaken.

      We thank the reviewer for this important point. As suggested, we have performed a direct comparison of chromatin accessibility between Ebf1<sub>−/−</sub> pre-pro-B‒like cells (CD19<sub>-</sub>, control transduction) and Ebf1<sub>−/−</sub> miR-195‒transduced CD19⁺ cells. This comparison is shown in green in Figure 5B and represents the ATAC-seq peaks differentially accessible between these two populations.  

      (1-11) I cannot agree with the authors with some of their statements such as Line 242 - "therefore miR-195 considered to have similar function with EBF1 to some extent" - how can this be the case when miR-195 is a miRNA and EBF1 is a transcription factor with pioneering transcriptional activity? Surely the effects of miR-195 must be secondary.

      We thank the reviewer for pointing out the inappropriateness of comparing miR-195 to EBF1 in terms of functional similarity. We agree that miR-195, as a microRNA, operates through post-transcriptional regulation and does not possess the pioneering transcriptional activity characteristic of EBF1. To avoid confusion or overstatement, we have removed the sentence in line 242 ("therefore miR-195 is considered to have similar function with EBF1 to some extent").

      (1-12) It is unclear whether this observation is in fact physiological. When the authors analyse a knockout model of miR-195, there is not much of a change in the B-cell phenotype. Their findings may therefore be an artefact of an overexpression system. The authors should comment on this observation in their discussion.  

      We thank the reviewer for this important observation. We agree that the mild phenotype observed in our miR-195 knockout mice suggests that miR-195 is not essential for B cell development under steady-state physiological conditions. Accordingly, we do not claim a physiological requirement for miR-195. Rather, our study demonstrates that miR-195 possesses the potential to activate a B-lineage program in the absence of EBF1 when ectopically expressed. This functional potential̶rather than its endogenous necessity̶ is the main focus of our work. We have now clarified this distinction in the revised Discussion section (lines 551‒560), and we emphasize that our findings highlight an alternative regulatory pathway that can be artificially engaged under specific conditions.

      (1-13) I recommend the authors check spelling and grammar throughout their manuscript.

      We thank the reviewer for the suggestion. In response, we have carefully reviewed the manuscript for spelling, grammar, and clarity. Minor corrections have been made throughout the text to improve readability and ensure consistency. We hope that the revised version addresses any language-related concerns. In addition, the manuscript has been reviewed by professional editing service to improve the language quality.

      (1-14) In general, I recommend more comprehensive primary data be presented in the manuscript or supplementary files to add value to their submission.

      We thank the reviewer for this helpful suggestion. In response, we have revised the manuscript and supplementary materials to include additional primary data wherever possible. The bar graphs have been updated to include individual data points to show variability and replicate information. Uncropped western blot images are now provided in Supplementary Figure S2. We hope these additions provide greater transparency and value to the manuscript. 

      Reviewer #2 (Recommendations for the authors): 

      I have a number of suggestions with regard to inclusion of details and controls: 

      (2-1) The authors need to provide more details on in vitro differentiation, especially culture times. 

      Thank you for your comment. The culture conditions for in vitro differentiation of Ebf1<sup>−/−</sup> hematopoietic progenitor cells are described in the Methods section (lines 648‒ 649) under “Culture of lineage-negative (Lin‒) cells from the fetal liver.” As stated, cells were cultured more than 7 days under the specified conditions.

      (2-2) In Figure 1E, the authors need to provide information on statistics (FDR or similar). 

      I thank the reviewer for the suggestion. In Figure 1D (Original Figure 1E) (the microarray analysis), only two biological replicates were available for each condition (n = 2 per group). Due to this limited sample size, we did not perform statistical testing, as the power would be insufficient to produce reliable p-values or adjusted FDRs. Instead, we focused on genes with consistent and biologically meaningful changes in expression, and presented representative examples based on fold change values.

      (2-3) For in vivo experiments (Figure 2) the authors should comment on their use of two different recipient mouse strains despite very low n numbers. As described above, classical mixed BM chimeras would be much more informative. In these experiments, the authors should also show the formation of other lymphoid lineages. This would answer the question of whether miR-195 redirects cells to the B lineage. Most importantly, absolute numbers need to be provided, especially in conjunction with Ebf1 rescue as described above. 

      We thank the reviewer for the thoughtful and detailed suggestions regarding our in vivo experiments. Regarding the use of different recipient mouse strains, our initial intention was to perform the transplantations in BRG mice; however, due to facility restrictions and animal husbandry considerations, we had to switch to NOG mice. All in vivo experiments were performed with n = 3 per group, in accordance with ethical guidelines and efforts to minimize animal use while still ensuring reproducibility. With respect to the suggestion of mixed bone marrow chimeras, we agree that this approach can provide valuable information on lineage competitiveness. However, in our system, miR-195 confers only a very limited B cell developmental potential in Ebf1<sup>−/−</sup> progenitors. In such a setting, the inclusion of wild-type competitor cells would overwhelmingly dominate the B cell compartment, likely masking any measurable effect of miR-195. Therefore, we opted to assess the gain-of-function potential of miR-195 in a noncompetitive setting. Regarding the assessment of other lymphoid lineages, we focused our analysis on the emergence of B-lineage cells, as the frequency of CD19⁺ cells induced by miR-195 is quite low. Given this low efficiency, we consider it unlikely that miR-195 significantly alters the development of non-B lineages, and thus did not observe substantial lineage diversion effects. Our aim was not to demonstrate lineage redirection, but rather to show that miR-195 can confer partial B cell potential in the absence of EBF1.

      Finally, we acknowledge the importance of presenting absolute cell numbers. However, the cell number collected from the mice were so few that we did not get the reliable results, we described it in the manuscript. (lines 498-501)

      (2-4) The statistics in Figure 3 are inadequate. No S.D. is provided for WT. How then was normalization performed? Student's T-test cannot be applied to ratios. 

      We thank the reviewer for highlighting the need for more appropriate statistical analysis. Due to considerable inter-batch variability in absolute measurements, we normalized the KO values to their paired WT counterparts from the same experimental batch. Specifically, for each replicate, we calculated the KO/WT ratio to control for batch-specific variation. We then applied a one-sample t-test (against a null hypothesis of ratio = 1) to determine statistical significance. We have now revised the figure to show individual ratio values for each replicate and updated the legend and Methods to clearly explain the statistical approach. We hope this addresses the concern and improves the clarity and rigor of the analysis.

      (2-5) In Figure 4A, the authors should comment on the strong repression of the Akt3UTR. 

      We appreciate the reviewerʼs observation regarding the strong repression observed with the Akt3 3'UTR construct. Indeed, we also noted that luciferase activity was markedly reduced in the presence of the Akt3 3'UTR, even in cells transduced with a control vector. We hypothesize that the Akt3 3'UTR contains strong post-transcriptional regulatory elements̶such as AU-rich elements or binding sites for endogenous miRNAs or RNA-binding proteins̶which may suppress mRNA stability or translation independent of miR-195. Alternatively, the secondary structure or length of the UTR may inherently reduce luciferase expression. We have added this limitation to the Discussion section (lines 561‒569).

      (2-6) The Western blot in Figure 4C is of insufficient quality. The authors need to provide unspliced versions of the bands including markers. 

      We thank the reviewer for this important comment. In response, we have included the unprocessed, full-length Western blot images corresponding to Figure 4C as Fig. S2. This provides a transparent view of the original data and addresses the concern about image cropping.

      (2-7) The ATACseq experiment in Figure 5 is difficult to comprehend. A simpler design including Ebf1 rescue controls would clearly improve this part. 

      We thank the reviewer for this valuable feedback. We agree that the original presentation of the ATAC-seq data may have been difficult to interpret. To address this, we have included a clear interpretation of the overlapping regions in the revised figure legend (lines 1018-1022). We hope this improves the clarity of the data and facilitates understanding of the chromatin changes mediated by EBF1 and miR-195.

      (2-8) The miR-195 KO mouse lacks validation (RT-PCR, genomic PCR) as well as a clear description of the deleted region and whether miR-497 is affected. In addition, the genetic background and number of backcrosses for the removal of potential off-target effects need to be mentioned. 

      We thank the reviewer for this important comment. The miR-195 knockout mouse was generated via CRISPR/Cas9, and Sanger sequencing confirmed a 628 bp deletion on chromosome 11 (GRCm38/mm10 chr11:70,234,425‒70,235,103). This deletion includes the entire miR-497 locus and part of the miR-195 precursor sequence. Although we do not show PCR gel images, the deletion was validated by sequencing, and the results are now clearly described in the revised Methods section (lines 607619). All transgenic mice in this study were backcrossed to the C57BL/6 background for at least eight generations.

      (2-9) The manuscript requires extensive editing for language. 

      We appreciate the reviewerʼs comment. The manuscript has now been revised and professionally edited for language by a native English-speaking editor. We believe clarity and readability have been significantly improved.

      Reviewer #3 (Recommendations for the authors): 

      (3-1) What is the expression level of miR-195 after viral overexpression? In Figure 4B, the authors show a 2.5-fold increase, but this appears very low for the experimental system (expression through the MDH1 retroviral construct) and the observed repressive effects (e.g. Figure 4A and B). 

      We thank the reviewer for this insightful comment. We agree that the apparent ~2.5fold increase in miR-195 levels (Figure 4B) may seem modest in the context of retroviral overexpression and the associated functional effects. However, due to the high sequence similarity within the miR-15/16/195/497 family, it is technically challenging to measure mature miR-195 levels with complete specificity. The baseline signal observed in control samples likely reflects cross-reactivity with endogenous miRNAs such as miR-497 or miR-16, which share similar seed sequences. Therefore, the reported fold-change may underestimate the true level of ectopic miR-195 expression. Despite this, we observed robust repression of validated targets (e.g., Mapk3, Akt3) in both qPCR and luciferase assays, indicating that functionally effective levels of miR-195 were achieved. We have now clarified this limitation and interpretation in the revised Results sections (lines 332‒335).

      (3-2) In alignment with the transparency of the data, I would encourage the authors to display the individual data points for all bar graphs. 

      We thank the reviewer for this helpful suggestion. In the revised manuscript, we have updated bar graphs to include individual data points to increase transparency and allow better visualization of data variability. In the ddPCR experiments, we provided the raw data in Fig. S1 for full transparency. In Fig. 1A, we have confirmed miR-195 expression profiles using the deposit data which the reviewer suggested, but miR-195 expression was very lower than we expected. We also performed scRNA-seq using hematopoietic lineage cells in 8-week-old C57BL/6 mice, but we could not get the reproducibility of miR-195 expression profiles. Therefore, we determined that this is an artifact caused by the miR-195 probe used for qPCR, and deleted Fig. 1A.

      (3-3) The references appear to be compromised. For example, the authors state that "The Ebf1−/+ mouse was originally generated by R. Grosschedl (39)" (line 297), but this is not the respective paper. Likewise, the knockout mouse was generated "based on the CRISPR/Cas9 system established by C. Gurumurthy (40)" (line 299), but he/she is not involved in the referenced study. 

      We thank the reviewer for pointing out the discrepancies in the reference citations. Upon revising the Methods section to integrate it with the main text, the reference numbering became misaligned. We have corrected the reference in the revised manuscript, and we thank the reviewer for bringing this to our attention.

      (3-4) Given that the miRNA Taqman assays the authors used here have difficulties to discriminate closely related miRNAs such as e.g. miR-16 (highly expressed in the hematopoietic system) and miR-195, I would suggest that the authors test their qPCR in an appropriate setup, e.g. in their knockout mouse model. In this context, did the authors use another small RNA as a reference for the qPCR analysis? In the methods, only GAPDH is mentioned, but in my opinion, another RNA that uses the same stemloop-based cDNA synthesis protocol would be better suited.

      We thank the reviewer for this valuable and technically insightful comment.

      As correctly pointed out, TaqMan-based qPCR assays for miRNAs such as miR-195 can show cross-reactivity with closely related family members, particularly miR-16, which is abundantly expressed in hematopoietic cells. Indeed, due to this limitation, we do not treat the qPCR results shown in the original Figures 1A and 4B as definitive quantification of miR-195 expression. Rather, these data are used to provide a suggestion and a rough estimate of overexpression efficiency, while our core functional analyses rely on phenotypic and molecular outcomes such as target gene repression and lineage emergence. With this in mind, although we acknowledge that a small RNA reference based on the same stem-loop cDNA synthesis would offer a more compatible normalization in principle, the inherent variability and lack of absolute specificity in such assays also limits their interpretive value. Therefore, we used GAPDH as a normalization control for consistency with other qPCR analyses in the manuscript. We have now clarified this rationale and limitation in the revised Methods sections (lines 712‒716), and we thank the reviewer again for highlighting this important technical consideration.

      (3-5) The Western blot data used to support the hypothesis that FOXO1 phosphorylation is reduced upon overexpression of miR-195 are not convincing. The authors should not crop everything but the band. 

      We thank the reviewer for the helpful comment. In response, we have now provided the full-length, uncropped Western blot images corresponding to Figure 4C, including both total FOXO1 and phospho-FOXO1 blots. These images are included in Fig. S2.

    1. eLife Assessment

      In reporting on a valuable "learning proteome" for a C. elegans gustatory associative learning paradigm, this work identifies a new set of genes to be tested for roles in learning and memory, describes molecular pathways involving these genes and relevant for learning and memory in C. elegans, and deliver a new set of tools for prodding worm behavior. The methods and results convincingly support the findings, which will be of interest to neuroscientists and developmental biologists seeking to understand the self-assembly and operation of neural circuits for learning and memory.

      [Editors' note: this paper was reviewed by Review Commons.]

    2. Reviewer #1 (Public review):

      Summary:

      Rahmani et al. utilize the TurboID method to characterize global proteome changes in the worm's nervous system induced by a salt-based associative learning paradigm. Altogether, they uncover 706 proteins tagged by the TurboID method in worms that underwent the memory-inducing protocol. Next, the authors conduct a gene enrichment analysis that implicates specific molecular pathways in salt-associative learning, such as MAP kinase and cAMP-mediated pathways, as well as specific neuronal classes including pharyngeal neurons, and specific sensory neurons, interneurons, and motor neurons. The authors then screen a representative group of hits from the proteome analysis. They find that mutants of candidate genes from the MAP kinase pathway, namely dlk-1 and uev-3, do not affect performance in the learning paradigm. Instead, multiple acetylcholine signaling mutants, as well as a protein-kinase-A mutant, significantly affected performance in the associative memory assay (e.g., acc-1, acc-3, lgc-46, and kin-2). Finally, the authors demonstrate that protein-kinase-A mutants, as well as acetylcholine signaling mutants, do not exhibit a phenotype in a related but distinct conditioning paradigm-aversive salt conditioning-suggesting their effect is specific to appetitive salt conditioning.

      Overall, the authors addressed the concerns raised in the previous review round, including the statistics of the chemotaxis experiments and the systems-level analysis of the neuron class expression patterns of their hits. I also appreciate the further attempt to equalize the sample size of the chemotaxis experiments and the transparent reporting of the sample size and statistics in the figure captions and Table S9. The new results from the panneuronal overexpression of the kin-2 gain-of-function allele also contribute to the manuscript. Together, these make the paper more compelling.

    3. Reviewer #2 (Public review):

      Summary:

      In this study by Rahmani in colleagues, the authors sought to define the "learning proteome" for a gustatory associative learning paradigm in C. elegans. Using a cytoplasmic TurboID expressed under the control of a pan-neuronal promoter, the authors labeled proteins during the training portion of the paradigm, followed by proteomics analysis. This approach revealed hundreds of proteins potentially involved in learning, which the authors describe using gene ontology and pathway analysis. The authors performed functional characterization of over two dozen of these genes for their requirement in learning using the same paradigm. They also compared the requirement for these genes across various learning paradigms and found that most hits they characterized appear to be specifically required for the training paradigm used for generating the "learning proteome".

      Strengths:

      - The authors have thoughtfully and transparently designed and reported the results of their study. Controls are carefully thought-out, and hits are ranked as strong and weak. By combining their proteomics with behavioral analysis, the authors also highlight the biological significance of their proteomics findings, and support that even weak hits are meaningful.

      - The authors display a high degree of statistical rigor, incorporating normality tests into their behavioral data which is beyond the field standard.

      - The authors include pathway analysis that generates interesting hypotheses about processes involved learning and memory

      -The authors generally provide thoughtful interpretations for all of their results, both positive and negative, as well as any unexpected outcomes.

    4. Reviewer #3 (Public review):

      Summary:

      In this manuscript, authors used a learning paradigm in C. elegans; when worms were fed in a saltless plate, its chemotaxis to salt is greatly reduced. To identify learning-related proteins, authors employed nervous system-specific transcriptome analysis to compare whole proteins in neurons between high-salt-fed animals and saltless-fed animals. Authors identified "learning-specific proteins" which are observed only after saltless feeding. They categorized these proteins by GO analyses, pathway analyses and expression site analyses, and further stepped forward to test mutants in selected genes identified by the proteome analysis. They find several mutants that are defective or hyper-proficient for learning, including acc-1/3 and lgc-46 acetylcholine receptors, F46H5.3 putative arginine kinase, and kin-2, a cAMP pathway gene. These mutants were not previously reported to have abnormality in the learning paradigm.

      Concerns:

      Upon revision, authors addressed all concerns of this reviewer, and the results are now presented in a way that facilitates objective evaluation. Authors' conclusions are supported by the results presented, and the strength of the proteomics approach is persuasively demonstrated.

      Significance:

      (1) Total neural proteome analysis has not been conducted before for learning-induced changes, though transcriptome analysis has been performed for odor learning (Lakhina et al., http://dx.doi.org/10.1016/j.neuron.2014.12.029). This warrants the novelty of this manuscript, because for some genes, protein levels may change even though mRNA levels remain the same. Although in a few reports TurboID has been used in C. elegans, this is the first report of a systematic analysis of tissue-specific differential proteomics.

      (2) Authors found five mutants that have abnormality in the salt learning. These genes have not been described to have the abnormality, providing novel knowledge to the readers, especially those who work on C. elegans behavioural plasticity. Especially, involvement of acetylcholine neurotransmission has not been addressed before. Although transgenic rescue experiments have not been performed except kin-2, and the site of action (neurons involved) has not been tested in this manuscript, it will open the venue to further determine the way in which acetylcholine receptors, cAMP pathway etc. influences the learning process.

      [Editors' note: this version has been assessed without input from the reviewers.]

    5. Author response:

      The following is the authors’ response to the original reviews

      Comment from the editors at eLife:

      You could consider further strengthening the manuscript with the incorporation of new relevant public datasets for network modeling, but that is entirely your choice.

      We thank the editors and reviewers for their thoughtful and positive feedback on our article. We are particularly appreciative of the eLife assessment describing our work as valuable with a convincing methodology.

      As suggested, we have expanded our neuron class analysis by incorporating transcriptomic data from young adult animals (Kaletsky et al., 2016 Nature; Ghaddar et al., 2023 Science Advances; St Ange et al., 2024 Cell Genomics) to complement our existing analysis of larval stage 4 (L4) animals.

      In addition, we have updated Table S1 to include the outcross status of all strains used in this study, providing clearer information on the genotypes tested. We have also corrected the typographical errors noted by the reviewers. Please note that page and line numbers below refer to the MS Word Document with tracked changes set to ‘simple markup’.

      We greatly appreciate the reviewers’ input and hope these revisions further enhance the value and clarity of our study.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Rahmani et al. utilize the TurboID method to characterize global proteome changes in the worm's nervous system induced by a salt-based associative learning paradigm. Altogether, they uncover 706 proteins tagged by the TurboID method in worms that underwent the memory-inducing protocol. Next, the authors conduct a gene enrichment analysis that implicates specific molecular pathways in salt-associative learning, such as MAP kinase and cAMP-mediated pathways, as well as specific neuronal classes including pharyngeal neurons, and specific sensory neurons, interneurons, and motor neurons. The authors then screen a representative group of hits from the proteome analysis. They find that mutants of candidate genes from the MAP kinase pathway, namely dlk-1 and uev-3, do not affect performance in the learning paradigm. Instead, multiple acetylcholine signaling mutants, as well as a protein-kinase-A mutant, significantly affected performance in the associative memory assay (e.g., acc-1, acc-3, lgc-46, and kin-2). Finally, the authors demonstrate that protein-kinase-A mutants, as well as acetylcholine signaling mutants, do not exhibit a phenotype in a related but distinct conditioning paradigm-aversive salt conditioning-suggesting their effect is specific to appetitive salt conditioning.

      Overall, the authors addressed the concerns raised in the previous review round, including the statistics of the chemotaxis experiments and the systems-level analysis of the neuron class expression patterns of their hits. I also appreciate the further attempt to equalize the sample size of the chemotaxis experiments and the transparent reporting of the sample size and statistics in the figure captions and Table S9. The new results from the panneuronal overexpression of the kin-2 gain-of-function allele also contribute to the manuscript. Together, these make the paper more compelling. The additional tested hits provide a comprehensive analysis of the main molecular pathways that could have affected learning. However, the revised manuscript includes more information and analysis, raising additional concerns.

      Major comments:

      As reviewer 4 noted, and as also shown to be relevant for C30G12.6 presented in Figure 6, the backcrossing of the mutants is important, as background mutations may lead to the observed effects. Could the authors add to Table 1, sheet 1, the outcrossing status of the tested mutants?

      We appreciate this important point. A column has now been added to Table S1 to indicate the outcross status of all strains used in this study. Additionally, we have updated the table legend on page 77 to clarify how to interpret the information provided in this column.

      It is important to validate that the results of the positive hits (where learning was affected), such as acc-1, acc-3, and lgc-46, do not stem from background mutations.

      While we agree that confirming the absence of background mutations is important, we have taken alternative steps to address this concern:

      - The outcross status of each strain is now clearly indicated in Table S1.

      - Observed phenotypes were consistent across multiple biological replicates over extended periods (months, sometimes years), reducing the likelihood that results stem from background mutations.

      We believe these measures provide confidence in the validity of our findings.

      The fold change in the number of hits for different neurons in the CENGEN-based rank analysis requires a statistical test (discussed on pages 17-19 and summarized in Table S7). Similar to the other gene enrichment analyses presented in the manuscript, the new rank analysis also requires a statistical test. Since the authors extensively elaborate on the results from this analysis, I think a statistical analysis is especially important for its interpretation. For example, if considering the IL1 neurons, which ranked highest, and assuming random groups of genes-each having the same size as those of the ranked neurons (209 genes in total for IL1 in Table S7)-how common would it be to get the calculated fold change of 1.38 or higher? Such bootstrapping analysis is common for enrichment analysis. Perhaps the authors could consult with an institutional expert (Dr. Pawel Skuza, Flinders University) for the statistical aspects of this analysis.

      We appreciate the suggestion and agree that statistical testing can be valuable for enrichment analyses. However, implementing additional tests such as bootstrapping is beyond the scope of this study. Our aim was to provide a descriptive overview rather than inferential statistics. To ensure transparency and interpretability, we have:

      - Clearly reported fold changes and rankings in Table S7.

      - Discussed the limitations of this approach in the manuscript text (page 18, lines 17–20).

      - Clearly outlined the methods used to perform this analysis (pages 53–54).

      We believe this descriptive analysis provides sufficient context for interpreting these results.

      The learning phenotypes from Figure S8, concerning acc-1, acc-3, and lgc-46 mutants, are summarized in a scheme in Figure 4; however, the chemotaxis results are found in the supplemental Figure S8. Perhaps I missed the reasoning, but for transparency, I think the relevant Figure S8 results should be shown together with their summary scheme in Figure 4.

      Thank you for this suggestion to improve clarity. We have now moved the panels corresponding to cholinergic signalling components from Figure S8 into Figure 4 on page 21, so that the summary scheme and underlying data are presented together. The figure legends and main text have been updated accordingly to reflect the correct figure numbers.

      Reviewer #2 (Public review):

      Summary:

      In this study by Rahmani in colleagues, the authors sought to define the "learning proteome" for a gustatory associative learning paradigm in C. elegans. Using a cytoplasmic TurboID expressed under the control of a pan-neuronal promoter, the authors labeled proteins during the training portion of the paradigm, followed by proteomics analysis. This approach revealed hundreds of proteins potentially involved in learning, which the authors describe using gene ontology and pathway analysis. The authors performed functional characterization of over two dozen of these genes for their requirement in learning using the same paradigm. They also compared the requirement for these genes across various learning paradigms and found that most hits they characterized appear to be specifically required for the training paradigm used for generating the "learning proteome".

      Strengths:

      The authors have thoughtfully and transparently designed and reported the results of their study. Controls are carefully thought-out, and hits are ranked as strong and weak. By combining their proteomics with behavioral analysis, the authors also highlight the biological significance of their proteomics findings, and support that even weak hits are meaningful.

      The authors display a high degree of statistical rigor, incorporating normality tests into their behavioral data which is beyond the field standard.

      The authors include pathway analysis that generates interesting hypotheses about processes involved learning and memory

      The authors generally provide thoughtful interpretations for all of their results, both positive and negative, as well as any unexpected outcomes.

      Weaknesses:

      - The authors use the Cengen single cell-transcriptomic atlas to predict where the proteins in the "learning proteome" are likely to be expressed and use this data to identify neurons that are likely significant to learning, and building hypothetical circuit. This is an excellent idea; however, the Cengen dataset only contains transcriptomic data from juvenile L4 animals, while the authors performed their proteome experiments in Day 1 Adult animals. It is well documented that the C. elegans nervous system transcriptome is significant different between these two stages (Kaletsky et al., 2016, St. Ange et al., 2024), so the authors might be missing important expression data, resulting in inaccurate or incomplete networks. The adult neuronal single-cell atlas data (https://cestaan.princeton.edu/) would be better suited to incorporate into neuronal expression analysis.

      Thank you for highlighting this important point. We have now incorporated transcriptomic data from young adult animals to complement the L4-based CeNGEN dataset. Specifically, we integrated data from CeSTAAN (https://cestaan.princeton.edu/, including St. Ange et al., 2024) and WormSeq (https://wormseq.org/, including Ghaddar et al., 2023), as outlined below. Importantly, CeSTAAN and WormSeq provide data for 79 and 104 neuron classes, respectively (compared to 128 from CeNGEN); for this reason, the main analysis focuses on CeNGEN due to its broader coverage, with additional datasets noted in brackets for completeness. This is stated on page 18, lines 15–17 to ensure transparency regarding our rationale.

      The main text has been updated to describe these datasets and their integration into our analysis (pages 18–20), and further details on how these resources were used have been added to the Experimental Procedures (pages 53–54).

      We also incorporated data from Kaletsky et al. (2016) and St. Ange et al. (2024) into our neuron identity checks for all assigned and unassigned hits (page 16, lines 8–19). This analysis shows that the nervous system is highly represented in our proteome data: 75–87% of assigned hits and 75–83% of all hits correspond to neuron-enriched genes identified by St. Ange et al. and Kaletsky et al.

      In addition, we used several transcriptomic databases to confirm that learning regulators identified in this study through TurboID and validation experiments are expressed in the same neuron classes as suggested by CenGEN (page 36).

      - The authors offer many interpretations for why mutants in "learning proteome" hits have no detectable phenotype, which is commendable. They are however overlooking another important interpretation, it is possible that these changes to the proteome are important for memory, which is dependent upon translation and protein level changes, and is molecularly distinct from learning. It is well established in the field mutating or knocking down memory regulators in other paradigms will often have no detectable effect on learning. Incorporating this interpretation into the discussion and highlighting it as an area for future exploration would strengthen the manuscript.

      Thank you for this suggestion. We have incorporated this interpretation into the Results section (page 31, lines 17–23), specifying the potential role of these proteomic changes in memory encoding and retention, which are molecularly distinct from learning.

      - A minor weakness - In the discussion, the authors state that the Lakhina, et al 2015 used RNA-seq to assess memory transcriptome changes. This study used microarray analysis.

      This has been corrected on page 38, line 5.

      Significance:

      The approach used in this study is interesting and has the potential to further our knowledge about the molecular mechanisms of associative behaviors. There have been multiple transcriptomic studies in the worm looking at gene expression changes in the context of behavioral training. This study compliments and extends those studies, by examining how the proteome changes in a different training paradigm. This approach here could be employed for multiple different training paradigms, presenting a new technical advance for the field. This paper would be of interest to the broader field of behavioral and molecular neuroscience. Though it uses an invertebrate system, many findings in the worm regarding learning and memory translate to higher organisms, making this paper of interest and significant to the broader field of behavioral neuroscience.

      Reviewer #4 (Public review):

      Summary:

      In this manuscript, authors used a learning paradigm in C. elegans; when worms were fed in a saltless plate, its chemotaxis to salt is greatly reduced. To identify learning-related proteins, authors employed nervous system-specific transcriptome analysis to compare whole proteins in neurons between high-salt-fed animals and saltless-fed animals. Authors identified "learning-specific proteins" which are observed only after saltless feeding. They categorized these proteins by GO analyses, pathway analyses and expression site analyses, and further stepped forward to test mutants in selected genes identified by the proteome analysis. They find several mutants that are defective or hyper-proficient for learning, including acc-1/3 and lgc-46 acetylcholine receptors, F46H5.3 putative arginine kinase, and kin-2, a cAMP pathway gene. These mutants were not previously reported to have abnormality in the learning paradigm.

      Concerns:

      Upon revision, authors addressed all concerns of this reviewer, and the results are now presented in a way that facilitates objective evaluation. Authors' conclusions are supported by the results presented, and the strength of the proteomics approach is persuasively demonstrated.

      Thank you, we appreciate this positive feedback.

      Significance:

      (1) Total neural proteome analysis has not been conducted before for learning-induced changes, though transcriptome analysis has been performed for odor learning (Lakhina et al., http://dx.doi.org/10.1016/j.neuron.2014.12.029). This warrants the novelty of this manuscript, because for some genes, protein levels may change even though mRNA levels remain the same. Although in a few reports TurboID has been used in C. elegans, this is the first report of a systematic analysis of tissue-specific differential proteomics.

      (2) Authors found five mutants that have abnormality in the salt learning. These genes have not been described to have the abnormality, providing novel knowledge to the readers, especially those who work on C. elegans behavioural plasticity. Especially, involvement of acetylcholine neurotransmission has not been addressed before. Although transgenic rescue experiments have not been performed except kin-2, and the site of action (neurons involved) has not been tested in this manuscript, it will open the venue to further determine the way in which acetylcholine receptors, cAMP pathway etc. influences the learning process.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The authors stated in their response to reviewers that "referring to a phenotype as both a trend and non-significant may confuse readers, which was originally stated in the manuscript in two locations," and that such sentences were removed. Unfortunately, in the new text (page 28, lines 18-19), the authors write: "uev-3 mutants showed a lower average CI after training compared with wild-type, but this did not reach statistical significance." As stated before, I find such sentences confusing and not interpretable. If the changes are not significant, then the lower average CI is not informative.

      Thank you for pointing this out. This has been corrected to improve clarity – we say instead that “trained phenotypes between wild-type and uev-3 mutants were not statistically significant” (page 29, lines 21–22).

      In response to reviewers' comments, the authors added more information about the biotinylation efficiency of the experiment, which is also described in the text:

      Page 8, line 27: "we found that biotin exposure increased the signal 1.3-fold for non-Tg and 1.7-fold for TurboID C. elegans."

      Page 10, line 4: "Quantification of the signal within entire lanes showed a 1.1-fold increase in the 'TurboID, control' lane compared with the 'non-Tg, control' lane, and a 1.9-fold increase in the 'TurboID, trained' lane compared with the 'non-Tg, trained' lane."

      Is it common in this field not to show the actual raw quantified numbers? I was expecting either a bar graph or instead that the measured values would appear in the text alongside the fold-change information.

      Table S2 (and its table legend on page 77) have been edited to include raw area values.

      Figure 5: Typo? - "pan neuronal expression of ..." The allele number is written as 139, but I believe it should be 179, as in the rest of the paper.

      The typo has been corrected on page 25.

      The results describing the absence of a learning phenotype in backcrossed C30G12.6 are presented in the main figure. If the authors believe this is an important result, I understand keeping it in the main figure; however, I find this uncommon.

      Thank you for your comment. We consider the absence of a learning phenotype in backcrossed C30G12.6 to be an important control for interpreting the original findings, which is why we have retained it in the main figure.

      Reviewer #4 (Recommendations for the authors):

      I noted a few typos.

      (1) In Fig 5B, the transgene is depicted kin-2(ce139) but it is probably kin-2(ce179).

      The typo has been corrected on page 25.

      (2) In text, R97C and ce179 are used interchangeably, but in fact there is no description that they are identical.

      We now state the following in the manuscript: “We tested worms with the ce179 mutant allele in kin-2, in which a conserved residue in the inhibitory domain (which normally functions to keep PKA turned off in the absence of cAMP) is mutated to cause an R92C amino acid change – this results in increased PKA activity (Schade et al., 2005).” (page 25, lines 1–3),

      (3) p31 line 7, Figure S7 -> Fig S9 C-E

      We apologise for this typographical error. This figure number is meant to correspond to salt associative learning assay data (Fig. S8), not salt aversive learning (Fig. S9). Since the data from Fig. S8 was moved to Fig. 4, the figure citation has been changed from Fig. S7 (which was incorrect) to Fig. 4 (page 32, line 17).

      (4) p45 line 11, Fig S9 -> Fig S6

      The typo has been corrected (page 47, line 12).

    1. eLife Assessment

      This valuable work demonstrates that M. tuberculosis protein PPE2 perturbs adipose tissue biology by modulating adipogenesis, lipolysis, and inflammatory remodeling, thereby contributing to fat loss and insulin resistance during TB. Using M. smegmatis overexpression strains, PPE2-deficient Mtb mutants, and mouse models, the study links PPE2 to downregulation of PPAR-γ, C/EBP-α, adiponectin, and broader transcriptional changes in host fatty acid metabolism. These findings convincingly highlight, for the first time, a direct role for a bacterial virulence factor in TB-associated wasting. However, despite strong associative evidence, the mechanistic basis of PPE2-mediated regulation remains unresolved.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, Bisht et al. investigate the role of PPE2, a Mycobacterium tuberculosis (Mtb) secreted virulence factor, in adipose tissue physiology during tuberculosis (TB) infection. Previous work by this group established the significance of PPE proteins in Mtb virulence and their role in modulating the innate immune response. Here, the authors present compelling evidence that PPE2 regulates host cell adipogenesis and lipolysis, thereby establishing a link to the development of insulin resistance during TB infection. These fundamental findings demonstrate, for the first time, that a bacterial virulence factor is directly involved in the profound body fat loss, or "wasting," which is a long-established clinical symptom of active TB.

      Key Strengths:

      The confidence in the major findings of this study is significantly strengthened by the authors' comprehensive approach. They judiciously employ multiple experimental systems, including:

      (1) Purified PPE2 protein.

      (2) A non-pathogenic Mycobacterium strain engineered to express PPE2.

      (3) A pathogenic clinical Mtb strain (CDC1551) utilizing a targeted PPE2 deletion mutant.

      (4) While the presence of Mtb in adipose tissues in human and animal models is well-documented, this study is groundbreaking in demonstrating that an Mtb virulence-associated factor actively modulates host fatty acid metabolism within the adipose tissue.

      Key Weakness:

      Although the manuscript provides solid evidence associating the presence of PPE2 with transcriptional changes in host fatty acid machinery within the adipose tissue, the underlying mechanistic details remain elusive. A focused, deep mechanistic follow-up study will be essential to fully appreciate the complex biological implications of the findings reported here.

    3. Reviewer #2 (Public review):

      Summary:

      In the manuscript entitled "The PPE2 protein of Mycobacterium tuberculosis is responsible for the development of hyperglycemia and insulin resistance during tuberculosis" the authors identify PPE2, a secretory protein of Mycobacterium tuberculosis, as a modulator of adipose function. They show that PPE2 treatment in mice causes fat loss, immune cell infiltration into adipose, reduced gene expression of PPAR-γ, C/EBP-α, and adiponectin, and glucose intolerance. Overall, the authors link PPE2 with adipose tissue perturbation and insulin resistance following infection with M. tuberculosis. PPE2, a secretory protein of Mycobacterium tuberculosis, is a modulator of adipose function. They show that PPE2 treatment in mice causes fat loss, immune cell infiltration into adipose, reduced gene expression of PPAR-γ, C/EBP-α, and adiponectin, and glucose intolerance. Overall, the authors link PPE2 with adipose tissue perturbation and insulin resistance following infection with M. tuberculosis.

      Strengths:

      While it is known that M. tuberculosis persists in adipose, the mycobacterial factors contributing to adipose dysfunction are unknown. The study uses multiple mechanisms, including recombinant purified protein, non-pathogenic mycobacterium expressing PPE2, and a clinical strain of M. tuberculosis depleted of PPE2, to show that PPE2 may play an important role in causing fat loss, lipolysis, and insulin resistance following infection. The authors show that PPE2, through unknown mechanisms, decreases gene expression of proteins involved in adipogenesis. Although the mechanisms are unclear, this study advances the field as it is the first to identify a secreted factor (PPE2) from M. tuberculosis to play a role in disrupting adipose tissue.

      Weaknesses:

      There is a lack of completeness amongst the figures that greatly diminishes the claims and impact of the manuscript. For example, in Figures 2 and 5, the authors measure adipocyte area in H&E-stained adipose tissue to show adipose hypertrophy. However, this was not completed in Figures 3 and 4 despite the authors claiming that treatment with rPPE2 induces adipose hypertrophy. It is unclear why the adipocyte area was not measured in these figures, and having this included would support the author's claim and strengthen the manuscript. The same is true for immune cell infiltration, where the authors say there is increased immune cell infiltration following PPE2 treatment. This is based on H&E staining, but the data supporting this is limited. Although the authors measure CD3+ T cell infiltration in adipose tissue from mice infected with the clinical strain where PPE was depleted, staining was performed in only this experiment. Completing these experiments by showing data to support that PPE2 induces immune cell infiltration would greatly strengthen the manuscript.

      The authors state that a Student's t-test was performed to calculate the significance between two samples. However, there is no discussion of what statistical method was used when there were more than 2 groups, which occurs throughout the manuscript, such as in Figure 5, where 4 groups are analyzed. Having the appropriate statistical analysis is important for the impact of the manuscript.

    4. Reviewer #3 (Public review):

      Summary:

      In this manuscript titled "The PPE protein of Mycobacterium tuberculosis is responsible for the development of hyperglycemia and insulin resistance during tuberculosis", Bisht et al describe that PPE2 protein from Mtb is a key modulator of adipose tissue physiology that contributes to the development of insulin resistance. The authors have used 3T3-L1 preadipocyte cell lines, M. smegmatis overexpression strain, mice model, and genetically modified Mtb deletion strains to demonstrate that PPE promotes persistence in adipose tissue and regulates glucose homeostasis. Using qPCR and RNA-seq experiments, the authors demonstrate that PPE2 regulates the expression of key genes involved in adipogenesis.

      Strengths:

      Using purified protein, the authors show that PPE2 regulates adipose tissue physiology, and this effect was neutralised in the presence of anti-PPE2. The expression of several adipogenic markers was also reduced in 3TL-1 adipocytes treated with rPPE2 and in mice infected with M. smegmatis strains overexpressing PPE2. Using a mouse model of infection, the authors show that PPE2 contributes to enhanced mycobacterial survival within fat tissues. The authors also show infiltration of immune cells in the fat tissues of mice infected with wild-type and ppe2-complemented strains compared to the ppe2 KO strain. In order to gain a better mechanistic understanding of how PPE2 regulates adipogenesis, the authors employed an RNA-seq approach and identified 191 genes that were significantly differentially expressed in the fat tissues of mice infected with wild-type and ppe2 KO Mtb strains. The differentially expressed genes included transcripts encoding for proteins involved in chemokine/cytokine signalling, ER stress response. The expression of a few of these markers was also validated by qPCR and western blot analysis. Finally, the authors also show that PPE2 promotes lipolysis by reducing phosphodiesterase levels and activating PKA-HSL signalling. The experimental design is overall reasonable, and the methods used are reliable. Overall, the current study did provide some new information on the contribution of PPE2 in regulating adipose tissue physiology.

      Weaknesses:

      (1) The authors have used several methodologies to show that PPE2 regulates adipose tissue physiology and glucose homeostasis. But the exact mechanism is still not clear.

      (2) Mtb encodes several PE/PPE proteins? The authors have used PPE2 for their study. Will secretory PPE2 homologs also regulate similar cellular processes?

      (3) How do the authors rule out that the differences observed in the fat tissues of mice infected with wild-type and mutant strains are not associated with reduced bacterial burdens? Is it possible to include another Mtb attenuated strain as a control in mice experiments for few critical experiments?

    5. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Bisht et al. investigate the role of PPE2, a Mycobacterium tuberculosis (Mtb) secreted virulence factor, in adipose tissue physiology during tuberculosis (TB) infection. Previous work by this group established the significance of PPE proteins in Mtb virulence and their role in modulating the innate immune response. Here, the authors present compelling evidence that PPE2 regulates host cell adipogenesis and lipolysis, thereby establishing a link to the development of insulin resistance during TB infection. These fundamental findings demonstrate, for the first time, that a bacterial virulence factor is directly involved in the profound body fat loss, or "wasting," which is a long-established clinical symptom of active TB.

      Key Strengths:

      The confidence in the major findings of this study is significantly strengthened by the authors' comprehensive approach. They judiciously employ multiple experimental systems, including:

      (1) Purified PPE2 protein.

      (2) A non-pathogenic Mycobacterium strain engineered to express PPE2.

      (3) A pathogenic clinical Mtb strain (CDC1551) utilizing a targeted PPE2 deletion mutant.

      (4) While the presence of Mtb in adipose tissues in human and animal models is well-documented, this study is groundbreaking in demonstrating that an Mtb virulence-associated factor actively modulates host fatty acid metabolism within the adipose tissue.

      We thank the reviewer for his appreciation that in this work we demonstrated for the first time that an Mtb virulent factor is directly linked to TB-associated wasting.

      Weakness:

      Although the manuscript provides solid evidence associating the presence of PPE2 with transcriptional changes in host fatty acid machinery within the adipose tissue, the underlying mechanistic details remain elusive. A focused, deep mechanistic follow-up study will be essential to fully appreciate the complex biological implications of the findings reported here.

      We agree with the reviewer that a deep-focused, mechanistic follow-up study is necessary to further elucidate the complex biological implications of PPE2 actions. However, we believe that we have uncovered at least one of the possible mechanisms by which PPE2 increases lipolysis and circulating free fatty acids during infection by targeting cAMP-PKA-HSL pathway (Figure 7). In future studies we will aim to dissect out the mechanisms by which PPE2 triggers hyperglycaemia and insulin resistance.

      Reviewer #2 (Public review):

      Summary:

      In the manuscript entitled "The PPE2 protein of Mycobacterium tuberculosis is respon,sible for the development of hyperglycemia and insulin resistance during tuberculosis" the authors identify PPE2, a secretory protein of Mycobacterium tuberculosis, as a modulator of adipose function. They show that PPE2 treatment in mice causes fat loss, immune cell infiltration into adipose, reduced gene expression of PPAR-γ, C/EBP-α, and adiponectin, and glucose intolerance. Overall, the authors link PPE2 with adipose tissue perturbation and insulin resistance following infection with M. tuberculosis. PPE2, a secretory protein of Mycobacterium tuberculosis, is a modulator of adipose function. They show that PPE2 treatment in mice causes fat loss, immune cell infiltration into adipose, reduced gene expression of PPAR-γ, C/EBP-α, and adiponectin, and glucose intolerance. Overall, the authors link PPE2 with adipose tissue perturbation and insulin resistance following infection with M. tuberculosis.

      Strengths:

      While it is known that M. tuberculosis persists in adipose, the mycobacterial factors contributing to adipose dysfunction are unknown. The study uses multiple mechanisms, including recombinant purified protein, non-pathogenic mycobacterium expressing PPE2, and a clinical strain of M. tuberculosis depleted of PPE2, to show that PPE2 may play an important role in causing fat loss, lipolysis, and insulin resistance following infection. The authors show that PPE2, through unknown mechanisms, decreases gene expression of proteins involved in adipogenesis. Although the mechanisms are unclear, this study advances the field as it is the first to identify a secreted factor (PPE2) from M. tuberculosis to play a role in disrupting adipose tissue.

      We thank the reviewer for his appreciation of our findings presented in the manuscript.

      Weaknesses:

      (1) There is a lack of completeness amongst the figures that greatly diminishes the claims and impact of the manuscript. For example, in Figures 2 and 5, the authors measure adipocyte area in H&E-stained adipose tissue to show adipose hypertrophy. However, this was not completed in Figures 3 and 4 despite the authors claiming that treatment with rPPE2 induces adipose hypertrophy. It is unclear why the adipocyte area was not measured in these figures, and having this included would support the author's claim and strengthen the manuscript. The same is true for immune cell infiltration, where the authors say there is increased immune cell infiltration following PPE2 treatment. This is based on H&E staining, but the data supporting this is limited. Although the authors measure CD3+ T cell infiltration in adipose tissue from mice infected with the clinical strain where PPE was depleted, staining was performed in only this experiment. Completing these experiments by showing data to support that PPE2 induces immune cell infiltration would greatly strengthen the manuscript.

      As per the suggestion of the esteemed reviewer, in the revised manuscript we will attempt to analyse adipocyte area in both Figures 3 and 4. In the original manuscript, immune cell infiltration analyses (H&E staining and CD3+ staining) was restricted to only M. tuberculosis-mouse infection model, which best reflects the human tuberculosis pathology.  In other experiments involving infection with M. smegmatis expressing PPE2, immune cell infiltration studies will be carried out.

      (2) The authors state that a Student's t-test was performed to calculate the significance between two samples. However, there is no discussion of what statistical method was used when there were more than 2 groups, which occurs throughout the manuscript, such as in Figure 5, where 4 groups are analyzed. Having the appropriate statistical analysis is important for the impact of the manuscript.

      We agree with the reviewer that we missed to include ANOVA in the statistical analyses. We will include one-way ANOVA analysis where more than two groups are present and mention the statistical methods in the figure legends as well in the text of the revised manuscript.

      Reviewer #3 (Public review):

      Summary:

      In this manuscript titled "The PPE protein of Mycobacterium tuberculosis is responsible for the development of hyperglycemia and insulin resistance during tuberculosis", Bisht et al describe that PPE2 protein from Mtb is a key modulator of adipose tissue physiology that contributes to the development of insulin resistance. The authors have used 3T3-L1 preadipocyte cell lines, M. smegmatis overexpression strain, mice model, and genetically modified Mtb deletion strains to demonstrate that PPE promotes persistence in adipose tissue and regulates glucose homeostasis. Using qPCR and RNA-seq experiments, the authors demonstrate that PPE2 regulates the expression of key genes involved in adipogenesis.

      Strengths:

      Using purified protein, the authors show that PPE2 regulates adipose tissue physiology, and this effect was neutralised in the presence of anti-PPE2. The expression of several adipogenic markers was also reduced in 3TL-1 adipocytes treated with rPPE2 and in mice infected with M. smegmatis strains overexpressing PPE2. Using a mouse model of infection, the authors show that PPE2 contributes to enhanced mycobacterial survival within fat tissues. The authors also show infiltration of immune cells in the fat tissues of mice infected with wild-type and ppe2-complemented strains compared to the ppe2 KO strain. In order to gain a better mechanistic understanding of how PPE2 regulates adipogenesis, the authors employed an RNA-seq approach and identified 191 genes that were significantly differentially expressed in the fat tissues of mice infected with wild-type and ppe2 KO Mtb strains. The differentially expressed genes included transcripts encoding for proteins involved in chemokine/cytokine signalling, ER stress response. The expression of a few of these markers was also validated by qPCR and western blot analysis. Finally, the authors also show that PPE2 promotes lipolysis by reducing phosphodiesterase levels and activating PKA-HSL signalling. The experimental design is overall reasonable, and the methods used are reliable. Overall, the current study did provide some new information on the contribution of PPE2 in regulating adipose tissue physiology.

      We thank the reviewer for encouraging comments about the manuscript.

      Weaknesses:

      (1) The authors have used several methodologies to show that PPE2 regulates adipose tissue physiology and glucose homeostasis. But the exact mechanism is still not clear.

      We have clearly demonstrated that PPE2 inhibit PPAR-γ and C/EBP-α expression to block adipogenic differentiation. Further, we demonstrated a possible mechanism by which PPE2 trigger lipolysis via activation of the ER stress and cAMP/PKA/HSL pathway which is responsible for increasing free fatty acids in circulation (Figure 7) as confirmed by our observation that PPE2KO (ppe2 knock-out) Mtb infected mice had lower NEFA as compared to the those infected with wild-type Mtb (Figure 7F). Crucially, we showed that this mechanism is clinically relevant since NEFA levels in the sera of TB patients were higher as compared to the healthy controls (Figure 7G) confirming presence of dyslipidemia in TB patients which is an established risk factor for insulin resistance (Karpe et al., 2011; Bhattacharya et al., 2007), As increased free fatty acids have been shown to be linked to development of insulin resistance in several studies, this mechanism links PPE2 with the regulation of glucose homeostasis.

      (2) Mtb encodes several PE/PPE proteins? The authors have used PPE2 for their study. Will secretory PPE2 homologs also regulate similar cellular processes?

      It is known that Mtb encodes several PE/PPE family proteins and some of these have been implicated to play a role in host–pathogen interactions (Mukhopadhyay and Balaji, 2011; Dahiya et al., 2025). However, so far only PPE2 is shown to be present in the circulation (Bisht et al., 2023) which is the main reason we chose it for this study. Presence of PPE2 homologues in the circulation is not known so far.

      (3) How do the authors rule out that the differences observed in the fat tissues of mice infected with wild-type and mutant strains are not associated with reduced bacterial burdens? Is it possible to include another Mtb attenuated strain as a control in mice experiments for few critical experiments?

      We agree with the reviewer that the differences in bacterial burden can influence host tissue responses.  Precisely for this reason, we did not rely on just one infection model alone. We used a multi-pronged approach to de-couple the effects of PPE2 from the effects of bacterial load, like;

      (1) In vitro Model using recombinantly purified PPE2 protein (rPPE2) (Figure 1): In cultured 3T3-L1 adipocytes, purified rPPE2 protein directly inhibited adipogenesis by downregulating important factors like PPAR-g,C/EBP-α and Fatty acid synthase (which play a critical role in triglyceride metabolism) demonstrating a direct effect of PPE2 in the complete absence of infection.

      (2) Recombinant Protein Injection (Figure 3): By injecting recombinantly purified PPE2 protein (rPPE2) into mice, we observed similar metabolic perturbations (fat loss, impaired glucose tolerance) in the complete absence of any bacteria, demonstrating that PPE2 can drive these phenotypes independent of bacterial burden. Further study of rescuing of PPE2 action in rPPE2-immunized mice strongly confirm the specific role of PPE2 in establishing hyperglycaemia and insulin resistance (Figure 4).

      While the Mtb aerosol model can be questioned for bacterial load effects, it provides crucial in vivo validation that PPE2 function is relevant in the context of mycobacterial infection.

      References

      Bhattacharya S, Dey D, Roy SS. Molecular mechanism of insulin resistance. J Biosci. 2007 Mar;32(2):405-13. doi: 10.1007/s12038-007-0038-8. PMID: 17435330.

      Bisht MK, Pal R, Dahiya P, Naz S, Sanyal P, Nandicoori VK, Ghosh S, Mukhopadhyay S. The PPE2 protein of Mycobacterium tuberculosis is secreted during infection and facilitates mycobacterial survival inside the host. Tuberculosis (Edinb). 2023 Dec;143:102421. doi: 10.1016/j.tube.2023.102421. Epub 2023 Oct 12. PMID: 37879126.

      Dahiya P, Bisht MK, Mukhopadhyay S. Role of PE family of proteins in mycobacterial virulence: Potential on anti-TB vaccine and drug design. Int Rev Immunol. 2025; 44(4):213-228. doi: 10.1080/08830185.2025.2455161. Epub 2025 Jan 31. PMID: 39889764.

      Karpe F, Dickmann JR, Frayn KN. Fatty acids, obesity, and insulin resistance: time for a reevaluation. Diabetes. 2011 Oct;60(10):2441-9. doi: 10.2337/db11-0425. PMID: 21948998; PMCID: PMC3178283.

      Mukhopadhyay S, Balaji KN. The PE and PPE proteins of Mycobacterium tuberculosis. Tuberculosis (Edinb). 2011 Sep;91(5):441-7. doi: 10.1016/j.tube.2011.04.004. Epub 2011 May 6. PMID: 21527209.

    1. eLife Assessment

      Combining connectomics, optogenetics, behavioral analysis and modeling, this study delivers important findings on the role of inhibitory neurons in the generation of leg grooming movements in Drosophila. The results include convincing evidence that the identified neuronal populations are key in the generation of rhythmic leg movements, structured in distinct polysynaptic pathways articulating inhibition and disinhibition of antagonistic sets of motor neurons, as mapped from an electron microscopy volume of the ventral nerve cord, which orchestrate an alternation of flexion and extension. By analyzing limb kinematics upon experimentally silencing specific populations of premotor inhibitory neurons, together with computational modelling, the potential role of these neurons in rhythmic leg movement is shown. This work will be of interest to neuroscientists working in motor control and limbed locomotion.

    2. Reviewer #1 (Public review):

      Summary:

      Syed et al. investigate the circuit underpinnings for leg grooming in the fruit fly. They identify two populations of local interneurons in the right front leg neuromere of ventral nerve cord, i.e. 62 13A neurons and 64 13B neurons. Hierarchical clustering analysis identifies each 10 morphological classes for both populations. Connectome analysis reveals their circuit interactions: these GABAergic interneurons provide synaptic inhibition either between the two subpopulations, i.e. 13B onto 13A, or among each other, i.e. 13As onto other 13As, and/or onto leg motoneurons, i.e. 13As and 13Bs onto leg motoneurons. Interestingly, 13A interneurons fall into two categories with one providing inhibition onto a broad group of motoneurons, being called "generalists", while others project to few motoneurons only, being called "specialists". Optogenetic activation and silencing of both subsets strongly effects leg grooming. As well activating or silencing subpopulations, i.e. 3 to 6 elements of the 13A and 13B groups has marked effects on leg grooming, including frequency and joint positions and even interrupting leg grooming. The authors present a computational model with the four circuit motifs found, i.e. feed-forward inhibition, disinhibition, reciprocal inhibition and redundant inhibition. This model can reproduce relevant aspects of the grooming behavior.

      Strengths:

      The authors succeeded in providing evidence for neural circuits interacting by means of synaptic inhibition to play an important role in the generation of a fast rhythmic insect motor behavior, i.e. grooming of the body using legs. Two populations of local interneurons in the fruit fly VNC comprise four inhibitory circuit motifs of neural action and interaction: feed-forward inhibition, disinhibition, reciprocal inhibition and redundant inhibition. Connectome analysis identifies the similarities and differences between individual members of the two interneuron populations. Modulating the activity of small subsets of these interneuron populations markedly affects generation of grooming behavior thereby exemplifying their relevance. The authors carefully discuss strengths and limitations of their approaches and place their findings into the broader context of motor control.

      Weaknesses:

      Effects of modulating activity in the interneuron populations by means of optogenetics were conducted in the so-called "closed-loop" condition. This does not allow to differentiate between direct and secondary effects of the experimental modification in neural activity, as feedforward and feedback effects cannot be disentangled. To do so open loop experiments, e.g. in deafferented conditions, would be needed. Given that many members of the two populations of interneurons do not show one, but two or more circuit motifs, it remains to be disentangled which role the individual circuit motif plays in the generation of the motor behavior in intact animals.

      Comments on revisions:

      The authors have carefully revised the manuscript. I have no further suggestions or criticisms.

    3. Reviewer #3 (Public review):

      Summary:

      The authors set out to determine how GABAergic inhibitory premotor circuits contribute to the rhythmic alternation of leg flexion and extension during Drosophila grooming. To do this, they first mapped the ~120 13A and 13B hemilineage inhibitory neurons in the prothoracic segment of the VNC and clustered them by morphology and synaptic partners. They then tested the contribution of these cells to flexion and extension using optogenetic activation and inhibition and kinematic analyses of limb joints. Finally, they produced a computational model representing an abstract version of the circuit to determine how the connectivity identified in EM might relate to functional output. The study makes important contributions to the literature.

      The authors have identified an interesting question and use a strong set of complementary tools to address it:

      They analysed serial‐section TEM data to obtain reconstructions of every 13A and 13B neuron in the prothoracic segment. They manually proofread over 60 13A neurons and 64 13B neurons, then used automated synapse detection to build detailed connectivity maps and cluster neurons into functional motifs.

      They used optogenetic tools with a range of genetic driver lines in freely behaving flies to test the contribution of subsets of 13A and 13B neurons.

      They used a connectome-constrained computational model to determine how the mapped connectivity relates to the rhythmic output of the behavior.

      Comments on revisions:

      I appreciate that the authors have updated the GitHub repository to include the model and analysis code. Still lacking is: for the authors to explicitly separate empirical findings from modelling inferences in the text, and a supplemental table to make it clear which cell types are included. I should also point out that the code lacks annotations necessary for the results to be reproduced and the model to be reused.

    4. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Syed et al. investigate the circuit underpinnings for leg grooming in the fruit fly. They identify two populations of local interneurons in the right front leg neuromere of ventral nerve cord, i.e. 62 13A neurons and 64 13B neurons. Hierarchical clustering analysis identifies each 10 morphological classes for both populations. Connectome analysis reveals their circuit interactions: these GABAergic interneurons provide synaptic inhibition either between the two subpopulations, i.e. 13B onto 13A, or among each other, i.e. 13As onto other 13As, and/or onto leg motoneurons, i.e. 13As and 13Bs onto leg motoneurons. Interestingly, 13A interneurons fall into two categories with one providing inhibition onto a broad group of motoneurons, being called "generalists", while others project to few motoneurons only, being called "specialists". Optogenetic activation and silencing of both subsets strongly effects leg grooming. As well activating or silencing subpopulations, i.e. 3 to 6 elements of the 13A and 13B groups has marked effects on leg grooming, including frequency and joint positions and even interrupting leg grooming. The authors present a computational model with the four circuit motifs found, i.e. feed-forward inhibition, disinhibition, reciprocal inhibition and redundant inhibition. This model can reproduce relevant aspects of the grooming behavior.

      Strengths:

      The authors succeeded in providing evidence for neural circuits interacting by means of synaptic inhibition to play an important role in the generation of a fast rhythmic insect motor behavior, i.e. grooming. Two populations of local interneurons in the fruit fly VNC comprise four inhibitory circuit motifs of neural action and interaction: feed-forward inhibition, disinhibition, reciprocal inhibition and redundant inhibition. Connectome analysis identifies the similarities and differences between individual members of the two interneuron populations. Modulating the activity of small subsets of these interneuron populations markedly affects generation of the motor behavior thereby exemplifying their important role for generating grooming. The authors carefully discuss strengths and limitations of their approaches and place their findings into the broader context of motor control.

      We thank the reviewer for their thoughtful and constructive evaluation of our work.

      Weaknesses:

      Effects of modulating activity in the interneuron populations by means of optogenetics were conducted in the so-called closed-loop condition. This does not allow to differentiate between direct and secondary effects of the experimental modification in neural activity, as feedforward and feedback effects cannot be disentangled. To do so open loop experiments, e.g. in deafferented conditions, would be important. Given that many members of the two populations of interneurons do not show one, but two or more circuit motifs, it remains to be disentangled which role the individual circuit motif plays in the generation of the motor behavior in intact animals.

      Our optogenetic experiments show a role for 13A/B neurons in grooming leg movements – in an intact sensorimotor system - but we cannot yet differentiate between central and reafferent contributions. Activation of 13As or 13Bs disinhibits motor neurons and that is sufficient to induce walking/grooming. Therefore, we can show a role for the disinhibition motif.

      Proprioceptive feedback from leg movements could certainly affect the function of these reciprocal inhibition circuits. Given the synapses we observe between leg proprioceptors and 13A neurons, we think this is likely.

      Our previous work (Ravbar et al 2021) showed that grooming rhythms in dusted flies persist when sensory feedback is reduced, indicating that central control is possible. In those experiments, we used dust to stimulate grooming and optogenetic manipulation to broadly silence sensory feedback. We cannot do the same here because we do not yet have reagents to separately activate sparse subsets of inhibitory neurons while silencing specific proprioceptive neurons. More importantly, globally silencing proprioceptors would produce pleiotropic effects and severely impair baseline coordination, making it difficult to distinguish whether observed changes reflect disrupted rhythm generation or secondary consequences of impaired sensory input. Therefore, the reviewer is correct – we do not know whether the effects we observe are feedforward (central), feedback sensory, or both. We have included this in the revised results and discussion section to describe these possibilities and the limits of our current findings.

      Additionally, we have used a computational model to test the role of each motif separately and we show that in the results.  

      Comments on revisions:

      The careful revision of the manuscript improved the clarity of presentation substantially.

      Reviewer #2 (Public review):

      Summary:

      This manuscript by Syed et al. presents a detailed investigation of inhibitory interneurons, specifically from the 13A and 13B hemilineages, which contribute to the generation of rhythmic leg movements underlying grooming behavior in Drosophila. After performing a detailed connectomic analysis, which offers novel insights into the organization of premotor inhibitory circuits, the authors build on this anatomical framework by performing optogenetic perturbation experiments to functionally test predictions derived from the connectome. Finally, they integrate these findings into a computational model that links anatomical connectivity with behavior, offering a systems-level view of how inhibitory circuits may contribute to grooming pattern generation.

      Strengths:

      (1) Performing an extensive and detailed connectomic analysis, which offers novel insights into the organization of premotor inhibitory circuits.

      (2) Making sense of the largely uncharacterized 13A/13B nerve cord circuitry by combining connectomics and optogenetics is very impressive and will lay the foundation for future experiments in this field.

      (3) Testing the predictions from experiments using a simplified and elegant model.

      Thank you for the positive assessment of our work.

      Weaknesses:

      (1) In Figure 4-figure supplement 1, the inclusion of walking assays in dusted flies is problematic, as these flies are already strongly biased toward grooming behavior and rarely walk. To assess how 13A neuron activation influences walking, such experiments should be conducted in undusted flies under baseline locomotor conditions.

      We agree that there are better ways to assay potential contributions of 13A/13B neurons to walking. We intended to focus on how normal activity in these inhibitory neurons affects coordination during grooming, and we included walking because we observed it in our optogenetic experiments and because it also involves rhythmic leg movements. The walking data is reported in a supplementary figure because we think this merits further study with assays designed to quantify walking specifically. We will make these goals clearer in the revised manuscript and we are happy to share our reagents with other research groups more equipped to analyze walking differences.

      (2) Regarding Fig 5: The 70ms on/off stimulation with a slow opsin seems problematic. CsChrimson off kinetics are slow and unlikely to cause actual activity changes in the desired neurons with the temporal precision the authors are suggesting they get. Regardless, it is amazing the authors get the behavior! It would still be important for authors to mention the optogentics caveat, and potentially supplement the data with stimulation at different frequencies, or using faster opsins like ChrimsonR.

      We were also intrigued by the behavioral consequences of activating these inhibitory neurons with CsChrimson. We appreciate the reviewer’s point that CsChrimson’s slow off-kinetics limit precise temporal control. To address this, we repeated our frequency analysis using a range of pulse durations (10/10, 50/50, 70/70, 110/110, and 120/120 ms on/off) and compared the mean frequency of proximal joint extension/flexion cycles across conditions. We found no significant difference in frequency (LLMS, p > 0.05), suggesting that the observed grooming rhythm is not dictated by pulse period but instead reflects an intrinsic property of the premotor circuit once activated. We now include these results in ‘Figure 5—figure supplement 1’ and clarify in the text that we interpret pulsed activation as triggering, rather than precisely pacing, the endogenous grooming rhythm. We continue to note in the manuscript that CsChrimson’s slow off-kinetics may limit temporal precision. We will try ChrimsonR in future experiments.

      Overall, I think the strengths outweigh the weaknesses, and I consider this a timely and comprehensive addition to the field.

      Reviewer #3 (Public review):

      Summary:

      The authors set out to determine how GABAergic inhibitory premotor circuits contribute to the rhythmic alternation of leg flexion and extension during Drosophila grooming. To do this, they first mapped the ~120 13A and 13B hemilineage inhibitory neurons in the prothoracic segment of the VNC and clustered them by morphology and synaptic partners. They then tested the contribution of these cells to flexion and extension using optogenetic activation and inhibition and kinematic analyses of limb joints. Finally, they produced a computational model representing an abstract version of the circuit to determine how the connectivity identified in EM might relate to functional output. The study makes important contributions to the literature.

      The authors have identified an interesting question and use a strong set of complementary tools to address it:

      They analysed serial‐section TEM data to obtain reconstructions of every 13A and 13B neuron in the prothoracic segment. They manually proofread over 60 13A neurons and 64 13B neurons, then used automated synapse detection to build detailed connectivity maps and cluster neurons into functional motifs.

      They used optogenetic tools with a range of genetic driver lines in freely behaving flies to test the contribution of subsets of 13A and 13B neurons.

      They used a connectome-constrained computational model to determine how the mapped connectivity relates to the rhythmic output of the behavior.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      I still have the following specific suggestions and questions, which need the attention of the authors:

      P5, 2nd para, li 1: shouldn't "(Figures 1E and 1E')" be (Figures 1G and 1H)?

      P7, last para, li 3: shouldn't "(Figures 2C and 2D)" be (Figures 2A and 2B)?

      P19, para 2, last 2li: "...we observe that optogenetic activation......triggers grooming movements." I could not find the place in the text or a figure, where this was reported or shown. Please specify

      P19, last para: "... shows that 13A neurons can generate rhyhtmic movements....." Given that the experiments were conducted in closed-loop, i.e. including the loop through the leg and its movements, the following formulation appears more justified: "....shows that 13A neurons significantly contribute to the generation of rhythmic movements,....."

      P28, para 1, li 3 from bottom: "...themselves, rather than solely between antagonistsic motor neurons." While the authors are correct that in the stick insect and locust alternating inhibitory synaptic drive to flexor and extensor motoneurons has been shown to underly alternating activity of these two antagonistic motoneuron pools the previous studies have not shown or claimed that these synaptic inputs arise from direct interactions between these motoneuron pools. Based on this this text should be moved to the part "feed-forward inhibition" on page 27.

      P28: "redundant inhibition": this motif has been shown to be instrumental in the locust flight CPG, e.g. Robertson & Pearson, 1985, Fig. 16.

      P28: "reciprocal inhibition" The reviewer agrees with the authors that this motif has been shown for the mouse spinal cord, but also for other CPGs in vertebrates and invertebrates, e.g. clione, leech, xenopus - see the initial comment "(3) Intro and Discussion"

      Thank you, we have incorporated the suggested corrections and clarifications into the revised manuscript.

      Reviewer #2 (Recommendations for the authors):

      I'm satisfied with the revised version

      Reviewer #3 (Recommendations for the authors):

      The authors have made a substantial effort to address my original points. They corrected the title, expanded Discussion and Methods sections, reran statistical tests using mixed models, added modelling clarifications and constraints, and fixed or removed confusing figure panels. Those changes have improved clarity and reduced some of the claims that I thought were exaggerated.

      That said, some of my concerns remain only partially addressed, which could be fixed with relatively small tweaks. The authors should:

      (1) Explicitly separate empirical findings from modelling inferences throughout the manuscript, including the Abstract, Results and Discussion (i.e., label claims of "intrinsic rhythmogenesis" as model-based inferences, not direct experimental demonstrations)

      (2) Provide supplemental information on modelling to quantify the role of the black-box input (e.g., quantitative coordination/phase/frequency metrics for full model vs constant-input vs no black box), show pre- vs post-fine-tuning weight changes and the exact tuning constraints/optimization details (I could not find these details)

      (3) To ensure results are reproducible, provide a supplemental table mapping each split line to EM-identified neuron(s) with NBLAST/morphological scores for each match;

      (4) Fully document the statistical models (exact LMM/GLMM formulas, software/packages, etc);

      (5) Deposit model code, trained weights and analysis scripts in a public repository.

      We have updated the GitHub repository with the full statistical analysis documentation and model code, including trained weights and scripts.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1:

      (1) As such amount of work has been put into developing this community tool, it would be worth thinking about how it could serve other multiplex-immunofluorescence methods (such as immunoSABER, 4i, etc). Adding an extra tab where the particular method that uses those reagents is mentioned. This would also help as IBEX itself and related methods evolve in the future.

      We agree and currently support six other methods beyond the original ”IBEX2D Manual”, with the most generic being ”Multiplexed 2D Imaging”: standard, single cycle (non-iterative) imaging method applied to thin, 2D (5-30 micron) tissue sections. Descriptions of supported methods are given in the reagent glossary. We plan to evolve to include multiplex IF methods such as Immuno-SABER, 4i, Cell DIVE, etc. The current structure of the reagent resources table can support other immunofluorescence methods without modifications. The table contains information for IBEX and related methods. The particular method for which a reagent validation was evaluated is specified in the column titled ”Method”. Descriptions of supported methods are given in the reagent glossary.

      (2) It has a rather minimal description of the software. In particular, there is software that has not been developed for IBEX specifically but that could be used for IBEX datasets (ASHLAR, WSIReg, VALIS, WARPY, and QuPath, etc). It would be nice if there was mention of those.

      ASHLAR, WSIReg, VALIS, and Warpy have been added to the Knowledge-Base. These software components are specifically relevant for iterative imaging protocols which require image alignment. With respect to QuPath, Fiji, Napari and other general microscopy image analysis frameworks, these are not listed. Such frameworks provide a wide range of operations relevant for many microscopy image analysis tasks and are likely already familiar to researchers who are interested in the information contained in the Knowledge-Base.

      (3) There is a concern about how the negative data information will be added, as no publication or peer-review process can back it up. Perhaps the particular conditions of the experiment should be very well described to allow future users to assess the validity.

      We agree with this observation and have added the following language to the contribute page:

      ”When reporting information that has not appeared in a peer-reviewed publication, both negative and positive results, include more details with respect to experimental conditions and provide sample images as part of the supporting material files. In all cases, peer reviewed or not, we encourage providing additional details in the supporting material that you deem important and are not part of the csv file structure. These include, but are not limited to, lot numbers, versioned protocols used in the work, and any other information which will facilitate validation reproducibility.”

      (4) The proposed scheme where a reagent can be validated or recommended against by up to 4 different labs should be good. It may be good to make sure that researchers who validate belong to different labs and are not only different ORCID that belong to the same group. Similar to making a case of recommendations against a reagent.

      We generally support this recommendation. Based on our experience, even members within the same laboratory encounter challenges when attempting to validate reagents contributed by current or former colleagues. Additionally, research labs often experience significant personnel turnover, with minimal overlap over a five year span.

      To address these concerns, we have updated the instructions on the contribute page as follows: ”We only accept up to 5 ORCID additions in the Agree or Disagree columns. This means that the original contributor’s work was replicated by up to 4 individuals or refuted by up to 5 people. Priority is given to contributions from individuals in laboratories distinct from the original source.”

      (5) It is very interesting to keep track of the protocol versions used. Perhaps users should be able to validate independent versions and it will be important to know how information is kept.

      Thank you for your suggestion. We encourage members of the community to cite the latest version of the Knowledge-Base in the “Citing the Knowledge-Base” section.

      (6) The final point I would make is that the need to form a GitHub repository may deter some people from submitting data. For sporadic contributions, authors could think that users could either reach out to main developers and/or provide a submission form that can help less experienced users of command-line and GitHub programming, but still promote the contribution from the community.

      We have given this significant thought and now support a secondary path for contributing that does not require familiarity with git or GitHub. This path involves downloading a zip file, modifying the contents of the csv files and providing supporting material text files and images. Once the work is completed, the contributor contacts the Knowledge-Base maintainers and we complete the submission together, with the maintainers dealing with the usage of git and GitHub. This information has been added to the notes which are listed at the top of the Contribute page. We have recently completed the first contribution that followed this new workflow.

      We still encourage researchers to familiarize themselves with git and the GitHub repository hosting service. These tools have been shown to be useful for collaborative and reproducible laboratory research.

      Reviewer #2:

      (1) The potential impact of IBEX KB is very clear. However, the paper would benefit by also discussing more on KB maintenance and outreach, and how higher participation could be incentivized.

      We have added the following details to the discussion:

      The KB is actively maintained by its chairs, who meet bi-weekly to ensure its continued development and maintenance. In addition to these regular meetings, we engage with both current and prospective community members to gather feedback, encourage contributions, and expand the collective knowledge supporting the KB. To broaden outreach and foster sustained engagement, the IBEX community will collaborate with synergistic initiatives such as the HuBMAP Affinity Reagents Working Group, the European Society for Spatial Biology (ESSB), and the Global Alliance for Spatial Technologies (GESTALT).

      As a further incentive for participation, we intend to launch an annual “Reagent Validation Week”, a community driven event inspired by software hackathons. During this dedicated week, researchers would focus on validating or reproducing validation for selected reagents and contribute their findings to the KB. We have also discussed hosting an “Around the World” symposium, featuring presentations from both junior and senior scientists across the community, to showcase diverse perspectives and foster global collaboration.

      (2) Use of resources like GitHub may limit engagement from non-coding members of the scientific community. Will there be alternative options like a user-friendly web interface to contribute more easily?

      We agree with this observation and have addressed it. Please see detailed response to point 6 from Reviewer 1.

      Reviewer #3:

      (1) IBEX is a specific immunofluorescence method. However, the utility of the Knowledge base is not limited to the specific IBEX method. Therefore, I suggest removing the unnecessary branding of the term IBEX from the KB and citing potentially other similar cyclic immunofluorescence methods in the manuscript (e.g. CycIF Lin et al 2018). This would also emphasize the wider impact and applicability of the KB to the wider imaging community.

      For now, we have decided to keep the original reference to the IBEX method in the resource name and re-brand it in the next development phase. In that phase we intend to solicit reagent validations for methods unrelated to IBEX. We have added the reference to the CycIF publication. The manuscript text now reads: “We are optimistic that future versions will include extension of the IBEX method to other tissues and species and we intend to solicit contributions of reagent validations for other multiplexed imaging techniques such as CycIF Lin et al. (2015). At that point in time we expect to re-brand the KB as the IBEX++ Knowledge-Base...”

      (2) I believe reporting negative results with reagents is highly valuable. However, the way to report antibodies must include more details. To ensure data quality, every report should be linked to a specific protocol + images (or doc with the standard document variations, and sample information. This should be a mandatory requirement.

      We agree that this information is desirable, but we do not agree that it should be mandatory. In the contribution instructions we now explicitly list lot numbers and versioned protocols as examples of details that we encourage contributors to include in their supporting material files. We believe that requiring this information for a contribution sets the bar too high and will deter many from contributing information that can benefit others.

      (3) While cross-validation among researchers is beneficial, even if five individuals fail to reproduce results with a given antibody, their findings may be influenced by techniquespecific factors. It is also important to consider whether these researchers come from the same group, institution, or geographical region, as this could impact reproducibility. Additionally, entries that have not been reproduced at least five times using the same protocol should still be considered valuable information. To address this, an ”insufficient validation data” flag could be implemented, ensuring that incomplete but useful findings remain accessible.

      The contribution instructions now state that ”Priority is given to contributions from individuals in laboratories distinct from the original source”.

      While our goal is to support reproducing reagent validations, we do not expect these type of contributions be the rule as the only incentive we can provide to encourage this behavior is co-authorship on the authoritative dataset. As a result, it is likely that many of the validations will have a single endorser, the original contributor. These results are valuable information and we do not think they should be singled out (insufficient validation label). We leave it up to the users of the KB to decide whether they trust recommendations with multiple endorsers or if endorsement by a single highly trusted contributor is sufficient for them. In all cases, issues with contributions can be rasied and discussed on the KB discussion forum.

      The rationale for limiting the number of reproduction studies to five was that this is a minimal, yet sufficiently large, number that provides confidence in the results. Placing an upper limit ensures that researchers do not provide reproduction results for widely used and well established reagents just because these results are readily available to them.

      (4) This system could flag reagents with inconsistent reports, highlight potential techniquespecific issues, and suggest alternative reagents with stronger validation records. Furthermore, a validation confidence ranking could be introduced, taking into account the number of independent confirmations, protocol consistency, and reproducibility data. These measures would help refine the reporting process while maintaining transparency and scientific rigor.

      We agree that the functionality described here is desirable, but this is not part of the KB. At its core the KB is a dataset and we do not envision developing dedicated tools to perform these tasks. Instead, we foresee using the KB as context for interacting with AI agents. Providing the KB as context to an AI, one can currently use it to answer domain specific questions and perform related tasks such as designing imaging panels (under subject matter expert supervision). This was added to the sample usecases in the manuscript with a transcript from interaction with an AI model using the website as context provided as supplemental material.

      (5) Regarding image formats for results reporting, while JPG files are convenient due to their small size, TIFF files offer significant advantages, such as preserving metadata and maintaining the integrity of real data values. Proper signal adjustments may not always be applied by researchers, making TIFF crucial for accurate data analysis. I suggest in this regard making available the possibility of including a link to the original TIFF data

      The goal of the supporting material image is similar to that of an image used in a manuscript and it should not be used for data analysis purposes. This is the reason we chose the JPG format. Sharing these images is not intended to be a substitute for publicly sharing the original images and their associated metadata. This is now noted in the contributing instructions.

      (6) Homepage:

      Include a brief summary of the knowledge base’s purpose and tabs to provide clarity for new users. The current homepage is a bit misleading for newcomers.

      The homepage has been modified to include information about the Knowledge-Base, contents and how to use it including as context for interaction with AI agents.

      (7) Reagent Resources Section: Enable users to search for a target name directly, rather than filtering through dropdown options.

      The dropdown menu explicitly shows all available targets and also allows for direct search of target name. To use it for direct search, once the dropdown is selected start typing the name of the target and the focus will jump to it. Thus, if looking for ”Zrf1” there is no need to scroll through all targets in the dropdown. This also facilitates easy clearing of a filter, select the dropdown and start typing the word ”clear”, then press enter when it is highlighted. This information has been added to the page.

      Provide an option to download the dataset as a CSV file. This feature will be highly valued by non-computational researchers.

      Links to download the reagent resources csv file and the whole Knowledge-Base have been added.

      Add the same column documentation here as in the contributor instructions. For example, you need to make clear the distinctions between ”Recommend,” ”Agree,” and ”Disagree” ratings, as they may be misleading to those who have not visited the rules to contribute.

      A link to the column documentation in the contributor instructions has been added here. Information on the website is displayed in one location and linked as needed. Duplicated display of information creates uncertainty for users and results in more complex instructions when referring to the information.

      Include additional details in the dataset, such as lot numbers, or the date of the contribution, that could be relevant in different settings.

      Please see response to point 2.

      (8) Data & Software Section:

      Add filtering options in the table based on organism and tissue availability

      This data is not encoded in the available information in an independent manner so we do not directly enable filtering. It is usually included in the ”Details” free form text. This text is duplicated from the original dataset descriptions. One can still search this page using the browsers search functionality to achieve behavior similar to filtering. While the ”Details” text may not be visible due to the usage of the accordion user interface, it is still searchable and will automatically expand when the search text is found under the collapsed accordion button.

      (9) Contributor Section:

      Incorporate figures from the manuscript to make it more visual and improve understanding of rules and standards.

      Figure 4 from the manuscript was added to this page.

      I believe reporting negative results with reagents is highly valuable. However, to ensure data quality, every report should be linked to a specific protocol and sample information. This should be a mandatory requirement. To streamline the process, warnings for certain reagents could be implemented, but a reagent should not be outright labeled as ineffective without proper validation.

      Please see response to point 2.

      Cross-validation among researchers is beneficial, but even if five individuals fail to reproduce results with a given antibody, it may still be due to technique-specific factorsparticularly for non-routine antibodies.

      We agree with this observation and have modified the contribution instructions accordingly:

      When overturning previously reported results, the number of ORCIDs in the Disagree column becomes greater than those in the Agree column, we will open the contribution for public discussion on the Knowledge-Base forum before accepting it.

      The intent is to increase the community’s confidence in the results, particularly when dealing with non-routine antibodies. This allows the original contributor and other members of the community to engage with the researchers who were unable to replicate a specific validation, possibly helping them to replicate the original results by adding missing details to the KB, or explicitly identifying and documenting issues with the original work.

      Regarding image formats, JPG files are convenient due to their small size, but TIFF offers significant advantages, such as preserving metadata and maintaining the integrity of real data values. Proper signal adjustments may not always be applied by researchers, making TIFF crucial for accurate data analysis.

      Please see response to point 5.

    2. eLife Assessment

      The IBEX Knowledge-Base is a fundamental tool that will enhance scientific collaboration by providing a centralized, community-driven resource for immunofluorescence imaging and reagent validation. Its detailed use cases, open-source design, and transparent reporting offer exceptional evidence of its broad utility and impact in the life sciences. It is now up to the community to contribute to its growth. Overall, the resource sets a high standard as a blueprint for future community initiatives in reproducibility and standardization.

    3. Reviewer #1 (Public review):

      IBEX Knowledge Database

      Here, Yanid Z. and colleagues present the IBEX knowledge base. A community tool developed to centralize knowledge and help its adoption by more users. Authors have done a fantastic job, and there is careful consideration of the many aspects of the data management and FAIR principles. The manuscript needs no further work, as it is very well written and have detailed descriptions for data contribution as well as describing the KB itself. Overall, it is a great initiative, especially the aim to inform about negative data and non-recommended reagents, which will positively affect the user community and scientific reproducibility.

      This initiative will serve as a groundwork to include technical details of other multiple immunofluoresecence methods (such as immunoSABER, 4i, etc). Including other methods would help the knowledge base itself and related methods to evolve and assist their communities in the future.

      Significant care has been taken to allow the report of negative data. While there might be limitations as to how this information is included, transparency and community usage will ensure the knowledge base offers a fair representation.

      There are two ways to contribute to the knowledge base. While authors have contributed significantly to its creation, it will be the role of the maintainers to assist potential users and contributors. It is specially appreciated that a path to contribute is possible with no coding skills. I am keen to see how the KB evolves and it helps disseminate the use of this and other great techniques.

    4. Reviewer #2 (Public review):

      Summary:

      The paper introduces the IBEX Knowledge-Base (KB), a shared online resource designed to help scientists working with immunofluorescence imaging. It acts as a central hub where researchers can find and share information about reagents, protocols, and imaging methods. The KB is not static like traditional publications; instead, it evolves as researchers contribute new findings and refinements. A key highlight is that it includes results of both successful and unsuccessful experiments, helping scientists avoid repeating failed experiments and saving time and resources. The platform is built on open-access tools ensuring that the information remains available to everyone. Overall, the KB aims to collaboratively accelerate research, improve reproducibility, and reduce wasted effort in imaging experiments.

      Strengths:

      (1) The IBEX KB is built entirely on open-source tools, ensuring accessibility and long-term sustainability. This approach aligns with FAIR data principles and ensures that the KB remains adaptable to future advancements.

      (2) The KB also follows strict data organization standards, ensuring that all information about reagents and protocols is clearly documented and easy to find with little ambiguity.

      (3) The KB allows scientists to report both positive and negative results, reducing duplication of effort and speeds up the research process.

      (4) The KB is helpful for all researchers, but even more so for scientists in resource-limited settings. It provides guidance on finding affordable alternatives to expensive or discontinued reagents, making it easier for researchers with fewer resources to perform high-quality experiments.

      (5) The KB includes a community discussion forum where scientists can ask for advice, share troubleshooting tips, and collaborate with others facing similar challenges.

      (6) The authors discuss plans for active maintenance of the database and also to incentivize higher participation from the community.

      (7) Even those unfamiliar with Github may contribute with the help of the database maintenance team.

      Note: The authors have addressed my comments on the previous version of the article and the current version has been strengthened as a result.

    5. Reviewer #3 (Public review):

      Summary:

      The authors have developed and interactive knowledge-base that uses crowdsourcing information on antibodies and reagents for immunofluorescence imaging.

      Strengths:

      The authors provide an extremely relevant and needed interphase for collaboration through a well-built platform. All the links in their website work, the information provided, reagents, datasets, videos and protocols are very informative. The instructions for the community researchers to contribute is clear and they provide detailed instructions in how to technically proceed. Additionally, the interface has been refined to enable the contribution regardless of the computational expertise of the researcher.

      Weaknesses:

      The Knowledge-Base relies on community contributions without mandatory, standardized metadata and validation criteria. Whilst this enhances the contributions, it limits the reliability of the database.

    1. eLife Assessment

      This manuscript by Kaur et al. identifies differential gene expression in distinct cell populations, specifically myeloid and lymphoid cells, following short-term exposure to e-cigarette aerosols with various flavors. Their findings are useful because they provide a single-cell sequencing data resource for assessing which genes and cellular pathways could be affected by e-cig aerosols and their components. However, the evidence is incomplete due to limited number of biological replicates per condition, as well as due to the lack of in vivo validation.

    2. Reviewer #1 (Public review):

      Summary:

      The authors assess the impact of E-cigarette smoke exposure on mouse lungs using single-cell RNA sequencing. Air was used as control and several flavors (fruit, menthol, tobacco) were tested. Differentially expressed genes (DEGs) were identified for each group and compared against the air control. Changes in gene expression in either myeloid or lymphoid cells were identified for each flavor and the results varied by sex. The scRNAseq dataset will be of interest to the lung immunity and e-cig research communities, and some of the observed effects could be important. Unfortunately, the revision did not address the reviewers' main concerns about low replicate numbers and lack of validations. The study remains preliminary and no solid conclusions could be drawn about the effects of E-cig exposure as a whole or any flavor-specific phenotypes.

      Strengths:

      The study is the first to use scRNAseq to systematically analyze the impact of e-cigarettes on the lung. The dataset will be of broad interest.

      Weaknesses:

      This study had only N=1 biological replicates for the single-cell sequencing data per sex per group and some sex-dependent effects were observed. This could have been remedied by validating key observations from the study using traditional methods such as flow cytometry and qPCR, but the limited number of validation experiments did not support the conclusions of the scRNAseq analysis. An important control group (PG:VG) had extremely low cell numbers and therefore could not be used to derive meaningful conclusions. Statistical analysis is lacking in almost all figures. Overall, this is a preliminary study with some potentially interesting observations.

      (1) The only new validation experiment for this revision is the immunofluorescent staining of neutrophils in Figure 4. The images are very low resolution and low quality and it is not clear which cells are neutrophils. S100A8 (calprotectin) is highly abundant in neutrophils but not strictly neutrophil-specific. It's hard to distinguish positive cells from autofluorescence in both ly6g and S100a8 channels. No statistical analysis is presented for the quantified data from this experiment.

      (2) The relevance of Fig. 3A and B are unclear since these numbers only reflect the number of cells captured in the scRNAseq experiment and the biological meaning of this data is not explained. Flow cytometry quantification is presented as cell counts but percentage of cells from the CD45+ gate should be shown. No statistical analysis is shown, and flow cytometry results do not support the conclusions of scRNAseq data.

    3. Reviewer #3 (Public review):

      This work aims to establish cell-type-specific changes in gene expression upon exposure to different flavors of commercial e-cigarette aerosols compared to control or vehicle. Kaur et al. conclude that immune cells are most affected, with the greatest dysregulation found in myeloid cells exposed to tobacco-flavored e-cigs and lymphoid cells exposed to fruit-flavored e-cigs. The up- and down-regulated genes are heavily associated with innate immune response. The authors suggest that a Ly6G-deficient subset of neutrophils is found to be increased in abundance for the treatment groups, while gene expression remains consistent, which could indicate impaired function. Increased expression of CD4+ and CD8+ T cells along with their associated markers for proliferation and cytotoxicity is thought to be a result of activation following this decline in neutrophil-mediated immune response.

      Strengths:

      Single-cell sequencing data can be very valuable in identifying potential health risks and clinical pathologies of lung conditions associated with e-cigarettes considering they are still relatively new.

      Not many studies have been performed on cell-type-specific differential gene expression following exposure to e-cig aerosols.

      The assays performed address several factors of e-cig exposure such as metal concentration in the liquid and condensate, coil composition, cotinine/nicotine levels in serum and the product itself, cell types affected, which genes are up- or down-regulated and what pathways they control.

      Considerations were made to ensure clinical relevance such as selecting mice whose ages corresponded with human adolescents so that data collected was relevant.

      The discussion addresses the limitations of this study.

      Weaknesses:

      The exposure period of 1 hour a day for 5 days is not representative of chronic use and this time point may be too short to see a full response in all cell types. There is no gold standard in the field.

      Most findings are based on scRNA-seq alone, so interpretations should be made with care as some conclusions are observational.

      This paper provides a good foundation for future follow-up studies that will examine the effects of e-cig exposure on innate immunity.

    4. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      The authors assess the impact of E-cigarette smoke exposure on mouse lungs using single cell RNA sequencing. Air was used as control and several flavors (fruit, menthol, tobacco) were tested. Differentially expressed genes (DEGs) were identified for each group and compared against the air control. Changes in gene expression in either myeloid or lymphoid cells were identified for each flavor and the results varied by sex. The scRNAseq dataset will be of interest to the lung immunity and e-cig research communities and some of the observed effects could be important. Unfortunately, the revision did not address the reviewers' main concerns about low replicate numbers and lack of validations. The study remains preliminary, and no solid conclusions could be drawn about the effects of E-cig exposure as a whole or any flavor-specific phenotypes.

      Strengths:

      The study is the first to use scRNAseq to systematically analyze the impact of e-cigarettes on the lung. The dataset will be of broad interest.

      Weaknesses:

      scRNAseq studies may have low replicate numbers due to the high cost of studies but at least 2 or 3 biological replicates for each experimental group is required to ensure rigor of the interpretation. This study had only N=1 per sex per group and some sex-dependent effects were observed. This could have been remedied by validating key observations from the study using traditional methods such as flow cytometry and qPCR, but the limited number of validation experiments did not support the conclusions of the scRNA seq analysis. An important control group (PG:VG) had extremely low cell numbers and was basically not useful. Statistical analysis is lacking in almost all figures. Overall, this is a preliminary study with some potentially interesting observations, but no solid conclusions can be made from the data presented.

      The only new validation experiment is the immunofluorescent staining of neutrophils in Figure 4. The images are very low resolution and low quality and it is not clear which cells are neutrophils. S100A8 (calprotectin) is highly abundant in neutrophils but not strictly neutrophil-specific. It's hard to distinguish positive cells from autofluorescence in both Ly6g and S100a8 channels. No statistical analysis in the quantification.

      We thank the reviewer for identifying the strengths of this study and pointing out the gaps in knowledge. Overall, our purpose to present this data is to provide the scRNA seq results as a resource to a wider community. We have used techniques like flow cytometry, multianalyte cytokine array and immunofluorescence to validate some of the results. We agree with the reviewer that we were unable to rightly point out the significance of our findings with the immunofluorescent stain in the previous edit. We have revised the manuscript and included the quantification for both Ly6G+ and S100A8+ cells in e-cig aerosol exposed and control lung tissues. Briefly, we identified a marked decrease in the staining for S100A8 (marker for neutrophil activation) in tobacco-flavored e-cig exposed mouse lungs as compared to controls. Upon considering the corroborating evidence from scRNA seq and flow cytometry with regards to increased neutrophil percentages in experimental group and lowered staining for active neutrophils using immunofluorescence, we speculate that exposure to e-cig (tobacco) aerosols may alter the neutrophil dynamics within the lungs. Also, co-immunofluorescence identified a more prominent co-localization of the two markers in control samples as compared to the treatment group which points towards some changes in the innate immune milieu within the lungs upon exposures. Future work is required to validate these speculations.

      We have now discussed all the above-mentioned points in the Discussion section of the revised manuscript and toned down our conclusions regarding sex-dependent changes from scRNA seq data.

      It is unclear what the meaning of Fig. 3A and B is, since these numbers only reflect the number of cells captured in the scRNAseq experiment and are not biologically meaningful. Flow cytometry quantification is presented as cell counts, but the percentage of cells from the CD45+ gate should be shown. No statistical analysis is shown, and flow cytometry results do not support the conclusions of scRNAseq data.

      We thank the reviewer for this question. However, we would like to highlight that scRNA seq and flow cytometry may show similar trends but cannot be identical as one relies on cell surface markers (protein) for identification of cell types, while other is dependent on the transcriptomic signatures to identify the cell types. In our data, for the myeloid cells (alveolar macrophages and neutrophils), the scRNA and flow cytometry data match in trend. However, the trends do not match with respect to the lymphoid cells being studied (CD4 and CD8 T cells). The possible explanation for such a finding could be possible high gene dropout rates in scRNA seq, different analytical resolution for the two techniques and pooling of samples in our single cell workflow. We realize these shortcomings in our analyses and mention it clearly in the discussion as limitation of our work. It is important to note also that cell frequencies identified in scRNA seq just provide wide and indistinct indications which need to be further validated, which we tried to accomplish in our work to some degree. Our flow-based results clearly highlight the sex-specific variations in the immune cell percentages (something we could not have anticipated earlier). In future studies, we will include more replicates to tease out sex-based variations upon acute and chronic exposure to e-cig aerosols.

      We have now replotted the graphs in Fig 3A and B and plotted the flow quantification as the percentage of total CD45+ cells. The gating strategy for the flow plots is also included as Figure S6 in the revised manuscript.

      Reviewer #2 (Public review):

      This study provides some interesting observations on how different flavour e-cigarettes can affect lung immunology; however, there are numerous flaws, including a low replicate number and a lack of effective validation methods, meaning findings may not be repeated. This is a revised article but several weaknesses remain related to the analysis and interpretation of the data.

      Strengths:

      The strength of the study is the successful scRNA-seq experiment which gives some preliminary data that can be used to create new hypotheses in this area.

      Weaknesses:

      Although some text weaknesses have been addressed since resubmission, other specific weaknesses remain: The major weakness is the n-number and analysis methods. Two biological n per group is not acceptable to base any solid conclusions. Any validatory data was too little (only cell % data) and not always supporting the findings (e.g. figure 3D does not match 3B/4A). Other examples include:

      There aren't enough cells to justify analysis - only 300-1500 myeloid cells per group with not many of these being neutrophils or the apparent 'Ly6G- neutrophils'.

      We thank the reviewer for the comment, but we disagree with the reviewer in terms of the justification of analyses. All the flavored e-cig aerosol groups were compared with air controls to deduce the outcomes in the current study. We already acknowledge low sample quality for PGVG group and have only included the comparisons with PGVG upon reviewer’s request which is open to interpretation by the reader.

      By that measure, each treatment group (except PGVG group) has over 1000 cells with 24777 genes being analyzed for each cell type, which by the standards of single cell is sufficient. We understand that this strategy should not be used for detection of rare cell populations, which was neither the purpose of this manuscript nor was attempted. We conduct comparisons of broader cell types and mention more samples need to be added in the Discussion section of the revised manuscript.

      As for the Ly6G neutrophil category, we don’t only base our results on scRNA analyses but also perform co-immunofluorescence and multi-analyte analyses and use evidence from previous literature to back our outcome. To avoid over-stating our results we have revamped the whole manuscript and ensured to tone down our results with relation to the presence of Ly6G- neutrophils. We do understand that more work is required in the future, but our work clearly shows the shift in neutrophil dynamics upon exposure which should be reported, in our opinion.

      The dynamic range of RNA measurement using scRNAseq is known to be limited - how do we know whether genes are not expressed or just didn't hit detection? This links into the Ly6G negative neutrophil comments, but in general the lack of gene expression in this kind of data should be viewed with caution, especially with a low n number and few cells. The data in the entire paper is not strong enough to base any solid conclusion - it is not just the RNA-sequencing data.

      We acknowledge this to be a valid point and have revamped the manuscript and toned down our conclusions. However, such limitations exist with any scRNA seq dataset and so must be interpreted accordingly by the readers. We do understand that due to the low cell counts and the limitations with scRNA seq we should not perform DESeq2 analyses for Ly6G+ versus Ly6G- neutrophil categories, which was never attempted at the first place. However, our results with co-immunofluorescence, multianalyte assay and scRNA expression analyses in myeloid cluster do point towards a shift in neutrophil activation which needs to be further investigated. Furthermore, Ly6G deficiency has been linked to immature neutrophils in many previous studies and is not an unlikely outcome that needs to be treated with immense skepticism.

      We wish to make this dataset available as a resource to influence future research. We are aware of its limitations and have been transparent with regards to our experimental design, capture strategy, the quality of obtained results, and possible caveats to make it is open for discussion by the readers.

      There is no data supporting the presence of Ly6G negative neutrophils. In the flow cytometry only Ly6G+ cells are shown with no evidence of Ly6G negative neutrophils (assuming equal CD11b expression). There is no new data to support this claim since resubmission and the New figures 4C and D actually show there are no Ly6G negative cells - the cells that the authors deem Ly6G negative are actually positive - but the red overlay of S100A8 is so strong it blocks out the green signal - looking to the Ly6G single stains (green only) you can see that the reported S100A8+Ly6G- cells all have Ly6G (with different staining intensities).

      We thank the reviewer for this query and do understand the skepticism. We have now quantified the data to provide more clarity for interpretation. As we were using paraffin embedded tissues, some autofluorescence is expected which could explain some of reviewer’s concerns. However we expect that the inclusion of better quality images and quantification must address some of the concerns raised by the reviewer.

      Eosinophils are heavily involved in lung macrophage biology, but are missing from the analysis - it is highly likely the RNA-sequence picked out eosinophils as Ly6G- neutrophils rather than 'digestion issues' the authors claim

      We thank the reviewer for raising a valid concern. However, the Ly6G- cluster cannot be eosinophils in our case. Literature suggests SiglecF as an important biomarker of eosinophils which was absent in the Ly6G- cluster our in scRNA seq analyses as shown in File S18 and Figure 6B of the revised manuscript. We have now provided a detailed explanation (Lines 476-488; 503-506) of the observed results pertaining to eosinophil population in the revised manuscript to further address some of the concerns raised by this reviewer.

      After author comments, it appears the schematic in Figure 1A is misleading and there are not n=2/group/sex but actually only n=1/group/sex (as shown in Figure 6A). Meaning the n number is even lower than the previous assumption.

      We concur with reviewers’ valid concern and so are willing to provide this data as a resource for a wider audience to assist future work. Pooling of samples have been practiced by many groups previously to save resources and expense. We did it for the very same reason. It may not be the preferred approach, but it still has its merit considering the vast amount of cell-specific data generated using this strategy. To avoid overstating our results we have ensured to maintain transparency in our reporting and acknowledge all the limitations of this study.

      We do not believe that the strength of scRNA seq lies in drawing conclusive results, but to tease our possible targets and direction that need to be validated with more work. In that respect, our study does identify the target cell types and biological processes which could be of importance for future studies.

      Reviewer #3 (Public review):

      This work aims to establish cell-type specific changes in gene expression upon exposure to different flavors of commercial e-cigarette aerosols compared to control or vehicle. Kaur et al. conclude that immune cells are most affected, with the greatest dysregulation found in myeloid cells exposed to tobacco-flavored e-cigs and lymphoid cells exposed to fruit-flavored e-cigs. The up- and down-regulated genes are heavily associated with innate immune response. The authors suggest that a Ly6G-deficient subset of neutrophils is found to be increased in abundance for the treatment groups, while gene expression remains consistent, which could indicate impaired function. Increased expression of CD4+ and CD8+ T cells along with their associated markers for proliferation and cytotoxicity is thought to be a result of activation following this decline in neutrophil-mediated immune response.

      Strengths:

      Single cell sequencing data can be very valuable in identifying potential health risks and clinical pathologies of lung conditions associated with e-cigarettes considering they are still relatively new.

      Not many studies have been performed on cell-type specific differential gene expression following exposure to e-cig aerosols.

      The assays performed address several factors of e-cig exposure such as metal concentration in the liquid and condensate, coil composition, cotinine/nicotine levels in serum and the product itself, cell types affected, which genes are up- or down-regulated and what pathways they control.

      Considerations were made to ensure clinical relevance such as selecting mice whose ages corresponded with human adolescents so that data collected was relevant.

      Weaknesses:

      The exposure period of 1 hour a day for 5 days is not representative of chronic use and this time point may be too short to see a full response in all cell types. The experimental design is not well-supported based on the literature available for similar mouse models. Clinical relevance of this short exposure remains unclear.

      We thank the reviewer for this query. However, we would like to emphasize that chronic exposure was never the intention of this study. We wished to design a study for acute nose-only exposure owing to which the study duration was left shorter. Shorter durations limit the stress and discomfort to the animal. The in vivo study using nose-only exposure regimen is still developing with multiple exposure regimen being used by different groups. To our knowledge there is no gold standard of e-cig aerosol exposure which is widely accepted other than the CORESTA recommendations, which we followed. Also, we show in our study how the daily exposure to leached metals vary in a flavor-dependent manner thus validating that exposure regime does need more attention in terms of equal dosing, particle distribution and composition- something we have started doing in our future studies. We have included all the explanations in the revised manuscript (Lines 82-85, 425-435, 648-654).

      Several claims lack supporting evidence or use data that is not statistically significant. In particular, there were no statistical analyses to compare results across sex, so conclusions stating there is a sex bias for things like Ly6G+ neutrophil percentage by condition are observational.

      We agree with reviewer’s comment and have taken this into consideration. We have now revamped the whole manuscript and toned down most of the sex-based conclusions stated in this work. Having said that, it is important to note that most of the work relying solely on scRNA seq, as is the case for this study, is observational in nature and needs to be assessed bearing this in mind.

      Overall, the paper and its discussion are relatively surface-level and do not delve into the significance of the findings or how they fit into the bigger picture of the field. It is not clear whether this paper is intended to be used as a resource for other researchers or as an original research article.

      We have now reworked on the Discussion and tried to incorporate more in-depth discussion and the results providing our insights regarding the observations, discrepancies and the possible explanations. We have also made it clear that this paper is intended to be used as a resource by other researchers (Lines 577-579)

      The manuscript has some validation of findings but not very comprehensive.

      We have now revamped the manuscript. We have Included quantification for immunofluorescence data with better representation of the GO analyses. We have worked on the Results and Discussion sections to make this a useful resource for the scientific community.

      This paper provides a strong foundation for follow-up experiments that take a closer look at the effects of e-cig exposure on innate immunity. There is still room to elaborate on the differential gene expression within and between various cell types.

      We thank the reviewer for pointing out the strength of this paper. The reason why we refrained from elaborating of the differential gene expressions within and between various cell types was due to low sample number and sequencing depth for this study. However the raw data will be provided with the final publication, which should be freely accessible to the public to re-analyze the data set as they deem fit.

      Comments on revisions:

      The reviewers have addressed major concerns with better validation of data and improved organization of the paper. However, we still have some concerns and suggestions pertaining to the statistical analyses and justifications for experimental design.

      We appreciate the nuance of this experimental design, and the reviewers have adequately commented on why they chose nose-only exposure over whole body exposure. However, the justification for the duration of the exposure, and the clinical relevance of a short exposure, have not been addressed in the revised manuscript.

      We thank the editor for this query. We have now addressed this query briefly in Lines 82-85, 425-435, 648-654 of the revised manuscript. We would like to add, however, that we intend to design a study for acute nose-only exposure for this project. Shorter durations limit the stress and discomfort to the animal, owing to which a duration of 1hour per day was chosen. The in vivo study using nose-only exposure regimen is still developing with multiple exposure regimen being used by different groups. Ours is one such study in that direction just intended to identify cell-specific changes upon exposure. Considering our results in Figure 1B showing variations in the level of metals leached in each flavor per day, the appropriate exposure regimen to design a controlled, reproducible experiment needs to be discussed. There could be room for improvement in our strategy, but this was the best regimen that we found to be appropriate per the literature and our prior knowledge in the field.

      The presentation of cell counts should be represented by a percentage/proportion rather than a raw number of cells. Without normalization to the total number of cells, comparisons cannot be made across groups/conditions. This comment applies to several figures.

      We thank the editor for this comment and have now made the requested change in the revised manuscript.

      We appreciate that the authors have taken the reviewers' advice to validate their findings. However, we have concerns regarding the immunofluorescent staining shown in Figure 4. If the red channel is showing a pan-neutrophil marker (S100A8) and the green channel is showing only a subset of neutrophils (LY6G+), then the green channel should have far less signal than the red channel. This expected pattern is not what is shown in the figure, with the Ly6G marker apparently showing more expression than S100A8. Additionally, the FACS data states that only 4-5% of cells are neutrophils, but the red channel co-localizes with far more than 4-5% of the DAPI stain, meaning this population is overrepresented, potentially due to background fluorescence (noise). In addition, some of the shapes in the staining pattern do not look like true neutrophils, although it is difficult to tell because there remains a lot of background staining. The authors need to verify that their S100A8 and Ly6G antibodies work and are specific to the populations they intend to target. It is possible that only the brightest spots are truly S100A8+ or Ly6G+.

      We thank the editor for this comment and acknowledge that we may have made broad generalizations in our interpretation of our data previously. We have now revisited the data and quantified the two fluorescence for better interpretation of our results. We have also reassessed our conclusions from this data and reworded the manuscript accordingly. Briefly we believe that Ly6G deficiency could be an indication of the presence of immature neutrophils in the lungs. This is a common process of neutrophil maturation. An active neutrophil population has Ly6G and should also express S100A8 indicating a normal neutrophilic response against stressors. However, our results, despite some autofluorescence which is common with lung tissues, shows a marked decline in the S100A8+ cells in the lung of tobacco-flavored e-cig aerosol exposed mice as compared to air controls. We also do not see prominent co-localization of the two markers in exposed group thus proving a shift in neutrophil dynamics which requires further investigation. We would also like to mention here that S100A8 is predominantly expressed in neutrophils, but is also expressed by monocytes and macrophages, so that could explain the over-representation of these cells in our immunofluorescence results. We have now included this in the Discussion section (Lines 489- 538) of the revised manuscript.

      Paraffin sections do not always yield the best immunostaining results and the images themselves are low magnification and low resolution.

      We agree with the editor that paraffin sections may not yield best results, we have worked on the final figure to improve the quality of the displayed results and zoomed-in some parts of the merged image to show the differences in the co-localization patterns for the two markers in our treated and control groups for easier interpretation.

      Please change the scale bars to white so they are more visible in each channel.

      The merged image in Figure 6C now has a white scale bar.

      We appreciate that this is a preliminary test used as a resource for the community, but there is interesting biology regarding immune cells that warrants DEG analysis by the authors. This computational analysis can be easily added with no additional experiments required.

      We thank the editor for this comment and agree that interesting biology regarding immune cells could be explored upon performing the DEG analyses on individual immune populations. However, due to the small sample size, low sequencing depth and pooling of same sex animals in each treatment group, we refrained from performing that analyses fearing over-representation of our results. We will be providing the link to the raw data with this publication which will be freely accessible to public on NIH GEO resource to allow further analyses on this dataset by the judgement of the investigator who utilizes it as a resource.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (Minor) The pathway analyses in Fig. 6-8 have different fonts than what's used in all other figures.

      We have now made the requested change in the revised manuscript.

    1. eLife Assessment

      This important study identifies a new key factor in orchestrating the process of glial wrapping of axons in Drosophila wandering larvae. The evidence supporting the claims of the authors is convincing and the EM studies are of outstanding quality. After the revision, the authors have addressed most of the concerns and the manuscript has been significantly improved. Both reviewers have agreed on the significance of the work. The work will be of interest to neuroscientists working on glial cell biology.

    2. Reviewer #1 (Public review):

      Summary:

      A central function of glial cells is the ensheathment of axons. Wrapping of larger-diameter axons involves myelin-forming glial classes (such as oligodendrocytes), whereas smaller axons are covered by non-myelin forming glial processes (such as olfactory ensheathing glia). While we have some insights into the underlying molecular mechanisms orchestrating myelination, our understanding of the signaling pathways at work in non-myelinating glia remains limited. As non-myelinating glial ensheathment of axons is highly conserved in both vertebrates and invertebrates, the nervous system of Drosophila melanogaster, and in particular the larval peripheral nerves, have emerged as powerful model to elucidate the regulation of axon ensheathment by a class of glia called wrapping glia. This study seeks to specifically address the question, as to which molecular mechanisms contribute to the regulation of the extent of glial ensheathment focusing on the interaction of wrapping glia with axons.

      Strengths and Weaknesses:

      For this purpose, the study combines state-of-the-art genetic approaches with high-resolution imaging, including classic electron microscopy. The genetic methods involve RNAi mediated knockdown, acute Crispr-Cas9 knock-outs and genetic epistasis approaches to manipulate gene function with the help of cell-type specific drivers. The successful use of acute Crispr-Cas9 mediated knockout tools (which required the generation of new genetic reagents for this study) will be of general interest to the Drosophila community.

      The authors set out to identify new molecular determinants mediating the extent of axon wrapping in the peripheral nerves of third instar wandering Drosophila larvae. They could show that over-expressing a constitutive-active version of the Fibroblast growth factor receptor Heartless (Htl) causes an increase of wrapping glial branching, leading to the formation of swellings in nerves close to the cell body (named bulges). To identify new determinants involved in axon wrapping acting downstream of Htl, the authors next conducted an impressive large-scale genetic interaction screen (which has become rare, but remains a very powerful approach), and identified Uninflatable (Uif) in this way. Uif is a large single-pass transmembrane protein which contains a whole series of extracellular domains, including Epidermal growth factor-like domains. Linking this protein to glial branch formation is novel, as it has so far been mostly studied in the context of tracheal maturation and growth. Intriguingly, a knock-down or knock-out of uif reduces branch complexity and also suppresses htl over-expression defects. Importantly, uif over-expression causes the formation of excessive membrane stacks. Together these observations are in in line with the notion that htl may act upstream of uif.

      Further epistasis experiments using this model implicated also the Notch signaling pathway as a crucial regulator of glial wrapping: reduction in Notch signaling reduces wrapping, whereas over-activation of the pathway increases axonal wrapping (but does not cause the formation of bulges). Importantly, defects caused by over-expression of uif can be suppressed by activated Notch signaling. Knock-down experiments in neurons suggest further that neither Delta nor Serrate act as neuronal ligands to activate Notch signaling in wrapping glia, whereas knock-down of Contactin, a GPI anchored Immunoglobulin domain containing protein led to reduced axon wrapping by glia, and thus could act as an activating ligand in this context.

      Based on these results the authors put forward a model proposing that Uif normally suppresses Notch signaling, and that activation of Notch by Contactin leads to suppression of Htl, to trigger the ensheathment of axons. While these are intriguing propositions, future experiments will need to conclusively address whether and how Uif could "stabilize" a specific membrane domain capable to interact with specific axons.

      Moreover, to obtain evidence for Uif suppression by Notch to inhibit "precocious" axon wrapping and for a "gradual increase" of Notch signaling that silences uif and htl, (1) reporters for N and Htl signaling in larvae, (2) monitoring of different stages at a time point when branch extension begins, and (3) a reagent enabling the visualization of Uif expression could be important next tools/approaches. Considering the qualitatively different phenotypes of reduced branching, compared to excessive membrane stacks close to cell bodies, it would perhaps be worthwhile to explore more deeply how membrane formation in wrapping glia is orchestrated at the subcellular level by Uif.

      However, the points raised above remain at present technically difficult to address because of the lack of appropriate genetic reagents. Also more detailed electron microscopy analyses of early developmental stages and comparisons of effects on cell bodies compared to branches will be very labor-intensive, and indeed may represent a new study.

      In summary, in light of the importance of correct ensheathment of axons by glia for neuronal function, the proposed model for the interactions between Htl, Uif and N to control the correct extent of neuron and glial contacts will be of general interest to the glial biology community.

      Comments on revisions:

      The authors have addressed all my comments. However, the sgRNAs in the Star method table are still all for cleavage just before the transmembrane domain, while the Supplemental figure suggests different locations.

    3. Reviewer #2 (Public review):

      The FGF receptor Heartless has previously been implicated in Drosophila peripheral glial growth and axonal wrapping. Here, the authors performed a large-scale screen of over 2,600 RNAi lines to identify factors regulating the downstream signaling of this process. They identified the transmembrane protein Uninflatable (Uif) as essential for the formation of plasma membrane domains. Furthermore, they found that Notch, a regulatory target of Uif, is required for glial wrapping. Interestingly, additional evidence implies that Notch reciprocally regulates uif and htl, suggesting a feedback loop. Consequently, the authors propose that Uif functions as a 'switch' to regulate the balance between glial growth and axonal wrapping.

      Little is known about how glial cell properties are coordinated with axons, and the identification of Uif provides essential insight into this orchestration. The manuscript is well-written, and the experiments are generally well-controlled. The electron microscopy studies, in particular, are of outstanding quality and help mechanistically dissect the consequences of Uif and Notch signaling in the regulation of glial processes. Together, this important study provides convincing evidence of a new player coordinating the glial wrapping of axons.

      Comments on revisions:

      Overall, the authors have done an excellent job of responding to my substantive concerns in this significantly improved manuscript. In particular, the authors have provided important additional details about the design, prioritization, and outcomes of their screen, and relayed changes that strengthen and extend the impact of their study. I have revised my assessment accordingly, and I expect this study to be of high interest to a variety of researchers in the field.

    4. Author response:

      The following is the authors’ response to the current reviews.

      We would like to proceed with this paper as a Version of Record but we will correct the mistake that we made in the Key resources table. As the reviewer noted we had added the wrong guide RNA sequence here. We are super thankful to the reviewer and apologize for the mistake.


      The following is the authors’ response to the original reviews.

      eLife Assessment 

      This important study identifies a new key factor in orchestrating the process of glial wrapping of axons in Drosophila wandering larvae. The evidence supporting the claims of the authors is convincing and the EM studies are of outstanding quality.

      We are thankful for this kind and very positive judgment.

      However, the quantification of the wrapping index, the role of Htl/Uif/Notch signaling in differentiation vs growth/wrapping, and the mechanism of how Uif "stabilizes" a specific membrane domain capable of interacting with specific axons might require further clarification or discussion.

      This is now addressed

      Reviewer #1 (Public review):

      Summary:

      A central function of glial cells is the ensheathment of axons. Wrapping of larger-diameter axons involves myelin-forming glial classes (such as oligodendrocytes), whereas smaller axons are covered by non-myelin-forming glial processes (such as olfactory ensheathing glia). While we have some insights into the underlying molecular mechanisms orchestrating myelination, our understanding of the signaling pathways at work in non-myelinating glia remains limited. As non-myelinating glial ensheathment of axons is highly conserved in both vertebrates and invertebrates, the nervous system of Drosophila melanogaster, and in particular the larval peripheral nerves, have emerged as a powerful model to elucidate the regulation of axon ensheathment by a class of glia called wrapping glia. Using this model, this study seeks to specifically address the question, as to which molecular mechanisms contribute to the regulation of the extent of glial ensheathment focusing on the interaction of wrapping glia with axons. 

      Strengths and Weaknesses:

      For this purpose, the study combines state-of-the-art genetic approaches with high-resolution imaging, including classic electron microscopy. The genetic methods involve RNAi-mediated knockdown, acute Crispr-Cas9 knock-outs, and genetic epistasis approaches to manipulate gene function with the help of cell-type specific drivers. The successful use of acute Crispr-Cas9 mediated knockout tools (which required the generation of new genetic reagents for this study) will be of general interest to the Drosophila community. 

      The authors set out to identify new molecular determinants mediating the extent of axon wrapping in the peripheral nerves of third-instar wandering Drosophila larvae. They could show that over-expressing a constitutive-active version of the Fibroblast growth factor receptor Heartless (Htl) causes an increase in wrapping glial branching, leading to the formation of swellings in nerves close to the cell body (named bulges). To identify new determinants involved in axon wrapping acting downstream of Htl, the authors next conducted an impressive large-scale genetic interaction screen (which has become rare, but remains a very powerful approach), and identified Uninflatable (Uif) in this way. Uif is a large single-pass transmembrane protein that contains a whole series of extracellular domains, including Epidermal growth factor-like domains. Linking this protein to glial branch formation is novel, as it has so far been mostly studied in the context of tracheal maturation and growth. Intriguingly, a knock-down or knock-out of uif reduces branch complexity and also suppresses htl over-expression defects. Importantly, uif over-expression causes the formation of excessive membrane stacks. Together these observations are in in line with the notion that htl may act upstream of uif. 

      Further epistasis experiments using this model implicated also the Notch signaling pathway as a crucial regulator of glial wrapping: reduction in Notch signaling reduces wrapping, whereas over-activation of the pathway increases axonal wrapping (but does not cause the formation of bulges). Importantly, defects caused by the over-expression of uif can be suppressed by activated Notch signaling. Knock-down experiments in neurons suggest further that neither Delta nor Serrate act as neuronal ligands to activate Notch signaling in wrapping glia, whereas knock-down of Contactin, a GPI anchored Immunoglobulin domain-containing protein led to reduced axon wrapping by glia, and thus could act as an activating ligand in this context. 

      Based on these results the authors put forward a model proposing that Uif normally suppresses Notch signaling, and that activation of Notch by Contactin leads to suppression of Htl, to trigger the ensheathment of axons. While these are intriguing propositions, future experiments would need to conclusively address whether and how Uif could "stabilize" a specific membrane domain capable of interacting with specific axons.

      We absolutely agree with the reviewer that it would be fantastic to understand whether and how Uif could stabilize specific membrane domains that are capable of interacting with axons. To address this we need to be able to label such membrane domains and unfortunately we still cannot do so. We analyzed the distribution of PIP2/PIP3 but failed to detect any differences. Thus we still lack wrapping glial membrane markers that are able to label specific compartments.

      Moreover, to obtain evidence for Uif suppression by Notch to inhibit "precocious" axon wrapping and for a "gradual increase" of Notch signaling that silences uif and htl, (1) reporters for N and Htl signaling in larvae, (2) monitoring of different stages at a time point when branch extension begins, and (3) a reagent enabling to visualize Uif expression could be important next tools/approaches. Considering the qualitatively different phenotypes of reduced branching, compared to excessive membrane stacks close to cell bodies, it would perhaps be worthwhile to explore more deeply how membrane formation in wrapping glia is orchestrated at the subcellular level by Uif.

      In the revised version of the manuscript we have now included the use of Notch and RTK-signaling reporters.

      (1) reporters for N and Htl signaling in larvae,

      We had already employed the classic reporter generated by the Bray lab: Gbe-Su(H)-lacZ. This unfortunately failed to detect any activity in larval wrapping glia nuclei but was able to detect Notch activity in the adult wrapping glia (Figure S5C,F).

      We did, as requested, the analysis of a RTK signaling reporter.  The activity of sty-lacZ that we had previously characterized in the lab (Sieglitz et al., 2013) increases by 22% when Notch is silenced. Given the normal distribution of the data points, this shows a trend which, however, is not in the significance range. We have not included this in the paper, but would be happy to do so, if requested.

      Author response image 1.

       

      (2) monitoring of different stages at a time point when branch extension begins,

      The reviewer asks for an important question; however, this is extremely difficult to tackle experimentally. It would require a detailed electron microscopic analysis of early larval stages which cannot be done in a reasonable amount of time. We have however added additional information on wrapping glia growth summarizing recently published work from the lab (Kautzmann et al., 2025).

      (3) a reagent enabling to visualize Uif expression could be important next tools/approaches.

      The final comment of the reviewer also addresses an extremely relevant and important issue. We employed antibodies generated by the lab of R. Ward, but they did not allow detection of the protein in larval nerves. We also attempted to generate anti-Uif peptide antibodies but these antibodies unfortunately do not work in tissue. We are still trying to generate suitable reagents but for the current revision cannot offer any solution.

      Lastly, we agree with the reviewer that it would be worthwhile to explore how Uif controls membrane formation at the subcellular level. This, however, is a completely new project and will require the identification of the binding partners of Uif in wrapping glia to start working on a link between Uif and membrane extension. The reduced branching phenotype might well be a direct consequence of excessive membrane formation as it likely blocks recourses needed for efficient growth of glial processes.

      Finally, in light of the importance of correct ensheathment of axons by glia for neuronal function, this study will be of general interest to the glial biology community. 

      We are very grateful for this very positive comment.

      Reviewer #2 (Public review): 

      The FGF receptor Heartless has previously been implicated in Drosophila peripheral glial growth and axonal wrapping. Here, the authors perform a large-scale screen of over 2600 RNAi lines to find factors that control the downstream signaling in this process. They identify a transmembrane protein Uninflatable to be necessary for the formation of plasma membrane domains. They further find that a Uif regulatory target, Notch, is necessary for glial wrapping. Interestingly, additional evidence suggests Notch itself regulates uif and htl, suggesting a feedback system. Together, they propose that Uif functions as a "switch" to regulate the balance between glial growl and wrapping of axons. 

      Little is known about how glial cell properties are coordinated with axons, and the identification of Uif is a promising link to shed light on this orchestration. The manuscript is well-written, and the experiments are generally well-controlled. The EM studies in particular are of outstanding quality and really help to mechanistically dissect the consequences of Uif and Notch signaling in the regulation of glial processes. Together, this valuable study provides convincing evidence of a new player coordinating the interactions controlling the glial wrapping of axons.

      Reviewer #1 (Recommendations for the authors): 

      (1) To be reproducible and understandable, it would be important to provide detailed information about crosses and genotypes, as reagents are currently listed individually and genotypes are provided in rather simplified versions. 

      We have added the requested information to the text.

      (2) Neurons are inherently resistant to RNAi-mediated knockdown and it thus may be necessary to introduce the over-expression of UAS-dcr2 when assessing neuronal requirements and to specifically exclude Delta or Serrate as ligands. 

      We agree with the reviewer and have repeated the knockdown experiments using UAS-dcr2 and obtained the same results. To use an RNAi independent approach we also employed sgRNA expression in the presence of Cas9. The neuron specific gene knockout also showed no glial wrapping phenotype. These results are now added to the manuscript.

      (3) Throughout the manuscript, the authors use the terms "growth" and "differentiation" referring to the extent of branch formation versus axon wrapping. However glial differentiation and growth could have different meanings (for instance, growth could implicate changes in cell size or numbers, while differentiation could refer to a change from an immature precursor-like state to a mature cell identity). It may thus be useful to replace these general terms with more specific ones. 

      This is a very good point. When we use the term “growth” we only infer on glial cell growth and thus, the increase in cell mass. Proliferation is excluded and this is now explicitly stated in the manuscript. The term “differentiation” is indeed difficult and therefore we changed it either directly addressing the morphology or to axon wrapping.

      (4) Page 4. "remake" fibers should be Remak fibers. 

      We have corrected this typo.

      (5) Page 5. "Heartless controls glial growth but does promote axonal wrapping", this sentence is not clear in its message because of the "but".

      We have corrected this sentence.

      (6) Generally, many gene names are used as abbreviations without introductions (e.g. Sos, Rl, Msk on page 7). These would require an introduction.

      All genetic elements are now introduced.

      (7) Page 8. When Cas9 is expressed ubiquitously ... It would be helpful to add how this is done (nsyb-Gal4, nrv2-Gal4, or another Gal4 driver are used to express UAS-Cas9, as the listed Gal4 drivers seem to be specific to neurons or glia?).

      This now added. We used the following genotype for ubiquitous knockout using the four different uif specific sgRNAs (UAS-uif<sup>sgRNA X</sup>): [w; UAS-Cas9/ Df(2L)ED438; da-Gal4 /UAS-uif<sup>sgRNA X</sup>]. We used the following genotype for a glial knockout in wrapping glia ([+/+; UAS-Cas9/+; nrv2-Gal4,UAS-CD8::mCherry/UAS-uif<sup>sgRNA X</sup>].

      We had previously shown that nrv2-Gal4 is a wrapping glia specific driver in the larval PNS (Kottmeier et al., 2020).

      Moreover, the authors mention that "This indicates that a putatively secreted version of Uif is not functional". This conclusion would need to be explained in detail.

      First, because it requires quite some detective work to understand the panels in Figure 1 on which this statement is based; second, since the acutely induced double-stranded breaks in the DNA and subsequent repair may cause variable defects, it may indeed be not certain what changes have been induced in each cell; and third considering that there is a putative cleavage site, would it be not be expected that the protein is not functional, when it is not cleaved, and there is no secreted extracellular part (unless the cleavage site is not required). The latter could probably only be addressed by rescue experiments with UAS transgenes with identified changes.

      We agree with the reviewer. The rescue experiments are unfortunately difficult, since even expression of a full length uif construct does not fully rescue the uif mutant phenotype (Loubéry et al., 2014). We therefore explained the conclusion taken from the different sgRNA knockout experiments better and also removed the statement that secreted Uif forms are non-functional.

      In the Star Method reagent table, it is not clear, why all 8 oligonucleotides are for "uif cleavage just before transmembrane domain" despite targeting different locations. 

      We are very sorry for this mistake and corrected it now. Thank you very much for spotting this.

      (8) Page 13. However, we expressed activated Notch,... the word "when" seems to be missing, and it would be helpful to specify how this was done (over-expression of N[ICD].

      We corrected it now accordingly.

      (9) To strengthen the point similarity of phenotypes caused by Htl pathway over-activation and Uif over-expression, it would be helpful to also show an EM electron micrograph of the former.

      We now added an extensive description of the phenotype caused by activated Heartless. This is shown as new Figure 2.

      (10) Figure 4C, the larval nerve seems to be younger, as many extracellular spaces between axons are detected.

      This perception is a misunderstanding and we are sorry for not explaining this better. The third instar larvae are all age matched. The particular specimen in Figure 4C shows some fixation artifacts that result in the loss of material. Importantly, however, membranes are not affected. Similar loss of material is also seen in Figure 6C. For further examples please see a study on nerve anatomy by (Kautzmann et al., 2025).

      (11) The model could be presented as a figure panel in the manuscript. To connect the recommendation section with the above public review, a step forward could be to adjust the model and the wording in the Result section and to move some of the less explored points and thoughts to the discussion.

      We are thankful for this advice and have moved an updated model figure to the end of the main text (now Figure 7).

      Reviewer #2 (Recommendations for the authors):

      (1) Screen and the interest in Uif: Out of the ~62 genes that came out of the RNAi screen, why did the authors prioritize and focus on Uif? What were the other genes that came out of the screen, and did any of those impinge on Notch signaling? 

      We have now more thoroughly described the results of the screen.  We selected Uif as it was the only transmembrane // adhesion protein identified and given the findings that Uif decorate apical membrane domains in epithelial cells, we hoped to identify a protein specific for a similar membrane domain in wrapping glia.

      Notch as well as its downstream transcription factors were not included in the initial screen, and were only analyzed, once we had seen the contribution of Notch. Interestingly, here is one single hit in our screen linked to Notch signaling: Gp150. Here however, we have tested additional dsRNA expressing lines and were not able to reproduce the phenotype. This information is added to the discussion.

      The authors performed a large-scale screen of 2600 RNAi lines, it seems more details about what came out of the screen and why the focus on Uif would benefit the manuscript. 

      See above comment.

      Relatedly, there would be a discussion of the limitations of the screen, and that it was really a screen looking to modify a gain-of-function phenotype from the activated Htl allele; it seems a screen of this design may lead to artifacts that may not reflect endogenous signaling.

      We have now added a short paragraph on suppressor screens, employing gain of function alleles to the introduction.

      “In Drosophila, such suppressor screens have been used successfully many times (Macagno et al., 2014; Rebay et al., 2000; Therrien et al., 2000). Possibly, such screens also uncover genes that are not directly linked to the signaling pathway under study but this can be tested in further experiments. Our screen led to the unexpected identification of the large transmembrane protein Uninflatable, which in epithelial cells localizes to the apical plasma membrane. Loss of uninflatable suppresses the phenotype caused by activated RTK signaling. In addition, we find that uif knockdown and uif knockout larvae show impaired glial growth while an excess of Uninflatable leads to the formation of ectopic wrapping membrane processes that, however, fail to interact with axons. uninflatable is also known to inhibit Notch.  “

      (2) In general this study relies on RNAi knockdown, and is generally well controlled in using multiple RNAi lines giving the same phenotype, and also controlled for by tissue-specific gene knockout. However, there is little in the way of antibody staining to directly confirm the target of interest is lost/reduced, which would obviously strengthen the study. 

      Lacking the tools or ability to assess RNAi efficiency (qPCR, antibody staining), some conclusions need to be tempered. For example, in the experiments in Figure S6 regarding canonical Notch signaling, the authors do not find a phenotype by Delta or Serrate knockdown, but there are no experiments that show Delta or Serrate are lost. Thus, if the authors cannot directly test for RNAi efficiency, these conclusions should be tempered throughout the manuscript. 

      We agree with the reviewer and now provide information on the use of Dicer in our RNAi experiments and conducted new sgRNA/Cas9 experiments. In addition we tempered our wording stating that Dl and or Ser are still possible ligands.

      (3) More description is needed regarding how the authors are measuring and calculating the "wrapping index". In principle, the approach seems sound. However, are there cases where axons are "partially" wrapped of various magnitudes, and how are these cases treated in the analysis? Are there additional controls of previously characterized mutants to illustrate the dynamic range of the wrapping index in various conditions?

      This is now explained.

      Further, can the authors quantify the phenotypes in the axonal "bulges" in Figures 1, 3, and 5?

      This is a difficult question. Although we can easily quantify the number of bulges we cannot quantify the severity of the phenotype as this will require EM analysis. Sectioning nerves at a specific distance of the ventral nerve cord already requires very careful adjustments. Sectioning at the level of a bulge is way more difficult and it is not possible to get the number of sections needed to quantify the bulge phenotype.

      The fact is that all wrapping glial cells develop swellings (bulges) at the position of the nucleus. As there are in general three wrapping glial cells per segmental nerve, the number of bulges is three.

      (4) It seems difficult to clearly untangle the functions of Htl/Uif/Notch in differentiation itself vs subsequent steps in growth/wrapping. For example, if the differentiation steps are not properly coordinated, couldn't this give rise to some observed differences in growth or wrapping at later stages? I'm not sure of any obvious experiments to pursue here, but at least a brief discussion of these issues in the manuscript would be of use.

      We have discussed this in our discussion now more carefully. To discriminate the function of the three genes in either differentiation or in a stepwise mode of growth and differentiation.

      When comparing the different loss of function phenotypes they al appear the same, which would argue all three genes act in a common process.

      However, when we look at gain of function phenotypes, Htl and Uif behave different compared to Notch. This would favor for two distinct processes.

      We have now added activity markers for RTK signaling to directly show that Notch silences RTK activity. Unfortunately we were not able to do a similar reciprocal experiment.

      Minor:

      (1) The Introduction is too long, and would benefit from revisions to make it shorter and more concise.

      We have shortened the introduction and hopefully made it more concise.

      (2) A schematic illustrating the model the authors propose about Htl, Uif, and Notch in glial differentiation, growth, and wrapping would benefit the clarity of this work. 

      We had previously added the graphical abstract below that we updated and included as a Figure in the main text.

      References

      Kautzmann, S., Rey, S., Krebs, A., and Klämbt, C. (2025). Cholinergic and glutamatergic axons differentially require glial support in the Drosophila PNS. Glia. 10.1002/glia.70011.

      Kottmeier, R., Bittern, J., Schoofs, A., Scheiwe, F., Matzat, T., Pankratz, M., and Klämbt, C. (2020). Wrapping glia regulates neuronal signaling speed and precision in the peripheral nervous system of Drosophila. Nature communications 11, 4491-4417. 10.1038/s41467-020-18291-1.

      Loubéry, S., Seum, C., Moraleda, A., Daeden, A., Fürthauer, M., and González-Gaitán, M. (2014). Uninflatable and Notch control the targeting of Sara endosomes during asymmetric division. Current biology : CB 24, 2142-2148. 10.1016/j.cub.2014.07.054.

      Macagno, J.P., Diaz Vera, J., Yu, Y., MacPherson, I., Sandilands, E., Palmer, R., Norman, J.C., Frame, M., and Vidal, M. (2014). FAK acts as a suppressor of RTK-MAP kinase signalling in Drosophila melanogaster epithelia and human cancer cells. PLoS Genet 10, e1004262. 10.1371/journal.pgen.1004262.

      Rebay, I., Chen, F., Hsiao, F., Kolodziej, P.A., Kuang, B.H., Laverty, T., Suh, C., Voas, M., Williams, A., and Rubin, G.M. (2000). A genetic screen for novel components of the Ras/Mitogen-activated protein kinase signaling pathway that interact with the yan gene of Drosophila identifies split ends, a new RNA recognition motif-containing protein. Genetics 154, 695-712. 10.1093/genetics/154.2.695.

      Sieglitz, F., Matzat, T., Yuva-Adyemir, Y., Neuert, H., Altenhein, B., and Klämbt, C. (2013). Antagonistic Feedback Loops Involving Rau and Sprouty in the Drosophila Eye Control Neuronal and Glial Differentiation. Science signaling 6, ra96. 10.1126/scisignal.2004651.

      Therrien, M., Morrison, D.K., Wong, A.M., and Rubin, G.M. (2000). A genetic screen for modifiers of a kinase suppressor of Ras-dependent rough eye phenotype in Drosophila. Genetics 156, 1231-1242.

    1. eLife Assessment

      This important study investigates why the 13-lined ground squirrel (13LGS) retina is unusually rich in cone photoreceptors, the cells responsible for color and daylight vision. The authors perform deep transcriptomic and epigenetic comparisons between the mouse and the 13-lined ground squirrel (13LGS) to provide convincing evidence that identifies mechanisms that drive rod vs cone-rich retina development. Overall, this key question is investigated using an impressive collection of new data, cross-species analysis, and subsequent in vivo experiments.

    2. Reviewer #2 (Public review):

      Summary:

      This paper aims to elucidate the gene regulatory network governing the development of cone photoreceptors, the light-sensing neurons responsible for high acuity and color vision in humans. The authors provide a comprehensive analysis through stage-matched comparisons of gene expression and chromatin accessibility using scRNA-seq and scATAC-seq from the cone-dominant 13-lined ground squirrel (13LGS) retina and the rod-dominant mouse retina. The abundance of cones in the 13LGS retina arises from a dominant trajectory from late retinal progenitor cells (RPCs) to photoreceptor precursors and then to cones, whereas only a small proportion of rods are generated from these precursors.

      Strengths:

      The paper presents intriguing insights into the gene regulatory network involved in 13LGS cone development. In particular, the authors highlight the expression of cone-promoting transcription factors such as Onecut2, Pou2f1, and Zic3 in late-stage neurogenic progenitors, which may be driven by 13LGS-specific cis-regulatory elements. The authors also characterize candidate cone-promoting genes Zic3 and Mef2C, which have been previously understudied. Overall, I found that the across-species analysis presented by this study is a useful resource for the field.

      Comments on Revision:

      The authors have addressed my questions, and the revised text now presents their findings more clearly.

    3. Reviewer #3 (Public review):

      Summary:

      The authors perform deep transcriptomic and epigenetic comparisons between mouse and 13-lined ground squirrel (13LGS) to identify mechanisms that drive rod vs cone rich retina development. Through cross species analysis the authors find extended cone generation in 13LGS, gene expression within progenitor/photoreceptor precursor cells consistent with lengthened cone window, and differential regulatory element usage. Two of the transcription factors, Mef2c and Zic3, were subsequently validated using OE and KO mouse lines to verify role of these genes in regulating competence to generate cone photoreceptors.

      Strengths:

      Overall, this is an impactful manuscript with broad implications toward our understanding of retinal development, cell fate specification, and TF network dynamics across evolution and with the potential to influence our future ability to treat vision loss in human patients. The generation of this rich new dataset profiling the transcriptome and epigenome of the 13LGS is a tremendous addition to the field that assuredly will be useful for numerous other investigations and questions of a variety of interests. In this manuscript, the authors use this dataset and compare to data they previously generated for mouse retinal development to identify 2 new regulators of cone generation and shed insights onto their regulation and their integration into the network of regulatory elements within the 13LGS compared to mouse.

      The authors have done considerable work to address reviewer concerns from the first draft. The current version of the manuscript is strong and supports the claims.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public review): 

      Summary 

      In this manuscript, Weir et al. investigate why the 13-lined ground squirrel (13LGS) retina is unusually rich in cone photoreceptors, the cells responsible for color and daylight vision. Most mammals, including humans, have rod-dominant retinas, making the 13LGS retina both an intriguing evolutionary divergence and a valuable model for uncovering novel mechanisms of cone generation. The developmental programs underlying this adaptation were previously unknown. 

      Using an integrated approach that combines single-cell RNA sequencing (scRNAseq), scATACseq, and histology, the authors generate a comprehensive atlas of retinal neurogenesis in 13LGS. Notably, comparative analyses with mouse datasets reveal that in 13LGS, cones can arise from late-stage neurogenic progenitors, a striking contrast to mouse and primate retinas, where late progenitors typically generate rods and other late-born cell types but not cones. They further identify a shift in the timing (heterochrony) of expression of several transcription factors.

      Further, the authors show that these factors act through species-specific regulatory elements. And overall, functional experiments support a role for several of these candidates in cone production. 

      Strengths 

      This study stands out for its rigorous and multi-layered methodology. The combination of transcriptomic, epigenomic, and histological data yields a detailed and coherent view of cone development in 13LGS. Cross-species comparisons are thoughtfully executed, lending strong evolutionary context to the findings. The conclusions are, in general, well supported by the evidence, and the datasets generated represent a substantial resource for the field. The work will be of high value to both evolutionary neurobiology and regenerative medicine, particularly in the design of strategies to replace lost cone photoreceptors in human disease. 

      Weaknesses 

      (1) Overall, the conclusions are strongly supported by the data, but the paper would benefit from additional clarifications. In particular, some of the conclusions could be toned down slightly to reflect that the observed changes in candidate gene function, such as those for Zic3 by itself, are modest and may represent part of a more complex regulatory network.  

      We have revised the text to qualify these conclusions as suggested.

      “Zic3 promotes cone-specific gene expression and is necessary for generating the full complement of cone photoreceptors”

      “Pou2f1 overexpression upregulated an overlapping but distinct, and larger, set of cone-specific genes relative to Zic3, while also downregulating many of the same rod-specific genes, often to a greater extent (Fig. 3C).”

      “This resulted in a statistically significant ~20% reduction in the density of cone photoreceptors in the mutant retina (Fig. 3E,F), while the relative numbers of rods and horizontal cells remained unaffected (Fig. S4A-D).”

      “Our analysis suggests that gene regulatory networks controlling cone specification are highly redundant, with transcription factors acting in complex, redundant, and potentially synergistic combinations. This is further supported by our findings on the synergistic effects of combined overexpression of Zic3 and Pou2f1 increasing both the number of differentially expressed genes and their level of change in expression relative to the modest changes seen with overexpression of either gene alone (Fig. 3) and the relatively mild or undetectable phenotypes observed following loss of function of Zic3 and Mef2c (Fig. 3, Fig. S6), as well as other cone-promoting factors such as Onecut1 and Pou2f1[18,19].“

      (2) Additional explanations about the cell composition of the 13LGS retina are needed. The ratios between cone and rod are clearly detailed, but do those lead to changes in other cell types? 

      The 13LGS retina, like most cone-dominant retinas, shows relatively lower numbers of rod and cone photoreceptors (~20%) than do nocturnal species such as mice (~80%). The difference is made up by increased numbers of inner retinal neurons and Muller glia. While rigorous histological quantification of the abundance of inner retinal cell types has not yet been performed for 13LGS, we can estimate these values using our snATAC-Seq data.  These numbers are provided in Table ST1, and are now discussed in the text.  

      (3) Could the lack of a clear trajectory for rod differentiation be just an effect of low cell numbers for this population? 

      This is indeed likely to be the case. This is now stated explicitly in the text.

      “However, no clear trajectory for rod differentiation was detected, likely due to the very low number of rod cells detected prior to P17 (Fig. 2A).”

      (4) The immunohistochemistry and RNA hybridization experiments shown in Figure S2 would benefit from supporting controls to strengthen their interpretability. While it has to be recognized that performing immunostainings on non-conventional species is not a simple task, negative controls are necessary to establish the baseline background levels, especially in cases where there seems to be labeling around the cells. The text indicates that these experiments are both immunostainings and ISH, but the figure legend only says "immunohistochemistry". Clarifying these points would improve readers' confidence in the data. 

      The figure legend has been corrected, and negative controls for P24 have been added. The figure legend has been modified as follows:

      “Fluorescent in situ hybridization showing co-expression of (A) Pou2f1 and Otx2 or (B) Zic3, Rxrg, and Otx2 in P1, P5, P10, and P24 retinas. Insets show higher power images of highlighted areas. (C) Zic3, Rxrg, and Otx2 fluorescent in situ hybridization from P24 with matched (C’) negative controls.  (D) Pou2f1 and Otx2 fluorescent in situ hybridization from P24 with matched (D’) negative controls. (E) Quantification of the fraction of Otx2-positive cells in the outer neuroblastic layer (P1, P5) and ONL (P10, P24) that also express Zic3. (F) Immunohistochemical analysis Mef2c and Otx2 expression in P1, P5, P10, and P24 retinas. (G) Mef2c and Otx2 immunohistochemistry from P24 with matched (G’) negative controls. Negative controls for fluorescent in situ hybridization omit the probe and for immunohistochemistry omit primary antibodies. Scale bars, 10 µm (S2A-F), 50 µm (S2G) and 5 µm (inset). Cell counts in E were analyzed using one-way ANOVA analysis with Sidak multiple comparisons test and 95% confidence interval. ** = p <0.01, **** = p <0.0001, and ns = non-significant. N=3 independent experiments.”

      (5) Figure S3: The text claims that overexpression of Zic3 alone is sufficient to induce the conelike photoreceptor precursor cells as well as horizontal cell-like precursors, but this is not clear in Figure S3A nor in any other figure. Similarly, the effects of Pou2f1 overexpression are different in Figure S3A and Figure S3B. In Figure S3B, the effects described (increased presence of cone-like and horizontal-like precursors) are very clear, whereas it is not in Figure S3A. How are these experiments different? 

      These UMAP data represent two independent experiments. Total numbers and relative fractions of each cell type are now included in Table ST5.

      In these experiments, cone-like precursors were identified by both cell type clustering and differential gene expression. Cells from all conditions were found in the cone-like precursor cluster. However, cells electroporated with a plasmid expressing GFP alone only showed GFP as a differentially expressed gene, identifying them most likely as GFP+ rods. In contrast, Zic3 overexpression resulted in increased expression of cone-specific genes and decreased expression of rod-specific genes in both cone-like precursors and rods relative to controls electroporated with GFP alone. Cell type proportions across independent overexpression singlecell experiments could be influenced by a number of factors, including electroporation efficiency and ex vivo growth conditions. 

      (6) The analyses of Zic3 conditional mutants (Figure S4) reveal an increase in many cone, rod, and pan-photoreceptor genes with only a reduction in some cone genes. Thus, the overall conclusion that Zic3 is essential for cones while repressing rod genes doesn't seem to match this particular dataset. 

      We observe that loss of function of Zic3 in developing retinal progenitors leads to a reduction in the total number of cones (Fig. 4E,F). In Fig. S4, we investigate how gene expression is altered in both the remaining cones and in other retinal cell types. We only observed significant changes in mutant cones and Muller glia relative to controls. We observe a mixed phenotype in cones, with a subset of cone-specific genes downregulated (notably including Thrb), a subset of others upregulated (including Opn1sw). We also find that genes expressed both in rods and cones, as well as rod-specific genes, are downregulated in cKO cones. Since rods are fragile cells that are located immediately adjacent to cones, some level of contamination of rod-specific genes is inevitable in single-cell analysis of dissociated cones (c.f. PMID: 31128945, 34788628), and this reduced level of rod contamination could result from altered adhesion between mutant rods and cones. In mutant Muller glia, in contrast, we see a broad decrease in expression of Muller glia-specific genes, which likely reflects the indirect effects of Zic3 loss of function in retinal progenitors, and an upregulation of both broadly photoreceptor-specific genes and a subset of rod-specific genes, which may also result from altered adhesion between Muller glia and rods. 

      This is consistent with the conclusions in the text, although we have both modified the text and included heatmaps showing downregulation of rod-specific genes in mutant cones, to clarify this finding.

      “In addition, we observe a broad decrease in expression of genes expressed at high levels in both cones and rods (Rpgrip1, Drd4) and rod-specific genes (Rho, Cnga1, Pde6b) in mutant cones (Fig. S4F). Since rods are fragile cells that are located immediately adjacent to cones, some level of contamination of rod-specific genes is inevitable in single-cell analysis of dissociated cones (c.f. PMID: 31128945, 34788628), and this reduced level of rod contamination could result from altered adhesion between mutant rods and cones. In contrast, increased expression of rod-specific genes (Rho, Nrl, Pde6g, Gngt1) and pan-photoreceptor genes (Crx, Stx3, Rcvrn) was observed in Müller glia (Fig. S4G), which may likewise result from altered adhesion between Muller glia and rods. Finally, several Müller glia-specific genes were downregulated, including Clu, Aqp4, and Notch pathway components such as Hes1 and Id3, with the exception of Hopx, which was upregulated (Fig. S4G). This likely reflects the indirect effects of Zic3 loss of function in retinal progenitors. These findings indicate that Zic3 is essential for the proper expression of photoreceptor genes in cones while also playing a role in regulating expression of Müller glia-specific genes.”

      (7) Throughout the text, the authors used the term "evolved". To substantiate this claim, it would be important to include sequence analyses or to rephrase to a more neutral term that does not imply evolutionary inference. 

      We have modified the text as requested to replace “evolved” and “evolutionarily conserved” where possible, with examples of revised text listed below:  

      “These results demonstrate that modifications to gene regulatory networks underlie the development of cone-dominant retina,...”

      “Our results demonstrate that heterochronic expansion of the expression of transcription factors that promote cone development is a key event in the development of the cone-dominant 13LGS retina.”

      “Conserved patterns of motif accessibility, identified using ChromVAR and theTRANSFAC2018 database, (Fig. S1F, Table ST1)...”

      “However, most of these elements  mapped to sequences that were not shared between 13LGS and mouse, with intergenic enhancers exhibiting particularly low levels of conservation (Fig. 5B).”

      “We conclude that the development of the cone-dominant retina in 13LGS is driven by novel cisregulatory elements…”

      “Based on our bioinformatic analysis, the cone-dominant 13LGS retina follows this paradigm, in which species-specific enhancer elements…”

      “Dot plots showing the enrichment of binding sites for Otx2 and Neurod1, TFs which are broadly expressed in both neurogenic RPC and photoreceptor precursors, which are enriched in both conserved cis-regulatory elements in both species. (D) Bar plots showing the number of conversed and species-specific enhancers per TSS in four cone-promoting genes between 13LGS and mouse.”

      Reviewer #2 (Public review): 

      Summary: 

      This paper aims to elucidate the gene regulatory network governing the development of cone photoreceptors, the light-sensing neurons responsible for high acuity and color vision in humans. The authors provide a comprehensive analysis through stage-matched comparisons of gene expression and chromatin accessibility using scRNA-seq and scATAC-seq from the conedominant 13-lined ground squirrel (13LGS) retina and the rod-dominant mouse retina. The abundance of cones in the 13LGS retina arises from a dominant trajectory from late retinal progenitor cells (RPCs) to photoreceptor precursors and then to cones, whereas only a small proportion of rods are generated from these precursors. 

      Strengths: 

      The paper presents intriguing insights into the gene regulatory network involved in 13LGS cone development. In particular, the authors highlight the expression of cone-promoting transcription factors such as Onecut2, Pou2f1, and Zic3 in late-stage neurogenic progenitors, which may be driven by 13LGS-specific cis-regulatory elements. The authors also characterize candidate cone-promoting genes Zic3 and Mef2C, which have been previously understudied. Overall, I found that the across-species analysis presented by this study is a useful resource for the field. 

      Weaknesses: 

      The functional analysis on Zic3 and Mef2C in mice does not convincingly establish that these factors are sufficient or necessary to promote cone photoreceptor specification. Several analyses lack clarity or consistency, and figure labeling and interpretation need improvement. 

      We have modified the text and figures to more clearly describe the observed roles of Zic3 and Mef2c in cone photoreceptor development as detailed in our responses to reviewer recommendations.

      Reviewer #3 (Public review): 

      Summary: 

      The authors perform deep transcriptomic and epigenetic comparisons between mouse and 13lined ground squirrel (13LGS) to identify mechanisms that drive rod vs cone-rich retina development. Through cross-species analysis, the authors find extended cone generation in 13LGS, gene expression within progenitor/photoreceptor precursor cells consistent with a lengthened cone window, and differential regulatory element usage. Two of the transcription factors, Mef2c and Zic3, were subsequently validated using OE and KO mouse lines to verify the role of these genes in regulating competence to generate cone photoreceptors. 

      Strengths: 

      Overall, this is an impactful manuscript with broad implications toward our understanding of retinal development, cell fate specification, and TF network dynamics across evolution and with the potential to influence our future ability to treat vision loss in human patients. The generation of this rich new dataset profiling the transcriptome and epigenome of the 13LGS is a tremendous addition to the field that assuredly will be useful for numerous other investigations and questions of a variety of interests. In this manuscript, the authors use this dataset and compare it to data they previously generated for mouse retinal development to identify 2 new regulators of cone generation and shed insights into their regulation and their integration into the network of regulatory elements within the 13LGS compared to mouse. 

      Weaknesses: 

      (1) The authors chose to omit several cell classes from analyses and visualizations that would have added to their interpretations. In particular, I worry that the omission of 13LGS rods, early RPCs, and early NG from Figures 2C, D, and F is notable and would have added to the understanding of gene expression dynamics. In other words, (a) are these genes of interest unique to late RPCs or maintained from early RPCs, and (b) are rod networks suppressed compared to the mouse? 

      We were unable to include 13LGS rods in our analysis due to the extremely low number of cells detected prior to P17. Relative expression levels of cone-promoting transcription factors in 13LGS in early RPCs and early NG cells is shown in Fig. 2H. Particularly when compared to mice, we also observe elevated expression of cone-promoting genes in early-stage RPC and/or early NG cells. These include Zic3, Onecut2, Mef2c, and Pou2f1, as well as transcription factors that promote the differentiation of post-mitotic cone precursors, such as Thrb and Rxrg. Contrast this with genes that promote specification and differentiation of both rods and cones, such as Otx2 and Crx, which show similar or even slightly higher expression in mice. Genes such as Casz1, which act in late NG cells to promote rod specification, are indeed downregulated in 13LGS late NG cells relative to mice. We have modified the text to clarify these points, as shown below:

      “To further characterize species-specific patterns of gene expression and regulation during postnatal photoreceptor development, we analyzed differential gene expression, chromatin accessibility, and motif enrichment across late-stage primary and neurogenic progenitors, immature photoreceptor precursors, rods, and cones. Due to their very low number before time point P17, we were unable to include 13LGS rods in the analysis.”

      “In contrast, two broad patterns of differential expression of cone-promoting transcription factors were observed between mouse and 13LGS.”

      “First, transcription factors identified in this network that are known to be required for committed cone precursor differentiation, including Thrb, Rxrg, and Sall3 [25,26,45], consistently showed stronger expression in late-stage RPCs and early-stage primary and/or neurogenic RPCs of 13LGS compared to mice.”

      “Second, transcription factors in the network known to promote cone specification in early-stage mouse RPCs, such as Onecut2 and Pou2f1, exhibited enriched expression in early and latestage primary and/or neurogenic RPCs of 13LGS, implying a heterochronic expansion of conepromoting factors into later developmental stages.”

      “In contrast, genes such as Casz1, which act in late neurogenic RPCs to promote rod specification, are downregulated in 13LGS late neurogenic RPCs relative to mice.”

      (2) The authors claim that the majority of cones are generated by late RPCs and that this is driven primarily by the enriched enhancer network around cone-promoting genes. With the temporal scRNA/ATACseq data at their disposal, the authors should compare early vs late born cones and RPCs to determine whether the same enhancers and genes are hyperactivated in early RPCs as well as in the 13LGS. This analysis will answer the important question of whether the enhancers activated/evolved to promote all cones, or are only and specifically activated within late RPCs to drive cone genesis at the expense of rods. 

      This is an excellent question.  We have addressed this question by analyzing both expression of the cone-promoting genes identified in C2 and C3 in Figure 2C and accessibility of their associated enhancer sequences, which are shown in Figure 6B, in early and late-stage RPCs and cone precursors.  The results are shown in Author response image 1 below. We observe that cone-promoting genes consistently show higher expression in both late-stage RPCs and cones.  We do not observe any clear differences in the accessibility of the associated enhancer regions, as determined by snATAC-Seq.  However, since we have not performed CUT&RUN analysis in embryonic retina for H3K27Ac or any other marker of active enhancer elements, we cannot determine whether the total number of active enhancers differs between early and late-stage RPCs. We suspect, however, this is likely to be the case, given the differences in the expression levels of these genes.

      Author response image 1.

      Relative expression levels of cone-promoting genes and accessibility of enhancer elements associated with these genes in early- and late-stage RPCs and cone precursors.

      (3) The authors repeatedly use the term 'evolved' to describe the increased number of local enhancer elements of genes that increase in expression in 13LGS late RPCs and cones. Evolution can act at multiple levels on the genome and its regulation. The authors should consider analysis of sequence level changes between mouse, 13LGS, and other species to test whether the enhancer sequences claimed to be novel in the 13LGS are, in fact, newly evolved sequence/binding sites or if the binding sites are present in mouse but only used in late RPCs of the 13LGS. 

      Novel enhancer sequences here are defined as having divergent sequences rather than simply divergent activity. This point has been clarified in the text, with the following changes made:

      “However, most of these elements mapped to sequences that were not shared between 13LGS and mouse, with intergenic enhancers exhibiting particularly low levels of conservation (Fig. 5B).”

      “...demonstrated far greater motif enrichment in active regulatory elements in 13LGS than in mice, though few of these elements mapped to sequences that were shared between 13LGS and mouse (Fig. 5C,D, Table ST10).”

      (4) The authors state that 'Enhancer elements in 13LGS are predicted to be directly targeted by a considerably greater number of transcription factors than in mice'. This statement can easily be misread to suggest that all enhancers display this, when in fact, this is only the conepromoting enhancers of late 13LGS RPCs. In a way, this is not surprising since these genes are largely less expressed in mouse vs 13LGS late RPCs, as shown in Figure 2. The manuscript is written to suggest this mechanism of enhancer number is specific to cone production in the 13LGS- it would help prove this point if the authors asked the opposite question and showed that mouse late RPCs do not have similar increased predicted binding of TFs near rodpromoting genes in C7-8. 

      The Reviewer’s point is well taken, and we agree that this mechanism is unlikely to be specific to cone photoreceptors, since we are simply looking at genes that show higher expression in late-stage neurogenic RPCs in 13LGS. We have changed the relevant text to now state:

      “Enhancer elements associated with cone-specific genes in 13LGS are predicted to be directly targeted by a considerably greater number of transcription factors in late-stage neurogenic RPCs than in mice, as might be expected, given the higher expression levels of these genes.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors): 

      (1) Minor: Clusters C1-C8 (Figure 2) are labeled as "C1-8" in the text but "G1-8" in the figure. 

      This has been done.

      (2) Minor: Showing other neurogenic factors (Olig2, Ascl1, Otx2) and late-stage specific factors (Lhx2, Sox8, Nfia/b) could be shown in Figure 2 to better support the text. 

      This has been done. These motifs are consistent in both species, but Figure 2F shows differential motifs. The reference to Figure 2F has been altered to include Table ST4, while Neurod1 motifs are shown in Fig. 2F.

      Reviewer #2 (Recommendations for the authors): 

      (1) Figure 2 

      2A-B: The exclusion of early-stage data from the species-integrated analysis is puzzling, as it could reveal significant differences between early-stage neurogenic progenitors in mice and late-stage progenitors in 13LGS that both give rise to cones. This analysis would also shed light on how cone-promoting transcription factors are suppressed in mouse early-stage progenitors, limiting the window for cone genesis.

      2C: The figure labels G1-8, while C1-8 are referenced in the text. 

      2F: Neurog2, Olig2, Ascl1, and Neurod1 are mentioned in the text but not labeled in the figure. 

      2A-B: There are indeed substantial differences between early-stage RPC in 13LGS and latestage RPC in mice that are broadly linked to control of temporal patterning, which are mentioned in the text. For instance, early-stage RPCs in both animals express higher levels of Nr2f1/2, Meis1/2, and Foxp1/4, while late-stage RPCs express higher levels of Nfia/b/x, indicating that core distinction between early- and late-stage RPCs is maintained.  What most clearly differs in 13-LGS is the sustained expression of a subset of cone-promoting transcription factors in late-stage RPCs that are normally restricted to early-stage RPCs in mice. However, as mentioned in response to Reviewer #3’s first point, we do observe some evidence for increased expression of cone-promoting transcription factors in early-stage RPCs and NG cells of 13LGS relative to mice, although this is much less dramatic than observed at later stages.  We have modified the text to directly mention this point. G1-8 has been corrected to C1-8 in the figure, a reference to Table ST4 has been added in discussion of neurogenic bHLH factors, and Fig. 2F has been modified to label Neurod1. 

      “First, transcription factors identified in this network that are known to be required for committed cone precursor differentiation, including Thrb, Rxrg, and Sall3 [25,26,45], consistently showed stronger expression in late-stage RPCs and early-stage primary and/or neurogenic RPCs of 13LGS compared to mice.”

      “Second, transcription factors in the network known to promote cone specification in early-stage mouse RPCs, such as Onecut2 and Pou2f1, exhibited enriched expression in early and latestage primary and/or neurogenic RPCs of 13LGS, implying a heterochronic expansion of conepromoting factors into later developmental stages.”

      (2) Figure 3 

      In 3F, the cone density in the WT retina is approximately 0.25 cones per micron, while in the Zic3 cKO retina, it is about 0.2 cones per micron. However, the WT control in Figure S6C also shows about 0.2 cones per micron, raising questions about whether there is a genuine decrease in cone number or if it results from quantification variability. Additionally, the proportion of cone cells in the Zic3 cKO scRNA-seq data shown in Figure S4E appears comparable to the WT control, which is inconsistent with the conclusion that Zic3 cKO leads to reduced cone production. Therefore, I found that the conclusion that Zic3 is necessary for cone development is not supported by the data.

      The cone density counts in the two mutant lines and accompanying littermate controls were collected by blinded counting by two different observers (R.A. for the Zic3 cKO and N.P. for the Mef2c cKO). We believe that the ~20% difference in the observed cone density in the two control samples likely represents investigator-dependent differences. These can exceed 20% between even highly skilled observers when quantifying dissociated cells (PMID: 35198419) and are likely to be even higher for immunohistochemistry samples.  Since both controls were done in parallel with littermate mutant samples, we therefore stand by our interpretation of these results.

      (3) Figures 4 and 5

      These figures are duplicates. In Figure 4, Mef2C overexpression in postnatal progenitors leads to increased numbers of neurogenic RPCs, suggesting it may promote cell proliferation rather than inhibit rod cell fate or promote cone cell fate. Electroporation of plasmids into P0 retina typically does not label cone cells, as cones are born prenatally in mice. Given the widespread GFP signal in Figure 4D, the authors should consider that the high background of GFP signal may have misled the quantification of the result.

      The figure duplication has been corrected. We respectfully disagree with the Reviewer’s statement that ex vivo electroporation performed at P0, as is the case here, does not label cones. We routinely observe small numbers of electroporated cones when performing this analysis. Cones at this age are located on the scleral face of the retina at this age and therefore in direct contact with the buffer solution containing the plasmid in question (c.f. PMID: 20729845, 31128945, 34788628, 40654906). Furthermore, since the level of GFP expression that is used to gate electroporated cells for isolation using FACS is typically considerably less than that used to identify a GFP-positive cell using standard immunohistochemical techniques, making it difficult to directly compare the efficiency of cone electroporation between these approaches. We agree, however, that Mef2c overexpression seems to broadly delay the differentiation of rod photoreceptors, and have modified the text to include discussion of this point.

      “Although a few GFP-positive electroporated cells co-expressing the cone-specific marker Gnat2 were detected in control (likely due to the electroporation of cone precursors, which we have previously observed in P0 retinal explants (Clark et al., 2019; Leavey et al., 2025; Lyu et al., 2021; Onishi et al., 2010)), there was a significant increase in double-positive cells in the test condition, matching the novel cone-like precursor population found in the scRNA-Seq (Fig. 4E).”

      “Indeed, overexpression of Mef2c increased the number of both neurogenic RPCs and immature photoreceptor precursors, suggesting that rod differentiation was broadly delayed.”

      (4) Figure S2 

      The figure legend lacks information about panels A and B. It is unclear which panels represent immunohistochemistry and which represent RNA hybridization chain reaction. Overall, the staining results are difficult to interpret, as it appears that all examined RNAs/proteins are positively stained across the sections with varying background levels. Specificity is hard to assess. For instance, in Figure S2B, the background intensity of Zic3 staining varies inconsistently from P1 to P24. The number of Zic3 mRNA dots seems to peak at P5 and decrease at P10, which contradicts the scRNA-seq results showing peak expression in mature cones.

      The figure legend has been corrected. Negative controls are now included for both in situ hybridization (Fig. S2C’) and immunostaining (Fig. S2G) at P24, along with paired experimental data.  We have quantified the total fraction of Otx2+ cells that also contain Zic3 foci, and find that coexpression peaks at P5 and P10.  This is now included as Fig. S2E.

      The number of Zic3 foci is in fact higher at P5 than P10, with XX foci/Otx2+ cell at P5 vs. YY foci/Otx2+ cell at P10.

      “Fluorescent in situ hybridization showing co-expression of (A) Pou2f1 and Otx2 or (B) Zic3, Rxrg, and Otx2 in P1, P5, P10, and P24 retinas. Insets show higher power images of highlighted areas. (C) Zic3, Rxrg, and Otx2 fluorescent in situ hybridization from P24 with matched (C’) negative controls. (D) Pou2f1 and Otx2 fluorescent in situ hybridization from P24 with matched (D’) negative controls. (E) Quantification of the fraction of Otx2-positive cells in the outer neuroblastic layer (P1, P5) and ONL (P10, P24) that also express Zic3. (F) Immunohistochemical analysis Mef2c and Otx2 expression in P1, P5, P10, and P24 retinas. (G) Mef2c and Otx2 immunohistochemistry from P24 with matched (G’) negative controls. Negative controls for fluorescent in situ hybridization omit the probe and for immunohistochemistry omit primary antibodies. Scale bars, 10 µm (S2A-F),  50 µm (S2G) and 5 µm (inset). Cell counts in E were analyzed using one-way ANOVA analysis with Sidak multiple comparisons test and 95% confidence interval. ** = p <0.01, **** = p <0.0001, and ns = non-significant. N=3 independent experiments.”

      (5) Figure S3

      In S3A and S3B, the UMAPs of the empty vector-treated groups are distinctly different. The same goes for Zic3+Pou2F1 UMAPS.

      In S3A, Zic3 overexpression alone does not appear to have any impact on cell fate. It is not evident that Zic3, even in combination with Pou2F1, has any significant impact on cone or other cell type production, as the proportions of the cones and cone precursors seem similar across different groups.

      In S3B, Zic3+Pou2F1 seems to increase HC-like precursors without increasing cone-like procursors or cones.

      Moreover, the cone-like precursors described do not seem to contribute to cone generation, as there is no increase in cones in the adult mouse retina; rather, these cells resemble rod-cone mosaic cells with expression of both rod- and cone-specific genes.

      As the Reviewer states, we observe some differences in the proportion of cell types in both control and experimental conditions between the two experiments. Notably, relatively more photoreceptors and correspondingly fewer progenitors, bipolar, and amacrine cells are observed in the samples shown in Fig. S3A relative to Fig. S3B.  However, these represent two independent experiments. Cell type proportions seen across independent ex vivo electroporation experiments such as these can be affected by a number of variables, including precise developmental age of the samples, electroporation efficiency, cell dissociation conditions, and ex vivo growth conditions.  Some differences are inevitable, which is why paired negative controls must always be done for results to be interpretable.

      In both experiments, we observe that overexpression of Zic3, Pou2f1, and most notably Zic3 and Pou2f1 lead to an increase in the relative fraction of cone-like precursors. In the experiment shown in Fig. S3B, we also observe that Zic3 alone, Onecut1 alone, and Zic3 and Pou2f1 in combination also promote generation of horizontal-like cells. All treatments likewise induce expression of different subsets of cone-enriched genes in the cone-like precursors, while also suppressing rod-specific genes in these same cells.

      Total numbers and relative fractions of each cell type are now included in Table ST5.

      (6) Figure S4

      The proportion of cone cells in the Zic3 cKO scRNA-seq data shown in Figure S4E appears comparable to the WT control, contradicting the conclusion that Zic3 cKO leads to reduced cone production. 

      Total numbers and relative fractions of each cell type are now included in Table ST6.

      (7) Figure S5

      In Figure S5A, Mef2C overexpression does not decrease expression of the rod gene Nrl. 

      This is correct, and is mentioned in the text.

      “No obvious reduction in the relative number of Nrl-positive cells was observed (Fig. S5A).”

      Reviewer #3 (Recommendations for the authors): 

      (1) The authors make several broad and definitive statements that have the potential to confuse readers. In the first sections of Results: 'retinal ganglion cells and amacrine cells were generated predominantly by early stage progenitors' but later say 'late-stage RPCs in 13LGS retina are competent to generate cone photoreceptors but not other early born cell types.' In the discussion, the authors themselves point out limitations of analyses without birthdating. These definitive statements should be qualified/amended. 

      Both single-cell RNA and ATAC-Seq analysis can be used to accurately profile cells that have recently exited mitosis and committed to a specific cell fate. When applied to data obtained from a developmental timecourse such as is the case here, this can in turn serve as a reasonable proxy for generating birthdating data. Nonetheless, we have modified the text to state that BrdU/EdU labeling is indeed the gold standard for drawing conclusions about cell birthdates, and should be used to confirm these findings in future studies.

      “The expected temporal patterns of neurogenesis were observed in both species: retinal ganglion cells and amacrine cells were generated predominantly in the early stage, whereas bipolar cells and Müller glia were produced in the late stage.”

      “Though BrdU/EdU labeling would be required to unambiguously demonstrate species-specific differences in birthdating, our findings strongly indicate that 13LGS exhibit a selective expansion of the temporal window of cone generation, extending into late stages of neurogenesis.”

      This sentence does not make a definitive statement about 13LGS RPC competence, and we have left it unaltered. 

      “These findings suggest that late-stage RPCs in 13LGS retina are competent to generate cone photoreceptors but not other early-born cell types…”

      (2) Figure 2C clusters are referred to as C1-8 in the text but G1-8 in the figure. This is confusing and should be fixed. 

      This has been corrected.

      (3) The authors refer to many genes that show differential expression in Figure 2F, but virtually none of these are labelled in the heatmap, making it hard to follow the narrative. 

      Figure 2F represents transcription factor binding motifs that are differentially active between mouse and 13LGS, not gene expression. We have modified the figure to include names of all differentially active motifs discussed in the text, and otherwise refer the reader to Table ST4, which includes a list of all differentially expressed genes.

    1. eLife Assessment

      This valuable retrospective analysis identified three independent components of glucose dynamics - "value," "variability," and "autocorrelation" - which may be used in predicting coronary plaque vulnerability. The study is solid and of interest to a wide range of investigators in the medical field who are interested in the role of glycemia on cardiometabolic health. The manuscript has been substantially strengthened by clarifying methods, improving transparency, and validating key findings, resulting in a coherent and persuasive case for autocorrelation as a meaningful third dimension of glucose dynamics despite remaining design-related limitations.

    2. Reviewer #2 (Public review):

      Summary:

      Sugimoto et al. explore the relationship between glucose dynamics-specifically value, variability, and autocorrelation-and coronary plaque vulnerability in patients with varying glucose tolerance levels. The study identifies three independent predictive factors for %NC and emphasizes the use of continuous glucose monitoring (CGM)-derived indices for coronary artery disease (CAD) risk assessment. By employing robust statistical methods and validating findings across datasets from Japan, America, and China, the authors highlight the limitations of conventional markers while proposing CGM as a novel approach for risk prediction.The study has the potential to reshape CAD risk assessment by emphasizing CGM-derived indices, aligning well with personalized medicine trends.

      Further, the revised version includes expanded biological interpretation, improved statistical justification, and a new web-based calculator for clinical translation. Together, these updates make the study an important contribution to precision risk assessment in diabetes and cardiovascular research.

      Strengths:

      The introduction of autocorrelation as a predictive factor for plaque vulnerability adds a novel dimension to glucose dynamic analysis.

      Inclusion of datasets from diverse regions enhances generalizability.

      The use of a well-characterized cohort with controlled cholesterol and blood pressure levels strengthens the findings.

      The focus on CGM-derived indices aligns with personalized medicine trends, showcasing potential for CAD risk stratification.

      The benchmarking of CGM-derived measures against established CAD risk models (e.g., Framingham Risk Score) enhances interpretability and significance.

      The addition of a web-based computational tool makes the proposed indices accessible for potential clinical and research use.

      Weaknesses:

      The biological mechanism linking glucose autocorrelation to plaque vulnerability, although plausibly associated with insulin clearance pathways, remains largely theoretical.

      The primary cohort size is still modest, and while supported by power analysis and external datasets, broader prospective validation will be important.

      Strict participant selection criteria as employed by the study may reduce applicability to broader populations.

      CGM-derived indices like AC_Var and ADRR may be too complex for routine clinical use without simplified models or guidelines.

      Comments on revised version:

      The authors have thoroughly addressed previous concerns and produced a much stronger manuscript. The study now provides a coherent, validated, and well-reasoned argument for including autocorrelation as a third major dimension of glucose dynamics. It offers both conceptual novelty and translational potential and will likely stimulate further research on temporal glucose metrics in metabolic and cardiovascular risk assessment.

    3. Reviewer #3 (Public review):

      Summary:

      This is a retrospective analysis of 53 individuals over 26 features (12 clinical phenotypes, 12 CGM features, and 2 autocorrelation features) to examine which features were most informative in predicting percent necrotic core (%NC) as parameter for coronary plaque vulnerability. Multiple regression analysis demonstrated a better ability to predict %NC from 3 selected CGM derived features than 3 selected clinical phenotypes. LASSO regularization and partial least squares (PLS) with VIP scores were used to identify 4 CGM features that most contribute to the precision of %NC. Using factor analysis they identify 3 components that have CGM related features: value (relating to the value of blood glucose), variability (relating to glucose variability), and autocorrelation (composed of the two autocorrelation features). These three groupings appeared in the 3 validation cohorts and when performing hierarchical clustering. To demonstrate how these three features change, a simulation was created to allow the user to examine these features under different conditions.

      Summary of Revision 1. This is a Valuable study supported by Solid evidence. The revisions meaningfully strengthen the manuscript by clarifying methods, improving transparency, and refining presentation. The work provides useful conceptual and methodological advances for understanding CGM-derived glucose dynamics and their possible relationship to cardiovascular pathology.

      Strengths:

      The authors have provided a much clearer exposition of how each glycemic component was defined and validated across cohorts. The revised manuscript now includes explicit pairwise correlations, clarified p- and q-value reporting, and better visualization of key associations between CGM indices and %NC. The justification for LASSO and PLS use is now well explained, and additional details on cohort timing relative to PCI, validation dataset structure, and statistical robustness (e.g., VIP stability with covariates) address prior concerns. The inclusion of precise factor definitions and clearer graphics notably improves interpretability.

      Limitations:

      Some limitations remain inherent to the study design, including the modest primary sample size, reliance on retrospective data, and differences between validation datasets in outcome ascertainment. However, these are now acknowledged more openly.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      We appreciate the reviewer for the critical review of the manuscript and the valuable comments. We have carefully considered the reviewer’s comments and have revised our manuscript accordingly.

      The reviewer’s comments in this letter are in Bold and Italics.

      Summary:

      This study identified three independent components of glucose dynamics-"value," "variability," and "autocorrelation", and reported important findings indicating that they play an important role in predicting coronary plaque vulnerability. Although the generalizability of the results needs further investigation due to the limited sample size and validation cohort limitations, this study makes several notable contributions: validation of autocorrelation as a new clinical indicator, theoretical support through mathematical modeling, and development of a web application for practical implementation. These contributions are likely to attract broad interest from researchers in both diabetology and cardiology and may suggest the potential for a new approach to glucose monitoring that goes beyond conventional glycemic control indicators in clinical practice.

      Strengths:

      The most notable strength of this study is the identification of three independent elements in glycemic dynamics: value, variability, and autocorrelation. In particular, the metric of autocorrelation, which has not been captured by conventional glycemic control indices, may bring a new perspective for understanding glycemic dynamics. In terms of methodological aspects, the study uses an analytical approach combining various statistical methods such as factor analysis, LASSO, and PLS regression, and enhances the reliability of results through theoretical validation using mathematical models and validation in other cohorts. In addition, the practical aspect of the research results, such as the development of a Web application, is also an important contribution to clinical implementation.

      We appreciate reviewer #1 for the positive assessment and for the valuable and constructive comments on our manuscript.

      Weaknesses:

      The most significant weakness of this study is the relatively small sample size of 53 study subjects. This sample size limitation leads to a lack of statistical power, especially in subgroup analyses, and to limitations in the assessment of rare events. 

      We appreciate the reviewer’s concern regarding the sample size. We acknowledge that a larger sample size would increase statistical power, especially for subgroup analyses and the assessment of rare events.

      We would like to clarify several points regarding the statistical power and validation of our findings. Our sample size determination followed established methodological frameworks, including the guidelines outlined by Muyembe Asenahabi, Bostely, and Peters Anselemo Ikoha. “Scientific research sample size determination.” (2023). These guidelines balance the risks of inadequate sample size with the challenges of unnecessarily large samples. For our primary analysis examining the correlation between CGM-derived measures and %NC, power calculations (a type I error of 0.05, a power of 0.8, and an expected correlation coefficient of 0.4) indicated that a minimum of 47 participants was required. Our sample size of 53 exceeded this threshold and allowed us to detect statistically significant correlations, as described in the Methods section. Moreover, to provide transparency about the precision of our estimates, we have included confidence intervals for all coefficients. 

      Furthermore, our sample size aligns with previous studies investigating the associations between glucose profiles and clinical parameters, including Torimoto, Keiichi, et al. “Relationship between fluctuations in glucose levels measured by continuous glucose monitoring and vascular endothelial dysfunction in type 2 diabetes mellitus.” Cardiovascular Diabetology 12 (2013): 1-7. (n=57), Hall, Heather, et al. “Glucotypes reveal new patterns of glucose dysregulation.” PLoS biology 16.7 (2018): e2005143. (n=57), and Metwally, Ahmed A., et al. “Prediction of metabolic subphenotypes of type 2 diabetes via continuous glucose monitoring and machine learning.” Nature Biomedical Engineering (2024): 1-18. (n=32).

      Furthermore, the primary objective of our study was not to assess rare events, but rather to demonstrate that glucose dynamics can be decomposed into three main factors - mean, variance and autocorrelation - whereas traditional measures have primarily captured mean and variance without adequately reflecting autocorrelation. We believe that our current sample size effectively addresses this objective. 

      Regarding the classification of glucose dynamics components, we have conducted additional validation across diverse populations including 64 Japanese, 53 American, and 100 Chinese individuals. These validation efforts have consistently supported our identification of three independent glucose dynamics components.

      However, we acknowledge the importance of further validation on a larger scale. To address this, we conducted a large follow-up study of over 8,000 individuals (Sugimoto, Hikaru, et al. “Stratification of individuals without prior diagnosis of diabetes using continuous glucose monitoring” medRxiv (2025)), which confirmed our main finding that glucose dynamics consist of mean, variance, and autocorrelation. As this large study was beyond the scope of the present manuscript due to differences in primary objectives and analytical approaches, it was not included in this paper; however, it provides further support for the clinical relevance and generalizability of our findings.

      To address the sample size considerations, we have added the following sentences in the Discussion section (lines 409-414): 

      Although our analysis included four datasets with a total of 270 individuals, and our sample size of 53 met the required threshold based on power calculations with a type I error of 0.05, a power of 0.8, and an expected correlation coefficient of 0.4, we acknowledge that the sample size may still be considered relatively small for a comprehensive assessment of these relationships. To further validate these findings, larger prospective studies with diverse populations are needed.

      We appreciate the reviewer’s feedback and believe that these clarifications improve the manuscript.

      In terms of validation, several challenges exist, including geographical and ethnic biases in the validation cohorts, lack of long-term follow-up data, and insufficient validation across different clinical settings. In terms of data representativeness, limiting factors include the inclusion of only subjects with well-controlled serum cholesterol and blood pressure and the use of only short-term measurement data.

      We appreciate the reviewer’s comment regarding the challenges associated with validation. In terms of geographic and ethnic diversity, our study includes validation datasets from diverse populations, including 64 Japanese, 53 American and 100 Chinese individuals. These datasets include a wide range of metabolic states, from healthy individuals to those with diabetes, ensuring validation across different clinical conditions. In addition, we recognize the limited availability of publicly available datasets with sufficient sample sizes for factor decomposition that include both healthy individuals and those with type 2 diabetes (Zhao, Qinpei, et al. “Chinese diabetes datasets for data-driven machine learning.” Scientific Data 10.1 (2023): 35.). The main publicly available datasets with relevant clinical characteristics have already been analyzed in this study using unbiased approaches.

      However, we fully agree with the reviewer that expanding the geographic and ethnic scope, including long-term follow-up data, and validation in different clinical settings would further strengthen the robustness and generalizability of our findings. To address this, we conducted a large follow-up study of over 8,000 individuals with two years of follow-up (Sugimoto, Hikaru, et al. “Stratification of individuals without prior diagnosis of diabetes using continuous glucose monitoring” medRxiv (2025)), which confirmed our main finding that glucose dynamics consist of mean, variance, and autocorrelation. As this large study was beyond the scope of the present manuscript due to differences in primary objectives and analytical approaches, it was not included in this paper; however, it provides further support for the clinical relevance and generalizability of our findings.

      Regarding the validation considerations, we have added the following sentences to the Discussion section (lines 409-414, 354-361): 

      Although our analysis included four datasets with a total of 270 individuals, and our sample size of 53 met the required threshold based on power calculations with a type I error of 0.05, a power of 0.8, and an expected correlation coefficient of 0.4, we acknowledge that the sample size may still be considered relatively small for a comprehensive assessment of these relationships. To further validate these findings, larger prospective studies with diverse populations are needed.

      Although our LASSO and factor analysis indicated that CGM-derived measures were strong predictors of %NC, this does not mean that other clinical parameters, such as lipids and blood pressure, are irrelevant in T2DM complications. Our study specifically focused on characterizing glucose dynamics, and we analyzed individuals with well-controlled serum cholesterol and blood pressure to reduce confounding effects. While we anticipate that inclusion of a more diverse population would not alter our primary findings regarding glucose dynamics, it is likely that a broader data set would reveal additional predictive contributions from lipid and blood pressure parameters.

      In terms of elucidation of physical mechanisms, the study is not sufficient to elucidate the mechanisms linking autocorrelation and clinical outcomes or to verify them at the cellular or molecular level.

      We appreciate the reviewer’s point regarding the need for further elucidation of the physical mechanisms linking glucose autocorrelation to clinical outcomes. We fully agree with the reviewer that the detailed molecular and cellular mechanisms underlying this relationship are not yet fully understood, as noted in our Discussion section.

      However, we would like to emphasize the theoretical basis that supports the clinical relevance of autocorrelation. Our results show that glucose profiles with identical mean and variability can exhibit different autocorrelation patterns, highlighting that conventional measures such as mean or variance alone may not fully capture inter-individual metabolic differences. Incorporating autocorrelation analysis provides a more comprehensive characterization of metabolic states. Consequently, incorporating autocorrelation measures alongside traditional diabetes diagnostic criteria - such as fasting glucose, HbA1c and PG120, which primarily reflect only the “mean” component - can improve predictive accuracy for various clinical outcomes. While further research at the cellular and molecular level is needed to fully validate these findings, it is important to note that the primary goal of this study was to analyze the characteristics of glucose dynamics and gain new insights into metabolism, rather than to perform molecular biology experiments.

      Furthermore, our previous research has shown that glucose autocorrelation reflects changes in insulin clearance (Sugimoto, Hikaru, et al. “Improved detection of decreased glucose handling capacities via continuous glucose monitoring-derived indices.” Communications Medicine 5.1 (2025): 103.). The relationship between insulin clearance and cardiovascular disease has been well documented (Randrianarisoa, Elko, et al. “Reduced insulin clearance is linked to subclinical atherosclerosis in individuals at risk for type 2 diabetes mellitus.” Scientific reports 10.1 (2020): 22453.), and the mechanisms described in this prior work may potentially explain the association between glucose autocorrelation and clinical outcomes observed in the present study.

      Rather than a limitation, we view these currently unexplored associations as an opportunity for further research. The identification of autocorrelation as a key glycemic feature introduces a new dimension to metabolic regulation that could serve as the basis for future investigations exploring the molecular mechanisms underlying these patterns.

      While we agree that further research at the cellular and molecular level is needed to fully validate these findings, we believe that our study provides a theoretical framework to support the clinical utility of autocorrelation analysis in glucose monitoring, and that this could serve as the basis for future investigations exploring the molecular mechanisms underlying these autocorrelation patterns, which adds to the broad interest of this study. Regarding the physical mechanisms linking autocorrelation and clinical outcomes, we have added the following sentences in the Discussion section (lines 331-339, 341-352): 

      This study also provided evidence that autocorrelation can vary independently from the mean and variance components using simulated data. In addition, simulated glucose dynamics indicated that even individuals with high AC_Var did not necessarily have high maximum and minimum blood glucose levels. This study also indicated that these three components qualitatively corresponded to the four distinct glucose patterns observed after glucose administration, which were identified in a previous study (Hulman et al., 2018). Thus, the inclusion of autocorrelation in addition to mean and variance may improve the characterization of inter-individual differences in glucose regulation and improve the predictive accuracy of various clinical outcomes.

      Despite increasing evidence linking glycemic variability to oxidative stress and endothelial dysfunction in T2DM complications (Ceriello et al., 2008; Monnier et al., 2008), the biological mechanisms underlying the independent predictive value of autocorrelation remain to be elucidated. Our previous work has shown that glucose autocorrelation is influenced by insulin clearance (Sugimoto et al., 2025), a process known to be associated with cardiovascular disease risk (Randrianarisoa et al., 2020). Therefore, the molecular pathways linking glucose autocorrelation to cardiovascular disease may share common mechanisms with those linking insulin clearance to cardiovascular disease. Although previous studies have primarily focused on investigating the molecular mechanisms associated with mean glucose levels and glycemic variability, our findings open new avenues for exploring the molecular basis of glucose autocorrelation, potentially revealing novel therapeutic targets for preventing diabetic complications.

      Reviewer #2 (Public review):

      We appreciate the reviewer for the critical review of the manuscript and the valuable comments. We have carefully considered the reviewer’s comments and have revised our manuscript accordingly. The reviewer’s comments in this letter are in Bold and Italics.

      Sugimoto et al. explore the relationship between glucose dynamics - specifically value, variability, and autocorrelation - and coronary plaque vulnerability in patients with varying glucose tolerance levels. The study identifies three independent predictive factors for %NC and emphasizes the use of continuous glucose monitoring (CGM)-derived indices for coronary artery disease (CAD) risk assessment. By employing robust statistical methods and validating findings across datasets from Japan, America, and China, the authors highlight the limitations of conventional markers while proposing CGM as a novel approach for risk prediction. The study has the potential to reshape CAD risk assessment by emphasizing CGM-derived indices, aligning well with personalized medicine trends.

      Strengths:

      (1) The introduction of autocorrelation as a predictive factor for plaque vulnerability adds a novel dimension to glucose dynamic analysis.

      (2) Inclusion of datasets from diverse regions enhances generalizability.

      (3) The use of a well-characterized cohort with controlled cholesterol and blood pressure levels strengthens the findings.

      (4) The focus on CGM-derived indices aligns with personalized medicine trends, showcasing the potential for CAD risk stratification.

      We appreciate reviewer #2 for the positive assessment and for the valuable and constructive comments on our manuscript.

      Weaknesses:

      (1) The link between autocorrelation and plaque vulnerability remains speculative without a proposed biological explanation. 

      We appreciate the reviewer’s point about the need for a clearer biological explanation linking glucose autocorrelation to plaque vulnerability. We fully agree with the reviewer that the detailed biological mechanisms underlying this relationship are not yet fully understood, as noted in our Discussion section.

      However, we would like to emphasize the theoretical basis that supports the clinical relevance of autocorrelation. Our results show that glucose profiles with identical mean and variability can exhibit different autocorrelation patterns, highlighting that conventional measures such as mean or variance alone may not fully capture inter-individual metabolic differences. Incorporating autocorrelation analysis provides a more comprehensive characterization of metabolic states. Consequently, incorporating autocorrelation measures alongside traditional diabetes diagnostic criteria - such as fasting glucose, HbA1c and PG120, which primarily reflect only the “mean” component - can improve predictive accuracy for various clinical outcomes.

      Furthermore, our previous research has shown that glucose autocorrelation reflects changes in insulin clearance (Sugimoto, Hikaru, et al. “Improved detection of decreased glucose handling capacities via continuous glucose monitoring-derived indices.” Communications Medicine 5.1 (2025): 103.). The relationship between insulin clearance and cardiovascular disease has been well documented (Randrianarisoa, Elko, et al. “Reduced insulin clearance is linked to subclinical atherosclerosis in individuals at risk for type 2 diabetes mellitus.” Scientific reports 10.1 (2020): 22453.), and the mechanisms described in this prior work may potentially explain the association between glucose autocorrelation and clinical outcomes observed in the present study. 

      Rather than a limitation, we view these currently unexplored associations as an opportunity for further research. The identification of autocorrelation as a key glycemic feature introduces a new dimension to metabolic regulation that could serve as the basis for future investigations exploring the molecular mechanisms underlying these patterns.

      While we agree that further research at the cellular and molecular level is needed to fully validate these findings, we believe that our study provides a theoretical framework to support the clinical utility of autocorrelation analysis in glucose monitoring, and that this could serve as the basis for future investigations exploring the molecular mechanisms underlying these autocorrelation patterns, which adds to the broad interest of this study. Regarding the physical mechanisms linking autocorrelation and clinical outcomes, we have added the following sentences in the Discussion section (lines 331-339, 341-352): 

      This study also provided evidence that autocorrelation can vary independently from the mean and variance components using simulated data. In addition, simulated glucose dynamics indicated that even individuals with high AC_Var did not necessarily have high maximum and minimum blood glucose levels. This study also indicated that these three components qualitatively corresponded to the four distinct glucose patterns observed after glucose administration, which were identified in a previous study (Hulman et al., 2018). Thus, the inclusion of autocorrelation in addition to mean and variance may improve the characterization of inter-individual differences in glucose regulation and improve the predictive accuracy of various clinical outcomes.

      Despite increasing evidence linking glycemic variability to oxidative stress and endothelial dysfunction in T2DM complications (Ceriello et al., 2008; Monnier et al., 2008), the biological mechanisms underlying the independent predictive value of autocorrelation remain to be elucidated. Our previous work has shown that glucose autocorrelation is influenced by insulin clearance (Sugimoto et al., 2025), a process known to be associated with cardiovascular disease risk (Randrianarisoa et al., 2020). Therefore, the molecular pathways linking glucose autocorrelation to cardiovascular disease may share common mechanisms with those linking insulin clearance to cardiovascular disease. Although previous studies have primarily focused on investigating the molecular mechanisms associated with mean glucose levels and glycemic variability, our findings open new avenues for exploring the molecular basis of glucose autocorrelation, potentially revealing novel therapeutic targets for preventing diabetic complications.

      (2) The relatively small sample size (n=270) limits statistical power, especially when stratified by glucose tolerance levels. 

      We appreciate the reviewer’s concern regarding sample size and its potential impact on statistical power, especially when stratified by glucose tolerance levels. We fully agree that a larger sample size would increase statistical power, especially for subgroup analyses.

      We would like to clarify several points regarding the statistical power and validation of our findings. Our sample size followed established methodological frameworks, including the guidelines outlined by Muyembe Asenahabi, Bostely, and Peters Anselemo Ikoha. “Scientific research sample size determination.” (2023). These guidelines balance the risks of inadequate sample size with the challenges of unnecessarily large samples. For our primary analysis examining the correlation between CGM-derived measures and %NC, power calculations (a type I error of 0.05, a power of 0.8, and an expected correlation coefficient of 0.4) indicated that a minimum of 47 participants was required. Our sample size of 53 exceeded this threshold and allowed us to detect statistically significant correlations, as described in the Methods section. Moreover, to provide transparency about the precision of our estimates, we have included confidence intervals for all coefficients. 

      Furthermore, our sample size aligns with previous studies investigating the associations between glucose profiles and clinical parameters, including Torimoto, Keiichi, et al. “Relationship between fluctuations in glucose levels measured by continuous glucose monitoring and vascular endothelial dysfunction in type 2 diabetes mellitus.” Cardiovascular Diabetology 12 (2013): 1-7. (n=57), Hall, Heather, et al. “Glucotypes reveal new patterns of glucose dysregulation.” PLoS biology 16.7 (2018): e2005143. (n=57), and Metwally, Ahmed A., et al. “Prediction of metabolic subphenotypes of type 2 diabetes via continuous glucose monitoring and machine learning.” Nature Biomedical Engineering (2024): 1-18. (n=32).

      Regarding the classification of glucose dynamics components, we have conducted additional validation across diverse populations including 64 Japanese, 53 American, and 100 Chinese individuals. These validation efforts have consistently supported our identification of three independent glucose dynamics components.

      However, we acknowledge the importance of further validation on a larger scale. To address this, we conducted a large follow-up study of over 8,000 individuals with two years of followup (Sugimoto, Hikaru, et al. “Stratification of individuals without prior diagnosis of diabetes using continuous glucose monitoring” medRxiv (2025)), which confirmed our main finding that glucose dynamics consist of mean, variance, and autocorrelation. As this large study was beyond the scope of the present manuscript due to differences in primary objectives and analytical approaches, it was not included in this paper; however, it provides further support for the clinical relevance and generalizability of our findings.

      To address the sample size considerations, we have added the following sentences in the Discussion section (lines 409-414): 

      Although our analysis included four datasets with a total of 270 individuals, and our sample size of 53 met the required threshold based on power calculations with a type I error of 0.05, a power of 0.8, and an expected correlation coefficient of 0.4, we acknowledge that the sample size may still be considered relatively small for a comprehensive assessment of these relationships. To further validate these findings, larger prospective studies with diverse populations are needed.

      (3) Strict participant selection criteria may reduce applicability to broader populations. 

      We appreciate the reviewer’s comment regarding the potential impact of strict participant selection criteria on the broader applicability of our findings. We acknowledge that extending validation to more diverse populations would improve the generalizability of our findings.

      Our study includes validation cohorts from diverse populations, including 64 Japanese, 53 American and 100 Chinese individuals. These cohorts include a wide range of metabolic states, from healthy individuals to those with diabetes, ensuring validation across different clinical conditions. However, we acknowledge that further validation in additional populations and clinical settings would strengthen our conclusions. To address this, we conducted a large follow-up study of over 8,000 individuals (Sugimoto, Hikaru, et al. “Stratification of individuals without prior diagnosis of diabetes using continuous glucose monitoring” medRxiv (2025)), which confirmed our main finding that glucose dynamics consist of mean, variance, and autocorrelation. As this large study was beyond the scope of the present manuscript due to differences in primary objectives and analytical approaches, it was not included in this paper; however, it provides further support for the clinical relevance and generalizability of our findings.

      We have added the following text to the Discussion section to address these considerations (lines 409-414, 354-361):

      Although our analysis included four datasets with a total of 270 individuals, and our sample size of 53 met the required threshold based on power calculations with a type I error of 0.05, a power of 0.8, and an expected correlation coefficient of 0.4, we acknowledge that the sample size may still be considered relatively small for a comprehensive assessment of these relationships. To further validate these findings, larger prospective studies with diverse populations are needed.

      Although our LASSO and factor analysis indicated that CGM-derived measures were strong predictors of %NC, this does not mean that other clinical parameters, such as lipids and blood pressure, are irrelevant in T2DM complications. Our study specifically focused on characterizing glucose dynamics, and we analyzed individuals with well-controlled serum cholesterol and blood pressure to reduce confounding effects. While we anticipate that inclusion of a more diverse population would not alter our primary findings regarding glucose dynamics, it is likely that a broader data set would reveal additional predictive contributions from lipid and blood pressure parameters.

      (4) CGM-derived indices like AC_Var and ADRR may be too complex for routine clinical use without simplified models or guidelines. 

      We appreciate the reviewer’s concern about the complexity of CGM-derived indices such as AC_Var and ADRR for routine clinical use. We acknowledge that for these indices to be of practical use, they must be both interpretable and easily accessible to healthcare providers. 

      To address this concern, we have developed an easy-to-use web application that automatically calculates these measures, including AC_Var, mean glucose levels, and glucose variability (https://cgmregressionapp2.streamlit.app/). This tool eliminates the need for manual calculations, making these indices more practical for clinical implementation.

      Regarding interpretability, we acknowledge that establishing specific clinical guidelines would enhance the practical utility of these measures. For example, defining a cut-off value for AC_Var above which the risk of diabetes complications increases significantly would provide clearer clinical guidance. However, given our current sample size limitations and our predefined objective of investigating correlations among indices, we have taken a conservative approach by focusing on the correlation between AC_Var and %NC rather than establishing definitive cutoffs. This approach intentionally avoids problematic statistical practices like phacking. It is not realistic to expect a single study to accomplish everything from proposing a new concept to conducting large-scale clinical trials to establishing clinical guidelines. Establishing clinical guidelines typically requires the accumulation of multiple studies over many years. Recognizing this reality, we have been careful in our manuscript to make modest claims about the discovery of new “correlations” rather than exaggerated claims about immediate routine clinical use.

      To address this limitation, we conducted a large follow-up study of over 8,000 individuals in the next study (Sugimoto, Hikaru, et al. “Stratification of individuals without prior diagnosis of diabetes using continuous glucose monitoring” medRxiv (2025)), which proposed clinically relevant cutoffs and reference ranges for AC_Var and other CGM-derived indices. As this large study was beyond the scope of the present manuscript due to differences in primary objectives and analytical approaches, it was not included in this paper; however, by integrating automated calculation tools with clear clinical thresholds, we expect to make these measures more accessible for clinical use.

      We have added the following text to the Discussion section to address these considerations (lines 415-419):

      While CGM-derived indices such as AC_Var and ADRR hold promise for CAD risk assessment, their complexity may present challenges for routine clinical implementation. To improve usability, we have developed a web-based calculator that automates these calculations. However, defining clinically relevant thresholds and reference ranges requires further validation in larger cohorts.

      (5) The study does not compare CGM-derived indices to existing advanced CAD risk models, limiting the ability to assess their true predictive superiority. 

      We appreciate the reviewer’s comment regarding the comparison of CGMderived indices with existing CAD risk models. Given that our study population consisted of individuals with well-controlled total cholesterol and blood pressure levels, a direct comparison with the Framingham Risk Score for Hard Coronary Heart Disease (Wilson, Peter WF, et al. “Prediction of coronary heart disease using risk factor categories.” Circulation 97.18 (1998): 1837-1847.) may introduce inherent bias, as these factors are key components of the score.

      Nevertheless, to further assess the predictive value of the CGM-derived indices, we performed additional analyses using linear regression to predict %NC. Using the Framingham Risk Score, we obtained an R² of 0.04 and an Akaike Information Criterion (AIC) of 330. In contrast, our proposed model incorporating the three glycemic parameters - CGM_Mean, CGM_Std, and AC_Var - achieved a significantly improved R² of 0.36 and a lower AIC of 321, indicating superior predictive accuracy. 

      We have added the following text to the Result section (lines 115-122):

      The regression model including CGM_Mean, CGM_Std and AC_Var to predict %NC achieved an R² of 0.36 and an Akaike Information Criterion (AIC) of 321. Each of these indices showed statistically significant independent positive correlations with %NC (Fig. 1A). In contrast, the model using conventional glycemic markers (FBG, HbA1c, and PG120) yielded an R² of only 0.05 and an AIC of 340 (Fig. 1B). Similarly, the model using the Framingham Risk Score for Hard Coronary Heart Disease (Wilson et al., 1998) showed limited predictive value, with an R² of 0.04 and an AIC of 330 (Fig. 1C).

      (6) Varying CGM sampling intervals (5-minute vs. 15-minute) were not thoroughly analyzed for impact on results. 

      We appreciate the reviewer’s comment regarding the potential impact of different CGM sampling intervals on our results. To assess the robustness of our findings across different sampling frequencies, we performed a down sampling analysis by converting our 5minute interval data to 15-minute intervals. The AC_Var value calculated from 15-minute intervals was significantly correlated with that calculated from 5-minute intervals (R = 0.99, 95% CI: 0.97-1.00). Furthermore, the regression model using CGM_Mean, CGM_Std, and AC_Var from 15-minute intervals to predict %NC achieved an R² of 0.36 and an AIC of 321, identical to the model using 5-minute intervals. These results indicate that our results are robust to variations in CGM sampling frequency. 

      We have added this analysis to the Result section (lines 122-125):

      The AC_Var computed from 15-minute CGM sampling was nearly identical to that computed from 5-minute sampling (R = 0.99, 95% CI: 0.97-1.00) (Fig. S1A), and the regression using the 15‑min features yielded almost the same performance (R² = 0.36; AIC = 321; Fig. S1B).

      Reviewer #3 (Public review):

      We appreciate the reviewer for the critical review of the manuscript and the valuable comments. We have carefully considered the reviewer’s comments and have revised our manuscript accordingly. The reviewer’s comments in this letter are in Bold and Italics.

      Summary:

      This is a retrospective analysis of 53 individuals over 26 features (12 clinical phenotypes, 12 CGM features, and 2 autocorrelation features) to examine which features were most informative in predicting percent necrotic core (%NC) as a parameter for coronary plaque vulnerability. Multiple regression analysis demonstrated a better ability to predict %NC from 3 selected CGM-derived features than 3 selected clinical phenotypes. LASSO regularization and partial least squares (PLS) with VIP scores were used to identify 4 CGM features that most contribute to the precision of %NC. Using factor analysis they identify 3 components that have CGM-related features: value (relating to the value of blood glucose), variability (relating to glucose variability), and autocorrelation (composed of the two autocorrelation features). These three groupings appeared in the 3 validation cohorts and when performing hierarchical clustering. To demonstrate how these three features change, a simulation was created to allow the user to examine these features under different conditions.

      We appreciate reviewer #3 for the valuable and constructive comments on our manuscript.

      The goal of this study was to identify CGM features that relate to %NC. Through multiple feature selection methods, they arrive at 3 components: value, variability, and autocorrelation. While the feature list is highly correlated, the authors take steps to ensure feature selection is robust. There is a lack of clarity of what each component (value, variability, and autocorrelation) includes as while similar CGM indices fall within each component, there appear to be some indices that appear as relevant to value in one dataset and to variability in the validation. 

      We appreciate the reviewer’s comment regarding the classification of CGMderived measures into the three components: value, variability, and autocorrelation. As the reviewer correctly points out, some measures may load differently between the value and variability components in different datasets. However, we believe that this variability reflects the inherent mathematical properties of these measures rather than a limitation of our study.

      For example, the HBGI clusters differently across datasets due to its dependence on the number of glucose readings above a threshold. In populations where mean glucose levels are predominantly below this threshold, the HBGI is more sensitive to glucose variability (Fig. S3A). Conversely, in populations with a wider range of mean glucose levels, HBGI correlates more strongly with mean glucose levels (Fig. 3A). This context-dependent behaviour is expected given the mathematical properties of these measures and does not indicate an inconsistency in our classification approach.

      Importantly, our main findings remain robust: CGM-derived measures systematically fall into three components-value, variability, and autocorrelation. Traditional CGM-derived measures primarily reflect either value or variability, and this categorization is consistently observed across datasets. While specific indices such as HBGI may shift classification depending on population characteristics, the overall structure of CGM data remains stable.

      To address these considerations, we have added the following text to the Discussion section (lines 388-396):

      Some indices, such as HBGI, showed variation in classification across datasets, with some populations showing higher factor loadings in the “mean” component and others in the “variance” component. This variation occurs because HBGI calculations depend on the number of glucose readings above a threshold. In populations where mean glucose levels are predominantly below this threshold, the HBGI is more sensitive to glucose variability (Fig. S5A). Conversely, in populations with a wider range of mean glucose levels, the HBGI correlates more strongly with mean glucose levels (Fig. 3A). Despite these differences, our validation analyses confirm that CGM-derived indices consistently cluster into three components: mean, variance, and autocorrelation.

      We are sceptical about statements of significance without documentation of p-values. 

      We appreciate the reviewer’s concern regarding statistical significance and the documentation of p values.

      First, given the multiple comparisons in our study, we used q values rather than p values, as shown in Figure 1D. Q values provide a more rigorous statistical framework for controlling the false discovery rate in multiple testing scenarios, thereby reducing the likelihood of false positives.

      Second, our statistical reporting follows established guidelines, including those of the New England Journal of Medicine (Harrington, David, et al. “New guidelines for statistical reporting in the journal.” New England Journal of Medicine 381.3 (2019): 285-286.), which recommend that “reporting of exploratory end points should be limited to point estimates of effects with 95% confidence intervals” and that “replace p values with estimates of effects or association and 95% confidence intervals”. According to these guidelines, p values should not be reported in this type of study. We determined significance based on whether these 95% confidence intervals excluded zero - a method for determining whether an association is significantly different from zero (Tan, Sze Huey, and Say Beng Tan. "The correct interpretation of confidence intervals." Proceedings of Singapore Healthcare 19.3 (2010): 276-278.). 

      For the sake of transparency, we provide p values for readers who may be interested, although we emphasize that they should not be the basis for interpretation, as discussed in the referenced guidelines. Specifically, in Figure 1A-B, the p values for CGM_Mean, CGM_Std, and AC_Var were 0.02, 0.02, and <0.01, respectively, while those for FBG, HbA1c, and PG120 were 0.83,

      0.91, and 0.25, respectively. In Figure 3C, the p values for factors 1–5 were 0.03, 0.03, 0.03, 0.24, and 0.87, respectively, and in Figure S8C, the p values for factors 1–3 were <0.01, <0.01, and 0.20, respectively.

      We appreciate the opportunity to clarify our statistical methodology and are happy to provide additional details if needed.

      While hesitations remain, the ability of these authors to find groupings of these many CGM metrics in relation to %NC is of interest. The believability of the associations is impeded by an obtuse presentation of the results with core data (i.e. correlation plots between CGM metrics and %NC) buried in the supplement while main figures contain plots of numerical estimates from models which would be more usefully presented in supplementary tables. 

      We appreciate the reviewer’s comment regarding the presentation of our results and recognize the importance of ensuring clarity and accessibility of the core data. 

      The central finding of our study is twofold: first, that the numerous CGM-derived measures can be systematically classified into three distinct components-mean, variance, and autocorrelation-and second, that each of these components is independently associated with %NC. This insight cannot be derived simply from examining scatter plots of individual correlations, which are provided in the Supplementary Figures. Instead, it emerges from our statistical analyses in the main figures, including multiple regression models that reveal the independent contributions of these components to %NC.

      We acknowledge the reviewer’s concern regarding the accessibility of key data. To improve clarity, we have moved several scatter plots from the Supplementary Figures to the main figures (Fig. 1D-J) to allow readers to more directly visualize the relationships between CGM-derived measures and %NC. We believe this revision improved the transparency and readability of our results while maintaining the rigor of our analytical approach.

      Given the small sample size in the primary analysis, there is a lot of modeling done with parameters estimated where simpler measures would serve and be more convincing as they require less data manipulation. A major example of this is that the pairwise correlation/covariance between CGM_mean, CGM_std, and AC_var is not shown and would be much more compelling in the claim that these are independent factors.

      We appreciate the reviewer’s feedback on our statistical analysis and data presentation. The correlations between CGM_Mean, CGM_Std, and AC_Var were documented in Figure S1B. However, to improve accessibility and clarity, we have moved these correlation analyses to the main figures (Fig. 1F). 

      Regarding our modeling approach, we chose LASSO and PLS methods because they are wellestablished techniques that are particularly suited to scenarios with many input variables and a relatively small sample size. These methods have been used in the literature as robust approaches for variable selection under such conditions (Tibshirani R. 1996. Regression shrinkage and selection via the lasso. J R Stat Soc 58:267–288. Wold S, Sjöström M, Eriksson L. 2001. PLS-regression: a basic tool of chemometrics. Chemometrics Intellig Lab Syst 58:109–130. Pei X, Qi D, Liu J, Si H, Huang S, Zou S, Lu D, Li Z. 2023. Screening marker genes of type 2 diabetes mellitus in mouse lacrimal gland by LASSO regression. Sci Rep 13:6862. Wang C, Kong H, Guan Y, Yang J, Gu J, Yang S, Xu G. 2005. Plasma phospholipid metabolic profiling and biomarkers of type 2 diabetes mellitus based on high-performance liquid chromatography/electrospray mass spectrometry and multivariate statistical analysis.

      Anal Chem 77:4108–4116.). 

      Lack of methodological detail is another challenge. For example, the time period of CGM metrics or CGM placement in the primary study in relation to the IVUS-derived measurements of coronary plaques is unclear. Are they temporally distant or proximal/ concurrent with the PCI? 

      We appreciate the reviewer’s important question regarding the temporal relationship between CGM measurements and IVUS-derived plaque assessments. As described in our previous work (Otowa‐Suematsu, Natsu, et al. “Comparison of the relationship between multiple parameters of glycemic variability and coronary plaque vulnerability assessed by virtual histology–intravascular ultrasound.” Journal of Diabetes Investigation 9.3 (2018): 610615.), all individuals underwent continuous glucose monitoring for at least three consecutive days within the seven-day period prior to the PCI procedure. To improve clarity for readers, we have added the following text to the Methods section (lines 440-441):

      All individuals underwent CGM for at least three consecutive days within the seven-day period prior to the PCI procedure.

      A patient undergoing PCI for coronary intervention would be expected to have physiological and iatrogenic glycemic disturbances that do not reflect their baseline state. This is not considered or discussed. 

      We appreciate the reviewer’s concern regarding potential glycemic disturbances associated with PCI. As described in our previous work (Otowa‐Suematsu, Natsu, et al. “Comparison of the relationship between multiple parameters of glycemic variability and coronary plaque vulnerability assessed by virtual histology–intravascular ultrasound.” Journal of Diabetes Investigation 9.3 (2018): 610-615.), all CGM measurements were performed before the PCI procedure. This temporal separation ensures that the glycemic patterns analyzed in our study reflect the baseline metabolic state of the patients, rather than any physiological or iatrogenic effects of PCI. To avoid any misunderstanding, we have clarified this temporal relationship in the revised manuscript (lines 440-441):

      All individuals underwent CGM for at least three consecutive days within the seven-day period prior to the PCI procedure.

      The attempts at validation in external cohorts, Japanese, American, and Chinese are very poorly detailed. We could only find even an attempt to examine cardiovascular parameters in the Chinese data set but the outcome variables are unspecified with regard to what macrovascular events are included, their temporal relation to the CGM metrics, etc. Notably macrovascular event diagnoses are very different from the coronary plaque necrosis quantification. This could be a source of strength in the findings if carefully investigated and detailed but due to the lack of detail seems like an apples-to-oranges comparison. 

      We appreciate the reviewer’s comment regarding the validation cohorts and the need for greater clarity, particularly in the Chinese dataset. We acknowledge that our initial description lacked sufficient methodological detail, and we have expanded the Methods section to provide a more comprehensive explanation.

      For the Chinese dataset, the data collection protocol was previously documented (Zhao, Qinpei, et al. “Chinese diabetes datasets for data-driven machine learning.” Scientific Data 10.1 (2023): 35.). Briefly, trained research staff used standardized questionnaires to collect demographic and clinical information, including diabetes diagnosis, treatment history, comorbidities, and medication use. Physical examinations included anthropometric measurements, and body mass index was calculated using standard protocols. CGM was performed using the FreeStyle Libre H device (Abbott Diabetes Care, UK), which records interstitial glucose levels at 15-minute intervals for up to 14 days. Laboratory measurements, including metabolic panels, lipid profiles, and renal function tests, were obtained within six months of CGM placement. While previous studies have linked necrotic core to macrovascular events (Xie, Yong, et al. “Clinical outcome of nonculprit plaque ruptures in patients with acute coronary syndrome in the PROSPECT study.” JACC: Cardiovascular Imaging 7.4 (2014): 397-405.), we acknowledge the limitations of the cardiovascular outcomes in the Chinese data set. These outcomes were extracted from medical records rather than standardized diagnostic procedures or imaging studies. To address these concerns, we have added the following text to the Methods section (lines 496-504):

      The data collection protocol for the Chinese dataset was previously documented (Zhao et al., 2023). Briefly, trained research staff used standardized questionnaires to collect demographic and clinical information, including diabetes diagnosis, treatment history, comorbidities, and medication use. CGM records interstitial glucose levels at 15-minute intervals for up to 14 days. Laboratory measurements, including metabolic panels, lipid profiles, and renal function tests, were obtained within six months of CGM placement. While previous studies have linked necrotic core to macrovascular events, we acknowledge the limitations of the cardiovascular outcomes in the Chinese data set. These outcomes were extracted from medical records rather than from standardized diagnostic procedures or imaging studies.

      Finally, the simulations at the end are not relevant to the main claims of the paper and we would recommend removing them for the coherence of this manuscript. 

      We appreciate the reviewer’s feedback regarding the relevance of the simulation component of our manuscript. The primary contribution of our study goes beyond demonstrating correlations between CGM-derived measures and %NC; it highlights three fundamental components of glycemic patterns-mean, variability, and autocorrelation-and their independent relationships with coronary plaque characteristics. The simulations are included to illustrate how glycemic patterns with identical means and variability can have different autocorrelation structures. Because temporal autocorrelation can be conceptually difficult to interpret, these visualizations were intended to provide intuitive examples for the readers. 

      However, we agree with the reviewer’s concern about the coherence of the manuscript. In response, we have streamlined the simulation section by removing simulations that do not directly support our primary conclusions (old version of the manuscript, lines 239-246, 502526), while retaining only those that enhance understanding of the three glycemic components. Regarding reviewer 2’s minor comment #4, we acknowledge that autocorrelation can be challenging to understand intuitively. To address this, we kept Fig. 4A with a brief description.

      Recommendations for the authors:

      Reviewer 2# (Recommendations for the authors):

      Summary:

      The study by Sugimoto et. al. investigates the association between components of glucose dynamics-value, variability, and autocorrelation-and coronary plaque vulnerability (%NC) in patients with varying glucose tolerance levels. The research identifies three key factors that independently predict %NC and highlights the potential of continuous glucose monitoring (CGM)-derived indices in risk assessment for coronary artery disease (CAD). Using robust statistical methods and validation across diverse populations, the study emphasizes the limitations of conventional diagnostic markers and suggests a novel, CGMbased approach for improved predictive performance While the study demonstrates significant novelty and potential impact, several issues must be addressed by the authors.

      Major Comments:

      (1) The study demonstrates originality by introducing autocorrelation as a novel predictive factor in glucose dynamics, a perspective rarely explored in prior research. While the innovation is commendable, the biological mechanisms linking autocorrelation to plaque vulnerability remain speculative. Providing a hypothesis or potential pathways would enhance the scientific impact and practical relevance of this finding.

      We appreciate the reviewer’s point about the need for a clearer biological explanation linking glucose autocorrelation to plaque vulnerability. Our previous research has shown that glucose autocorrelation reflects changes in insulin clearance (Sugimoto, Hikaru, et al. “Improved detection of decreased glucose handling capacities via continuous glucose monitoring-derived indices.” Communications Medicine 5.1 (2025): 103.). The relationship between insulin clearance and cardiovascular disease has been well documented (Randrianarisoa, Elko, et al. “Reduced insulin clearance is linked to subclinical atherosclerosis in individuals at risk for type 2 diabetes mellitus.” Scientific reports 10.1 (2020): 22453.), and the mechanisms described in this prior work may potentially explain the association between glucose autocorrelation and clinical outcomes observed in the present study. We have added the following sentences to the Discussion section (lines 341-352):

      Despite increasing evidence linking glycemic variability to oxidative stress and endothelial dysfunction in T2DM complications (Ceriello et al., 2008; Monnier et al., 2008), the biological mechanisms underlying the independent predictive value of autocorrelation remain to be elucidated. Our previous work has shown that glucose autocorrelation is influenced by insulin clearance (Sugimoto et al., 2025), a process known to be associated with cardiovascular disease risk (Randrianarisoa et al., 2020). Therefore, the molecular pathways linking glucose autocorrelation to cardiovascular disease may share common mechanisms with those linking insulin clearance to cardiovascular disease. Although previous studies have primarily focused on investigating the molecular mechanisms associated with mean glucose levels and glycemic variability, our findings open new avenues for exploring the molecular basis of glucose autocorrelation, potentially revealing novel therapeutic targets for preventing diabetic complications.

      (2) The inclusion of datasets from Japan, America, and China adds a valuable cross-cultural dimension to the study, showcasing its potential applicability across diverse populations. Despite the multi-regional validation, the sample size (n=270) is relatively small, especially when stratified by glucose tolerance categories. This limits the statistical power and applicability to diverse populations. A larger, multi-center cohort would strengthen conclusions.

      We appreciate the reviewer’s concern regarding sample size and its potential impact on statistical power, especially when stratified by glucose tolerance levels. We fully agree that a larger sample size would increase statistical power, especially for subgroup analyses.

      We would like to clarify several points regarding the statistical power and validation of our findings. Our study adheres to established methodological frameworks for sample size determination, including the guidelines outlined by Muyembe Asenahabi, Bostely, and Peters Anselemo Ikoha. “Scientific research sample size determination.” (2023). These guidelines balance the risks of inadequate sample size with the challenges of unnecessarily large samples. For our primary analysis examining the correlation between CGM-derived measures and %NC, power calculations with a type I error of 0.05, a power of 0.8, and an expected correlation coefficient of 0.4 indicated that a minimum of 47 participants was required. Our sample size of 53 exceeded this threshold and allowed us to detect statistically significant correlations, as described in the Methods section.

      Furthermore, our sample size aligns with previous studies investigating the associations between glucose profiles and clinical parameters, including Torimoto, Keiichi, et al. “Relationship between fluctuations in glucose levels measured by continuous glucose monitoring and vascular endothelial dysfunction in type 2 diabetes mellitus.” Cardiovascular Diabetology 12 (2013): 1-7. (n=57), Hall, Heather, et al. “Glucotypes reveal new patterns of glucose dysregulation.” PLoS biology 16.7 (2018): e2005143. (n=57), and Metwally, Ahmed A., et al. “Prediction of metabolic subphenotypes of type 2 diabetes via continuous glucose monitoring and machine learning.” Nature Biomedical Engineering (2024): 1-18. (n=32). Moreover, to provide transparency about the precision of our estimates, we have included confidence intervals for all coefficients.

      Regarding the classification of glucose dynamics components, we have conducted additional validation across diverse populations including 64 Japanese, 53 American, and 100 Chinese individuals. These validation efforts have consistently supported our identification of three independent glucose dynamics components. Furthermore, the primary objective of our study was not to assess rare events, but rather to demonstrate that glucose dynamics can be decomposed into three main factors - mean, variance and autocorrelation - whereas traditional measures have primarily captured mean and variance without adequately reflecting autocorrelation. We believe that our current sample size effectively addresses this objective. 

      However, we acknowledge the importance of further validation on a larger scale. To address this, we conducted a large follow-up study of over 8,000 individuals with two years of followup (Sugimoto, Hikaru, et al. “Stratification of individuals without prior diagnosis of diabetes using continuous glucose monitoring” medRxiv (2025)), which confirmed our main finding that glucose dynamics consist of mean, variance, and autocorrelation. As this large study was beyond the scope of the present manuscript due to differences in primary objectives and analytical approaches, it was not included in this paper; however, it provides further support for the clinical relevance and generalizability of our findings.

      To address the sample size considerations, we have added the following sentences to the Discussion section (lines 409-414):

      Although our analysis included four datasets with a total of 270 individuals, and our sample size of 53 met the required threshold based on power calculations with a type I error of 0.05, a power of 0.8, and an expected correlation coefficient of 0.4, we acknowledge that the sample size may still be considered relatively small for a comprehensive assessment of these relationships. To further validate these findings, larger prospective studies with diverse populations are needed.

      (3) The study focuses on a well-characterized cohort with controlled cholesterol and blood pressure levels, reducing confounding variables. However, this stringent selection might exclude individuals with significant variability in these parameters, potentially limiting the study's applicability to broader, real-world populations. The authors should discuss how this may affect generalizability and potential bias in the results.

      We appreciate the reviewer’s comment regarding the potential impact of strict participant selection criteria on the broader applicability of our findings. We acknowledge that extending validation to more diverse populations would improve the generalizability of our findings.

      Our validation strategy included multiple cohorts from different regions, specifically 64 Japanese, 53 American and 100 Chinese individuals. These cohorts represent a clinically diverse population, including both healthy individuals and those with diabetes, allowing for validation across a broad spectrum of metabolic conditions. However, we recognize that further validation in additional populations and clinical settings would strengthen our conclusions. To address this, we conducted a large follow-up study of over 8,000 individuals with two years of follow-up (Sugimoto, Hikaru, et al. “Stratification of individuals without prior diagnosis of diabetes using continuous glucose monitoring” medRxiv (2025)), which confirmed our main finding that glucose dynamics consist of mean, variance, and autocorrelation. As this large study was beyond the scope of the present manuscript due to differences in primary objectives and analytical approaches, it was not included in this paper; however, it provides further support for the clinical relevance and generalizability of our findings.

      We have added the following text to the Discussion section to address these considerations (lines 409-414, 354-361):

      Although our analysis included four datasets with a total of 270 individuals, and our sample size of 53 met the required threshold based on power calculations with a type I error of 0.05, a power of 0.8, and an expected correlation coefficient of 0.4, we acknowledge that the sample size may still be considered relatively small for a comprehensive assessment of these relationships. To further validate these findings, larger prospective studies with diverse populations are needed.

      Although our LASSO and factor analysis indicated that CGM-derived measures were strong predictors of %NC, this does not mean that other clinical parameters, such as lipids and blood pressure, are irrelevant in T2DM complications. Our study specifically focused on characterizing glucose dynamics, and we analyzed individuals with well-controlled serum cholesterol and blood pressure to reduce confounding effects. While we anticipate that inclusion of a more diverse population would not alter our primary findings regarding glucose dynamics, it is likely that a broader data set would reveal additional predictive contributions from lipid and blood pressure parameters.

      (4) The study effectively highlights the potential of CGM-derived indices as a tool for CAD risk assessment, a concept that aligns with contemporary advancements in personalized medicine. Despite its potential, the complexity of CGM-derived indices like AC_Var and ADRR may hinder their routine clinical adoption. Providing simplified models or actionable guidelines would facilitate their integration into everyday practice.

      We appreciate the reviewer’s concern about the complexity of CGM-derived indices such as AC_Var and ADRR for routine clinical use. We recognize that for these indices to be of practical use, they must be both interpretable and easily accessible to healthcare providers.

      To address this, we have developed an easy-to-use web application that automatically calculates these measures, including AC_Var, mean glucose levels, and glucose variability. By eliminating the need for manual calculations, this tool streamlines the process and makes these indices more practical for clinical use.

      Regarding interpretability, we acknowledge that establishing specific clinical guidelines would enhance the practical utility of these measures. For example, defining a cut-off value for AC_Var above which the risk of diabetes complications increases significantly would provide clearer clinical guidance. However, given our current sample size limitations and our predefined objective of investigating correlations among indices, we have taken a conservative approach by focusing on the correlation between AC_Var and %NC rather than establishing definitive cutoffs. This approach intentionally avoids problematic statistical practices like phacking. It is not realistic to expect a single study to accomplish everything from proposing a new concept to conducting large-scale clinical trials to establishing clinical guidelines. Establishing clinical guidelines typically requires the accumulation of multiple studies over many years. Recognizing this reality, we have been careful in our manuscript to make modest claims about the discovery of new “correlations” rather than exaggerated claims about immediate routine clinical use.

      To address this limitation, we conducted a large follow-up study of over 8,000 individuals in the next study (Sugimoto, Hikaru, et al. “Stratification of individuals without prior diagnosis of diabetes using continuous glucose monitoring” medRxiv (2025)), which proposed clinically relevant cutoffs and reference ranges for AC_Var and other CGM-derived indices. As this large study was beyond the scope of the present manuscript due to differences in primary objectives and analytical approaches, it was not included in this paper; however, by integrating automated calculation tools with clear clinical thresholds, we expect to make these measures more accessible for clinical use.

      We have added the following text to the Discussion section to address these considerations (lines 415-419):

      While CGM-derived indices such as AC_Var and ADRR hold promise for CAD risk assessment, their complexity may present challenges for routine clinical implementation. To improve usability, we have developed a web-based calculator that automates these calculations. However, defining clinically relevant thresholds and reference ranges requires further validation in larger cohorts.

      (5) The exclusion of TIR from the main analysis is noted, but its relevance in diabetes management warrants further exploration. Integrating TIR as an outcome measure could provide additional clinical insights.

      We appreciate the reviewer’s comment regarding the potential role of time in range (TIR) as an outcome measure in our study. Because TIR is primarily influenced by the mean and variance of glucose levels, it does not fully capture the distinct role of glucose autocorrelation, which was the focus of our investigation.

      To clarify this point, we have expanded the Discussion section as follows (lines 380-388):

      Although time in range (TIR) was not included in the main analyses due to the relatively small number of T2DM patients and the predominance of participants with TIR >70%, our results demonstrate that CGM-derived indices outperformed conventional markers such as FBG, HbA1c, and PG120 in predicting %NC. Furthermore, multiple regression analysis between factor scores and TIR revealed that only factor 1 (mean) and factor 2 (variance) were significantly associated with TIR (Fig. S8C, D). This finding confirms the presence of three distinct components in glucose dynamics and highlights the added value of examining AC_Var as an independent glycemic feature beyond conventional CGM-derived measures.

      (6) While the study reflects a commitment to understanding CAD risks in a global context by including datasets from Japan, America, and China, the authors should provide demographic details (e.g., age, gender, socioeconomic status) and discuss how these factors might influence glucose dynamics and coronary plaque vulnerability.

      We appreciate the reviewer’s comment regarding the potential influence of demographic factors on glucose dynamics and coronary plaque vulnerability. We examined these relationships and found that age and sex had minimal effects on glucose dynamics characteristics, as shown in Figure S8A and S8B. These findings suggest that our primary conclusions regarding glucose dynamics and coronary risk remain robust across demographic groups within our data set.

      To address the reviewer’s suggestion, we have added the following discussion (lines 361-368):

      In our analysis of demographic factors, we found that age and gender had minimal influence on glucose dynamics characteristics (Fig. S8A, B), suggesting that our findings regarding the relationship between glucose dynamics and coronary risk are robust across different demographic groups within our dataset. Future studies involving larger and more diverse populations would be valuable to comprehensively elucidate the potential influence of age, gender, and other demographic factors on glucose dynamics characteristics and their relationship to cardiovascular risk.

      (7) While the article shows CGM-derived indices outperform traditional markers (e.g., HbA1c, FBG, PG120), it does not compare these indices against existing advanced risk models (e.g., Framingham Risk Score for CAD). A direct comparison would strengthen the claim of superiority.

      We appreciate the reviewer’s comment regarding the comparison of CGMderived indices with existing CAD risk models. Given that our study population consisted of individuals with well-controlled total cholesterol and blood pressure levels, a direct comparison with the Framingham Risk Score for Hard Coronary Heart Disease (Wilson, Peter WF, et al. “Prediction of coronary heart disease using risk factor categories.” Circulation 97.18 (1998): 1837-1847.) may introduce inherent bias, as these factors are key components of the score.

      Nevertheless, to further assess the predictive value of the CGM-derived indices, we performed additional analyses using linear regression to predict %NC. Using the Framingham Risk Score, we obtained an R² of 0.04 and an Akaike Information Criterion (AIC) of 330. In contrast, our proposed model incorporating the three glycemic parameters - CGM_Mean, CGM_Std, and AC_Var - achieved a significantly improved R² of 0.36 and a lower AIC of 321, indicating superior predictive accuracy. We have updated the Result section as follows (lines 115-122):

      The regression model including CGM_Mean, CGM_Std and AC_Var to predict %NC achieved an R<sup>2</sup> of 0.36 and an Akaike Information Criterion (AIC) of 321. Each of these indices showed statistically significant independent positive correlations with %NC (Fig. 1A). In contrast, the model using conventional glycemic markers (FBG, HbA1c, and PG120) yielded an R² of only 0.05 and an AIC of 340 (Fig. 1B). Similarly, the model using the Framingham Risk Score for Hard Coronary Heart Disease (Wilson et al., 1998) showed limited predictive value, with an R² of 0.04 and an AIC of 330 (Fig. 1C).

      (8) The study mentions varying CGM sampling intervals across datasets (5-minute vs. 15minute). Authors should employ sensitivity analysis to assess the impact of these differences on the results. This would help clarify whether higher-resolution data significantly improves predictive performance.

      We appreciate the reviewer’s comment regarding the potential impact of different CGM sampling intervals on our results. To assess the robustness of our findings across different sampling frequencies, we performed a down sampling analysis by converting our 5minute interval data to 15-minute intervals. The AC_Var value calculated from 15-minute intervals was significantly correlated with that calculated from 5-minute intervals (R = 0.99, 95% CI: 0.97-1.00). Consequently, the main findings remained consistent across both sampling frequencies, indicating that our results are robust to variations in temporal resolution. We have added this analysis to the Result section (lines 122-126):

      The AC_Var computed from 15-minute CGM sampling was nearly identical to that computed from 5-minute sampling (R = 0.99, 95% CI: 0.97-1.00) (Fig. S1A), and the regression using the 15‑min features yielded almost the same performance (R<sup>2</sup>  = 0.36; AIC = 321; Fig. S1B).

      (9) The identification of actionable components in glucose dynamics lays the groundwork for clinical stratification. The authors could explore the use of CGM-derived indices to develop a simple framework for stratifying risk into certain categories (e.g., low, moderate, high). This could improve clinical relevance and utility for healthcare providers.

      We appreciate the reviewer’s suggestion regarding the potential for CGMderived indices to support clinical stratification. We completely agree with the idea that establishing risk categories (e.g., low, moderate, high) based on specific thresholds would enhance the clinical utility of these measures. However, given our current sample size limitations and our predefined objective of investigating correlations among indices, we have taken a conservative approach by focusing on the correlation between AC_Var and %NC rather than establishing definitive cutoffs. This approach intentionally avoids problematic statistical practices like p-hacking. It is not realistic to expect a single study to accomplish everything from proposing a new concept to conducting large-scale clinical trials to establishing clinical thresholds. Establishing clinical thresholds typically requires the accumulation of multiple studies over many years. Recognizing this reality, we have been careful in our manuscript to make modest claims about the discovery of new “correlations” rather than exaggerated claims about immediate routine clinical use.

      To address this limitation, we conducted a large follow-up study of over 8,000 individuals in the next study (Sugimoto, Hikaru, et al. “Stratification of individuals without prior diagnosis of diabetes using continuous glucose monitoring” medRxiv (2025)), which proposed clinically relevant cutoffs and reference ranges for AC_Var and other CGM-derived indices. As this large study was beyond the scope of the present manuscript due to differences in primary objectives and analytical approaches, it was not included in this paper. However, we expect to make these measures more actionable in clinical use by integrating automated calculation tools with clear clinical thresholds.

      We have added the following text to the Discussion section to address these considerations (lines 415-419):

      While CGM-derived indices such as AC_Var and ADRR hold promise for CAD risk assessment, their complexity may present challenges for routine clinical implementation. To improve usability, we have developed a web-based calculator that automates these calculations. However, defining clinically relevant thresholds and reference ranges requires further validation in larger cohorts.

      (10) While the study acknowledges several limitations, authors should also consider explicitly addressing the potential impact of inter-individual variability in glucose metabolism (e.g., age-related changes, hormonal influences) on the findings.

      We appreciate the reviewer’s comment regarding the potential impact of interindividual variability in glucose metabolism, including age-related changes and hormonal influences, on our results. In our analysis, we found that age had minimal effects on glucose dynamics characteristics, as shown in Figure S8A. In addition, CGM-derived measures such as ADRR and AC_Var significantly contributed to the prediction of %NC independent of insulin secretion (I.I.) and insulin sensitivity (Composite index) (Fig. 2). These results suggest that our primary conclusions regarding glucose dynamics and coronary risk remain robust despite individual differences in glucose metabolism.

      To address the reviewer’s suggestion, we have added the following discussion (lines 186-188, 361-368):

      Conventional indices, including FBG, HbA1c, PG120, I.I., Composite index, and Oral DI, did not contribute significantly to the prediction compared to these CGM-derived indices.

      In our analysis of demographic factors, we found that age and gender had minimal influence on glucose dynamics characteristics (Fig. S8A, B), suggesting that our findings regarding the relationship between glucose dynamics and coronary risk are robust across different demographic groups within our dataset. Future studies involving larger and more diverse populations would be valuable to comprehensively elucidate the potential influence of age, gender, and other demographic factors on glucose dynamics characteristics and their relationship to cardiovascular risk.

      (11) It's unclear whether the identified components (value, variability, and autocorrelation) could serve as proxies for underlying physiological mechanisms, such as beta-cell dysfunction or insulin resistance. Please clarify.

      We appreciate the reviewer’s comment regarding the physiological underpinnings of the glucose components we identified. The mean, variance, and autocorrelation components we identified likely reflect specific underlying physiological mechanisms related to glucose regulation. In our previous research (Sugimoto, Hikaru, et al. “Improved detection of decreased glucose handling capacities via continuous glucose monitoring-derived indices.” Communications Medicine 5.1 (2025): 103.), we explored the relationship between glucose dynamics characteristics and glucose control capabilities using clamp tests and mathematical modelling. These investigations revealed that autocorrelation specifically shows a significant correlation with the disposition index (the product of insulin sensitivity and insulin secretion) and insulin clearance parameters.

      Furthermore, our current study demonstrates that CGM-derived measures such as ADRR and AC_Var significantly contributed to the prediction of %NC independent of established metabolic parameters including insulin secretion (I.I.) and insulin sensitivity (Composite index), as shown in Figure 2. These results suggest that the components we identified capture distinct physiological aspects of glucose metabolism beyond traditional measures of beta-cell function and insulin sensitivity. Further research is needed to fully characterize these relationships, but our results imply that these characteristics of glucose dynamics offer supplementary insight into the underlying beta-cell dysregulation that contributes to coronary plaque vulnerability.

      To address the reviewer’s suggestion, we have added the following discussion to the Result section (lines 186-188):

      Conventional indices, including FBG, HbA1c, PG120, I.I., Composite index, and Oral DI, did not contribute significantly to the prediction compared to these CGM-derived indices.

      Minor Comments:

      (1) The use of LASSO and PLS regression is appropriate, but the rationale for choosing these methods over others (e.g., Ridge regression) should be explained in greater detail.

      We appreciate the reviewer’s comment and have added the following discussion to the Methods section (lines 578-585):

      LASSO regression was chosen for its ability to perform feature selection by identifying the most relevant predictors. Unlike Ridge regression, which simply shrinks coefficients toward zero without reaching exactly zero, LASSO produces sparse models, which is consistent with our goal of identifying the most critical features of glucose dynamics associated with coronary plaque vulnerability. In addition, we implemented PLS regression as a complementary approach due to its effectiveness in dealing with multicollinearity, which was particularly relevant given the high correlation among several CGM-derived measures.

      (2) While figures are well-designed, adding annotations to highlight key findings (e.g., significant contributors in factor analysis) would improve clarity.

      We appreciate the reviewer’s suggestion to improve the clarity of our figures. In the factor analysis, we decided not to include annotations because indicators such as ADRR and J-index can be associated with multiple factors, which could lead to misleading or confusing interpretations. However, in response to the suggestion, we have added annotations to the PLS analysis, specifically highlighting items with VIP values greater than 1 (Fig. 2D, S2D) to emphasize key contributors.

      (3) The term "value" as a component of glucose dynamics could be clarified. For instance, does it strictly refer to mean glucose levels, or does it encompass other measures?

      We appreciate the reviewer’s question regarding the term “value” in the context of glucose dynamics. Factor 1 was predominantly influenced by CGM_Mean, with a factor loading of 0.99, indicating that it primarily represents mean glucose levels. Given this strong correlation, we have renamed Factor 1 to “Mean” (Fig. 3A) to more accurately reflect its role in glucose dynamics.

      (4) The concept of autocorrelation may be unfamiliar to some readers. A brief, intuitive explanation with a concrete example of how it manifests in glucose dynamics would enhance understanding.

      We appreciate the reviewer’s suggestion. Autocorrelation refers to the relationship between a variable and its past values over time. In the context of glucose dynamics, it reflects how current glucose levels are influenced by past levels, capturing patterns such as sustained hyperglycemia or recurrent fluctuations. For example, if an individual experiences sustained high glucose levels after a meal, the strong correlation between successive glucose readings indicates high autocorrelation. We have included this explanation in the revised manuscript (lines 519-524) to improve clarity for readers unfamiliar with the concept. Additionally, Figure 4A shows an example of glucose dynamics with different autocorrelation.

      (5) Ensure consistent use of terms like "glucose dynamics," "CGM-derived indices," and "plaque vulnerability." For instance, sometimes indices are referred to as "components," which might confuse readers unfamiliar with the field.

      We appreciate the reviewer’s comment about ensuring consistency in terminology. To avoid confusion, we have reviewed and standardized the use of terms such as “CGM-derived indices,” and “plaque vulnerability” throughout the manuscript. Additionally, while many of our measures are strictly CGM-derived indices, several “components” in our analysis include fasting blood glucose (FBG) and glucose waveforms during the OGTT. For these measures, we retained the descriptors “glucose dynamics” and “components” rather than relabelling them as CGM-derived indices.

      (6) Provide a more detailed overview of the supplementary materials in the main text, highlighting their relevance to the key findings.

      We appreciate the reviewer’s suggestion. We revised the manuscript by integrating the supplementary text into the main text (lines 129-160), which provides a clearer overview of the supplementary materials. Consequently, the Supplementary Information section now only contains supplementary figures, while their relevance and key details are described in the main text. 

      Reviewer #3 (Recommendations for the authors):

      Other Concerns:

      (1) The text states the significance of tests, however, no p-values are listed: Lines 118-119: Significance is cited between CGM indices and %NC, however, neither the text nor supplementary text have p-values. Need p-values for Figure 3C, Figure S10. When running the https://cgm-basedregression.streamlit.app/ multiple regression analysis, a p-value should be given as well. Do the VIP scores (Line 142) change with the inclusion of SBP, DBP, TG, LDL, and HDL? Do the other datasets have the same well-controlled serum cholesterol and BP levels?

      We appreciate the reviewer’s concern regarding statistical significance and the documentation of p values.

      First, given the multiple comparisons in our study, we used q values rather than p values, as shown in Figure 1D. Q values provide a more rigorous statistical framework for controlling the false discovery rate in multiple testing scenarios, thereby reducing the likelihood of false positives.

      Second, our statistical reporting follows established guidelines, including those of the New England Journal of Medicine (Harrington, David, et al. “New guidelines for statistical reporting in the journal.” New England Journal of Medicine 381.3 (2019): 285-286.), which recommend that “reporting of exploratory end points should be limited to point estimates of effects with 95% confidence intervals” and that “replace p values with estimates of effects or association and 95% confidence intervals”. According to these guidelines, p values should not be reported in this type of study. We determined significance based on whether these 95% confidence intervals excluded zero - a statistical method for determining whether an association is significantly different from zero (Tan, Sze Huey, and Say Beng Tan. “The correct interpretation of confidence intervals.” Proceedings of Singapore Healthcare 19.3 (2010): 276-278.).

      For the sake of transparency, we provide p values for readers who may be interested, although we emphasize that they should not be the basis for interpretation, as discussed in the referenced guidelines. Specifically, in Figure 1A-B, the p values for CGM_Mean, CGM_Std, and AC_Var were 0.02, 0.02, and <0.01, respectively, while those for FBG, HbA1c, and PG120 were 0.83, 0.91, and 0.25, respectively. In Figure 3C, the p values for factors 1–5 were 0.03, 0.03, 0.03, 0.24, and 0.87, respectively, and in Figure S8C, the p values for factors 1–3 were <0.01, <0.01, and 0.20, respectively. We appreciate the opportunity to clarify our statistical methodology and are happy to provide additional details if needed.

      We confirmed that the results of the variable importance in projection (VIP) analysis remained stable after including additional covariates, such as systolic blood pressure (SBP), diastolic blood pressure (DBP), triglycerides (TG), low-density lipoprotein cholesterol (LDL-C), and high-density lipoprotein cholesterol (HDL-C). The VIP values for ADRR, MAGE, AC_Var, and LI consistently exceeded one even after these adjustments, suggesting that the primary findings are robust in the presence of these clinical variables. We have added the following sentences in the Results and Methods section (lines 188-191, 491-494):

      Even when SBP, DBP, TG, LDL-C, and HDL-C were included as additional input variables, the results remained consistent, and the VIP scores for ADRR, AC_Var, MAGE, and LI remained greater than 1 (Fig. S2D).

      Of note, as the original reports document, the validation datasets did not specify explicit cutoffs for blood pressure or cholesterol. Consequently, they included participants with suboptimal control of these parameters.

      (2) Negative factor loadings have not been addressed and consistency in components: Figure 3, Figure S7. All the main features for value in Figure 3A are positive. However, MVALUE in S7B is very negative for value whereas the other features highlighted for value are positive. What is driving this difference? Please explain if the direction is important. Line 480 states that variables with factor loadings >= 0.30 were used for interpretation, but it appears in the text (Line 156, Figure 3) that oral DI was used for value, even though it had a -0.61 loading. Figure 3, Figure S7. HBGI falls within two separate components (value and variability). There is not a consistent component grouping. Removal of MAG (Line 185) and only MAG does not seem scientific. Did the removal of other features also result in similar or different Cronbach's ⍺? It is unclear what Figure S8B is plotting. What does each point mean?

      We appreciate the reviewer’s comment regarding the classification of CGMderived measures into the three components: value, variability, and autocorrelation. As the reviewer correctly points out, some measures may load differently between the value and variability components in different datasets. However, we believe that this variability reflects the inherent mathematical properties of these measures rather than a limitation of our study.

      For example, the HBGI clusters differently across datasets due to its dependence on the number of glucose readings above a threshold. In populations where mean glucose levels are predominantly below this threshold, the HBGI is more sensitive to glucose variability (Fig. S3A). Conversely, in populations with a wider range of mean glucose levels, HBGI correlates more strongly with mean glucose levels (Fig. 3A). This context-dependent behaviour is expected given the mathematical properties of these measures and does not indicate an inconsistency in our classification approach.

      Importantly, our main findings remain robust: CGM-derived measures systematically fall into three components-value, variability, and autocorrelation. Traditional CGM-derived measures primarily reflect either value or variability, and this categorization is consistently observed across datasets. While specific indices such as HBGI may shift classification depending on population characteristics, the overall structure of CGM data remains stable.

      With respect to negative factor loadings, we agree that they may appear confusing at first. However, in the context of exploratory factor analysis, the magnitude, or absolute value, of the loading is most critical for interpretation, rather than its sign. Following established practice, we considered variables with absolute loadings of at least 0.30 to be meaningful contributors to a given component. Accordingly, although the oral DI had a negative loading of –0.61, its absolute magnitude exceeded the threshold of 0.30, so it was considered in our interpretation of the “value” component. Regarding the reviewer’s observation that MVALUE in Figure S7B shows a strongly negative loading while other indices in the same component show positive loadings, we believe this reflects the relative orientation of the factor solution rather than a substantive difference in interpretation. In factor analysis, the direction of factor loadings is arbitrary: multiplying all the loadings for a given factor by –1 would not change the factor’s statistical identity. Therefore, the important factor is not whether a variable loads positively or negatively but rather the strength of its association with the latent component (i.e., the absolute value of the loading).

      The rationale for removing MAG was based on statistical and methodological considerations. As is common practice in reliability analyses, we examined whether Cronbach’s α would improve if we excluded items with low factor loadings or weak item–total correlations. In the present study, we recalculated Cronbach’s α after removing the MAG item because it had a low loading. Its exclusion did not substantially affect the theoretical interpretation of the factor, which we conceptualize as “secretion” (without CGM). MAG’s removal alone is scientifically justified because it was the only item whose exclusion improved Cronbach's α while preserving interpretability. In contrast, removing other items would have undermined the conceptual clarity of the factor or would not have meaningfully improved α. Furthermore, the MAG item has a high factor 2 loading.

      Each point in Figure S8B (old version) corresponds to an individual participant.

      To address these considerations, we have added the following text to the Discussion, Methods, (lines 388-396, 600-601) and Figure S6B (current version) legend:

      Some indices, such as HBGI, showed variation in classification across datasets, with some populations showing higher factor loadings in the “mean” component and others in the “variance” component. This variation occurs because HBGI calculations depend on the number of glucose readings above a threshold. In populations where mean glucose levels are predominantly below this threshold, the HBGI is more sensitive to glucose variability (Fig. S5A). Conversely, in populations with a wider range of mean glucose levels, the HBGI correlates more strongly with mean glucose levels (Fig. 3A). Despite these differences, our validation analyses confirm that CGM-derived indices consistently cluster into three components: mean, variance, and autocorrelation.

      Variables with absolute factor loadings of ≥ 0.30 were used in interpretation.

      Box plots comparing factors 1 (Mean), 2 (Variance), and 3 (Autocorrelation) between individuals without (-) and with (+) diabetic macrovascular complications. Each point corresponds to an individual. The boxes represent the interquartile range, with the median shown as a horizontal line. Mann–Whitney U tests were used to assess differences between groups, with P values < 0.05 considered statistically significant.

      Minor Concerns:

      (1) NGT is not defined.

      We appreciate the reviewer for pointing out that the term “NGT” was not clearly defined in the original manuscript. We have added the following text to the Methods section (lines 447-451):

      T2DM was defined as HbA1c ≥ 6.5%, fasting plasma glucose (FPG) ≥ 126 mg/dL or 2‑h plasma glucose during a 75‑g OGTT (PG120) ≥ 200 mg/dL. IGT was defined as HbA1c 6.0– 6.4%, FPG 110–125 mg/dL or PG120 140–199 mg/dL. NGT was defined as values below all prediabetes thresholds (HbA1c < 6.0%, FPG < 110 mg/dL and PG120 < 140 mg/dL).

      (2) Is it necessary to list the cumulative percentage (Line 173), it could be clearer to list the percentage explained by each factor instead.

      We appreciate the reviewer’s suggestion to list the percentage explained by each factor rather than the cumulative percentage for improved clarity. According to the reviewer’s suggestion, we have revised the results to show the individual contribution of each factor (39%, 21%, 10%, 5%, 5%) rather than the cumulative percentages (39%, 60%, 70%, 75%, 80%) that were previously listed (lines 220-221).

      (3) Figure S10. How were the coefficients generated for Figure S10? No methods are given.

      We conducted a multiple linear regression analysis in which time in range (TIR) was the dependent variable and the factor scores corresponding to the first three latent components (factor 1 representing the mean, factor 2 representing the variance, and factor 3 representing the autocorrelation) were the independent variables. We have added the following text to the figure legend (Fig. S8C) to provide a more detailed description of how the coefficients were generated:

      Comparison of predicted Time in range (TIR) versus measured TIR using multiple regression analysis between TIR and factor scores in Figure 3. In this analysis, TIR was the dependent variable, and the factor scores corresponding to the first three latent components (factor 1 representing the mean, factor 2 representing the variance, and factor 3 representing the autocorrelation) were the independent variables. Each point corresponds to the values for a single individual.

      (4) In https://cgm-basedregression.streamlit.app/, more explanation should be given about the output of the multiple regression. Regression is spelled incorrectly on the app.

      We appreciate the reviewer for pointing out the need for a clearer explanation of the multiple regression analysis presented in the online tool

      (https://cgmregressionapp2.streamlit.app/). We have added the description about the regression and corrected the typographical error in the spelling of “regression” within the app. 

      (5) The last section of results (starting at line 225) appears to be unrelated to the goal of predicting %NC.

      We appreciate the reviewer’s feedback regarding the relevance of the simulation component of our manuscript. The primary contribution of our study goes beyond demonstrating correlations between CGM-derived measures and %NC; it highlights three fundamental components of glycemic patterns-mean, variance, and autocorrelation-and their independent relationships with coronary plaque characteristics. The simulations are included to illustrate how glycemic patterns with identical means and variability can have different autocorrelation structures. As reviewer 2 pointed out in minor comment #4, temporal autocorrelation can be difficult to interpret, so these visualizations were intended to provide intuitive examples for readers.

      However, we agree with the reviewer’s concern about the coherence of the manuscript. In response, we have streamlined the simulation section by removing technical simulations that do not directly support our primary conclusions (old version of the manuscript, lines 239-246, 502-526), while retaining only those that enhance understanding of the three glycemic components (Fig. 4A).

      (6) Figure S2. The R2 should be reported.

      We appreciate the reviewer for suggesting that we report R² in Figure S2. In the revised version, we have added the correlation coefficients and their 95% confidence intervals to Figure 1E.

      (7) Multiple panels have a correlation line drawn with a slope of 1 which does not reflect the data or r^2 listed. this should be fixed.

      We appreciate the reviewer’s concern that several panels included regression lines with a fixed slope of one that did not reflect the associated R² values. We have corrected Figures 1A–C and 3C to display regression lines representing the estimated slopes derived from the regression analyses.

    1. eLife Assessment

      This study delivers valuable new insights into the neural circuits involved in post-mating responses (PMR) in Drosophila females, supported by convincing evidence that the circuits for mating receptivity and egg-laying are distinct. The new experimental evidence adds to the current understanding the neural circuits and molecular mechanisms underpinning PMR.

    2. Reviewer #1 (Public review):

      Summary:

      Authors explore how sex-peptide (SP) affects post-mating behaviours in adult females, such as receptivity and egg laying. This study identifies different neurons in the adult brain and the VNC that become activated by SP, largely by using an intersectional gene expression approach (split-GAL4) to narrow down the specific neurons involved. They confirm that SP binds to the well-known Sex Peptide Receptor (SPR), initiating a cascade of physiological and behavioural changes related to receptivity and egg laying.

      Comments on revised version:

      The authors have substantially strengthened the manuscript in response to our main concerns.

      In particular, they now explicitly test multiple established PMR nodes (including SAG/SPSN as well as pC1, OviDN/OviEN/OviIN and vpoDN), which helps separate direct SP targets from downstream PMR circuitry and supports their interpretation that some of these known nodes can affect receptivity without necessarily inducing oviposition. They also addressed key technical/clarity points: the requested head/trunk expression controls are provided (Suppl Fig S1), and the VT003280 annotation is corrected (now FD6 rather than "SAG driver"). Overall, these additions make the central conclusion, that distinct CNS neuron subsets ("SPRINz") are sufficient to elicit PMR components, more convincing, and the added comparisons with genital tract expressing lines further argue against a simple "periphery only" explanation.

    3. Reviewer #2 (Public review):

      Sex peptide (SP) transferred during mating from male to female induces various physiological responses in the receiving female. Among those, the increase in oviposition and decrease in sexual receptivity are very remarkable. Naturally, a long standing and significant question is the identify of the underlying sex peptide target neurons that express the SP receptor and are underlying these responses. Identification of these neurons will eventually lead to the identification of the underlying neuronal circuitry.

      The Soller lab has addressed this important question already several years ago (Haussmann et al. 2013), using relevant GAL4-lines and membrane-tethered SP. The results already showed that the action of SP on receptivity and oviposition is mediated by different neuronal subsets and hence can be separated. The GAL4-lines used at that time were, however, broad, and the individual identity of the relevant neurons remained unclear.

      In the present paper, Nallasivan and colleagues carried this analysis a significant step further, using new intersectional approaches and transsynaptic tracing.

      Strength:

      The intersectional approach is appropriate and state-of-the art. The analysis is a very comprehensive tour-de-force and experiments are carefully performed to a high standard. The authors also produced a useful new transgenic line (UAS-FRTstopFRT mSP). The finding that neurons in the brain (head) mediate the SP effect on receptivity, while neurons in the abdomen and thorax (ventral nerve cord or peripheral neurons) mediate the SP effect on oviposition, is a significant step forward in the endavour to identify the underlying neuronal networks and hence a mechanistic understanding of SP action. The analysis identifies a small set of neurons underlying SP responses. Some are part of the post-mating circuitry aind influence receptivity, while other are likely involved in higher order sensory processing. Though these results are not entirely unexpected, they are novel and represent a significant step forwards as the analysis is at a much higher resolution as previous work.

      Weakness:

      Though the analysis is at a much higher resolution as previous work on SP targets, it does not yet reach the resolution of single neuronal cell types. The last paragraph in the discussion rightfully speculates about the neurochemical identity of some of the intersection neurons (e.g. dopaminergic P1 neurons, NPF neurons). These suggested identities could have been confirmed by straight-forward immunostainings agains NPF or TH, for which antisera are available. Moreover, specific GAL4 lines for NPF or P1 or at least TH neurons are available which could be used to express mSP to test whether SP activation of those neurons is sufficient to trigger the SP effect. Moreover, the conclusion that SP target neurons operate as key integrators of sensory information for decision of behavioural outputs needs further experimental confirmation.

    4. Reviewer #3 (Public review):

      Summary:

      This paper reports new findings regarding neuronal circuitries responsible for female post-mating responses (PMRs) in Drosophila. The PMRs are induced by sex peptide (SP) transferred from males during mating. The authors sought to identify SP target neurons using a membrane-tethered SP (mSP) and a collection of GAL4 lines, each containing a fragment derived from the regulatory regions of the SPR, fru, and dsx genes involved in PMR. They identified several lines that induced PMR upon expression of mSP. Using split-GAL4 lines, they identified distinct SP-sensing neurons in the central brain and ventral nerve cord. Analyses of pre- and post-synaptic connection using retro- and trans-Tango placed SP target neurons at the interface of sensory processing interneurons that connect to two common post-synaptic processing neuronal populations in the brain. The authors proposed that SP interferes with the processing of sensory inputs from multiple modalities.

      Strengths:

      Besides the main results described in the summary above, the authors discovered the following:

      (1) Reduction of receptivity and induction of egg-laying are separable by restricting the expression of membrane-tethered SP (mSP): head-specific expression of mSP induces reduction of receptivity only, whereas trunk-specific expression of mSP induces oviposition only. Also, they identified a GAL4 line (SPR12) that induced egg laying but did not reduce receptivity.

      (2) Expression of mSP in the genital tract sensory neurons does not induce PMR. The authors identified three GAL4 drivers (SPR3, SPR 21, and fru9), which robustly expressed mSP in genital tract sensory neurons but did not induce PMRs. Also, SPR12 does not express in genital tract neurons but induces egg laying by expressing mSP.

    5. Author response:

      Public Reviews:

      Reviewer #1 (Public Review):

      Areas of improvement and suggestions:

      (1) "These results suggest the SP targets interneurons in the brain that feed into higher processing centers from different entry points likely representing different sensory input" and "All together, these data suggest that the abdominal ganglion harbors several distinct type of neurons involved in directing PMRs"

      The characterization of the post-mating circuitry has been largely described by the group of Barry Dickson and other labs. I suggest ruling out a potential effect of mSP in any of the well-known post-mating neuronal circuitry, i.e: SPSN, SAG, pC1, vpoDN or OviDNs neurons. A combination of available split-Gal4 should be sufficient to prove this.

      We agree that this information is important to distinguish neurons which are direct SP targets from neurons which are involved in directing reproductive behaviors. We have now tested drivers for these neurons and added these data in Fig 3 (SAG neurons) and as Suppl Figs S4 (SPSN and genital tract neuron drivers SPR3 and SPR21), Suppl Fig S6 (overlap in single cell expression atlas), Suppl Fig S7 (overlap of SPSN split drivers with SPR8, fru11/12 and dsx split drivers in the brain inducing PMRs) and Suppl Fig S9 (pC1, OviDNs, OviENs, OviINs and vpoDN).  

      The newly added data are in full support of our conclusion that SP targets central nervous system neurons, which we termed SP Response Inducing Neurons (SPRINz). In particular, we find lines that express in genital tract neurons, but do not induce an SP response (Supp Figs S4, S7 and S10) or do not express in genital tract neurons and induce an SP response (Fig 2 and Supp Fig S2).

      We have analysed the expression of SPSN in the brain and VNC and find expression in few neurons (Suppl Fig S4). This result is consistent with expression of the genes driving SPSN expression in the single cell expression atlas indicating overlap of expression in very few neurons (Suppl Fig S6). We have already shown that FD6 (VT003280) which is part of the SPSN splitGal4 driver, expresses in the brain and VNC and can induce PMRs from SP expression (Fig 4).

      We have taken this further to test another SPSN driver (VT058873) in combination with SPR8, fru11/12 and dsx and find PMRs induced by mSP expression (Suppl Fig S7). Moreover, if we restrict expression of mSP to the brain with otdflp we can induce PMRs from mSP expression and obtain the same response by activating these brain neurons (Suppl Fig S7). We note that the VT058873 ∩ fru11/12 intersection in combination with otdflp stopmSP or stopTrpA1 in the head, did not result in PMRs. Here, PMR inducing neurons likely reside in the VNC, but currently no tools are available to test this further.

      We further tested pC1, OviDNs, OviENs, OviINs and vpoDN for induction of PMRs from expression of mSP. We are pleased to see that OviEN-SS2s, OviIN-SS1 and vpoDN splitGAl4 drivers can reduce receptivity, but not induce oviposition (Suppl Fig S8). We predicted such drivers based on previously published data (Haussmann et al. 2013), which we now validated.

      (2) Authors must show how specific is their "head" (elav/otd-flp) and "trunk" (elav/tsh) expression of mSP by showing images of the same constructs driving GFP.

      The expression pattern for tshGAL, which expresses in the trunk is already published (Soller et al., 2006). We have added images for “head” expression for tshGAL and adjusted our statement to be pre-dominantly expressed in the VNC in Suppl Fig 1.

      (3) VT3280 is termed as a SAG driver. However, VT3280 is a SPSN specific driver (Feng et al., 2014; Jang et al., 2017; Scheunemann et al., 2019; Laturney et al., 2023). The authors should clarify this.

      According to the reviewers suggestion, we have clarified the specificity of VT003280 and now say that this is FD6.

      (4) Intersectional approaches must rule out the influence of SP on sex-peptide sensing neurons (SPSN) in the ovary by combining their constructs with SPSN-Gal80 construct. In line with this, most of their lines targets the SAG circuit (4I, J and K). Again, here they need to rule out the involvement of SPSN in their receptivity/egg laying phenotypes. Especially because "In the female genital tract, these split-Gal4 combinations show expression in genital tract neurons with innervations running along oviduct and uterine walls (Figures S3A-S3E)".

      We agree with this reviewer that we need a higher resolution of expression to only one cell type. However, this is a major task that we will continue in follow up studies.

      In principal, use of GAL80 is a valid approach to restrict expression, if levels of GAL80 are higher than those of GAL4, because GAL80 binds GAL4 to inhibit its activity. Hence, if levels of GAL80 are lower, results could be difficult to interpret.

      (5) The authors separate head (brain) from trunk (VNC) responses, but they don't narrow down the neural circuits involved on each response. A detailed characterization of the involved circuits especially in the case of the VNC is needed to (a) show that the intersectional approach is indeed labelling distinct subtypes and (b) how these distinct neurons influence oviposition.

      Again, we agree with this reviewer that we need a higher resolution of expression to only one cell type. However, this is a major task that we will continue in follow up studies.

      Reviewer #2 (Public Review):

      Strength:

      The intersectional approach is appropriate and state-of-the art. The analysis is a very comprehensive tour-de-force and experiments are carefully performed to a high standard. The authors also produced a useful new transgenic line (UAS-FRTstopFRT mSP). The finding that neurons in the brain (head) mediate the SP effect on receptivity, while neurons in the abdomen and thorax (ventral nerve cord or peripheral neurons) mediate the SP effect on oviposition, is a significant step forward in the endavour to identify the underlying neuronal networks and hence a mechanistic understanding of SP action. Though this result is not entirely unexpected, it is novel as it was not shown before.

      We thank reviewer 2 for recognizing the advance of our work.

      Weakness:

      Though the analysis identifies a small set of neurons underlying SP responses, it does not go the last step to individually identify at least a few of them. The last paragraph in the discussion rightfully speculates about the neurochemical identity of some of the intersection neurons (e.g. dopaminergic P1 neurons, NPF neurons). At least these suggested identities could have been confirmed by straight-forward immunostainings agains NPF or TH, for which antisera are available. Moreover, specific GAL4 lines for NPF or P1 or at least TH neurons are available which could be used to express mSP to test whether SP activation of those neurons is sufficient to trigger the SP effect.

      We appreciate this reviewers recognition of our previous work showing that receptivity and oviposition are separable. As pointed out we have now gone one step further and identified in a tour de force approach subsets of neurons in the brain and VNC.

      We agree with this reviewer that we need a higher resolution of expression to only one cell type. As pointed out by this reviewer, the neurochemical identity is an excellent suggestions and will help to further restrict expression to just one type of neuron. However, this is a major task that we will continue in follow up studies.

      Reviewer #3 (Public Review):

      Strengths:

      Besides the main results described in the summary above, the authors discovered the following:

      (1) Reduction of receptivity and induction of egg-laying are separable by restricting the expression of membrane-tethered SP (mSP): head-specific expression of mSP induces reduction of receptivity only, whereas trunk-specific expression of mSP induces oviposition only. Also, they identified a GAL4 line (SPR12) that induced egg laying but did not reduce receptivity.

      (2) Expression of mSP in the genital tract sensory neurons does not induce PMR. The authors identified three GAL4 drivers (SPR3, SPR 21, and fru9), which robustly expressed mSP in genital tract sensory neurons but did not induce PMRs. Also, SPR12 does not express in genital tract neurons but induces egg laying by expressing mSP.

      We thank reviewer 2 for recognizing these two important points regarding the SP response that point to a revised model for how the underlying circuitry induces the post-mating response. To further substantiate these findings we now have added a splitGal4 nSyb ∩ ppk which expresses in genital tract neurons, but does not induce PMRs from mSP expression.

      Weaknesses:

      (1) Intersectional expression involving ppk-GAL4-DBD was negative in all GAL4AD lines (Supp. Fig.S5). As the authors mentioned,   neurons may not intersect with SPR, fru, dsx, and FD6 neurons in inducing PMRs by mSP. However, since there was no PMR induction and no GAL4 expression at all in any combination with GAL4-AD lines used in this study, I would like to have a positive control, where intersectional expression of mSP in ppk-GAL4-DBD and other GAL4-AD lines (e.g., ppk-GAL4-AD) would induce PMR.

      We have added a positive control for ppk expression by combining the ppk-DBD line with a nSyb-AD which expresses in all neurons in Supp Fig S8. This experiment confirms our previous observations that ppk splitGal4 in combination with other drivers does not induce an SP response despite driving expression in genital tract neurons. We have expanded the discussion section to point out that we have identified additional cells in the brain expressing ppkGAL4, but expression of split-GAL4 ppk is absent in these cells. Part of this work has previously been published (Nallasivan et al. 2021). Accordingly, we amended the text to say when expression was achieved with ppkGAL or ppk splitGAL4.

      (2) The results of SPR RNAi knock-down experiments are inconclusive (Figure 5). SPR RNAi cancelled the PMR in dsx ∩ fru11/12 and partially in SPR8 ∩ fru 11/12 neurons. SPR RNAi in dsx ∩ SPR8 neurons turned virgin females unreceptive; it is unclear whether SPR mediates the phenotype in SPR8 ∩ fru 11/12 and dsx ∩ SPR8 neurons.

      We agree with this reviewer that the interpretation of the SPR RNAi results are complicated by the fact that SP has additional receptors (Haussmann et al 2013). The results are conclusive for all three intersections when expressing UAS mSP in SPR RNAi with respect to oviposition, e.g. egg laying is not induced in the absence of SPR. For receptivity, the results are conclusive for dsx ∩ fru11/12 and partially for SPR8 ∩ fru 11/12.

      Potentially, SPR RNAi knock-down does not sufficiently reduce SPR levels to completely reduce receptivity in some intersection patterns, likely also because splitGal4 expression is less efficient.

      Why SPR RNAi in dsx ∩ SPR8 neurons turned virgin females unreceptive is unclear, but we anticipate that we need a higher resolution of expression to only one cell type to resolve this unexpected result. However, this is a major task that we will continue in follow up studies.

      SPR RNAi knock-down experiments may also help clarify whether mSP worked autocrine or juxtacrine to induce PMR. mSP may produce juxtacrine signaling, which is cell non-autonomous.

      Whether membrane-tethered SP induces the response in a autocrine manner is an import aspect in the interpretation of the results from mSP expression.

      Removing SPR by SPR RNAi and expression of mSP in the same neurons did not induce egg laying for all three intersection and did not reduce receptivity for dsx ∩ fru11/12 and for SPR8 ∩ fru 11/12. Accordingly, we can conclude that for these neurons the response is induced in an autocrine manner.

      We have added this aspect to the discussion section.

  2. Dec 2025
    1. eLife Assessment

      This study investigates the function of Chi3l1 in hepatic macrophages in the context of MASLD, providing useful insights at a time when the distinct roles of Kupffer cells or monocyte-derived macrophages in this disease remain incompletely defined. The data suggests that CHI3L1 in Kupffer cells modulates glucose handling in obesity and mitigates systemic metabolic dysfunction and hepatic steatosis during high-fat, high-fructose feeding. However, the loss-of-function studies employing Kupffer cell restricted versus a pan myeloid Cre lines are not sufficient to support the assertion that CHI3L1 activity is confined to resident Kupffer cells. Additionally, the flow-cytometric analyses reveal a modest depletion of Kupffer cells and no recruitment of TIM4low monocyte-derived macrophages, indicating that the system reflects simple steatosis rather than substantial macrophage turnover or niche remodelling. While the findings are intriguing, further experimentation is required to clarify the cellular specificity and mechanistic basis of the phenotypes observed.

    2. Reviewer #1 (Public review):

      The manuscript by Shan et al seeks to define the role of the CHI3L1 protein in macrophages during the progression of MASH. The authors argue that the Chil1 gene is expressed highly in hepatic macrophages. Subsequently, they use Chil1 flx mice crossed to Clec4F-Cre or LysM-Cre to assess the role of this factor in the progression of MASH using a high fat high, fructose diet (HFFC). They found that loss of Chil1 in KCs (Clec4F Cre) leads to enhanced KC death and worsened hepatic steatosis. Using scRNA seq they also provide evidence that loss of this factor promotes gene programs related to cell death. From a mechanistic perspective they provide evidence that CHI3L serves as a glucose sink and thus loss of this molecule enhances macrophage glucose uptake and susceptibility to cell death. Using a bone marrow macrophage system and KCs they demonstrate that cell death induced by palmitic acid is attenuated by the addition of rCHI3L1. While the article is well written and potentially highlights a new mechanism of macrophage dysfunction in MASH and the authors have addressed some of my concerns there are some concerns about the current data that continue to limit my enthusiasm for the study. Please see my specific comments below.

      Major:

      (1) The authors' interpretation of the results from the KC ( Clec4F) and MdM KO (LysM-Cre) experiments is flawed. The authors have added new data that suggests LyM-Cre only leads to a 40% reduction of Chil1 in KCs and that this explains the difference in the phenotype compared to the Clec4F-Cre. However, this claim would be made stronger using flow sorted TIM4hi KCs as the plating method can lead to heterogenous populations and thus an underestimation of knockdown by qPCR. Moreover, in the supplemental data the authors show that Clec4f-Cre x Chil1flx leads to a significant knockdown of this gene in BMDMs. As BMDMs do not express Clec4f this data calls into question the rigor of the data. I am still concerned that the phenotype differences between Clec4f-cre and LyxM-cre is not related to the degree of knockdown in KCs but rather some other aspect of the model (microbiota etc). It woudl be more convincing if the authors could show the CHI3L reduction via IF in the tissue of these mice.

      (2) Figure 4 suggests that KC death is increased with KO of Chil1. The authors have added new data with TIM4 that better characterizes this phenotype. The lack of TIM4 low, F4/80 hi cells further supports that their diet model is not producing any signs of the inflammatory changes that occur with MASLD and MASH. This is also supported by no meaningful changes in the CD11b hi, F4/80 int cells that are predominantly monocytes and early Mdms). It is also concerning that loss of KCs does not lead to an increase in Mo-KCs as has been demonstrated in several studies (PMID37639126, PMID:33997821). This would suggest that the degree of resident KC loss is trivial.

      (3) The authors demonstrated that Clec4f-Cre itself was not responsible for the observed phenotype, which mitigates my concerns about this influencing their model.

      (4) I remain somewhat concerned about the conclusion that Chil1 is highly expressed in liver macrophages. The author agrees that mRNA levels of this gene are hard to see in the datasets; however, they argue that IF demonstrates clear evidence of the protein, CHI3L. The IF in the paper only shows a high power view of one KC. I would like to see what percentage of KCs express CHI3L and how this changes with HFHC diet. In addition, showing the knockout IF would further validate the IF staining patterns.

      Minor:

      (1) The authors have answered my question about liver fibrosis. In line with their macrophage data their diet model does not appear to induce even mild MASH.

    3. Reviewer #2 (Public review):

      In the revised version of the manuscript, the authors have attempted to address my questions, however, a number of my original concerns still remain.

      Firstly, I had asked for a validation of the different CRE lines used - Lysm and Clec4f. The authors have now looked at BMDMs and KCs (steady state) from these animals. They conclude LysM only targets BMDMs not KCs, while CLEC4F targets both KCs and BMDMs. This I do not understand, BMDMs do not express CLEC4F so why are they targeted with this CRE? Additionally, BMDMs are not the correct control here, rather the authors should look at the incoming moMFs in the livers of these mice in the MASLD setting. Similarly, the KO in the MASLD KCs should be verified.

      Then I had asked for validation of macrophage expression of Chil1 in other MASLD human and mouse datasets. The authors have looked into this, but the data provided do not suggest it is highly expressed by these cells either in the other mouse models or in the human. Nevertheless, they include a statement suggesting a similar expression pattern (although also being expressed by other cells). This is not an accurate discussion of the data and hence must be revised. This also prompted me to take another look at their data and this has left me querying the data in Figure 1D. Is the percent expressed 1%? In Figure 1C the scale goes from 0-100 but here 0-1. If we are talking about expression in 1% of cells which would fit with the additional public mouse data now analysed then how relevant are any of these claims? How sure are the authors that the effects seen are through KCs/moMFs? In figure 1D all cells profiled by scRNA-seq should be shown not just MFs to get a better sense of this data. What is macrophage expression of Chil1 compared with all other liver cells?

      The cell death had also previously concerned me that 40-60% of KCs were tunel +ve. I do not understand how 60% are +ve at 8 weeks but then they have more or less same number of TIM4+ cells at 16 weeks? How can this be? why do the tunel +ve cells not die? This concern remains as I don't understand how they reached these numbers given the images. Additional, larger images were also not provided to be sure that they are representative images in the figure. Now in the images provided, there are clearly cells which are TIM4+ where the tunel does not overlap, likely it is in a LSEC or other neighbouring cell. Indeed also taking Fig S11b as an example there are ˜7KCs and at best 1 expresses tunel so how do they get to 60%?

    4. Reviewer #3 (Public review):

      This paper investigates the role of Chi3l1 in regulating the fate of liver macrophages in the context of metabolic dysfunction leading to the development of MASLD. I do see value in this work, but some issues exist that should be addressed as well as possible.

      Here are my comments:

      (1) Chi3l1 has been linked to macrophage functions in MASLD/MASH, acute liver injury, and fibrosis models before (e.g., PMID: 37166517), which limits the novelty of the current work. It has even been linked to macrophage cell death/survival (PMID: 31250532) in the context of fibrosis, which is a main observation from the current study.

      (2) The LysCre-experiments differ from experiments conducted by Ariel Feldstein's team (PMID: 37166517). What is the explanation for this difference? - The LysCre system is neither specific to macrophages (it also depletes in neutrophils, etc), nor is this system necessarily efficient in all myeloid cells (e.g., Kupffer cells vs other macrophages). The authors need to show the efficacy and specificity of the conditional KO regarding Chi3l1 in the different myeloid populations in the liver and the circulation.

      (3) The conclusions are exclusively based on one MASLD model. I recommend confirming the key findings in a second, ideally a more fibrotic, MASH model.

      (4) Very few human data are being provided (e.g., no work with own human liver samples, work with primary human cells). Thus, the translational relevance of the observations remains unclear.

      Comments on revisions:

      The authors have done a thorough job addressing my comments. However, I am not convinced about the MCD diet model, which is somewhat hidden in the Supplementary Files. Neither seems MASH different nor are any fibrosis data shown to support the conclusions. I am not satisfied with this part of the revised manuscript, and I do not agree that the second MASH model would support the conclusions.

    5. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      The manuscript by Shan et al seeks to define the role of the CHI3L1 protein in macrophages during the progression of MASH. The authors argue that the Chil1 gene is expressed highly in hepatic macrophages. Subsequently, they use Chil1 flx mice crossed to Clec4F-Cre or LysM-Cre to assess the role of this factor in the progression of MASH using a high-fat, high-cholesterol diet (HFHC). They found that loss of Chil1 in KCs (Clec4F Cre) leads to enhanced KC death and worsened hepatic steatosis. Using scRNA seq, they also provide evidence that loss of this factor promotes gene programs related to cell death. From a mechanistic perspective, they provide evidence that CHI3L serves as a glucose sink and thus loss of this molecule enhances macrophage glucose uptake and susceptibility to cell death. Using a bone marrow macrophage system and KCs they demonstrate that cell death induced by palmitic acid is attenuated by the addition of rCHI3L1. While the article is well written and potentially highlights a new mechanism of macrophage dysfunction in MASH, there are some concerns about the current data that limit my enthusiasm for the study in its current form. Please see my specific comments below.

      (1) The authors' interpretation of the results from the KC (Clec4F) and MdM KO (LysM-Cre) experiments is flawed. For example, in Figure 2 the authors present data that knockout of Chil1 in KCs using Clec4f Cre produces worse liver steatosis and insulin resistance. However, in supplemental Figure 4, they perform the same experiment in LysM-Cre mice and find a somewhat different phenotype. The authors appear to be under the impression that LysM-Cre does not cause recombination in KCs and therefore interpret this data to mean that Chil1 is relevant in KCs and not MdMs. However, LysM-Cre DOES lead to efficient recombination in KCs and therefore Chil1 expression will be decreased in both KCs and MdM (along with PMNs) in this line.

      Therefore, a phenotype observed with KC-KO should also be present in this model unless the authors argue that loss of Chil1 from the MdMs has the opposite phenotype of KCs and therefore attenuates the phenotype. The Cx3Cr1 CreER tamoxifen inducible system is currently the only macrophage Cre strategy that will avoid KC recombination. The authors need to rethink their results with the understanding that Chil1 is deleted from KCs in the LysM-Cre experiment. In addition, it appears that only one experiment was performed, with only 5 mice in each group for both the Clec4f and LysM-Cre data. This is generally not enough to make a firm conclusion for MASH diet experiments.

      We thank the reviewer for raising this important point regarding our data interpretation. We have carefully examined the deletion efficiency of Chi3l1 in primary Kupffer cells (KCs) from Lyz2<sup>∆Chil1</sup> (LysM-Cre) mice. Our results show roughly a 40% reduction in Chi3l1 expression at both the mRNA and protein levels (Revised Manuscript, Figure S7B and C). Given this modest decrease, Chi3l1 deletion in KCs of Lyz2<sup>∆Chil1</sup> mice was incomplete, which likely accounts for the phenotypic differences observed between Clec4f<sup>∆Chil1</sup> and Lyz2<sup>∆Chil1</sup> mice in the MASLD model.

      Furthermore, we have increased the sample size in both the Clec4f- and LysM-Cre experiments to 9–12 mice per group following the HFHC diet, thereby strengthening the statistical power and reliability of our findings (Revised Figures 2 and S8).

      (2) The mouse weight gain is missing from Figure 2 and Supplementary Figure 4. This data is critical to interpret the changes in liver pathology, especially since they have worse insulin resistance.

      We thank the reviewer for this valuable comment. We have now included the mouse body weight data in the revised manuscript (Figure 2A, B and Figures S8A, B). Compared with mice on a normal chow diet (NCD), all groups exhibited progressive weight gain during HFHC diet feeding. Notably, Clec4f<sup>∆Chil1</sup> mice gained significantly more body weight than Chil1<sup>fl/fl</sup> controls, whereas Lyz2<sup>∆Chil1</sup> mice showed a similar weight gain trajectory to Chil1<sup>fl/fl</sup> mice under the same conditions.

      (3) Figure 4 suggests that KC death is increased with KO of Chil1. However, this data cannot be concluded from the plots shown. In Supplementary Figure 6 the authors provide a more appropriate gating scheme to quantify resident KCs that includes TIM4. The TIM4 data needs to be shown and quantified in Figure 4. As shown in Supplementary Figure 6, the F4/80 hi population is predominantly KCs at baseline; however, this is not true with MASH diets. Most of the recruited MoMFs also reside in the F4/80 hi gate where they can be identified by their lower expression of TIM4. The MoMF gate shown in this figure is incorrect. The CD11b hi population is predominantly PMNs, monocytes, and cDC,2 not MoMFs (PMID:33997821). In addition, the authors should stain the tissue for TIM4, which would also be expected to reveal a decrease in the number of resident KCs.

      We thank the reviewer for raising this critical point regarding the gating strategy and interpretation of KC death. We have now refined our flow cytometry gating based on the reviewer’s suggestion. Specifically, we analyzed TIM4 expression and attempted to identify TIM4<sup>low</sup> MoMFs populations in our model. However, we did not detect a distinct TIM4<sup>low</sup> population, likely because our mice were fed the HFHC diet for only 16 weeks and had not yet developed liver fibrosis. We therefore reason that MoMFs have not fully acquired TIM4 expression at this stage.

      To improve our analysis, we referred to published strategies (PMID: 41131393; PMID: 32562600) and gated KCs as CD45<sup>+</sup>CD11b<sup>+</sup>F4/80<sup>hi</sup> TIM4<sup>hi</sup> and MoMFs as CD45<sup>+</sup>Ly6G<sup>-</sup>CD11b<sup>+</sup>F4/80<sup>low</sup> TIM4<sup>low/-</sup>. Using this approach, we observed a gradual reduction of KCs and a corresponding increase in MoMFs in WT mice, with a significantly faster loss of KCs in Chil1<sup>-/-</sup> mice (Revised Figure 4C, D; Figure S10A).

      Furthermore, immunofluorescence staining for TIM4 combined with TUNEL or cleaved caspase-3 confirmed an increased number of dying KCs in Chil1<sup>-/-</sup> mice compared to WT following HFHC diet feeding (Revised Figure 4E; Figure S10B).

      (4) While the Clec4F Cre is specific to KCs, there is also less data about the impact of the Cre system on KC biology. Therefore, when looking at cell death, the authors need to include some mice that express Clec4F cre without the floxed allele to rule out any effects of the Cre itself. In addition, if the cell death phenotype is real, it should also be present in LysM Cre system for the reasons described above. Therefore, the authors should quantify the KC number and dying KCs in this mouse line as well.

      We thank the reviewer for raising this important point. During our study, we indeed observed an increased number of KCs in Clec4f-Cre mice compared to WT controls, suggesting that the Clec4f-Cre system itself may modestly affect KC homeostasis. To address this, we compared KCs numbers between Clec4f<sup>∆Chil1</sup> and Clec4f-Cre mice and found that Clec4f<sup>∆Chil1</sup> mice displayed a significant reduction in KCs numbers following HFHC diet feeding. Moreover, co-staining for TIM4 and TUNEL revealed a marked increase in KCs death in Clec4f<sup>∆Chil1</sup> mice relative to Clec4f-Cre mice, indicating that the observed phenotype is attributable to Chil1 deletion rather than Cre expression alone. These data have been reported in our related manuscript (He et al., bioRxiv, 2025.09.26.678483; doi: 10.1101/2025.09.26.678483).

      In addition, we quantified KCs numbers and KCs death in the Lyz2-Cre line. TIM4/TUNEL co-staining showed comparable levels of KCs death between Chil1<sup>fl/fl</sup> and Lyz2<sup>∆Chil1</sup> mice (Revised Figure S11B). Consistently, flow cytometry analyses revealed no significant differences in KCs numbers between these two groups before (0 weeks) or after (20 weeks) HFHC diet feeding (Revised Figures S11C, D). As discussed in our response to Comment 1, this may be due to the incomplete deletion of Chi3l1 in KCs (<50%) in the Lyz2-Cre line, which likely attenuates the phenotype.

      (5) I am somewhat concerned about the conclusion that Chil1 is highly expressed in liver macrophages. Looking at our own data and those from the Liver Atlas it appears that this gene is primarily expressed in neutrophils. At a minimum, the authors should address the expression of Chil1 in macrophage populations from other publicly available datasets in mouse MASH to validate their findings (several options include - PMID: 33440159, 32888418, 32362324). If expression of Chil1 is not present in these other data sets, perhaps an environmental/microbiome difference may account for the distinct expression pattern observed. Either way, it is important to address this issue.

      We thank the reviewer for this insightful comment and agree that analysis of scRNA-seq data, including our own and those reported in the Liver Atlas as well as in the referenced studies (PMID: 33440159, 32888418, 32362324), indicates that Chil1 is predominantly expressed in neutrophils.

      However, our immunofluorescence staining under normal physiological conditions revealed that Chi3l1 protein is primarily localized in Kupffer cells (KCs), as demonstrated by strong co-staining with TIM4 (Revised Figure 1E). In MASLD mouse models induced by HFHC or MCD diets, we observed that both KCs and monocyte-derived macrophages (MoMFs) express Chi3l1, with particularly high levels in MoMFs.

      We speculate that the apparent discrepancy between scRNA-seq datasets and our in situ findings may reflect differences in cellular proportions and detection sensitivity. Since hepatic macrophages (particularly KCs and MoMFs) constitute a larger proportion of total liver immune cells compared with neutrophils, their contribution to total Chi3l1 protein levels in tissue staining may appear dominant, despite lower transcript abundance per cell in sequencing datasets. We have included a discussion of this point in the revised manuscript to clarify this distinction (Revised manuscript, page 8,line 341-350 ).

      Minor points:

      (1) Were there any changes in liver fibrosis or liver fibrosis markers present in these experiments?

      We assessed liver fibrosis using Sirius Red staining and α-SMA Western blot analysis.

      We found no induction of liver fibrosis in our HFHC-induced MASLD model (Revised Figure S1A, B), but a clear elevation of fibrosis markers in the MCD-induced MASH model (Revised Figure S6A, B).

      (2) In Supplementary Figure 3, the authors do a western blot for CHI3L1 in BMDMs. This should also be done for KCs isolated from these mice. Does this antibody work for immunofluorescence? Staining liver tissue would provide valuable information on the expression patterns.

      We have included qPCR and western blot for Chi3l1 in isolated primary KCs from Lyz2<sup>∆Chil1</sup> mice. The data show a slight, non-significant reduction in both mRNA and protein levels in KCs (Revised Figure S7B, C). The immunofluorescence staining on liver tissue showed that Chi3l1 is more likely expressed in the plasma membranes of TIM4<sup>+</sup> F4/80<sup>+</sup> KCs both under NCD and HFHC diet (Revised Figure 1E).

      (3) What is the impact of MASH diet feeding on Chil1 expression in KCs or in the liver in general?

      In both our MASLD and MASH models, diet feeding consistently upregulates Chi3l1 in KCs or in the liver in general (Revised Figure 1F, G, S6C,D).

      (4) In Figure S1 the authors show tSNE plots of various monocyte and macrophage genes in the liver. Are these plots both diets together? How do things look when comparing these markers between the STD and HFHC diet? The population of recruited LAMs seems very small for 16 weeks of diet. Moreover, Chil1 should also be shown on these tSNE plots as well.

      Yes, these plots are both diets together. When compared separately, the core marker expression is consistent between NCD and HFHC diets. However, the HFHC diet induces a relative increase in KC marker expression within the MoMF cluster, suggesting phenotypic adaptation (Author response image 1A, below). Moreover, Chil1 expression on the t-SNE plot was shown (Author response image 1B, below). However, compared to lineage-specific marker genes, Chi3l1 expression is rather low.

      Author response image 1.

      Gene expression levels of lineage-specific marker genes in monocytes/macrophages clusters between NCD and HFHC diets. (A) UMAP plots show the scaled expression changes of lineage-specific markers in KCs/monocyte/macrophage clusters from mice under NCD and HFHC diets. Color represents the level of gene expression. (B) UMAP plots show the scaled expression changes of Chil1 in KCs/monocyte/macrophage clusters from mice under NCD and HFHC diets. Color represents the level of gene expression.

      (5) In Figure 5, the authors demonstrate that CHI3L1 binds to glucose. However, given that all chitin molecules bind to carbohydrates, is this a new finding? The data showing that CHI3L is elevated in the serum after diet is interesting. What happens to serum levels of this molecule in KC KO or total macrophage KO mice? Do the authors think it primarily acts as a secreted molecule or in a cell-intrinsic manner?

      We thank the reviewer for these insightful comments, which helped us clarify the novelty of our findings.

      (1) Novelty of CHI3L1-Glucose Binding:

      While chitin-binding domains are known to interact with carbohydrate polymers, our key discovery is that CHI3L1 (YKL-40)—a mammalian chitinase-like protein lacking enzymatic activity—specifically binds to glucose, a simple monosaccharide. This differs fundamentally from canonical binding to insoluble polysaccharides such as chitin and reveals a potential role for CHI3L1 in monosaccharide recognition, linking it to glucose metabolism and energy sensing. We clarified this point in the revised manuscript (page 9, line374-379).

      (2) Serum CHI3L1 in Knockout Models:

      Consistent with the reviewer’s suggestion, serum Chi3l1 levels are altered in our knockout models:

      KC-specific KO (Clec4f<sup>ΔChil1</sup>): Under normal chow, serum CHI3L1 is markedly reduced compared to controls and remains lower following HFHC feeding (Author response image 2A, below), indicating that Kupffer cells are the main source of circulating CHI3L1 under basal and disease conditions.

      Macrophage KO (Lyz2<sup>ΔChil1</sup>): No significant changes were observed between Chil1<sup>fl/fl</sup> and Lyz2<sup>ΔChil1</sup> mice under either diet (Author response image 2B, below), likely due to minimal monocyte-derived macrophage recruitment in this HFHC model (see Revised Figure 4C,D).

      (3) Secreted vs. Cell-Intrinsic Role:

      CHI3L1 predominantly localizes to the KC plasma membrane, consistent with a secreted role, and its serum reduction in KC-specific knockouts supports the physiological relevance of its secreted role. While cell-intrinsic effects have been reported elsewhere, our current data do not address this in KCs and warrant future investigation.

      Author response image 2.

      Chi3l1 expression in serum before and after HFHC in CKO mice. (A) Western blot to detect Chi3l1 expression in serum of Chil1<sup>fl/fl</sup> and Clec4f<sup>ΔChil1</sup> mice before and after 16 weeks’ HFHC diet. n=3 mice/group. (B) Western blot to detect Chi3l1 expression in serum of Chil1<sup>fl/fl</sup> and Lyz2ΔChil1 before and after 16 weeks’ HFHC diet. n=3 mice/group.

      Reviewer #2 (Public review):

      The manuscript from Shan et al., sets out to investigate the role of Chi3l1 in different hepatic macrophage subsets (KCs and moMFs) in MASLD following their identification that KCs highly express this gene. To this end, they utilise Chi3l1KO, Clec4f-CrexChi3l1fl, and Lyz2-CrexChi3l1fl mice and WT controls fed a HFHC for different periods of time.

      Major:

      Firstly, the authors perform scRNA-seq, which led to the identification of Chi3l1 (encoded by Chil1) in macrophages. However, this is on a limited number of cells (especially in the HFHC context), and hence it would also be important to validate this finding in other publicly available MASLD/Fibrosis scRNA-seq datasets. Similarly, it would be important to examine if cells other than monocytes/macrophages also express this gene, given the use of the full KO in the manuscript. Along these lines, utilisation of publicly available human MASLD scRNA-seq datasets would also be important to understand where the increased expression observed in patients comes from and the overall relevance of macrophages in this finding.

      We thank the reviewer for this valuable suggestion and acknowledge the limited number of cells analyzed under the HFHC condition in our original dataset. To strengthen our findings, we have now examined four additional publicly available scRNA-seq datasets— two from mouse models and two from human MASLD patients (Revised Figure S3, manuscript page 4, line 164-172). Across these datasets, the specific cell type showing the highest Chil1 expression varied somewhat between studies, likely reflecting model differences and disease stages. Nevertheless, Chil1 expression was consistently enriched in hepatic macrophage populations, including both Kupffer cells and infiltrating macrophages, in mouse and human livers. Notably, Chil1 expression was higher in infiltrating macrophages compared to resident Kupffer cells, supporting its upregulation during MASLD progression. These additional analyses confirm the robustness and crossspecies relevance of our finding that macrophages are the primary Chil1-expressing cell type in the liver.

      Next, the authors use two different Cre lines (Clec4f-Cre and Lyz2-Cre) to target KCs and moMFs respectively. However, no evidence is provided to demonstrate that Chil1 is only deleted from the respective cells in the two CRE lines. Thus, KCs and moMFs should be sorted from both lines, and a qPCR performed to check the deletion of Chil1. This is especially important for the Lyz2-Cre, which has been routinely used in the literature to target KCs (as well as moMFs) and has (at least partial) penetrance in KCs (depending on the gene to be floxed). Also, while the Clec4f-Cre mice show an exacerbated MASLD phenotype, there is currently no baseline phenotype of these animals (or the Lyz2Cre) in steady state in relation to the same readouts provided in MASLD and the macrophage compartment. This is critical to understand if the phenotype is MASLD-specific or if loss of Chi3l1 already affects the macrophages under homeostatic conditions.

      We thank the reviewer for raising this important point.

      (1) Chil1 deletion efficiency in Clec4f-Cre and Lyz2-Cre lines:

      We have assessed the efficiency of Chil1 deletion in both Lyz2<sup>∆Chil1</sup> and Clec4f<sup>∆Chil1</sup> mice by evaluating mRNA and protein levels of Chi3l1. For the Lyz2<sup>∆Chil1</sup> mice, we measured Chi3l1 expression in bone marrow-derived macrophages (BMDMs) and primary Kupffer cells (KCs). Both qPCR (for mRNA) and Western blotting (for protein) reveal that Chi3l1 is almost undetectable in BMDMs from Lyz2<sup>∆Chil1</sup> mice when compared to Chil1<sup>fl/fl</sup> controls. In contrast, we observe no significant reduction in Chi3l1 expression in KCs from these animals (Revised Figure S7B, C), suggesting Chil1 is deleted in BMDMs but not in KCs in Lyz2-Cre line.

      For the Clec4f<sup>∆Chil1</sup> mice, both mRNA and protein levels of Chi3l1 are barely detectable in BMDMs and primary KCs when compared to Chil1<sup>fl/fl</sup> controls (Revised Figure S4B, C). However, we did observe a faint Chi3l1 band in KCs of Clec4f<sup>∆Chil1</sup> mice, which we suspect is due to contamination from LSECs during the KC isolation process, given that the TIM4 staining for KCs was approximately 90%. Overall, Chil1 is deleted in both KCs and BMDMs in Clec4f-Cre line.

      Notably, since we observed a pronounced MASLD phenotype in Clec4f-Cre mice but not in Lyz2-Cre mice, these findings further underscore the critical role of Kupffer cells in the progression of MASLD.

      (2) Whether the phenotype is MASLD-specific or whether loss of Chi3l1 already affects the macrophages under homeostatic conditions: We now included phenotypic data of Clec4f<sup>ΔChil1</sup> mice (KC-specific KO) and Lyz2<sup>∆Chil1</sup> mice (MoMFs-specific KO) fed with NCD 16w (Revised Figure 2A-F, S8A-F). Shortly speaking, there is no baseline difference between Chil1<sup>fl/fl</sup> and Clec4f<sup>ΔChil1</sup> or Lyz2<sup>∆Chil1</sup> mice in steady state in relation to the same readouts provided in MASLD.

      Next, the authors suggest that loss of Chi3l1 promotes KC death. However, to examine this, they use Chi3l1 full KO mice instead of the Clec4f-Cre line. The reason for this is not clear, because in this regard, it is now not clear whether the effects are regulated by loss of Chi3l1 from KCs or from other hepatic cells (see point above). The authors mention that Chi3l1 is a secreted protein, so does this mean other cells are also secreting it, and are these needed for KC death? In that case, this would not explain the phenotype in the CLEC4F-Cre mice. Here, the authors do perform a basic immunophenotyping of the macrophage populations; however, the markers used are outdated, making it difficult to interpret the findings. Instead of F4/80 and CD11b, which do not allow a perfect discrimination of KCs and moMFs, especially in HFHC diet-fed mice, more robust and specific markers of KCs should be used, including CLEC4F, VSIG4, and TIM4.

      We thank the reviewer for raising this important point. We performed experiments in Clec4f<sup>∆Chil1</sup> (KC-specific KO) model. The phenotype in these mice closely mirrors that of the full KO: we observed a significant reduction in KC numbers and a concurrent increase in KC cell death following an HFHC diet in Clec4f<sup>∆Chil1</sup> mice post HFHC diet compared to Clec4f-cre mice. We have reported these data in the following related manuscript (Figure 6 D-G). This confirms that the loss of CHI3L1 specifically from KCs is sufficient to drive this effect.

      Hyperactivated Glycolysis Drives Spatially-Patterned Kupffer Cell Depletion in MASLD Jia He, Ran Li, Cheng Xie, Xiane Zhu, Keqin Wang, Zhao Shan bioRxiv 2025.09.26.678483; doi: https://doi.org/10.1101/2025.09.26.678483

      While other hepatic cells (e.g., neutrophils and liver sinusoidal endothelial cells) also express Chi3l1, our data indicate that KC-secreted Chi3l1 plays a dominant and cellautonomous role in maintaining KCs viability. The potential contribution of other cellular sources to this phenotype remains an interesting direction for future study.

      We apologize for the lack of clarity in our initial immunophenotyping. We have revised the flow cytometry data to clearly show that KCs are rigorously defined as TIM4+ cells (Revised Figure 4C, D).

      Additionally, while the authors report a reduction of KCs in terms of absolute numbers, there are no differences in proportions. Thus, coupled with a decrease also in moMF numbers at 16 weeks (when one would expect an increase if KCs are decreased, based on previous literature) suggests that the differences in KC numbers may be due to differences in total cell counts obtained from the obese livers compared with controls. To rule this out, total cell counts and total live CD45+ cell counts should be provided. Here, the authors also provide tunnel staining in situ to demonstrate increased KC death, but as it is typically notoriously difficult to visualise dying KCs in MASLD models, here it would be important to provide more images. Similarly, there appear to be many more Tunel+ cells in the KO that are not KCs; thus, it would be important to examine this in the CLEC4F-Cre line to ascertain direct versus indirect effects on cell survival.

      We thank the reviewer for raising this important point. We have now included the total cell counts and total live CD45<sup>+</sup> cell counts, which showed similar numbers between WT and Chil1<sup>-/-</sup> mice post HFHC diet (Figure 3A, below).

      Moreover, we included cleavaged caspase 3 and TIM4 co-staining in WT and Chil1<sup>-/-</sup> mice before and after HFHC diets, which confirmed increased KCs death in Chil1<sup>-/-</sup> mice (Revised Figure S10B). We have compared KCs number and KCs death between Clec4fcre and Clec4f<sup>∆Chil1</sup> mice under NCD and HFHC diet in the following manuscript (Figure 6 D-G). The data showed similar KCs number under NCD and reduced KCs number in Clec4f<sup>∆Chil1</sup> mice compared to Clec4f-cre mice, which confirms direct effects of Chi3l1 on cell survival but not because of cre insertion.

      Hyperactivated Glycolysis Drives Spatially-Patterned Kupffer Cell Depletion in MASLD Jia He, Ran Li, Cheng Xie, Xiane Zhu, Keqin Wang, Zhao Shan bioRxiv 2025.09.26.678483; doi: https://doi.org/10.1101/2025.09.26.678483

      Author response image 3.

      Number of total cells and total live CD45+ cells in liver of WT and Chil1<sup>-/-</sup> mice. (A) Number of total cells and total live CD45+ cells/liver were statistically analyzed. n= 3-4 mice per group.

      Finally, the authors suggest that Chi3l1 exerts its effects through binding glucose and preventing its uptake. They use ex vivo/in vitro models to assess this with rChi3l1; however, here I miss the key in vivo experiment using the CLEC4F-Cre mice to prove that this in KCs is sufficient for the phenotype. This is critical to confirm the take-home message of the manuscript.

      We agree that it is essential to confirm the in vivo relevance of Chi3l1-mediated glucose regulation in Kupffer cells (KCs). Our data suggest that KCs undergo cell death not because they express Chi3l1 per se, but because they exhibit a glucose-hungry metabolic phenotype that makes them uniquely dependent on Chi3l1-mediated regulation of glucose uptake. To directly assess this mechanism in vivo, we injected 2-NBDG, a fluorescent glucose analog, into overnight-fasted and refed mice and quantified its uptake in hepatic KCs. Notably, Chi3l1-deficient KCs exhibited significantly increased 2-NBDG uptake compared with controls, and this effect was markedly suppressed by co-treatment with recombinant Chi3l1 (rChi3l1) (Revised Figure 6G, H). These findings demonstrate that Chi3l1 regulates glucose uptake by KCs in vivo, supporting our proposed mechanism that Chi3l1 controls KC metabolic homeostasis through modulation of glucose availability.

      Minor points:

      (1) Some key references of macrophage heterogeneity in MASLD are not cited: PMID: 32362324 and PMID: 32888418.

      We thank the reviewer for highlighting these critical references and have included them in the introduction (Revised manuscript, page 2, line 64-73).

      (2) In the discussion, Figure 3H is referenced (Serum data), but there is no Figure 3H. If the authors have this data (increased Chi3l1 in serum of mice fed HFHC diet), what happens in CLEC4F-Cre mice fed the diet? Is this lost completely? This comes back to the point regarding the specificity of expression.

      We apologize for the mistake. It should be Figure 5F now in the revised version, in which serum Chi3l1 was significantly upregulated after HFHC diet. Moreover, under a normal chow diet (NCD), serum CHI3L1 is significantly lower in Clec4f<sup>ΔChil1</sup> mice compared to controls (Chil1<sup>fl/fl</sup>). Following an HFHC diet, levels increase in both genotypes but remain relatively lower in the KC-KO mice (please see Figure 2A above). This data strongly suggests that Kupffer Cells (KCs) are the primary source of serum CHI3L1 under basal conditions and a major contributor during MASLD progression.

      Reviewer #3 (Public review):

      This paper investigates the role of Chi3l1 in regulating the fate of liver macrophages in the context of metabolic dysfunction leading to the development of MASLD. I do see value in this work, but some issues exist that should be addressed as well as possible.

      (1) Chi3l1 has been linked to macrophage functions in MASLD/MASH, acute liver injury, and fibrosis models before (e.g., PMID: 37166517), which limits the novelty of the current work. It has even been linked to macrophage cell death/survival (PMID: 31250532) in the context of fibrosis, which is a main observation from the current study.

      We thank the reviewer for this insightful comment regarding the novelty of our findings. We agree that Chi3l1 has previously been linked to macrophage survival and function in models of liver injury and fibrosis (e.g., PMID: 37166517, 31250532). However, our study focuses specifically on the early stage of MASLD, prior to the onset of fibrosis, revealing a distinct mechanistic role for CHI3L1 in this context.

      We demonstrate that CHI3L1 directly interacts with extracellular glucose to regulate its cellular uptake—a previously unrecognized biochemical function. Furthermore, we show that CHI3L1’s protective role is metabolically dependent, safeguarding glucose-dependent Kupffer cells (KCs) but not monocyte-derived macrophages (MoMFs). This metabolic dichotomy and the direct link between CHI3L1 and glucose sensing represent conceptual advances beyond previous studies of CHI3L1 in fibrotic or injury models.

      (2) The LysCre-experiments differ from experiments conducted by Ariel Feldstein's team (PMID: 37166517). What is the explanation for this difference? - The LysCre system is neither specific to macrophages (it also depletes in neutrophils, etc), nor is this system necessarily efficient in all myeloid cells (e.g., Kupffer cells vs other macrophages). The authors need to show the efficacy and specificity of the conditional KO regarding Chi3l1 in the different myeloid populations in the liver and the circulation.

      We thank the reviewer for this important comment and the opportunity to clarify both the efficiency and specificity of our conditional knockouts, as well as the differences from the study by Feldstein’s group (PMID: 37166517).

      (1) Chil1 deletion efficiency in Clec4f-Cre and Lyz2-Cre lines:

      We have assessed the efficiency of Chil1 deletion in both Lyz2<sup>∆Chil1</sup> and Clec4f<sup>∆Chil1</sup> mice by evaluating mRNA and protein levels of Chi3l1. For the Lyz2<sup>∆Chil1</sup> mice, we measured Chi3l1 expression in bone marrow-derived macrophages (BMDMs) and primary Kupffer cells (KCs). Both qPCR (for mRNA) and Western blotting (for protein) reveal that Chi3l1 is almost undetectable in BMDMs from Lyz2<sup>∆Chil1</sup> mice when compared to Chil1<sup>fl/fl</sup> controls. In contrast, we observe no significant reduction in Chi3l1 expression in KCs from these animals (Revised Figure S7B, C), suggesting that Chil1 is deleted in BMDMs but not in KCs in Lyz2-Cre line.

      For the Clec4f<sup>∆Chil1</sup> mice, both mRNA and protein levels of Chi3l1 are barely detectable in BMDMs and primary KCs when compared to Chil1<sup>fl/fl</sup> controls (Revised Figure S4B, C). However, we did observe a faint Chi3l1 band in KCs of Clec4f<sup>∆Chil1</sup> mice, which we suspect is due to contamination from LSECs during the KC isolation process, given that the TIM4 staining for KCs was approximately 90%. Overall, Chil1 is deleted in both KCs and BMDMs in Clec4f-Cre line.

      Notably, since we observed a pronounced MASLD phenotype in Clec4f-Cre mice but not in Lyz2-Cre mice, these findings further underscore the critical role of Kupffer cells in the progression of MASLD.

      (2) Explanation for Differences from Feldstein et al. (PMID: 37166517):

      Our findings differ from those reported by Feldstein’s group primarily due to differences in disease stage and model. We used a high-fat, high-cholesterol (HFHC) diet to model earlystage MASLD characterized by steatosis and inflammation without fibrosis (Revised Figure S1A,B). In this context, we observed KC death but minimal MoMF infiltration (Revised Figure 4D). Accordingly, deletion of Chi3l1 in MoMFs (Lyz2<sup>∆Chil1</sup>) had no measurable effect on insulin resistance or steatosis, consistent with limited MoMF involvement at this stage. In contrast, the Feldstein study employed a CDAA-HFAT diet that models later-stage MASH with fibrosis. In that setting, Lyz2<sup>∆Chil1</sup> mice showed reduced recruitment of neutrophils and MoMFs, which likely underlies the attenuation of fibrosis and disease severity reported. Together, these data support a model in which KCs and MoMFs play temporally distinct roles during MASLD progression: KCs primarily drive early lipid accumulation and metabolic dysfunction, whereas MoMFs contribute more substantially to inflammation and fibrosis at later stages.

      (3) The conclusions are exclusively based on one MASLD model. I recommend confirming the key findings in a second, ideally a more fibrotic, MASH model.

      We thank the reviewer for this valuable suggestion to validate our findings in an additional MASH model. We have now included data from a methionine- and choline-deficient (MCD) diet–induced MASH model, which exhibits pronounced hepatic lipid accumulation and fibrosis (Revised Figure S6A,B). Consistent with our HFHC results, Clec4f<sup>∆Chil1</sup> mice displayed exacerbated MASH progression in this model, including increased lipid deposition, inflammation, and fibrosis (Revised Figure S6E-G).These findings confirm that CHI3L1 deficiency in Kupffer cells promotes hepatic lipid accumulation and disease progression across distinct MASLD/MASH models.

      (4) Very few human data are being provided (e.g., no work with own human liver samples, work with primary human cells). Thus, the translational relevance of the observations remains unclear.

      We thank the reviewer for this important comment regarding translational relevance. We fully agree that validation in human liver samples would further strengthen our study. However, obtaining tissue from early-stage steatotic livers is challenging due to the asymptomatic nature of this disease stage. Nonetheless, multiple studies have consistently reported Chi3l1 upregulation in human fibrotic and steatotic liver disease (PMID: 31250532, 40352927, 35360517), supporting the clinical significance of our mechanistic findings. We have now expanded the Discussion to highlight these human data and better contextualize our results within the spectrum of human MASLD/MASH progression (Revised manuscript, page 9, line390-394).

      Minor points:

      The authors need to follow the new nomenclature (e.g., MASLD instead of MAFLD, e.g., in Figure 1).

      "MASLD" used throughout.

      We thank the reviewers for their rigorous critique again. We thank eLife for fostering an environment of fairness and transparency that enables authors to communicate openly and present their data honestly.

      Reference

      (1) Tran, S. Baba I, Poupel L, et al(2020) Impaired Kupffer Cell Self-Renewal Alters the Liver Response to Lipid Overload during Non-alcoholic Steatohepatitis. Immunity 53, 627-640.

    1. eLife Assessment

      This study describes a genetic screen to identify deubiquitinases (DUBs) that counteract the activity of small molecule degraders (PROTACs). The presented data is valuable, identifying OTUD6A and UCHL5 as DUBs that impact the efficacy and potency of PROTAC-mediated degradation in distinct subcellular compartments. While the conclusions are broadly supported and the methods employed are solid, the validation of OTUD6A and UCHL5 mechanisms requires additional study. Overall, these findings merit further evaluation by the targeted protein degradation community when developing and optimizing PROTACs and efforts to achieve compartment-specific degradation.

    2. Reviewer #1 (Public review):

      Summary:

      In this study, the authors investigate the role of deubiquitinases (DUBs) in modulating the efficacy of PROTAC-mediated degradation of the cell-cycle kinase AURKA. Using a focused siRNA screen of 97 human DUBs, they identify UCHL5 and OTUD6A as negative regulators of AURKA degradation by PROTACs. They further offer a mechanistic explanation of enhanced AURKA degradation in the nucleus via OTUD6A expression being restricted to the cytosol, thereby protecting the cytoplasmic pool of AURKA. These findings provide important insight into how subcellular localization and DUB activity influence the efficiency of targeted protein degradation strategies, which could have implications for therapy.

      Strengths:

      The manuscript is well-structured, with clearly defined objectives and well-supported conclusions.

      The study employs a broad range of well-validated techniques-including live-cell imaging, proximity ligation assays, HiBiT reporter systems, and ubiquitin pulldowns - to dissect the regulation of PROTAC activity.

      The authors use informative experimental controls, including assessment of cell-cycle progression effects, rescue experiments with siRNA-resistant constructs to confirm specificity, and the application of both AURKA-targeting PROTACs with different warheads and orthogonal degrader systems (e.g., dTAG-13 and dTAGv-1) to differentiate between target- and ligase-specific effects.

      The identification of OTUD6A as a cytosol-restricted DUB that protects cytoplasmic but not nuclear AURKA is novel and may have therapeutic relevance for selectively targeting oncogenic nuclear AURKA pools.

      Weaknesses:

      Although UCHL5 and OTUD6A are shown to limit AURKA degradation, direct physical interaction was not assessed.

      While the authors suggest that combining PROTACs with DUB inhibition could enhance degradation, this was not experimentally tested.

      The authors acknowledge the apparent discrepancy between the enhanced degradation observed with CRBN-recruiting PROTACs and the lack of change in ubiquitination following UCHL5 knockdown, yet they do not propose any mechanistic explanation.

    3. Reviewer #2 (Public review):

      Summary:

      In this study, the authors present a screening approach to identify deubiquitylases that may impact PROTAC efficacy/potency, specifically in this case using a previously reported AURKA PROTAC as an initial model. The authors claim that UCHL5 is able to control the level of degradation of both AURKA and dTAG when using CRBN mediated PROTACs, however that VHL is not impacted by UCHL5 activity. They additionally claim that OTUD6A is able to control extent of AURKA degradation in a target protein-specific manner and that this effect is specific to cytoplasm located AURKA.

      Overall, the endeavour is of interest and important. Some of the claims made were overly generalised, and in the main effects observed when knocking down the respective DUBs were small. In addition, the systems used are highly artificial, and the data is not presented in a way that makes understanding absolute (rather than relative) changes easy to understand.

      Strengths:

      The topic is of high interest and relevance and explores an underappreciated and understudied area of the PROTAC mechanism of action. If further supported and understood, they would certainly bring value to the field.

      Weaknesses:

      The overall effects observed are sometimes limited in real terms. The data provided often omits the absolute changes in protein abundance observed. Data on endogenous/less engineered systems and/or with higher resolution read-outs would<br /> greatly strengthen some conclusions.

    4. Author response:

      The following is the authors’ response to the original reviews.

      We are grateful for the insightful and constructive feedback received from reviewers. As outlined in our previous response to the public reviews of the manuscript, we have made only minor changes to the manuscript to clarify some points noted by Reviewers 1 and 3. Firstly, we identify the DUB shown in the correlation plot (Fig 3B) - whose knockdown enhances PROTAC sensitivity without significantly altering cell cycle progression - as BAP1. Secondly, we explain in more detail how we selected DUB hits for further study, and thirdly, we acknowledge that the result in Figure 5G is unexpected given prevailing knowledge in the field.

      Please see below the detailed list of changes we have made to the manuscript.

      In response to Reviewer 1 (Point 2 of public review and Point 2 in recommendations to author)

      We have labelled one of the hits (as BAP1) in Figure 3B

      In response to Reviewer 1 (Point 2 of public review and Point 2 in recommendations to author) and Reviewer 3 (Point 6 in recommendations to authors)

      We have rewritten our description of Figure 3 in order to make clarifications about how we selected which hits to take forwards in our study

      In response to Reviewer 3 (Point 1 in the recommendation to authors)

      We corrected a typo in the first subtitle of the results section

      In response to Reviewer 3 (Point 2 in the recommendation to authors)

      We added information requested about how we selected our top hits

      In response to Reviewer 1 (Point 4 in public review and Point 4 in recommendation to authors)

      We pointed out the seemingly contradictory nature of the UCHL5 result in Figure 5G for the reader

      All of the changes have been aimed at clarifying our narrative, without any change to data content, analysis or interpretation, and we hope these improvements can be agreed by editorial review.

    1. eLife Assessment

      This important study contributes to our understanding of how epithelial cells establish polarity by identifying a hierarchy in which Par3 acts upstream of centrosome positioning and apical membrane initiation. The evidence supporting the main conclusions is convincing, although several aspects of the model remain only partially supported due to unresolved questions about microtubule organization and the need for clearer integration of quantitative and conceptual points raised in review. The work will be of interest to cell and developmental biologists, but the conclusions would be strengthened by greater precision in methodology, terminology, and interpretation.

    2. Reviewer #1 (Public review):

      Summary:

      Wang, Po-Kai et al., utilized the de novo polarization of MDCK cells cultured in Matrigel to assess the interdependence between polarity protein localization, centrosome positioning and apical membrane formation. They show that the inhibition of Plk4 with Centrinone does not prevent apical membrane formation, but does result in its delay, a phenotype the authors attribute to the loss of centrosomes due to the inhibition of centriole duplication. However, the targeted mutagenesis of specific centrosome proteins implicated in the positioning of centrosomes in other cell types (CEP164, ODF2, PCNT and CEP120), as well as the use of dominant negative constructs to inhibit centrosomal microtubule nucleation did not affect centrosome positioning in 3D cultured MDCK cells. A screen of proteins previously implicated in MDCK polarization revealed that the polarity protein Par-3 was upstream of centrosome positioning, similar to other cell types.

      Strengths:

      The investigation into the temporal requirement and interdependence of previously proposed regulators of cell polarization and lumen formation is valuable. The authors have provided a detailed analysis of many of these components at defined stages of polarity establishment, and well demonstrate that centrosomes are not necessary for apical polarity formation, but are involved in the efficient establishment of the apical membrane.

      Weaknesses:

      Key questions remain regarding the structure of the intracellular cytoskeleton following depletion of centrosomes, centrosome proteins,or abrogation of centrosome microtubule nucleation. The authors strengthen their model that centrosomes are positioned independently of microtubule nucleation using dominant negative Cdk5RAP2 and NEDD-1 constructs, however, the structure of the intracellular microtubule network remains unresolved and will be an important avenue for future investigation.

    3. Reviewer #3 (Public review):

      Here the Wang et al resubmit their manuscript describing the events in the establishment of polarity in MDCK cells cultured in vitro. As with the original version, the description is throughout and is important to the field to report as it establishes a hierarchy of events in polarization, placing Par3 upstream of centrosome positioning and apical membrane component trafficking. Unfortunately, in the revised version, the authors addressed almost none of my points. They did a cursory job of responding in the rebuttal letter but made little attempt to actually address what was being asked or to incorporate any of my suggestions into the manuscript. The particularly egregious examples are cited below:

      Comments on revisions:

      (1) My original main experimental concern was not addressed: I had originally asked what role microtubules play in the process of polarization (either centrosomal or non-centrosomal). An obvious model is that Gp135, Rab11, etc. are delivered to the AMIS on centrosomal microtubules. Centrosomes might be also be pulled to the AMIS via cortically derived microtubules as is the case in the C. elegans intestine where the centrosome moves apically on apical microtubules via dynein directed transport to the cortically anchored minus ends. The authors do not explore the role of microtubules in the revision, citing that it was not possible to observe the microtubules directly or to perform nocodazole experiments during polarization. Instead, the authors use a relatively new genetic tool to disrupt centrosomal microtubules. They appear to succeed in displacing centrosomal g-tubulin using this tool, but without being able to observe microtubules, a remaining caveat of this experiment is that it is still unclear whether the authors have removed centrosomal microtubules. Compounding this issue is that this tool has never been used in MDCK cells. The authors conclude "we found that cells lacking centrosomal microtubules were still able to polarize and position the centrioles apically.", but they have not shown this, instead the data suggest this conclusion and the authors should acknowledge the caveat that they have no idea whether centrosomal microtubules are abolished. Similarly, the authors also state: "Additionally, although PCNT knockout cells show reduced microtubule nucleation ability, they still recruit a small amount of γ-tubulin". Where are the data that show that microtubule nucleation is reduced in these PCNT knock out cells?

      (2) Many of my comments were addressed in the rebuttal, but not in the text.<br /> The non-centrosomal GP135 in Figure 2 is not acknowledged or explained.

      That the polarity index does not actually measure polarity, but nuclear-centrosome distance is not acknowledged or explained in the paper.

      I still don't believe that the quantification in Figure 3D matches the images I am being shown in Figure 3A. In the centrinone treatment condition, there is certainly an enrichment of GP135 at the AMIS that is not detected in the quantification. The method described in the rebuttal might miss this enrichment if it is offset from line drawn between the centroid of the two nuclei.

      Cell height changes in the centrosome depleted cysts are still referenced in the text ("the cell heights of the centrosome-depleted cysts are less uniform"), but no specific data or image is called out. Currently, Figure 3G is referenced, but that is a graph of GP135 intensity

      In my original review, I called on the authors to comment on the striking similarity of the mechanisms they documented in MDCK cells to what has been shown in in vivo systems. The authors did not do this, instead restating in the rebuttal some features of what they found. But, the mechanisms shown here are remarkably similar to the polarization of primordia that generate tubular organs in vivo. Perhaps most striking is the similarity to the C> elegans intestine where Par3 localizes to the cortex at the site of an apical MTOC that pulls the centrosome to the apical surface via dynein (Feldman and Priess, 2012). Instead of discussing this similarity, the authors state: "Par3 is likely to regulate centrosome positioning through some intermediate molecules or mechanisms, but its specific mechanism is still unclear and requires further investigation." Given the acetylated tubulin signal emanating from the Par3 positive patch in Figure 5E and F, I suspect similar mechanisms to the C. elegans intestine are at play here. Such a parallel should be noted in the Discussion.

      I had originally commented that "I find the results in Figure 6G puzzling. Why is ECM signaling required for Gp135 recruitment to the centrosome. Could the authors discuss what this means?" The authors responded that "The data in Figure 6G do not indicate that ECM signaling is required for the recruitment of Gp135 to the centrosome". In Figure 6G, the localization of GP135 to the centrosome appears significantly delayed compared to its localization to the centrosome in images where cells were cultured in Matrigel. Indeed, the authors argue that the centrosomal localization precedes and contributes to its localization to the AMIS. In the absence of ECM, GP135 localizes to the membrane before it localizes to the centrosome and its localization to the centrosome appears significantly reduced. Thus, my original and current interpretation is that ECM signaling is somehow required for the centrosomal targeting of GP135. One could make a competition argument, i.e. that the cortex in the absence of ECM is somehow a more desirable place to localize than the centrosome, but this experiment also argues that the centrosome does not need to be a source of this material in order for it to end up on the cortex.

      (3) There needs to be precision in the language used in many places:

      I don't understand this line in the abstract: "When cultured in Matrigel, de novo polarization of a single epithelial cell is often coupled with mitosis." If a cell has divided, it is no longer a single cell.

      The authors state in the Introduction "Because of its strong ability to nucleate microtubules, the centrosome functions as the primary microtubule organizing center", but then state ""In polarized epithelial cells, the centrosome is localized at the apical region during interphase, which contributes to the construction of an asymmetric microtubule network conducive to polarized vesicle trafficking". In the latter statement, I assume the authors are describing the well-characterized apical microtubule network in epithelial cells that is non-centrosomal. Thus, the latter sentence is at odds with the former.

      The authors continually refer to Par3 as a tight junction protein. "Par3, which controls tight junction assembly to partition the apical surface from the basolateral surface". To my knowledge, PARD3 is an apical protein with similar localization to C. elegans PAR-3 and Drosophila Bazooka. PARD3B is a junctional protein. I assume that the antibody that the authors are using is to PARD3 and not PARD3B? Can the authors please clarify this in the text.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Wang, Po-Kai, et al., utilized the de novo polarization of MDCK cells cultured in Matrigel to assess the interdependence between polarity protein localization, centrosome positioning, and apical membrane formation. They show that the inhibition of Plk4 with Centrinone does not prevent apical membrane formation, but does result in its delay, a phenotype the authors attribute to the loss of centrosomes due to the inhibition of centriole duplication. However, the targeted mutagenesis of specific centrosome proteins implicated in the positioning of centrosomes in other cell types (CEP164, ODF2, PCNT, and CEP120) did not affect centrosome positioning in 3D cultured MDCK cells. A screen of proteins previously implicated in MDCK polarization revealed that the polarity protein Par-3 was upstream of centrosome positioning, similar to other cell types.

      Strengths:

      The investigation into the temporal requirement and interdependence of previously proposed regulators of cell polarization and lumen formation is valuable to the community. Wang et al., have provided a detailed analysis of many of these components at defined stages of polarity establishment. Furthermore, the generation of PCNT, p53, ODF2, Cep120, and Cep164 knockout MDCK cell lines is likely valuable to the community.

      Weaknesses:

      Additional quantifications would highly improve this manuscript, for example it is unclear whether the centrosome perturbation affects gamma tubulin levels and therefore microtubule nucleation, it is also not clear how they affect the localization of the trafficking machinery/polarity proteins. For example, in Figure 4, the authors measure the intensity of Gp134 at the apical membrane initiation site following cytokinesis, but there is no measure of Gp134 at the centrosome prior to this.

      We thank the reviewer for this important suggestion. Previous studies have shown that genes encoding appendage proteins and CEP120 do not regulate γ-tubulin recruitment to centrosomes (Betleja, Nanjundappa, Cheng, & Mahjoub, 2018; Vasquez-Limeta & Loncarek, 2021). Although the loss of PCNT reduces γ-tubulin levels, this reduction is partially compensated by AKAP450. Even in the case of PCNT/AKAP450 double knockouts, low levels of γ-tubulin remain at the centrosome (Gavilan et al., 2018), suggesting that it is difficult to completely eliminate γ-tubulin by perturbing centrosomal genes alone.

      To directly address this question, in the revised manuscript (Page 8, Paragraph 4; Figure 4—figure supplement 3), we employed a recently reported method to block γ-tubulin recruitment by co-expressing two constructs: the centrosome-targeting carboxy-terminal domain (C-CTD) of CDK5RAP2 and the γ-tubulin-binding domain of NEDD1 (N-gTBD). This approach effectively depleted γ-tubulin and abolished microtubule nucleation at the centrosome (Vinopal et al., 2023). Interestingly, despite the reduced efficiency of apical vesicle trafficking, these cells were still able to establish polarity, with centrioles positioned apically. These results suggest that microtubule nucleation at the centrosomes (centrosomal microtubules) facilitates—but is not essential for—polarity establishment.

      Regarding Figure 4, we assume the reviewer was referring to Gp135 rather than Gp134. In the revised manuscript (Page 8, Paragraph 2; Figure 4I), we observed a slight decrease in Gp135 intensity near PCNT-KO centrosomes at the pre-Abs stage. However, its localization at the AMIS following cytokinesis remained unaffected. These results suggest that the loss of PCNT has a limited impact on Gp135 localization. 

      Reviewer #2 (Public review):

      Summary:

      The authors decoupled several players that are thought to contribute to the establishment of epithelial polarity and determined their causal relationship. This provides a new picture of the respective roles of junctional proteins (Par3), the centrosome, and endomembrane compartments (Cdc42, Rab11, Gp135) from upstream to downstream.

      Their conclusions are based on live imaging of all players during the early steps of polarity establishment and on the knock-down of their expression in the simplest ever model of epithelial polarity: a cell doublet surrounded by ECM.

      The position of the centrosome is often taken as a readout for the orientation of the cell polarity axis. There is a long-standing debate about the actual role of the centrosome in the establishment of this polarity axis. Here, using a minimal model of epithelial polarization, a doublet of daugthers MDCK cultured in Matrigel, the authors made several key observations that bring new light to our understanding of a mechanism that has been studied for many years without being fully explained:

      (1) They showed that centriole can reach their polarized position without most of their microtubule-anchoring structures. These observations challenge the standard model according to which centrosomes are moved by the production and transmission of forces along microtubules.

      (2) However) they showed that epithelial polarity can be established in the absence of a centriole.

      (3) (Somehow more expectedly) they also showed that epithelial polarity can't be established in the absence of Par3.

      (4) They found that most other polarity players that are transported through the cytoplasm in lipid vesicles, and finally fused to the basal or apical pole of epithelial cells, are moved along an axis which is defined by the position of centrosome and orientation of microtubules.

      (5) Surprisingly, two non-daughter cells that were brought in contact (for 6h) could partially polarize by recruiting a few Par3 molecules but not the other polarity markers.

      (6) Even more surprisingly, in the absence of ECM, Par 3 and centrosomes could move to their proper position close to the intercellular junction after cytokinesis but other polarity markers (at least GP135) localized to the opposite, non-adhesive, side. So the polarity of the centrosome-microtubule network could be dissociated from the localisation of GP135 (which was believed to be transported along this network).

      Strengths:

      (1) The simplicity and reproducibility of the system allow a very quantitative description of cell polarity and protein localisation.

      (2) The experiments are quite straightforward, well-executed, and properly analyzed.

      (3) The writing is clear and conclusions are convincing.

      Weaknesses:

      (1) The simplicity of the system may not capture some of the mechanisms involved in the establishment of cell polarity in more physiological conditions (fluid flow, electrical potential, ion gradients,...).

      We agree that certain mechanisms may not be captured by this simplified system. However, the model enables us to observe intrinsic cellular responses, minimize external environmental variables, and gain new insights into how epithelial cells position their centrosomes and establish polarity. 

      (2) The absence of centriole in centrinone-treated cells might not prevent the coalescence of centrosomal protein in a kind of MTOC which might still orient microtubules and intracellular traffic. How are microtubules organized in the absence of centriole? If they still form a radial array, the absence of a centriole at the center of it somehow does not conflict with classical views in the field.

      Previous studies have shown that in the absence of centrioles, centrosomal proteins can relocate to alternative microtubule-organizing centers (MTOCs), such as the Golgi apparatus (Gavilan et al., 2018). Furthermore, centriole loss leads to increased nucleation of non-centrosomal microtubules (Martin, Veloso, Wu, Katrukha, & Akhmanova, 2018). However, these microtubules typically do not form the classical radial array or a distinct star-like organization. 

      While this non-centrosomal microtubule network can still support polarity establishment, it does so less efficiently—similar to what is observed in p53-deficient cells undergoing centriole-independent mitosis (Meitinger et al., 2016). Thus, although the absence of centrioles does not completely prevent microtubule-based organization or polarity establishment, it impairs their spatial coordination and reduces overall efficiency compared to a centriole-centered microtubule-organizing center (MTOC). 

      (3) The mechanism is still far from clear and this study shines some light on our lack of understanding. Basic and key questions remain:

      (a) How is the centrosome moved toward the Par3-rich pole? This is particularly difficult to answer if the mechanism does not imply the anchoring of MTs to the centriole or PCM.

      Previous studies have shown that Par3 interacts with dynein, potentially anchoring it at the cell cortex (Schmoranzer et al., 2009). This interaction enables dynein, a minus-enddirected motor, to exert pulling forces on microtubules, thereby promoting centrosome movement toward the Par3-enriched pole.

      In our experiments (Figure 4), we attempted to disrupt centrosomal microtubule nucleation by knocking out multiple genes involved in centrosome structure and function, including ODF2 and PCNT. Under these perturbations, γ-tubulin still remained detectable at the centrosome, and we were unable to completely eliminate centrosomal microtubules. 

      To address this question more directly, we employed a strategy to deplete γ-tubulin from centrosomes by co-expressing the centrosome-targeting C-terminal domain (C-CTD) of CDK5RAP2 and the γ-tubulin-binding domain of NEDD1 (N-gTBD). As shown in the new data of the revised manuscript (Page 8, Paragraph 4; Figure 4—figure supplement 3), this approach effectively depleted γ-tubulin from centrosomes, thereby abolishing microtubule nucleation at the centrosome. 

      Surprisingly, even under these conditions, centrioles remained apically positioned (Page 8, Paragraph 4; Figure 4—figure supplement 3), indicating that centrosomal microtubules are not essential for centrosome movement during polarization.

      Given these findings, we agree that the precise mechanism by which the Par3-enriched cortex attracts or guides centrosome movement remains unclear. Although dynein–Par3 interactions may contribute, further studies are needed to elucidate how centrosome repositioning occurs in the absence of microtubule-based pulling forces from the centrosome itself.

      (b) What happens during cytokinesis that organises Par3 and intercellular junction in a way that can't be achieved by simply bringing two cells together? In larger epithelia cells have neighbours that are not daughters, still, they can form tight junctions with Par3 which participates in the establishment of cell polarity as much as those that are closer to the cytokinetic bridge (as judged by the overall cell symmetry). Is the protocol of cell aggregation fully capturing the interaction mechanism of non-daughter cells?

      We speculate that a key difference between cytokinesis and simple cell-cell contact lies in the presence or absence of actomyosin contractility during the process of cell division. Specifically, contraction of the cytokinetic ring generates mechanical forces between the two daughter cells, which are absent when two non-daughter cells are simply brought together. While adjacent epithelial cells can indeed form tight junctions and recruit Par3, the lack of shared cortical tension and contractile actin networks between non-daughter cells may lead to differences in how polarity is initiated. This mechanical input during cytokinesis may serve as an organizing signal for centrosome positioning. This idea is supported by recent work showing that the actin cytoskeleton can influence centrosome positioning (Jimenez et al., 2021), suggesting that contractile actin structures formed during cytokinesis may contribute to spatial organization in a manner that cannot be replicated by simple aggregation. 

      In our experiments, we simply captured two cells that were in contact within Matrigel. We cannot say for sure that it captures all the interaction mechanisms of non-daughter cells, but it does provide a contrast to daughter cells produced by cytokinesis. 

      Reviewer #3 (Public review):

      Here, Wang et al. aim to clarify the role of the centrosome and conserved polarity regulators in apical membrane formation during the polarization of MDCK cells cultured in 3D. Through well-presented and rigorous studies, the authors focused on the emergence of polarity as a single MDCK cell divided in 3D culture to form a two-cell cyst with a nascent lumen. Focusing on these very initial stages, rather than in later large cyst formation as in most studies, is a real strength of this study. The authors found that conserved polarity regulators Gp135/podocalyxin, Crb3, Cdc42, and the recycling endosome component Rab11a all localize to the centrosome before localizing to the apical membrane initiation site (AMIS) following cytokinesis. This protein relocalization was concomitant with a repositioning of centrosomes towards the AMIS. In contrast, Par3, aPKC, and the junctional components E-cadherin and ZO1 localize directly to the AMIS without first localizing to the centrosome. Based on the timing of the localization of these proteins, these observational studies suggested that Par3 is upstream of centrosome repositioning towards the AMIS and that the centrosome might be required for delivery of apical/luminal proteins to the AMIS.

      To test this hypothesis, the authors generated numerous new cell lines and/or employed pharmacological inhibitors to determine the hierarchy of localization among these components. They found that removal of the centrosome via centrinone treatment severely delayed and weakened the delivery of Gp135 to the AMIS and single lumen formation, although normal lumenogenesis was apparently rescued with time. This effect was not due to the presence of CEP164, ODF2, CEP120, or Pericentrin. Par3 depletion perturbed the repositioning of the centrosome towards the AMIS and the relocalization of the Gp135 and Rab11 to the AMIS, causing these proteins to get stuck at the centrosome. Finally, the authors culture the MDCK cells in several ways (forced aggregation and ECM depleted) to try and further uncouple localization of the pertinent components, finding that Par3 can localize to the cell-cell interface in the absence of cell division. Par3 localized to the edge of the cell-cell contacts in the absence of ECM and this localization was not sufficient to orient the centrosomes to this site, indicating the importance of other factors in centrosome recruitment.

      Together, these data suggest a model where Par3 positions the centrosome at the AMIS and is required for the efficient transfer of more downstream polarity determinants (Gp135 and Rab11) to the apical membrane from the centrosome. The authors present solid and compelling data and are well-positioned to directly test this model with their existing system and tools. In particular, one obvious mechanism here is that centrosome-based microtubules help to efficiently direct the transport of molecules required to reinforce polarity and/or promote lumenogenesis. This model is not really explored by the authors except by Pericentrin and subdistal appendage depletion and the authors do not test whether these perturbations affect centrosomal microtubules. Exploring the role of microtubules in this process could considerably add to the mechanisms presented here. In its current state, this paper is a careful observation of the events of MCDK polarization and will fill a knowledge gap in this field. However, the mechanism could be significantly bolstered with existing tools, thereby elevating our understanding of how polarity emerges in this system.

      We agree that further exploration of microtubule dynamics could strengthen the mechanistic framework of our study. In our initial experiments, we disrupted centrosome function through genetic perturbations (e.g., knockout of PCNT, CEP120, CEP164, and ODF2). However, consistent with previous reports (Gavilan et al., 2018; Tateishi et al., 2013), we found that single-gene deletions did not completely eliminate centrosomal microtubules. Furthermore, imaging microtubule organization in 3D culture presents technical challenges. Due to the increased density of microtubules during cell rounding, we were unable to obtain clear microtubule filament structures—either using α-tubulin staining in fixed cells or SiR-tubulin labeling in live cells. Instead, the signal appeared diffusely distributed throughout the cytosol.

      To overcome this, we employed a recently reported approach by co-expressing the centrosome-targeting carboxy-terminal domain (C-CTD) of CDK5RAP2 and the γtubulin-binding domain (gTBD) of NEDD1 to completely deplete γ-tubulin and abolish centrosomal microtubule nucleation (Vinopal et al., 2023). In our new data presented in the revised manuscript (Page 8, Paragraph 4; Figure 4—figure supplement 3), we found that cells lacking centrosomal microtubules were still able to polarize and position the centrioles apically. However, the efficiency of polarized transport of Gp135 vesicles to the apical membrane was reduced. These findings suggest that centrosomal microtubules are not essential for polarity establishment but may contribute to efficient apical transport. 

      Reference

      Betleja, E., Nanjundappa, R., Cheng, T., & Mahjoub, M. R. (2018). A novel Cep120-dependent mechanism inhibits centriole maturation in quiescent cells. Elife, 7. doi:10.7554/eLife.35439

      Gavilan, M. P., Gandolfo, P., Balestra, F. R., Arias, F., Bornens, M., & Rios, R. M. (2018). The dual role of the centrosome in organizing the microtubule network in interphase. EMBO Rep, 19(11). doi:10.15252/embr.201845942

      Jimenez, A. J., Schaeffer, A., De Pascalis, C., Letort, G., Vianay, B., Bornens, M., . . . Thery, M. (2021). Acto-myosin network geometry defines centrosome position. Curr Biol, 31(6), 1206-1220 e1205. doi:10.1016/j.cub.2021.01.002

      Martin, M., Veloso, A., Wu, J., Katrukha, E. A., & Akhmanova, A. (2018). Control of endothelial cell polarity and sprouting angiogenesis by non-centrosomal microtubules. Elife, 7. doi:10.7554/eLife.33864

      Meitinger, F., Anzola, J. V., Kaulich, M., Richardson, A., Stender, J. D., Benner, C., . . . Oegema, K. (2016). 53BP1 and USP28 mediate p53 activation and G1 arrest after centrosome loss or extended mitotic duration. J Cell Biol, 214(2), 155-166. doi:10.1083/jcb.201604081

      Schmoranzer, J., Fawcett, J. P., Segura, M., Tan, S., Vallee, R. B., Pawson, T., & Gundersen, G. G. (2009). Par3 and dynein associate to regulate local microtubule dynamics and centrosome orientation during migration. Curr Biol, 19(13), 1065-1074. doi:10.1016/j.cub.2009.05.065

      Tateishi, K., Yamazaki, Y., Nishida, T., Watanabe, S., Kunimoto, K., Ishikawa, H., & Tsukita, S. (2013). Two appendages homologous between basal bodies and centrioles are formed using distinct Odf2 domains. J Cell Biol, 203(3), 417-425. doi:10.1083/jcb.201303071

      Vasquez-Limeta, A., & Loncarek, J. (2021). Human centrosome organization and function in interphase and mitosis. Semin Cell Dev Biol, 117, 30-41. doi:10.1016/j.semcdb.2021.03.020

      Vinopal, S., Dupraz, S., Alfadil, E., Pietralla, T., Bendre, S., Stiess, M., . . . Bradke, F. (2023). Centrosomal microtubule nucleation regulates radial migration of projection neurons independently of polarization in the developing brain. Neuron, 111(8), 1241-1263 e1216. doi:10.1016/j.neuron.2023.01.020.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Figures:

      (1) Figure 3 B+C - Although in comparison to Figure 2 it appears the p53 mutation does not affect θN-C, or Lo-c. the figure would benefit from direct comparison to control cells.

      We appreciate your suggestion to improve the clarity of the figure. In response, we have revised Figure 3B+C to include control cell data, allowing for clearer side-by-side comparisons in the updated figures. 

      (2) Figure 3D - Clarify if both were normalized to time point 0:00 of the p53 KO. The image used appears that Gp135 intensity increases substantially between 0:00 and 0:15 in the figure, but the graph suggests that the intensity is the same if not slightly lower.

      Figure 3D – The data were normalized to the respective 0:00 time point for each condition. Because the intensity profile was measured along a line connecting the two nuclei, Gp135 signal could only be detected if it appeared along this line. However, the images shown are maximum-intensity projections, meaning that Gp135 signals from peripheral regions are projected onto the center of the image. This may create the appearance of increased intensity at certain time points (e.g., Figure 3A, p53-KO + CN, 0:00–0:15). 

      (3) Figure 4A: The diagram does not accurately represent the effect of the mutations, for example, PCNT mutation likely doesn't completely disrupt PCM (given gamma-tubulin is still visible in the staining), but instead results in its disorganization, Cep164 also wouldn't be expected to completely ablate distal appendages.

      Thank you for your comment. We have modified the figure in the revised manuscript (Figure 4A) to more clearly depict the defective DAs. 

      (4) Figure 4 + Supplements: A more in-depth characterization of the mutations would help address the previous comment and strengthen the manuscript. Especially as these components have previously been implicated in centrosome transport.

      Thank you for your valuable suggestion. As noted in previous studies, CEP164 is essential for distal appendage function and basal body docking, with its loss resulting in blocked ciliogenesis (Tanos et al., 2013); CEP120 is required for centriole elongation and distal appendage formation, and its loss also results in blocked ciliogenesis (Comartin et al., 2013; Lin et al., 2013; Tsai, Hsu, Liu, Chang, & Tang, 2019); ODF2 functions upstream in the formation of subdistal appendages, and its loss eliminates these structures and impairs microtubule anchoring (Tateishi et al., 2013); and PCNT functions as a PCM scaffold, necessary for the recruitment of PCM components and for microtubule nucleation at the centrosome (Fong, Choi, Rattner, & Qi, 2008; Zimmerman, Sillibourne, Rosa, & Doxsey, 2004). 

      Given that the phenotypes of these mutants have been well characterized in the literature. Here, we further focus on their roles in centrosome migration and polarized vesicle trafficking within the specific context of our study. 

      (5) Figure 4: It would be interesting to measure the Gp135 intensity at the centrosomes, given that the model proposes it is trafficked from the centrosomes to the AMIS.

      Thank you for your suggestion. We have included measurements of Gp135 intensity at the centrosomes during the Pre-Abs stage in the revised figure (Figure 4I). Our data show no significant differences in Gp135 intensity between wild-type (WT) and CEP164-, ODF2-, or CEP120-knockout (KO) cell lines. However, a slight decrease in Gp135 intensity was observed in PCNT-KO cells. 

      (6) Figure 6F shows that in suspension culture polarity is reversed, however, in Figure 6G gp135 still localizes to the cytokinetic furrow prior to polarity reversal. Given this paper demonstrates Par-3 is upstream of centrosome positioning, it would be important to have temporal data of how Par-3 localizes prior to the ring observed in 6F.

      Thank you for your comment. We have included a temporal analysis of Par3 localization using fixed-cell staining in the revised figure (Figure 6—figure supplement 1D). This analysis shows that Par3 also localizes to the cytokinesis site during the Pre-Abs stage, prior to ring formation observed during the Post-CK stage (Figure 6F). Interestingly, during the Pre-Abs stage, the centrosomes also migrate toward the center of the cell doublets in suspension culture, and Gp135 surrounding the centrosomes is also recruited to a region near the center (Figure 6—figure supplement 1E). These data suggest that Par3 also is initially recruited to the cytokinesis site before polarity reversal, potentially promoting centrosome migration. The main difference from Matrigel culture is the peripheral localization of Par3 and Gp135 in suspension, which is likely due to the lack of external ECM signaling. 

      Results:

      (1) Page 7 Paragraph 1 - consistently use AMIS (Apical membrane initiation site) rather than "the apical site".

      Thank you for your helpful comment. We have revised the manuscript (Page 7, Paragraph 1) and will now use "AMIS" (Apical Membrane Initiation Site) instead of "the apical site" throughout the text. 

      (2) Page 7 Paragraph 4 - A single sentence explaining why the p53 background had to be used for the Cep120 deletion would be beneficial. Did the cell line have a reduced centrosome number? Does this effect apical membrane initiation similar to centrinone?

      We have revised the text (Page 7, Paragraph 4) to clarify that we were unable to generate a CEP120 KO line in p53-WT cells for unknown reasons. CEP120-KO cells have a normal number of centrosome, but their centrioles are shorter. Because this KO line still contains centrioles, the effect is different from centrinone treatment, which results in a complete loss of centrioles. 

      (3) Page 10 paragraph 4 - This paragraph is confusing to read. I understand that in the cysts and epithelial sheet the cytokinetic furrow is apical, therefore a movement towards the AMIS could be due to its coincidence with the furrow. However, the phrasing "....we found that centrosomes move towards the apical membrane initiation site direction before bridge abscission. Taken together these findings indicate the position is strongly associated with the site of cytokinesis but not with the apical membrane" is confusing to the reader.

      We have revised the manuscript (Page 11, paragraph 4) to change the AMIS as the center of the cell doublet. During de novo epithelial polarization, the apical membrane has not yet formed at the Pre-Abs stage. However, at the Pre-Abs stage, the centrosome has already migrated toward the site of cytokinesis, suggesting that centrosome positioning is correlated with the site of cell division. A similar phenomenon occurs in fully polarized epithelial cysts and sheets, where the centrosomes also migrate before bridge abscission. Thus, we propose that the position of the centrosome is closely associated with the site of cytokinesis and is independent of apical membrane formation. 

      Discussion

      (1) Page 11, Paragraph 2 - citations needed when discussing previous studies.

      Thank you for your suggestion. We have included the necessary references to the discussion of the previous studies in the revised manuscript (Page 12, Paragraph 2). 

      (2) Page 12, Paragraph 2 - This section of the discussion would be strengthened by discussing the role of the actomyosin network in defining centrosome position (Jimenez et al., 2021). It seems plausible that the differences observed in the different conditions could be due to altered actomyosin architecture. Especially where the cells haven't undergone cytokinesis.

      We appreciate the suggestion of a role for the actomyosin network in determining centrosome positioning. Recent studies have indeed highlighted the role of the actomyosin network in regulating centrosome centering and off-centering (Jimenez et al., 2021). During the pre-abscission stage of cell division, the actomyosin network undergoes significant dynamic changes, with the contractile ring forming at the center and actin levels decreasing at the cell periphery. In contrast, under aggregated cell conditions—meaning cells that have not undergone division—the actomyosin network does not exhibit such dynamic changes. The loss of actomyosin remodeling may therefore influence whether the centrosome moves. Thus, alterations in actomyosin architecture may contribute to the differences observed under various conditions, particularly when cells have not yet completed cytokinesis. We have revised Paragraph 2 on Page 13 to briefly mention the referenced study and to propose that the actomyosin network may influence centrosome positioning, contributing to our observed results. This addition strengthens the discussion and clarifies our findings. 

      (3) Page 12 paragraph 3 - Given that centrosome translocation during cytokinesis in MDCK cells (this study) appears to be similar to that observed in HeLa cells and the zebrafish Kupffers vesicle (Krishnan et al., 2022) it would be interesting to discuss why Rab11a and PCNT may not be essential to centrosome positioning in MDCK cells.

      Thank you for your insightful comment. We agree that it is interesting that centrosome translocation during cytokinesis in MDCK cells (as observed in our study) is similar to that observed in HeLa cells and zebrafish Kupffer's vesicle (Krishnan et al., 2022). However, there are notable differences between these systems that may help explain why Rab11a and PCNT are not essential for centrosome positioning in MDCK cells.

      Our study used 3D culture of MDCK cells, while the reference study examined adherent culture of HeLa cells. In the adherent culture, cells attached to the culture surface form large actin stress fibers on their basal side, which weakens the actin networks in the apical and intercellular regions. In contrast, the 3D culture system used in our study better preserves cell polarity and the integrity of the actin network, which might contribute to centrosome positioning independent of Rab11a and PCNT. Differences in culture conditions and actin network architecture may explain why Rab11a and PCNT are not required for centrosome positioning in MDCK cells.

      Furthermore, the referenced study focused on Rab11a and PCNT in zebrafish embryos at 3.3–5 hours post-fertilization (hpf), a time point before the formation of the Kupffer’s vesicle. At this stage, the cells they examined may not yet have become epithelial cells, which may also influence the requirement of Rab11a and PCNT for centrosome positioning. We hypothesize that during the pre-abscission stage, centrosome migration toward the cytokinetic bridge occurs primarily in epithelial cells, and that the polarity and centrosome positioning mechanisms in these cells may differ from those in other cell types, such as zebrafish embryos.

      Furthermore, data from Krishnan et al. (2022) suggest that cytokinesis failure in pcnt+/- heterozygous embryos and Rab11a functional-blocked embryos may be due to the presence of supernumerary centrosomes. Consistent with this, our data show that blocking cytokinesis inhibits centrosome movement in MDCK cells. However, in our MDCK cell lines with PCNT or Rab11a knockdown, we did not observe significant cytokinesis failure, and centrosome migration proceeded normally. 

      Reviewer #2 (Recommendations for the authors):

      Suggestions for experiments:

      (1) A description of the organization of microtubules in the absence of centriole, or in the absence of ECM would be interesting to understand how polarity markers end up where you observed them. This easy experiment may significantly improve our understanding of this system.

      Previous studies have shown that in the absence of centrioles, microtubule organization undergoes significant changes. Specifically, the number of non-centrosomal microtubules increases, and these microtubules are not radially arranged, leading to the absence of focused microtubule organizing centers in centriolar-deficient cells (Martin, Veloso, Wu, Katrukha, & Akhmanova, 2018). This disorganized microtubule network reduces the efficiency of vesicle transport during de novo epithelial polarization at the mitotic preabscission stage. 

      In contrast, the organization of microtubules under ECM-free conditions remains less well characterized. Here, we show that while the ECM plays a critical role in establishing the direction of epithelial polarity, it does not influence the positioning of the centrosome, the microtubule-organizing center (MTOC).  

      (2) Would it be possible to knock down ODF2 and pericentrin to completely disconnect the centrosome from microtubules?

      ODF2 is the base of subdistal appendages. When ODF2 is knocked out, it affects the recruitment of all downstream proteins to the subdistal appendages (Mazo, Soplop, Wang, Uryu, & Tsou, 2016). One study has shown that ODF2 knockout cells almost completely lost subdistal appendage structures and significantly reduced the microtubule asters surrounding the centrioles (Tateishi et al., 2013). However, although pericentrin (PCNT) is the main scaffold of the pericentriolar matrix (PCM) of centrosomes, the microtubule organization ability of centrosomes can be compensated by AKAP450, a paralog of PCNT, after PCNT knockout. A previous study has even shown that in cells with a double knockout of PCNT and AKAP450, γ-tubulin can still be recruited to the centrosomes, and centrosomes can still nucleate microtubules (Gavilan et al., 2018). This suggests that there are other proteins or pathways that promote microtubule nucleation on centrosomes. We are unsure whether the triple knockout of ODF2, PCNT, and AKAP450 can completely disconnect the centrosome from microtubules. However, a recent study reported a simpler approach involving the expression of dominant-negative fragments of the γ-tubulinbinding protein NEDD1 and the activator CDK5RAP2 at the centrosome (Vinopal et al., 2023). In our revised manuscript (Page 8, Paragraph 4; Figure 4—figure supplement 3), we applied this strategy, which resulted in the depletion of nearly all γ-tubulin from the centrosome. This indicates a strong suppression of centrosomal microtubule nucleation and an effective disconnection of the centrosome from the microtubule network. 

      (3) The study does not distinguish the role of cytokinesis from the role of tight junctions, which form only after cytokinesis and not simply by bringing cells into contact. Would it be feasible and interesting to study the polarization after cytokinesis in cells that could not form tight junctions (due to the absence of Ecad or ZO1 for example)?

      Studying cell polarization after cytokinesis in cells unable to form tight junctions is a promising area of research.

      Recent studies have shown that mouse embryonic stem cells (mESCs) cultured in Matrigel can form ZO-1-labelled tight junctions at the midpoint of cell–cell contact even in the absence of cell division. However, in the absence of E-cadherin, ZO-1 localization is significantly impaired. Interestingly, despite the loss of E-cadherin, the Golgi apparatus and centrosomes remain oriented toward the cell–cell interface (Liang, Weberling, Hii, Zernicka-Goetz, & Buckley, 2022). These findings suggest that cell polarity can be maintained independently of tight junction formation, highlighting the potential value of studying cell polarization that lack tight junctions.

      Furthermore, while studies have explored the effects of knocking down tight junction components such as JAM-A and Cingulin on lumen formation in MDCK 3D cultures (Mangan et al., 2016; Tuncay et al., 2015), the role of ZO-1 in this context remains underexplored. Cingulin knockdown has been shown to disrupt endosome targeting and the formation of the AMIS, while both JAM-A and Cingulin knockdown result in actin accumulation at multiple points, leading to the formation of multi-lumen structures rather than a reversal of polarity. However, previous research has not specifically investigated centrosome positioning in JAM-A and Cingulin knockdown cells, an area that could provide valuable insights into how polarity is maintained in the absence of tight junctions. 

      Writing details:

      (1) The migration of the centrosome in the absence of appendages or PCM is proposed to be ensured by compensatory mechanisms ensuring the robustness of microtubule anchoring to the centrosome. It could also be envisaged that the centrosome motion does not require this anchoring and that other yet unknown moving mechanisms, based on an actin network for example, might exist.

      Thank you for your valuable comments. We agree that there may indeed be some unexpected mechanisms that allow centrosomes to move independently of microtubule anchoring to the centrosome, such as mechanisms based on actin filaments or noncentrosomal microtubules; these mechanisms are worth further investigation.

      In response to your suggestion, in the Paragraph 5 of the discussion section, we further clarified that while a microtubule anchoring mechanism might be one explanation, other mechanisms could also influence centrosome movement in the absence of appendages or PCM. Additionally, we revised the Paragraph 4 regarding the possibility of actin network-driven centrosome movement and emphasized the importance of future research for a deeper understanding of these processes. 

      (2) The actual conclusion of the study of Martin et al (eLife 2018) is not simply that centrosome is not involved in cell polarization but that it hinders cell polarization!

      Thank you for your valuable feedback. We agree with the findings of Martin et al. (eLife 2018) that centrosome is not irrelevant to cell polarity, but rather they inhibit cell polarization. Therefore, we have revised the manuscript (Page 2, Paragraph 2) to more accurately reflect this viewpoint. 

      (3) This study recalls some conclusions of the study by Burute et al (Dev Cell 2017), in particular the role of Par3 in driving centrosome toward the intercellular junction of daughter cells after cytokinesis. It would be welcome to comment on the results of this study in light of their work.

      Thank you for your valuable feedback. The study by Burute et al. (Dev Cell, 2017) showed that in micropattern-cultures of MCF10A cells, the cells exhibit polarity and localize their centrosomes towards the intercellular junction, while downregulation of Par3 gene expression disrupts this centrosome positioning. This result is similar to our findings in 3D cultured MDCK cells and consistent with previous studies in C. elegans intestinal cells and migrating NIH 3T3 cells (Feldman & Priess, 2012; Schmoranzer et al., 2009), indicating that Par3 indeed influences centrosome positioning in different cellular systems. However, Par3 does not directly localize to the centrosome; rather, it localizes to the cell cortex or cell-cell junctions. Therefore, Par3 likely regulates centrosome positioning through other intermediary molecules or mechanisms, but the specific mechanism remains unclear and requires further investigation. 

      (4) Could the term apico-basal be used in the absence of a basement membrane to form a basal pole?

      We understand that using the term "apico-basal" in the absence of a basement membrane might raise some questions. Traditionally, the apico-basal axis refers to the polarity of epithelial cells, where the apical surface faces the lumen or external environment, and the basal surface is oriented toward the basement membrane. However, in the absence of a basement membrane, such as in certain in vitro systems or under specific experimental conditions, polarity along a similar axis can still be observed. In such cases, the term "apico-basal" can still be used to describe the polarity between the apical domain and the region where it contacts the substrate or adjacent cells. 

      (5) The absence of centrosome movement to the intercellular bridge in spread cells in culture is not so surprising considering the work of Lafaurie-Janvore et al (Science 2018) about the role of cell spreading in the regulation of bridge tension and abscission delay.

      Thank you for your valuable comment. Indeed, previous studies have shown that in some cell types, the centrosome does move toward the intercellular bridge in spread cells (Krishnan et al., 2022; Piel, Nordberg, Euteneuer, & Bornens, 2001), but other studies have suggested that this movement may not be significant and it may not occur in universally observed across all cell types (Jonsdottir et al., 2010). In our study, we aim to demonstrate that this phenomenon is more pronounced in 3D culture systems compared to 2D spread cell culture systems. Previous studies and our work have observed that centrosome migration occurs during the pre-abscission stage, but whether this migration is directly related to cytokinetic bridge tension or the time of abscission remains an open question. Further research is needed to explore the potential relationship between centrosome positioning, cytokintic bridge tension, and the timing of abscission. 

      (6) GP135 (podocalyxin) has been proposed to have anti-adhesive/lubricant properties (hence its pro-invasive effect). Could it be possible that once localized at the cell surface it is systematically moved away from regions that are anchored to either the ECM or adjacent cells? So its localization away from the centrosome in an ECM-free experiment would not be a consequence of defective targeting but relocalization after reaching the plasma membrane?

      Thank you for your valuable comment. We agree that GP135 may indeed move directly across the cell surface, away from the region where it interacts with the ECM or adjacent cells. This re-localization could be due to its anti-adhesive or lubricating properties, which may facilitate its displacement from these adhesive sites. To validate this, it is necessary to employ higher-resolution real-time imaging system to observe the dynamic behavior of GP135 on the cell surface.

      However, this does not contradict our main conclusion. Under suspension culture conditions without ECM, the centrosome positioning in cell doublets is indeed decoupled from apical membrane orientation. This suggests that the localization of the centrosome and the apical membrane is regulated by different mechanisms. Specifically, the GP135 protein tends to accumulate away from areas of contact with the ECM or adjacent cells, possibly through movement within the cell membrane or by recycling endosome transport. In contrast, centrosome positioning is closely related to the cytokinesis site. Our study clearly elucidates the differences between these two polarity properties. 

      Reviewer #3 (Recommendations for the authors):

      Major:

      (1) To me, a clear implication of these studies is that Gp135, Rab11, etc. are delivered to the AMIS on centrosomal microtubules. The authors do not explore this model except to say that depletion of SD appendage or pericentrin has no effect on the protein relocalization to the AMIS. However, the authors do not observe microtubule association with the centrosome in these KO conditions. This analysis is imperative to interpret existing results since these are new KO conditions in this cell/culture system and parallel pathways (e.g. CDK5RAP2) are known to contribute to microtubule association with the centrosome. An ability to comment on the mechanism by which the centrosome contributes to the efficiency of polarization would greatly enhance the paper.

      Microtubule requirement could also be tested in numerous additional ways requiring varying degrees of new experiments:

      (a) faster live cell imaging at abscission to see if the deposition of those components appears to traffic on MTs;

      (b) live cell imaging with microtubules (e.g. SPY-tubulin) and/or EB1 to determine the origin and polarity of microtubules at the pertinent stages;

      For (a) and (b), because the cells were cultured in Matrigel, they tended to be round up, with a dense internal structure that made observation difficult. In contrast, under adherent culture conditions, the cells were flattened with a more dispersed internal structures, making them easier to observe. We had previously used SPY-tubulin to label microtubules for live cell imaging; however, due to the dense microtubule structure in 3D culture, the image contrast was reduced, and we could not clearly observe the microtubule network within the cells. 

      (c) acute nocodazole treatment at abscission to determine the effect on protein localization.

      Regarding the method of using nocodazole to study microtubule requirements at the abscission stage, we believe that nocodazole treatment may lead to cytokinesis failure. Cell division failure results in the formation of binucleated cells, which are unable to establish cell polarity. Furthermore, nocodazole treatment cannot distinguish between centrosomal and non-centrosomal microtubules, making it unsuitable for studying the specific role of centrosomal microtubules in this process.

      In our new data (Figure 4-figure supplementary 3) presented in the revised manuscript, we employed a recently reported method by co-expressing of the centrosome-targeting carboxy-terminal domain (C-CTD) of CDK5RAP2 and the γ-tubulin-binding domain (gTBD) of NEDD1 to completely deplete γ-tubulin and abolish centrosomal microtubule nucleation (Vinopal et al., 2023). We found that cells lacking centrosomal microtubules were still able to polarize and position the centrioles apically. However, the efficiency of polarized transport of Gp135 vesicles to the apical membrane was reduced. These findings suggest that centrosomal microtubules are not essential for polarity establishment but may contribute to facilitate efficient apical transport. 

      (2) Similar to the expanded analysis of the role of microtubules in this system, it would be excellent if the author could expand on the role of Par3 and the centrosome, although this reviewer recognizes that the authors have already done substantial work. For example, what are the consequences of Gp135 and/or Rab11 getting stuck at the centrosome? Do the authors have any later images to determine when and if these components ever leave the centrosome? Existing literature focuses on the more downstream consequence of Par3 removal on single-lumen formation. 

      Similarly, could the authors expand on the description of polarity disruption following centrinone treatment? It is clear that Gp135 recruitment is disrupted, but how and when do things get fixed and what else is disrupted at the very earliest stages of AMIS formation? The authors have an excellent opportunity to really expand on what is known about the requirements for these conserved components.

      Regarding the use of centrinone in treatment, we speculate that Gp135 can still accumulate at the AMIS over time, although the efficiency of its recruitment may be reduced.

      Furthermore, under similar conditions, other apical membrane components (such as the Crumbs3 protein) may exhibit similar characteristics to Gp135 protein. 

      (3) Perhaps satisfying both of the above asks, could the authors do a faster time-lapse at the relevant time points, i.e. as proteins are being recruited to the AMIS (time points between 1Aiv and v)? This type of imaging again might help shed light on the mechanism.

      We believe the above questions are very important and may require further experimental verification in the future. 

      Minor:

      (1) What is the green patch of Gp135 in Figure 2A that does not colocalize with the centrosome? Is this another source of Gp135 that is being delivered to the AMIS? This type of patch is also visible in Figure 3A 15 and 30-minute panels.

      During mitosis, membrane-composed organelles such as the Golgi apparatus are typically dispersed throughout the cytoplasm. However, during the pre-abscission stage, these organelles begin to reassemble and cluster around the centrosome. Furthermore, they also accumulate in the region between the nucleus and the cytokinetic bridge, corresponding to the “patch” mentioned in Figure 2A. 

      Live cell imaging results showed that this Gp135 patch initially appears in a region not associated with the centrosome. Subsequently, they were either directly transported to the AMIS or fused with the centrosome-associated Gp135 and transported together. Notably, this patch was only observed when Gp135 was overexpressed in cells. No such distinct protein patches were observed when staining endogenous Gp135 protein (Figure 1A), suggesting that overexpression of Gp135 protein may lead to a localized increase in its concentration in that region. 

      (2) I am confused by the "polarity index" quantification as this appears to just be a nucleus centrosome distance measurement and wouldn't, for example, distinguish if the centrosomes separated from the nucleus but were on the basal side of the cell.

      The position of the centrosome within the cell (i.e., its distance from the nucleus) can indeed serve as an indicator of cell polarity (Burute et al., 2017). We acknowledge that this quantitative method does not directly capture the specific direction in which the centrosome deviates from the cell center. To address this limitation, we have incorporated information about the angle between the nucleus and the centrosome, which allows for a more accurate description of changes in cell polarity (Rodriguez-Fraticelli, Auzan, Alonso, Bornens, & Martin-Belmonte, 2012). 

      (3) How is GP135 "at AMIS" measured? Is an arbitrary line drawn? This is important later when comparing to centrinone treatment in Figure 3D where the quantification does not seem to accurately capture the enrichment of Gp135 that is seen in the images.

      To measure the expression level of Gp135 in the "AMIS" region of the cell, we first connected the centers of the two cell nuclei in three-dimensional space to form a straight line. Then, we used the Gp135 expression intensity at the midpoint of this line as the representative value for the AMIS region. This method is based on the assumption that the AMIS region is most likely located between the centers of the two cell nuclei. Therefore, this quantitative method provides a standardized assessment tool for comparing Gp135 expression levels under different conditions. 

      (4) The authors reference cell height (p.7) but no data for this measurement are shown

      Thank you for the comment. Although we did not perform quantitative measurements, the differences in cell height are clearly visible in Figure 3E (p53-KO + CN), which visually illustrates this phenomenon. 

      (5) Can the authors comment on the seeming reduction of Par3 in p53 KO cells?

      We did not observe a reduction of Par3 in p53-KO cells in our experiments.

      (6) Can the authors make sense of the E-cad localization: Figure 5, Supplement 2.

      Our study revealed that E-cadherin begins to accumulate at the cell-cell contact sites during the pre-abscission stage. Its appearance is similar to that of ZO-1, which also appears near the cell division site during this phase. Therefore, the behavior of E-cadherin contrasts sharply with that of Gp135, further highlighting the unique trafficking mechanisms of apical membrane proteins during this process. 

      (7) I find the results in Figure 6G puzzling. Why is ECM signaling required for Gp135 recruitment to the centrosome. Could the authors discuss what this means?

      We appreciate the reviewer’s valuable comments and thank you for the opportunity to clarify this point. The data in Figure 6G do not indicate that ECM signaling is required for the recruitment of Gp135 to the centrosome. Rather, our findings suggest that even in the absence of ECM, the centrosomes can migrate to a polarized position similar to that in Matrigel culture. This suggests that centrosome migration and the orientation of the nucleus–centrosome axis may be independent of ECM signaling and are primarily driven by cytokinesis alone. 

      Regarding the localization of Gp135, previous studies have shown that ECM signaling through integrin promotes endocytosis, which is crucial for the internalization of Gp135 from the cell membrane and its subsequent transport to the AMIS (Buckley & St Johnston, 2022). Our study found that, prior to its accumulation at the AMIS, Gp135 transiently localizes around the centrosome. In the absence of ECM, due to reduced endocytosis, Gp135 primarily remains on the cell membrane and does not undergo intracellular trafficking.  

      (8) The authors end the Discussion stating that these studies may have implication for in vivo settings, yet do not discuss the striking similarities to the C. elegans and Drosophila intestine or the findings from any other more observational studies of tubular epithelial systems in vivo (e.g. mouse kidney polarization, zebrafish neuroepithelium, etc.). These models should be discussed.

      Thank you for your valuable comment. Indeed, all types of epithelial tissues or tubular epithelial systems in vivo share some common features during cell division, which have been well-documented across various species. 

      These features include: during interphase, the centrosome is located at the apical surface of the cells; after the cell enters mitosis, the centrosome moves to the lateral side of the cell to regulate spindle orientation; and during cytokinesis, the cleavage furrow ingresses asymmetrically from the basal to the apical side, with the cytokinetic bridge positioned at the apical surface. Our study using MDCK 3D culture and transwell culture systems successfully mimicked these key features, demonstrating that these in vitro models are of significant value for studying cell polarization dynamics. 

      Based on our observations, we speculate that the centrosome may return to the apical surface after anaphase, just before bridge abscission. This is consistent with our findings from studies using MDCK 3D cultures and transwell systems, which showed that the centrosome relocates prior to the final stages of cytokinesis.

      Additionally, we propose that de novo polarization of the kidney tubule in vivo may not solely depend on the aggregation and mesenchymal-epithelial transition (MET) of the metanephric mesenchyme. It may also be related to the cell division process, which triggers centrosome migration and polarized vesicle trafficking. These processes likely contribute to enhancing cell polarization, as we observed in our in vitro models.

      We hope this will further clarity the potential implications of our findings for in vivo model studies, as well as and their broader impact on the field of tubular epithelial cell polarization research. 

      (9) There are several grammatical issues/typos throughout the paper. A careful readthrough is required. For example:

      this sentence makes no sense "that the centrosome acts as a hub of apical recycling endosomes and centrosome migration during cytokinetic pre-abscission before apical membrane components are targeted to the AMIS"

      We carefully reviewed the paper and made necessary revisions to address the issues raised. In particular, we revised certain sentences to improve clarity and readability (Page 5, Paragraph 3). 

      (10) P.8: have been previously reported [to be] involved in MDCK...

      We appreciate the reviewer's valuable suggestions. We have revised the sentence accordingly (Page 9, Paragraph 2). 

      (11) This sentence seems misplaced: "Cultured conditions influence cellular polarization preferences."

      The sentence itself is fine, but to improve the coherence and clarity of the paragraph, we adjusted the paragraph structure and added some transitional phrases (Page 13, Paragraph 1).  

      (12) "Play a downstream role in Par3 recruitment" doesn't make sense, this should just be downstream of Par3 recruitment.

      Thank you for your suggestion. We have revised the wording accordingly, changing it to "downstream of Par3 recruitment" (Page 10, Paragraph 2).  

      Reference

      Buckley, C. E., & St Johnston, D. (2022). Apical-basal polarity and the control of epithelial form and function. Nat Rev Mol Cell Biol, 23(8), 559-577. doi:10.1038/s41580-022-00465-y

      Burute, M., Prioux, M., Blin, G., Truchet, S., Letort, G., Tseng, Q., . . . Thery, M. (2017). Polarity Reversal by Centrosome Repositioning Primes Cell Scattering during Epithelial-to-Mesenchymal Transition. Dev Cell, 40(2), 168-184. doi:10.1016/j.devcel.2016.12.004

      Comartin, D., Gupta, G. D., Fussner, E., Coyaud, E., Hasegan, M., Archinti, M., . . . Pelletier, L. (2013). CEP120 and SPICE1 cooperate with CPAP in centriole elongation. Curr Biol, 23(14), 13601366.

      doi:10.1016/j.cub.2013.06.002

      Feldman, J. L., & Priess, J. R. (2012). A role for the centrosome and PAR-3 in the hand-off of MTOC function during epithelial polarization. Curr Biol, 22(7), 575-582. doi:10.1016/j.cub.2012.02.044

      Fong, K. W., Choi, Y. K., Rattner, J. B., & Qi, R. Z. (2008). CDK5RAP2 is a pericentriolar protein that functions in centrosomal attachment of the gamma-tubulin ring complex. Mol Biol Cell, 19(1), 115-125. doi:10.1091/mbc.e07-04-0371

      Gavilan, M. P., Gandolfo, P., Balestra, F. R., Arias, F., Bornens, M., & Rios, R. M. (2018). The dual role of the centrosome in organizing the microtubule network in interphase. EMBO Rep, 19(11). doi:10.15252/embr.201845942

      Jimenez, A. J., Schaeffer, A., De Pascalis, C., Letort, G., Vianay, B., Bornens, M., . . . Thery, M. (2021). Acto-myosin network geometry defines centrosome position. Curr Biol, 31(6), 1206-1220 e1205. doi:10.1016/j.cub.2021.01.002

      Jonsdottir, A. B., Dirks, R. W., Vrolijk, J., Ogmundsdottir, H. M., Tanke, H. J., Eyfjord, J. E., & Szuhai, K. (2010). Centriole movements in mammalian epithelial cells during cytokinesis. BMC Cell Biol, 11, 34. doi:10.1186/1471-2121-11-34

      Krishnan, N., Swoger, M., Rathbun, L. I., Fioramonti, P. J., Freshour, J., Bates, M., . . . Hehnly, H. (2022). Rab11 endosomes and Pericentrin coordinate centrosome movement during preabscission in vivo. Life Sci Alliance, 5(7). doi:10.26508/lsa.202201362

      Liang, X., Weberling, A., Hii, C. Y., Zernicka-Goetz, M., & Buckley, C. E. (2022). E-cadherin mediates apical membrane initiation site localisation during de novo polarisation of epithelial cavities. EMBO J, 41(24), e111021. doi:10.15252/embj.2022111021

      Lin, Y. N., Wu, C. T., Lin, Y. C., Hsu, W. B., Tang, C. J., Chang, C. W., & Tang, T. K. (2013). CEP120 interacts with CPAP and positively regulates centriole elongation. J Cell Biol, 202(2), 211219. doi:10.1083/jcb.201212060

      Mangan, A. J., Sietsema, D. V., Li, D., Moore, J. K., Citi, S., & Prekeris, R. (2016). Cingulin and actin mediate midbody-dependent apical lumen formation during polarization of epithelial cells. Nat Commun, 7, 12426. doi:10.1038/ncomms12426

      Martin, M., Veloso, A., Wu, J., Katrukha, E. A., & Akhmanova, A. (2018). Control of endothelial cell polarity and sprouting angiogenesis by non-centrosomal microtubules. Elife, 7. doi:10.7554/eLife.33864

      Mazo, G., Soplop, N., Wang, W. J., Uryu, K., & Tsou, M. F. (2016). Spatial Control of Primary Ciliogenesis by Subdistal Appendages Alters Sensation-Associated Properties of Cilia. Dev Cell, 39(4), 424-437. doi:10.1016/j.devcel.2016.10.006

      Piel, M., Nordberg, J., Euteneuer, U., & Bornens, M. (2001). Centrosome-dependent exit of cytokinesis in animal cells. Science, 291(5508), 1550-1553. doi:10.1126/science.1057330

      Rodriguez-Fraticelli, A. E., Auzan, M., Alonso, M. A., Bornens, M., & Martin-Belmonte, F. (2012). Cell confinement controls centrosome positioning and lumen initiation during epithelial morphogenesis. J Cell Biol, 198(6), 1011-1023. doi:10.1083/jcb.201203075

      Schmoranzer, J., Fawcett, J. P., Segura, M., Tan, S., Vallee, R. B., Pawson, T., & Gundersen, G. G. (2009). Par3 and dynein associate to regulate local microtubule dynamics and centrosome orientation during migration. Curr Biol, 19(13), 1065-1074. doi:10.1016/j.cub.2009.05.065

      Tanos, B. E., Yang, H. J., Soni, R., Wang, W. J., Macaluso, F. P., Asara, J. M., & Tsou, M. F. (2013). Centriole distal appendages promote membrane docking, leading to cilia initiation. Genes Dev, 27(2), 163-168. doi:10.1101/gad.207043.112

      Tateishi, K., Yamazaki, Y., Nishida, T., Watanabe, S., Kunimoto, K., Ishikawa, H., & Tsukita, S. (2013). Two appendages homologous between basal bodies and centrioles are formed using distinct Odf2 domains. J Cell Biol, 203(3), 417-425. doi:10.1083/jcb.201303071

      Tsai, J. J., Hsu, W. B., Liu, J. H., Chang, C. W., & Tang, T. K. (2019). CEP120 interacts with C2CD3 and Talpid3 and is required for centriole appendage assembly and ciliogenesis. Sci Rep, 9(1), 6037. doi:10.1038/s41598-019-42577-0

      Tuncay, H., Brinkmann, B. F., Steinbacher, T., Schurmann, A., Gerke, V., Iden, S., & Ebnet, K. (2015). JAM-A regulates cortical dynein localization through Cdc42 to control planar spindle orientation during mitosis. Nat Commun, 6, 8128. doi:10.1038/ncomms9128

      Vinopal, S., Dupraz, S., Alfadil, E., Pietralla, T., Bendre, S., Stiess, M., . . . Bradke, F. (2023). Centrosomal microtubule nucleation regulates radial migration of projection neurons independently of polarization in the developing brain. Neuron, 111(8), 1241-1263 e1216. doi:10.1016/j.neuron.2023.01.020

      Zimmerman, W. C., Sillibourne, J., Rosa, J., & Doxsey, S. J. (2004). Mitosis-specific anchoring of gamma tubulin complexes by pericentrin controls spindle organization and mitotic entry. Mol Biol Cell, 15(8), 3642-3657. doi:10.1091/mbc.e03-11-0796.

    1. eLife Assessment

      This study uses a novel 3D imaging method to identify the Periportal Lamellar Complex (PLC), an important new structure. Although the methodological advancement and morphological descriptions are convincing, the evidence for its proposed function is incomplete, relying on transcriptomic correlation rather than direct experimental validation. The work would therefore be strengthened by focusing its claims on the robust methodological advancement and detailed morphological characterization.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, Chengjian Zhao et al. focused on the interactions between vascular, biliary, and neural networks in the liver microenvironment, addressing the critical bottleneck that the lack of high-resolution 3D visualization has hindered understanding of these interactions in liver disease.

      Strengths:

      This study developed a high-resolution multiplex 3D imaging method that integrates multicolor metallic compound nanoparticle (MCNP) perfusion with optimized CUBIC tissue clearing. This method enables the simultaneous 3D visualization of spatial networks of the portal vein, hepatic artery, bile ducts, and central vein in the mouse liver. The authors reported a perivascular structure termed the Periportal Lamellar Complex (PLC), which is identified along the portal vein axis. This study clarifies that the PLC comprises CD34⁺Sca-1⁺ dual-positive endothelial cells with a distinct gene expression profile, and reveals its colocalization with terminal bile duct branches and sympathetic nerve fibers under physiological conditions.

      Comments on revisions:

      The authors very nicely addressed all concerns from this reviewer. There are no further concerns or comments.

    3. Reviewer #2 (Public review):

      Summary:

      The present manuscript of Xu et al. reports a novel clearing and imaging method focusing on the liver. The Authors simultaneously visualized the portal vein, hepatic artery, central vein, and bile duct systems by injected metal compound nanoparticles (MCNPs) with different colors into the portal vein, heart left ventricle, vena cava inferior and the extrahepatic bile duct, respectively. The method involves: trans-cardiac perfusion with 4% PFA, the injection of MCNPs with different colors, clearing with the modified CUBIC method, cutting 200 micrometer thick slices by vibratome, and then microscopic imaging. The Authors also perform various immunostaining (DAB or TSA signal amplification methods) on the tissue slices from MCNP-perfused tissue blocks. With the application of this methodical approach, the Authors report dense and very fine vascular branches along the portal vein. The authors name them as 'periportal lamellar complex (PLC)' and report that PLC fine branches are directly connected to the sinusoids. The authors also claim that these structures co-localize with terminal bile duct branches and sympathetic nerve fibers and contain endothelial cells with a distinct gene expression profile. Finally, the authors claim that PLC-s proliferate in liver fibrosis (CCl4 model) and act as scaffold for proliferating bile ducts in ductular reaction and for ectopic parenchymal sympathetic nerve sprouting.

      Strengths:

      The simultaneous visualization of different hepatic vascular compartments and their combination with immunostaining is a potentially interesting novel methodological approach.

      Weaknesses:

      This reviewer has some concerns about the validity of the microscopic/morphological findings as well as the transcriptomics results, and suggests that the conclusions of the paper may be critically viewed. Namely, at this point, it is still not fully clear that the 'periportal lamellar complex (PLC)' that the Authors describe really exists as a distinct anatomical or functional unit or these are fine portal branches that connect the larger portal veins into the adjacent sinusoid. Also, in my opinion, to identify the molecular characteristics of such small and spatially highly organized structures like those fine radial portal branches, the only way is to perform high-resolution spatial transcriptomics (instead of data mining in existing liver single cell database and performing Venn diagram intersection analysis in hepatic endothelial subpopulations). Yet, the existence of such structures with a distinct molecular profile cannot be excluded. Further research with advanced imaging and omics techniques (such as high resolution volume imaging, and spatial transcriptomics/proteomics) are needed to reproduce these initial findings.

    4. Reviewer #3 (Public review):

      Summary:

      In the revised version of the manuscript authors addressed multiple comments, clarifying especially the methodological part of their work and PLC identification as a novel morphological feature of the adult liver portal veins. Tet is now also much clearer and has better flow.

      The additional assessment of the smartSeq2 data from Pietilä et al., 2025 strengthens the transcriptomic profiling of the CD34+Sca1+ cells and the discussion of the possible implications for the liver homeostasis and injury response. Why it may suffer from similar bias as other scRNA seq datasets - multiple cell fate signatures arising from mRNA contamination from proximal cells during dissociation, it is less likely that this would happen to yield so similar results.

      Nevertheless, a more thorough assessment by functional experimental approaches is needed to decipher the functional molecules and definite protein markers before establishing the PLC as the key hub governing the activity of biliary, arterial, and neuronal liver systems.

      The work does bring a clear new insight into the liver structure and functional units and greatly improves the methodological toolbox to study it even further, and thus fully deserves the attention of the Elife readers.

      Strengths:

      The authors clearly demonstrate an improved technique tailored to the visualization of the liver vasulo-biliary architecture in unprecedented resolution.

      This work proposes a new morphological feature of adult liver facilitating interaction between the portal vein, hepatic arteries, biliary tree, and intrahepatic innervation, centered at previously underappreciated protrusions of the portal veins - the Periportal Lamellar Complexes (PLCs).

      Weaknesses:

      The importance of CD34+Sca1+ endothelial cell subpopulation for PLC formation and function was not tested and warrants further validation.

    5. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Chengjian Zhao et al. focused on the interactions between vascular, biliary, and neural networks in the liver microenvironment, addressing the critical bottleneck that the lack of high-resolution 3D visualization has hindered understanding of these interactions in liver disease.

      Strengths:

      This study developed a high-resolution multiplex 3D imaging method that integrates multicolor metallic compound nanoparticle (MCNP) perfusion with optimized CUBIC tissue clearing. This method enables the simultaneous 3D visualization of spatial networks of the portal vein, hepatic artery, bile ducts, and central vein in the mouse liver. The authors reported a perivascular structure termed the Periportal Lamellar Complex (PLC), which is identified along the portal vein axis. This study clarifies that the PLC comprises CD34⁺Sca-1⁺ dual-positive endothelial cells with a distinct gene expression profile, and reveals its colocalization with terminal bile duct branches and sympathetic nerve fibers under physiological conditions.<br />

      Weaknesses:

      This manuscript is well-written, organized, and informative. However, there are some points that need to be clarified.

      (1) After MCNP-dye injection, does it remain in the blood vessels, adsorb onto the cell surface, or permeate into the cells? Does the MCNP-dye have cell selectivity?

      The experimental results showed that after injection, the MCNP series nanoparticles predominantly remained within the lumens of blood vessels and bile ducts, with their tissue distribution determined by physical perfusion. No diffusion of the dye signal into the surrounding parenchymal tissue was observed, nor was there any evidence of adsorption onto the cell surface or entry into cells. The newly added Supplementary Figure S2A–H further confirmed this feature, demonstrating that the dye signals were strictly confined to the luminal space, clearly delineating the continuous course of blood vessels and the branching morphology of bile ducts. These findings strongly support the conclusion that “MCNP dyes are distributed exclusively within the luminal compartments.”

      Therefore, the MCNP dyes primarily serve as intraluminal tracers within the tissue rather than as labels for specific cell types.

      (2) All MCNP-dyes were injected after the mice were sacrificed, and the mice's livers were fixed with PFA. After the blood flow had ceased, how did the authors ensure that the MCNP-dyes were fully and uniformly perfused into the microcirculation of the liver?

      Thank you for the reviewer’s valuable comments. Indeed, since all MCNP dyes were perfused after the mice were euthanized and blood circulation had ceased, we cannot fully ensure a homogeneous distribution of the dye within the hepatic microcirculation. The vascular labeling technique based on metallic nanoparticle dyes used in this study offers clear imaging, stable fluorescence intensity, and multiplexing advantages; however, it also has certain limitations. The main issue is that the dye distribution within the hepatic parenchyma can be affected by factors such as lobular overlap, local tissue compression, and variations in vascular pathways, resulting in regional inhomogeneity of dye perfusion. This is particularly evident in areas where multiple lobes converge or where anatomical structures are complex, leading to local dye accumulation or over-perfusion.

      In our experiments, we attempted to minimize local blockage or over-perfusion by performing PBS pre-flushing and low-pressure, constant-speed perfusion. Nevertheless, localized dye accumulation or uneven distribution may still occur in lobe junctions or structurally complex regions. Such variation represents one of the methodological limitations. Overall, the dye signals in most samples remained confined to the vascular and biliary lumens, and the distribution pattern was highly reproducible.

      We have addressed this issue in the Discussion section but would like to emphasize here that, although this system has clear advantages, it remains sensitive to anatomical variability in the liver—such as lobular overlap and vascular heterogeneity. At vascular junctions, local perfusion inhomogeneity or dye accumulation may occur; therefore, injection strategies and perfusion parameters should be adjusted according to liver size and vascular condition to improve reproducibility and imaging quality. It should also be noted that the results obtained using this method primarily aim to visualize the overall and fine anatomical structures of the hepatic vascular system rather than to quantitatively reflect hemodynamic processes. In the future, we plan to combine in vivo perfusion or dynamic fluid modeling to further validate the diffusion characteristics of the dyes within the hepatic microcirculation.

      (3) It is advisable to present additional 3D perspective views in the article, as the current images exhibit very weak 3D effects. Furthermore, it would be better to supplement with some videos to demonstrate the 3D effects of the stained blood vessels.

      Thank you for the reviewer’s valuable comments. In response to the suggestion, we have added perspective-rendered images generated from the 3D staining datasets to provide a more intuitive visualization of the spatial morphology of the hepatic vasculature. These images have been included in Figure S2A–J. In addition, we have prepared supplementary videos (available upon request) that dynamically display the three-dimensional distribution of the stained vessels, further enhancing the spatial perception and visualization of the results.

      (4) In Figure 1-I, the authors used MCNP-Black to stain the central veins; however, in addition to black, there are also yellow and red stains in the image. The authors need to explain what these stains are in the legend.

      Thank you for the reviewer’s constructive comment. In Figure 1I, MCNP-Black labels the central vein (black), MCNP-Yellow labels the portal vein (yellow), MCNP-Pink labels the hepatic artery (pink), and MCNP-Green labels the bile duct (green). We have revised the Figure 1 legend to include detailed descriptions of the color signals and their corresponding structures to avoid any potential confusion.

      (5) There is a typo in the title of Figure 4F; it should be "stem cell".

      Thank you for the reviewer’s careful correction. We have corrected the spelling error in the title of Figure 4F to “stem cell” and updated it in the revised manuscript.

      (6) Nuclear staining is necessary in immunofluorescence staining, especially for Figure 5e. This will help readers distinguish whether the green color in the image corresponds to cells or dye deposits.

      We thank the reviewer for the valuable suggestion. We understand that nuclear staining can help determine the origin of fluorescence signals. However, in our three-dimensional imaging system, the deep signal acquisition range after tissue clearing often causes nuclear dyes such as DAPI to generate highly dense and widespread fluorescence, especially in regions rich in vascular structures, which can obscure the fine vascular and perivascular details of interest. Therefore, this study primarily focuses on high-resolution visualization of the spatial architecture of the vascular and biliary systems. We have added an explanation regarding this point in Figures S2I–J.

      Reviewer #2 (Public review):

      Summary:

      The present manuscript of Xu et al. reports a novel clearing and imaging method focusing on the liver. The authors simultaneously visualized the portal vein, hepatic artery, central vein, and bile duct systems by injecting metal compound nanoparticles (MCNPs) with different colors into the portal vein, heart left ventricle, inferior vena cava, and the extrahepatic bile duct, respectively. The method involves: trans-cardiac perfusion with 4% PFA, the injection of MCNPs with different colors, clearing with the modified CUBIC method, cutting 200 micrometer thick slices by vibratome, and then microscopic imaging. The authors also perform various immunostaining (DAB or TSA signal amplification methods) on the tissue slices from MCNP-perfused tissue blocks. With the application of this methodical approach, the authors report dense and very fine vascular branches along the portal vein. The authors name them as 'periportal lamellar complex (PLC)' and report that PLC fine branches are directly connected to the sinusoids. The authors also claim that these structures co-localize with terminal bile duct branches and sympathetic nerve fibers, and contain endothelial cells with a distinct gene expression profile. Finally, the authors claim that PLC-s proliferate in liver fibrosis (CCl4 model) and act as a scaffold for proliferating bile ducts in ductular reaction and for ectopic parenchymal sympathetic nerve sprouting.

      Strengths:

      The simultaneous visualization of different hepatic vascular compartments and their combination with immunostaining is a potentially interesting novel methodological approach.

      Weaknesses:

      This reviewer has several concerns about the validity of the microscopic/morphological findings as well as the transcriptomics results. In this reviewer's opinion, the introduction contains overstatements regarding the potential of the method, there are severe caveats in the method descriptions, and several parts of the Results are not fully supported by the documentation. Thus, the conclusions of the paper may be critically viewed in their present form and may need reconsideration by the authors.

      We sincerely thank the reviewer for the thorough evaluation and constructive comments on our study. We fully understand and appreciate the reviewer’s concerns regarding the methodological validity and interpretation of the results. In response, we have made comprehensive revisions and additions to the manuscript as follows:

      First, we have carefully revised the Introduction and Discussion sections to provide a more balanced description of the methodological potential, removing statements that might be considered overstated, and clarifying the applicable scope and limitations of our approach (see the revised Introduction and Discussion).

      Second, we have substantially expanded the Methods section with detailed information on model construction, imaging parameters, data processing workflow, and technical aspects of the single-cell transcriptomic reanalysis, to enhance the transparency and reproducibility of the study.

      Third, we have added additional references and explanatory notes in the Results section to better support the main conclusions (see Section 6 of the Results).

      Finally, we have rechecked and validated all experimental data, and conducted a verification analysis using an independent single-cell RNA-seq dataset (Figure S6). The results confirm that the morphological observations and transcriptomic findings are consistent and reproducible across independent experiments.

      We believe these revisions have greatly strengthened the reliability of our conclusions and the overall scientific rigor of the manuscript. Once again, we sincerely appreciate the reviewer’s valuable comments, which have been very helpful in improving the logic and clarity of our work.

      Reviewer #3 (Public review):

      Summary:

      In the reviewed manuscript, researchers aimed to overcome the obstacles of high-resolution imaging of intact liver tissue. They report successful modification of the existing CUBIC protocol into Liver-CUBIC, a high-resolution multiplex 3D imaging method that integrates multicolor metallic compound nanoparticle (MCNP) perfusion with optimized liver tissue clearing, significantly reducing clearing time and enabling simultaneous 3D visualization of the portal vein, hepatic artery, bile ducts, and central vein spatial networks in the mouse liver. Using this novel platform, the researchers describe a previously unrecognized perivascular structure they termed Periportal Lamellar Complex (PLC), regularly distributed along the portal vein axis. The PLC originates from the portal vein and is characterized by a unique population of CD34⁺Sca-1⁺ dual-positive endothelial cells. Using available scRNAseq data, the authors assessed the CD34⁺Sca-1⁺ cells' expression profile, highlighting the mRNA presence of genes linked to neurodevelopment, biliary function, and hematopoietic niche potential. Different aspects of this analysis were then addressed by protein staining of selected marker proteins in the mouse liver tissue. Next, the authors addressed how the PLC and biliary system react to CCL4-induced liver fibrosis, implying PLC dynamically extends, acting as a scaffold that guides the migration and expansion of terminal bile ducts and sympathetic nerve fibers into the hepatic parenchyma upon injury.

      The work clearly demonstrates the usefulness of the Liver-CUBIC technique and the improvement of both resolution and complexity of the information, gained by simultaneous visualization of multiple vascular and biliary systems of the liver at the same time. The identification of PLC and the interpretation of its function represent an intriguing set of observations that will surely attract the attention of liver biologists as well as hepatologists; however, some claims need more thorough assessment by functional experimental approaches to decipher the functional molecules and the sequence of events before establishing the PLC as the key hub governing the activity of biliary, arterial, and neuronal liver systems. Similarly, the level of detail of the methods section does not appear to be sufficient to exactly recapitulate the performed experiments, which is of concern, given that the new technique is a cornerstone of the manuscript.

      Nevertheless, the work does bring a clear new insight into the liver structure and functional units and greatly improves the methodological toolbox to study it even further, and thus fully deserves the attention of readers.

      Strengths:

      The authors clearly demonstrate an improved technique tailored to the visualization of the liver vasulo-biliary architecture in unprecedented resolution.

      This work proposes a new biological framework between the portal vein, hepatic arteries, biliary tree, and intrahepatic innervation, centered at previously underappreciated protrusions of the portal veins - the Periportal Lamellar Complexes (PLCs).

      Weaknesses:

      Possible overinterpretation of the CD34+Sca1+ findings was built on re-analysis of one scRNAseq dataset.

      Lack of detail in the materials and methods section greatly limits the usefulness of the new technique to other researchers.

      We thank the reviewer for this important comment. We agree that when conclusions are mainly based on a single dataset, overinterpretation should be avoided. In response to this concern, we have carefully re-evaluated and clearly limited the scope of our interpretation of the scRNA-seq analysis. In addition, we performed a validation analysis using an independent single-cell RNA-seq dataset (see new Figure S6), which consistently confirmed the presence and characteristic transcriptional profile of the periportal CD34⁺Sca1⁺ endothelial cell population. These supplementary analyses strengthen the robustness of our findings and address the reviewer’s concern regarding potential overinterpretation.

      In the revised manuscript, we have also greatly expanded the Materials and Methods section by providing detailed information on sample preparation, imaging parameters, data processing workflow, and single-cell reanalysis procedures. These revisions substantially improve the transparency and reproducibility of our methodology, thereby enhancing the usability and reference value of this technique for other researchers.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      Introduction

      (1) In general, the Introduction is very lengthy and repetitive. It needs extensive shortening to a maximum of 2 A4 pages.

      We thank the reviewer for the valuable suggestions. We have thoroughly condensed and restructured the Introduction, removing redundant content and merging related paragraphs to make the theme more focused and the logic clearer. The revised Introduction has been shortened to within two A4 pages, emphasizing the scientific question, innovation, and technical approach of the study.

      (2) Please correct this erroneous sentence:

      '...the liver has evolved the most complex and densely n organized vascular network in the body, consisting primarily of the portal vein system, central vein system, hepatic artery system, biliary system, and intrahepatic autonomic nerve network [6, 7].'

      We thank the reviewer for pointing out this spelling error. The revised sentence is as follows:

      “…the liver has evolved the most complex and densely organized ductal-vascular network in the body, consisting primarily of the portal vein system, central vein system, hepatic artery system, biliary system, and intrahepatic autonomic nerve network [6, 7].”

      (3) '...we achieved a 63.89% improvement in clearing efficiency and a 20.12% increase in tissue transparency'

      Please clarify what you exactly mean by 'clearing efficiency' and 'increased tissue transparency'.

      We thank the reviewer for the valuable comments and have clarified the relevant terminology in the revised manuscript.

      “Clearing efficiency” refers to the improvement in the time required for the liver tissue to become completely transparent when treated with the optimized Liver-CUBIC protocol (40% urea + H₂O₂), compared with the conventional CUBIC method. In this study, the clearing time was reduced from 9 days to 3.25 days, representing a 63.89% increase in time efficiency.

      “Tissue transparency” refers to the ability of the cleared tissue to transmit visible light. We quantified the optical transparency by measuring light transmittance across the 400–900 nm wavelength range using a microplate reader. The results showed that the average transmittance increased by 20.12%, indicating that Liver-CUBIC treatment markedly enhanced the optical clarity of the liver tissue.

      (4) I am concerned about claiming this imaging method as real '3D imaging'. Namely, while the authors clear full lobes, they actually cut the cleared lobes into 200-micrometer-thick slices and perform further microscopy imaging on these slices. Considering that they focus on ductular structures of the liver (such as vasculature, bile duct system, and innervations), 200 micrometer allows a very limited 3D overview, particularly in comparison with the whole-mount immuno-imaging methods combined with light sheet microscopy (such as Adori 2021, Liu 2021, etc). In this context, I feel several parts of the Introduction to be an overstatement: besides of emphasizing the advantages of the technique (such as simultaneous visualization of different hepatic vascular compartments and the bile duct system by MCNPs, the combination with immunostainings), the authors must honestly discuss the limitations (such as limited tissue overview, potential dye perfusion problems - uneven distribution of the dye etc).

      We appreciate the reviewer’s insightful comments. It is true that most of the imaging depth in this study was limited to approximately 200 μm, and thus it could not achieve whole-liver three-dimensional imaging comparable to light-sheet microscopy. However, the primary focus of our study was to resolve the microscopic intrahepatic architecture, particularly the spatial relationships among blood vessels, bile ducts, and nerve fibers. Through high-resolution imaging of thick tissue sections, combined with MCNP-based multichannel labeling and immunofluorescence co-staining, we were able to accurately delineate the three-dimensional distribution of these microstructures within localized regions.

      In addition to thick-section imaging, we also obtained whole-lobe dye perfusion data (as shown in Figure S1F), which comprehensively depict the three-dimensional branching patterns and distribution of the vascular systems within the liver lobe. These images were acquired from intact liver lobes perfused with MCNP dyes, revealing a continuous vascular network extending from major trunks to peripheral branches, thereby demonstrating that our approach is also capable of achieving organ-level visualization.

      We have added this image and a corresponding description in the revised manuscript to more comprehensively present the coverage of our imaging system, and we have incorporated this clarification into the Discussion section.

      Method

      (5) More information may be needed about MCNPs:

      a) As reported, there are nanoparticles with different colors in brightfield microscopy, but the particles are also excitable in fluorescence microscopy. Would you please provide a summary about excitation/emission wavelengths of the different MCNPs? This is crucial to understand to what extent the method is compatible with fluorescence immunohistochemistry.

      We thank the reviewer for the careful attention and professional suggestion. We fully agree that this issue is critical for evaluating the compatibility of our method with fluorescent immunohistochemistry. Different types of metal compound nanoparticles (MCNPs) have clearly distinguishable spectral properties:

      - MCNP-Green and MCNP-Yellow: AF488-matched spectra, with excitation/emission wavelengths of 495/519 nm.

      - MCNP-Pink: Designed for far-red spectra, with excitation/emission wavelengths of 561/640 nm.

      - MCNP-Black: Non-fluorescent, appearing black under bright-field microscopy only.

      The above information has been added to the Materials and Methods section.

      b) Also, is there more systematic information available concerning the advantage of these particles compared to 'traditional' fluorescence dyes, such as Alexa fluor or Cy-dyes, in fluorescence microscopy and concerning their compatibility with various tissue clearing methods (e.g., with the frequently used organic-solvent-based methods)?

      We thank the reviewer for the detailed question. Compared with conventional organic fluorescent dyes, MCNP offers the following advantages:

      - Enhanced photostability: Its inorganic core-shell structure resists fading even after hydrogen peroxide bleaching.

      - High signal stability: Fluorescence is maintained during aqueous-based clearing (e.g., CUBIC) and multiple rounds of staining without quenching.

      We appreciate the reviewer’s suggestion. In our Liver-CUBIC system, MCNP nanoparticles exhibited excellent multi-channel labeling stability and fluorescence signal retention. Regarding compatibility with other clearing methods (e.g., SCAFE, SeeDB, CUBIC), since these methods have limited effectiveness for whole-liver clearing (see Figure 2 of Tainaka, et al. 2014) and cannot meet the requirements for high-resolution microstructural imaging in this study, we consider further testing of their compatibility unnecessary.

      In summary, MCNP dye demonstrates superior signal stability and spectral separation compared with conventional organic fluorescent dyes in multi-channel, long-term, high-transparency three-dimensional tissue imaging.

      c) When you perfuse these particles, to which structures do they bind inside the ducts (vessels, bile ducts)? Is the 48h post-fixation enough to keep them inside the tubes/bind them to the vessel walls? Is there any 'wash-out' during the complex cutting/staining procedure? E.g., in Figure 2D: the 'classical' hepatic artery in the portal triad is not visible - but the MCNP apparently penetrated to the adjacent sinusoids at the edge of the lobulus. Also, in Figure 3B, there is a significant mismatch between the MNCP-green (bile duct) signal and the CD19 (epithelium marker) immunostaining. Please discuss these.

      The experimental results showed that following injection, MCNP nanoparticles primarily remained within the vascular and biliary lumens, and their tissue distribution depended on physical perfusion. No dye signal was observed to diffuse into the surrounding parenchyma, nor did the particles adhere to cell surfaces or enter cells. The newly added Supplementary Figures S2A–H further confirm this feature: the dye signal is strictly confined within the lumens, clearly delineating continuous vascular paths and biliary branching patterns, strongly supporting the conclusion that “MCNP dye is distributed only within luminal spaces.”

      Thus, MCNP dye mainly serves as an intraluminal tracer rather than a label for specific cell types.

      We provide the following explanations and analyses regarding MCNP distribution in the hepatic vascular and biliary systems and its post-fixation stability:

      - Potential signal displacement during sectioning/immunostaining: During slicing and immunostaining, a small number of particles may be washed away due to mechanical cutting or washing steps; however, the overall three-dimensional structure retains high spatial fidelity.

      - Observation in Figure 2D: MCNP was seen entering the sinusoidal spaces at the lobule periphery, but hepatic arteries were not visible, likely due to limitations in section thickness. Although arteries were not apparent in this slice, arterial distribution around the portal vein is visible in Figure 2C. It should be noted that Figures 2C, D, and E do not represent whole-liver imaging, so not all regions necessarily contain visible hepatic arteries. For easier identification, the main hepatic artery trunk is highlighted in cyan in Figure 2E.

      - Incomplete biliary signal in Figure 3B: This may be because CK19 labeling only covers biliary epithelial cells, whereas MCNP-green distributes throughout the biliary lumen. In Figure 3B, the terminal MCNP-green signal exhibits irregular polygonal structures, which we interpret as the canalicular regions.

      (6) Which fixative was used for 48h of postfixation (step 6) after MCNP injections?

      After MCNP injection, mouse livers were post-fixed in 4% paraformaldehyde (PFA) for 48 hours. This fixation condition effectively “locks” the MCNP particles within the vascular and biliary lumens, maintaining their spatial positions, while also being compatible with subsequent sectioning and multi-channel immunostaining analyses.

      The above information has been added to the Materials and Methods section

      (7) What is the 'desired thickness' in step 7? In the case of immunostained tissue, a 200-micrometer slice thickness is mentioned. However, based on the Methods, it is not completely clear what the actual thickness of the tissue was that was examined ultimately in the microscopes, and whether or not the clearing preceded the cutting or vice versa.

      We appreciate the reviewer’s question. The “desired thickness” referred to in step 7 of the manuscript corresponds to the thickness of tissue sections used for immunostaining and high-resolution microscopic imaging, which is typically around 200 µm. We selected 200 µm because this thickness is sufficient to observe the PLC structure in its entirety, allows efficient staining, and preserves tissue architecture well. Other researchers may choose different section thicknesses according to their experimental needs.

      In this study, the processing order for immunostained tissue samples was sectioning followed by clearing, as detailed below:

      Section Thickness

      To ensure antibody penetration and preservation of three-dimensional structure, tissue sections were typically cut to ~200 µm. Thicker sections can be used if more complete three-dimensional structures are required, but adjustments may be needed based on antibody penetration and fluorescence detection conditions.

      Clearing Sequence

      After sectioning, slices were processed using the Liver-CUBIC aqueous-based clearing system.

      (8) More information is needed concerning the 'deep-focus microscopy' (Keyence), the applied confocal system, and the THUNDER 'high resolution imaging system': basic technical information, resolutions, objectives (N.A., working distance), lasers/illumination, filters, etc.

      In this study, all liver lobes (left, right, caudate, and quadrate lobes) were subjected to Liver-CUBIC aqueous-based clearing to ensure uniform visualization of MCNP fluorescence and immunolabeling throughout the three-dimensional imaging of the entire liver.

      The above information has been added to the Materials and Methods section.

      Imaging Systems and Settings

      VHX-6000 Extended Depth-of-Field Microscope: Objective: VH-Z100R, 100×–1000×; resolution: 1 µm (typical); illumination: coaxial reflected; transmitted illumination on platform: ON.

      Zeiss Confocal Microscope (980): Objectives: 20× or 40×; image size: 1024 × 1024. Fluorescence detection was set up in three channels:

      - Channel 1: 639 nm laser, excitation 650 nm, emission 673 nm, detection range 673–758 nm, corresponding to Cy5-T1 (red).

      - Channel 2: 561 nm laser, excitation 548 nm, emission 561 nm, detection range 547–637 nm, corresponding to Cy3-T2 (orange).

      - Channel 3: 488 nm laser, excitation 493 nm, emission 517 nm, detection range 490–529 nm, corresponding to AF488-T3 (green).

      Leica THUNDER Imager 3D Tissue: Fluorescence detection in two channels:

      - Channel 1: FITC channel (excitation 488 nm, emission ~520 nm).

      - Channel 2: Orange-red channel (excitation/emission 561/640 nm).<br /> Equipped with matching filter sets to ensure signal separation.

      The above information has been added to the Materials and Methods section.

      (9) Liver-CUBIC, step 2: which lobe(s) did you clear (...whole liver lobes...).

      In this study, all liver lobes (left, right, caudate, and quadrate lobes) were subjected to Liver-CUBIC aqueous-based clearing to ensure uniform visualization of MCNP fluorescence and immunolabeling throughout the three-dimensional imaging of the entire liver.

      The above information has been added to the Materials and Methods section.

      (10) For the DAB and TSA IHC stainings, did you use free-floating slices, or did you mount the vibratome sections and do the staining on mounted sections?

      In this study, fixed livers were first sectioned into thick slices (~200 µm) using a vibratome. Subsequently, DAB and TSA immunohistochemical (IHC) staining were performed on free-floating sections. During the entire staining process, the slices were kept floating in the solutions, ensuring thorough antibody penetration in the thick sections while preserving the three-dimensional tissue architecture, thereby facilitating multiple rounds of staining and three-dimensional imaging.

      (11) Regarding the 'transmission quantification': this was measured on 1 mm thick slices. While it is interesting to make a comparison between different clearing methods in general, one must note that it is relatively easy to clear 1mm thick tissue slices with almost any kind of clearing technique and in any tissues. The 'real' differences come with thicker blocks, such as >5mm in the thinnest dimension. Do you have such experiences (e.g., comparison in whole 'left lateral liver lobes')?

      In this study, we performed three-dimensional visualization of entire liver lobes to depict the distribution of MCNPs and the overall spatial architecture of the vascular and biliary systems (Figure S1F). However, due to the limitations of the plate reader and fluorescence imaging systems in terms of spatial resolution and light penetration depth, quantitative analyses were conducted only on tissue sections approximately 1 mm thick.

      Regarding the comparative quantification of different clearing methods, as the reviewer noted, nearly all aqueous- or organic solvent–based clearing techniques can achieve relatively uniform transparency in 1 mm-thick tissue sections, so differences at this thickness are limited. We have not yet conducted systematic comparisons on whole-lobe sections thicker than 5 mm and therefore cannot provide “true” difference data for thicker tissues.

      (12) There is no method description for the ELMI studies in the Methods.

      Transmission Electron Microscopy (TEM) Analysis of MCNPs

      Before imaging, the MCNP dye solution was centrifuged at 14,000 × g for 10 minutes at 4 °C to remove aggregates and impurities. The supernatant was collected, diluted 50-fold, and 3–4 μL of the sample was applied onto freshly glow-discharged Quantifoil R1.2/1.3 copper grids (Electron Microscopy Sciences, 300 mesh). The sample was allowed to sit for 30 seconds to enable particle adsorption, after which excess liquid was gently wicked away with filter paper and the grid was air-dried at room temperature. The sample was then negatively stained with 1% uranyl acetate for 30 seconds and air-dried again before imaging.

      Negative-stain TEM images were acquired using a JEOL JEM-1400 transmission electron microscope operating at 120 kV and equipped with a CCD camera. Data acquisition followed standard imaging conditions.

      The above information has been added to the Materials and Methods section.

      (13) Please, provide a method description for the applied CCl4 cirrhosis model. This is completely missing.

      (1) Under a fume hood, carbon tetrachloride (CCl₄) was dissolved in corn oil at a 1:3 volume ratio to prepare a working solution, which was filtered through a 0.2 μm filter into a 30 mL glass vial. In our laboratory, to mimic chronic injury, mice in the experimental group were intraperitoneally injected at a dose of 1 mL/kg body weight per administration.

      (2) Mice were carefully removed from the cage and placed on a scale to record body weight for calculation of the injection volume.

      (3) The needle cap was carefully removed, and the required volume of the pre-prepared CCl₄ solution was drawn into the syringe. The syringe was gently flicked to remove any air bubbles.

      (4) Mice were placed on a textured surface (e.g., wire cage) and restrained. When the mouse was properly positioned, ideally with the head lowered about 30°, the left lower or right lower abdominal quadrant was identified.

      (5) Holding the syringe at a 45° angle, with the bevel facing up, the needle was inserted approximately 4–5 mm into the abdominal wall, and the calculated volume of CCl₄ was injected.

      (6) Mice were returned to their cage and observed for any signs of discomfort.

      (7) Needles and syringes were disposed of in a sharps container without recapping. A new syringe or needle was used for each mouse.

      (8) To establish a progressive liver fibrosis model, injections were administered twice per week (e.g., Monday and Thursday) for 3 or 6 consecutive weeks (n=3 per group). Control mice were injected with an equal volume of corn oil for 3 or 6 weeks (n=3 per group).

      (9) Forty-eight hours after the last injection, mice were euthanized by cervical dislocation, and livers were rapidly harvested. Portions of the liver were processed for paraffin embedding and histological sectioning, while the remaining tissue was either immediately frozen or used for subsequent molecular biology analyses.

      The above information has been added to the Materials and Methods section.

      (14) Please provide a method description for the quantifications reported in Figures 5D, 5F, and 6E.

      ImageJ software was used to analyze 3D stained images (Figs. 5F, 6E), and the ultra-depth-of-field 3D analysis module was used to analyze 3D DAB images (Fig. 5D). The specific steps are as follows:

      Figure 5D: DAB-stained 3D images from the control group and the CCl<sub>4</sub> 6-week (CCl<sub>4</sub>-6W) group were analyzed. For each group, 20 terminal bile duct branch nodes were randomly selected, and the actual path distance along the branch to the nearest portal vein surface was measured. All measurements were plotted as scatter plots to reflect the spatial extension of bile ducts relative to the portal vein under different conditions.

      Figure 5F: TSA 3D multiplex-stained images from the control group, CCl<sub>4</sub> 3-week (CCl<sub>4</sub>-3W), and CCl<sub>4</sub> 6-week (CCl<sub>4</sub>-6W) groups were analyzed. For each group, 5 terminal bile duct branch nodes were randomly selected, and the actual path distance along the branch to the nearest portal vein surface was measured. Measurements were plotted as scatter plots to illustrate bile duct spatial extension.

      Figure 6E: TSA 3D multiplex-stained images from the control, CCl<sub>4</sub>-3W, and CCl<sub>4</sub>-6W groups were analyzed. For each group, 5 terminal nerve branch nodes were randomly selected, and the actual path distance along the branch to the nearest portal vein surface was measured. Scatter plots were generated to depict the spatial distribution of nerves under different treatment conditions.

      (15) Please provide a method description for the human liver samples you used in Figure S6. Patient data, fixation, etc...

      The human liver tissue samples shown in Figure S6 were obtained from adjacent non-tumor liver tissues resected during surgical operations at West China Hospital, Sichuan University. All samples used were anonymized archived tissues, which were applied for scientific research in accordance with institutional ethical guidelines and did not involve any identifiable patient information. After being fixed in 10% neutral formalin for 24 hours, the tissues were routinely processed for paraffin embedding (FFPE), and sectioned into 4 μm-thick slices for immunostaining and fluorescence imaging.

      Results

      (16) While it is stated in the Methods that certain color MCNPs were used for labelling different structures (i.e., yellow: hepatic artery; green: bile duct; portal vein: pink; central veins: black), in some figures, apparently different color MCNPs are used for the respective structures. E.g., in Figure 1J, the artery is pink and the portal vein is green. Please clarify this.

      The color assignment of MCNP dyes is not fixed across different experiments or schematic illustrations. MCNP dyes of different colors are fundamentally identical in their physical and chemical properties and do not exhibit specific binding or affinity for particular vascular structures. We select different colors based on experimental design and imaging presentation needs to facilitate distinction and visualization, thereby enhancing recognition in 3D reconstruction and image display. Therefore, the color labeling in Figure 1F is primarily intended to illustrate the distribution of different vascular systems, rather than indicating a fixed correspondence to a specific dye or injection color.

      (17) In Figure 1J, the hepatic artery is extremely shrunk, while the portal vein is extremely dilated - compared to the physiological situation. Does it relate to the perfusion conditions?

      We appreciate the reviewer’s attention. In fact, under normal physiological conditions, the hepatic arteries labeled by CD31 are naturally narrow. Therefore, the relatively thin hepatic arteries and thicker portal veins shown in Figure 1J are normal and unrelated to the perfusion conditions. See figure 1E of Adori et al., 2021.

      (18) Re: MCNP-black labelled 'oval fenestrae': the Results state 50-100 nm, while they are apparently 5-10-micron diameter in Figure 1I. Accordingly, the comparison with the ELMI studies in the subsequent paragraph is inappropriate.

      We thank the reviewer for the correction. The previous statement was a typographical error. In fact, the diameter of the “elliptical windows” marked by MCNP-black is 5–10 μm, so the diameter of 5–10 μm shown in Figure 1I is correct.

      (19) Please, correct this erroneous sentence: 'Pink marked the hepatic arterial system by injection extrahepatic duct (Figure 2B).'

      Original sentence: “The hepatic arterial system was labeled in pink by injection through the extrahepatic duct (Figure 2B).”

      Revised sentence: “The hepatic arterial system was labeled in pink by injection through the left ventricle (Figure 2B).”

      (20) How do you define the 'primary portal vein tract'?

      We thank the reviewer for the question. The term “primary portal vein tract” refers to the first-order branches of the portal vein that enter the liver from the hepatic hilum. These are the major branches arising directly from the main portal vein trunk and are responsible for supplying blood to the respective hepatic lobes. This definition corresponds to the concept of the first-order portal vein in hepatic anatomy.

      (21) I am concerned that the 'periportal lamellar complex (PLC)' that the Authors describe really exists as a distinct anatomical or functional unit. I also see these in 3D scans - in my opinion, these are fine, lower-order portal vein branches that connect the portal veins to the adjacent sinusoid. The strong MCNP-labelling of these structures may be caused by the 'sticking' of the perfused MCNP solutions in these 'pockets' during the perfusion process. What do these structures look like with SMA or CD31 immunostaining? Also, one may consider that the anatomical evaluation of these structures may have limitations in tissue slices. Have you ever checked MCNP-perfused, cleared full live lobes in light sheet microscope scans? I think this would be very useful to have a comprehensive morphological overview. Unfortunately, based on the presented documentation, I am also not convinced that PLCs are 'co-localize' with fine terminal bile duct branches (Figure 3E, S3C), or with TH+ 'neuronal bead chain networks' (Fig 6C). More detailed and more convincing documentation is needed here.

      We thank the reviewer for the detailed comments. Regarding the existence and function of the periportal lamellar complex (PLC), our observations are based on MCNP-Pink labeling of the portal vein, through which we were able to identify the PLC structure surrounding the portal branches. It should be noted that the PLC represents a very small anatomical structure. Although we have not yet performed light-sheet microscopy scanning, we anticipate that such imaging would primarily visualize larger portal vein branches. Nevertheless, this does not affect our overall conclusions.

      We also appreciate the reviewer’s suggestion that the observed structures might result from MCNP adherence during perfusion. To verify the structural characteristics of the PLC, we performed immunostaining for SMA and CD31, which revealed a specific arrangement pattern of smooth muscle and endothelial markers rather than simple perfusion-induced deposition (Figures 4F and S6B).

      Regarding the apparent colocalization of the PLC with terminal bile duct branches (Figures 3E and S3C) and TH⁺ neuronal bead-like networks (Figure 6C), we acknowledge that current literature evidence remains limited. Therefore, we have carefully described these observations as possible spatial associations rather than definitive conclusions. Future studies integrating high-resolution three-dimensional imaging with functional analyses will help to further clarify the anatomical and physiological significance of the PLC.

      (22) 'Extended depth-of-field three-dimensional bright-field imaging revealed a strict 1:1 anatomical association between the primary portal vein trunk (diameter 280 {plus minus} 32 μm) and the first-order bile duct (diameter 69 {plus minus} 8 μm) (Figures 3A and S3A)'.

      How do you define '1:1 anatomical association'? How do you define and identify the 'order' (primary, secondary) of vessel and bile duct branches in 200-micrometer slices?

      We thank the reviewer for the question. In this study, the term “1:1 anatomical correlation” refers to the stable paired spatial relationship between the main portal vein trunk and its corresponding primary bile duct within the same portal territory. In other words, each main portal vein branch is accompanied by a primary bile duct of matching branching order and trajectory, together forming a “vascular–biliary bundle.”

      The definitions of “primary” and “secondary” branches were based on extended-depth 3D bright-field reconstructions, considering both branching hierarchy and vessel/duct diameters: primary branches arise directly from the main trunk at the hepatic hilum and exhibit the largest diameters (averaging 280 ± 32 μm for the portal vein and 69 ± 8 μm for the bile duct), whereas secondary branches extend from the primary branches toward the lobular interior with smaller calibers.

      (23) In my opinion, the applied methodical approach in the single cell transcriptomics part (data mining in the existing liver single cell database and performing Venn diagram intersection analysis in hepatic endothelial subpopulations) is largely inappropriate and thus, all the statements here are purely speculative. In my opinion, to identify the molecular characteristics of such small and spatially highly organized structures like those fine radial portal branches, the only way is to perform high-resolution spatial transcriptomic.

      We thank the reviewer for the comment. We fully acknowledge the importance of high-resolution spatial transcriptomics in identifying the fine structural characteristics of portal vein branches. Due to current funding and technical limitations, we were unable to perform such high-resolution spatial transcriptomic analyses. However, we validated the molecular features of the PLC using another publicly available liver single-cell RNA-sequencing dataset, which provided preliminary supporting evidence (Figures S6B and S6C). In the manuscript, we have carefully stated that this analysis is exploratory in nature and have avoided overinterpretation. In future studies, high-resolution spatial omics approaches will be invaluable for more precisely delineating the molecular characteristics of these fine structures.

      (24) 'How the autonomic nervous system regulates liver function in mice despite the apparent absence of substantive nerve fiber invasion into the parenchyma remains unclear.'

      Please consider the role of gap junctions between hepatocytes (e.g., Miyashita, 1991; Seseke, 1992).

      In this study, we analyzed the spatial distribution of hepatic nerves in mice using immunofluorescence staining and found that nerve fibers were almost exclusively confined to the portal vein region (Figure S6A). Notably, this distribution pattern differs markedly from that in humans. Previous studies have shown that, in human livers, nerves are not only located around the portal veins but also present along the central veins, interlobular septa, and within the parenchymal connective tissue (Miller et al., 2021; Yi, la Fleur, Fliers & Kalsbeek, 2010).

      Further research has provided a physiological explanation for this interspecies difference: even among species with distinct sympathetic innervation patterns in the parenchyma—i.e., with or without direct sympathetic input—the sympathetic efferent regulatory functions may remain comparable (Beckh, Fuchs, Ballé & Jungermann, 1990). This is because signals released from aminergic and peptidergic nerve terminals can be transmitted to hepatocytes through gap junctions as electrical signals (Hertzberg & Gilula, 1979; Jensen, Alpini & Glaser, 2013; Seseke, Gardemann & Jungermann, 1992; Taher, Farr & Adeli, 2017).

      However, the scarcity of nerve fibers within the mouse hepatic parenchyma suggests that the mechanisms by which the autonomic nervous system regulates liver function in mice may differ from those in humans. This observation prompted us to further investigate the potential role of PLC endothelial cells in this process.

      (25) Please, correct typos throughout the text.

      We thank the reviewer for this comment. We have carefully proofread the entire manuscript and corrected all typographical errors and minor language issues throughout the text.

      Reviewer #3 (Recommendations for the authors):

      (1) A strong recommendation - the authors ought to challenge their scRNAsq- re-analysis with another scRNAseq dataset, namely a recently published atlas of adult liver endothelial, but also mesenchymal, immune, and parenchymal cell populations https://pubmed.ncbi.nlm.nih.gov/40954217/, performed with Smart-seq2 approach, which is perfectly suitable as it brings higher resolution data, and extensive cluster identity validation with stainings. Pietilä et al. indicate a clear distinction of portal vein endothelial cells into two populations that express Adgrg6, Jag1 (e2c), from Vegfc double-positive populations (e5c and e2c). Moreover, the dataset also includes the arterial endothelial cells that were shown to be part of the PLC, but were not followed up with the scRNAseq analysis. This distinction could help the authors to further validate their results, better controlling for cross-contaminations that may occur during scRNAseq preparation.

      We thank the reviewer for the valuable suggestion. As noted, we have further validated the molecular characteristics of the PLC using a recently published atlas of adult liver endothelial cells (Pietilä et al., 2023, PMID: 40954217). This dataset, generated using the Smart-seq2 technique, provides high-resolution transcriptomic profiles. By analyzing this dataset, we identified a CD34⁺LY6A⁺ portal vein endothelial cell population within the e2 cluster, which is localized around the portal vein. We then examined pathways and gene expression patterns related to hematopoiesis, bile duct formation, and neural signaling within these cells. The results revealed gene enrichment patterns consistent with those observed in our primary dataset, further supporting the robustness of our analysis of the PLC’s molecular characteristics.

      (2) Improving the methods section is highly recommended, this includes more detailed information for material and protocols used - catalog numbers; protocol details of the usage - rocking platforms, timing, and tubes used for incubations; GitHub or similar page with code used for the scRNA seq re-analysis.

      We thank the reviewer for the valuable suggestion. We have added more detailed information regarding the materials and experimental procedures in the Methods section, including catalog numbers, incubation conditions (such as the type of shaker, incubation time, and tube specifications), and other relevant parameters.

      (3) In Figure 2A, the authors claim the size of the nanoparticle is 100nm, while based on the image, the size is ~150-180nm. A more thorough quantification of the particle size would help users estimate the usability of their method for further applications.

      We thank the reviewer for the comment. In the TEM image shown in Figure 2A, the nanoparticles indeed appear to be approximately 150–200 nm in size. We have re-verified the particle dimensions and will update the corresponding description in the Methods section to allow readers to more accurately assess the applicability of this approach.

      (4) In Figure 3E, it is not clear what is labeled by the pink signal. Please consider labeling the structures in the figure.

      We thank the reviewer for the valuable comment. The pink signal in Figure 3E was originally intended to label the hepatic artery. However, a slight spatial misalignment occurred during the labeling process, making its position appear closer to the central vein rather than the portal vein in the image. To avoid misunderstanding, we will add clear annotations to the image and clarify this deviation in the figure legend in the revised version. It should also be noted that this figure primarily aims to illustrate the spatial relationship between the bile duct and the portal vein, and this minor deviation does not affect the reliability of our experimental conclusions.

      (5) The following statement is not backed by quantification as it ought to be „Dual-channel three-dimensional confocal imaging combined with CK19 immunostaining revealed that the sites of dye leakage did not coincide with the CK19-positive terminal bile duct epithelium, but instead were predominantly localized within regions adjacent to the PLC structures".

      We thank the reviewer for the valuable comment. We have added the corresponding quantitative analysis to support this conclusion. Quantitative assessment of the extended-depth imaging data revealed that dye leakage predominantly occurred in regions adjacent to the PLC structure, rather than in the perivenous sinusoidal areas. The corresponding results have been presented in the revised Figure 3G.

      (6) Similarly, Figure 4F is central to the Sca1CD34 cell type identification but lacks any quantification, providing it would strengthen the key statement of the article. A possible way to approach this is also by FACS sorting the double-positive cells and bluk/qRT validation.

      We thank the reviewer for raising this point. We agree that quantitative validation of the Sca1⁺CD34⁺ population by FACS sorting could further support our conclusions. However, the primary focus of this study is on the spatial localization and transcriptional features of PLC endothelial cells. The identification of the Sca1⁺CD34⁺ subset is robustly supported by multiple complementary approaches, including three-dimensional imaging, co-staining with pan-endothelial markers, and projection mapping analyses. Collectively, these lines of evidence provide a solid basis for characterizing this unique endothelial population.

      (7) The images in Figure S4D are not comparable, as the Sca1-stained image shows a longitudinal section of the PV, but the other stainings are cross-sections of PVs.

      We thank the reviewer for the careful comment. We agree that the original Sca1-stained image, being a longitudinal section of the portal vein, was not optimal for direct comparison with other cross-sectional images. We have replaced it with a cross-sectional image of the portal vein to ensure comparability across all images. The updated image has been included in the revised Supplementary Figure S4D.

      (8) I might be wrong, but Figure 4J is entirely missing, and only a cartoon is provided. Either remove the results part or provide the data.

      We appreciate the reviewer’s careful observation. Figure 4J was intentionally designed as a schematic illustration to summarize the structural relationships and spatial organization of the portal vein, hepatic artery, and PLC identified in the previous panels (Figures 4A–4I). It does not represent newly acquired experimental data, but rather serves to provide a conceptual overview of the findings.

      To avoid misunderstanding, we have clarified this point in the figure legend and the main text, stating that Figure 4J is a schematic summary rather than an experimental image. Therefore, we respectfully prefer to retain the schematic figure to aid readers’ interpretation of the preceding results.

      (9) The methods section lacks information about the CCL4concentration, and it is thus hard to estimate the dosage of CCL4 received (ml/kg). This is important for the interpretation of the severity of the fibrosis and presence of cirrhosis, as different doses may or may not lead to cirrhosis within the short regimen performed by the authors [PMID: 16015684 DOI: 10.3748/wjg.v11.i27.4167]. Validation of the fibrosis/cirrhosis severity is, in this case, crucial for the correct interpretation of the results. If the level of cirrhosis is not confirmed, only progressive fibrosis should be mentioned in the manuscript, as these two terms cannot be used interchangeably.

      Thank you for the reviewer’s comment. We indeed omitted the information on the concentration of carbon tetrachloride (CCl<sub>4</sub>) in the Methods section. In our experiments, mice received intraperitoneal injections of CCl<sub>4</sub> at a dose of 1 mL/kg body weight, twice per week, for a total of six weeks. We have revised the manuscript accordingly, using the term “progressive fibrosis” to avoid confusion between fibrosis and cirrhosis.

      (10) The following statement is not backed by any correlation analysis: "Particularly during liver fibrosis progression, the PLC exhibits dynamic structural extension correlating with fibrosis severity,.. ".

      We thank the reviewer for the comment. The original statement that the “PLC correlates with fibrosis severity” lacked support from quantitative analysis. To ensure a precise description, we have revised the sentence as follows: “During liver fibrosis progression, the PLC exhibits dynamic structural extension.”

      (11) Similarly, the following statement is not followed by data that would address the impact of innervation on liver function: "How the autonomic nervous system regulates liver function in mice despite the apparent absence of substantive nerve fiber invasion into the parenchyma remains unclear.".

      This section has been revised. In this study, we analyzed the spatial distribution of nerves in the mouse liver using immunofluorescence staining. The results showed that nerve fibers were almost entirely confined to the portal vein region (Figure S6A). Notably, this distribution pattern differs significantly from that in humans. Previous studies have demonstrated that in the human liver, nerves are not only distributed around the portal vein but also present in the central vein, interlobular septa, and connective tissue of the hepatic parenchyma (Miller et al., 2021; Yi, la Fleur, Fliers & Kalsbeek, 2010).

      Previous studies have further explained the physiological basis for this difference: even among species with differences in parenchymal sympathetic innervation (i.e., species with or without direct sympathetic input), their sympathetic efferent regulatory functions may still be similar (Beckh, Fuchs, Ballé & Jungermann, 1990). This is because signals released by adrenergic and peptidergic nerve terminals can be transmitted to hepatocytes as electrical signals through intercellular gap junctions (Hertzberg & Gilula, 1979; Jensen, Alpini & Glaser, 2013; Seseke, Gardemann & Jungermann, 1992; Taher, Farr & Adeli, 2017). However, the scarcity of nerve fibers in the mouse hepatic parenchyma suggests that the mechanism by which the autonomic nervous system regulates liver function in mice may differ from that in humans. This finding also prompts us to further explore the potential role of PLC endothelial cells in this process.

      (12) Could the authors discuss their interpretation of the results in light of the fact that the innervation is lower in cirrhotic patients? https://pmc.ncbi.nlm.nih.gov/articles/PMC2871629/. Also, while ADGRG6 (Gpr126) may play important roles in liver Schwann cells, it is likely not through affecting myelination of the nerves, as the liver nerves are not myelinated https://pubmed.ncbi.nlm.nih.gov/2407769/ and https://www.pnas.org/doi/10.1073/pnas.93.23.13280.

      We have revised the text to state that although most hepatic nerves are unmyelinated, GPR126 (ADGRG6) may regulate hepatic nerve distribution via non-myelination-dependent mechanisms. Studies have shown that GPR126 exerts both Schwann cell–dependent and –independent functions during peripheral nerve repair, influencing axon guidance, mechanosensation, and ECM remodeling (Mogha et al., 2016; Monk et al., 2011; Paavola et al., 2014).

      (13) The manuscript would benefit from text curation that would:

      a) Unify the language describing the PLC, so it is clear that (if) it represents protrusions of the portal veins.

      We have standardized the description of the PLC throughout the manuscript, clearly specifying its anatomical relationship with the portal vein. Wherever appropriate, we indicate that the PLC represents protrusions associated with the portal vein, avoiding ambiguous or inconsistent statements.

      b) Increase the accuracy of the statements.

      Examples: "bile ducts, and the central vein in adult mouse livers."

      We have refined all statements for accuracy.

      c) Reduce the space given to discussion and results in the introduction, moving them to the respective parts. The same applies to the results section, where discussion occurs at more places than in the Discussion part itself.

      We have edited the Introduction, removing detailed results and functional explanations, and retaining only a concise overview.

      Examples: "The formation of PLC structures in the adventitial layer may participate in local blood flow regulation, maintenance of microenvironmental homeostasis, and vascular-stem cell interactions."

      "This finding suggests that PLC endothelial cells not only regulate the periportal microcirculatory blood flow, but also establish a specialized microenvironment that supports periportal hematopoietic regulation, contributing to stem cell recruitment, vascular homeostasis, and tissue repair. "

      "Together, these findings suggest the PLC endothelium may act as a key regulator of bile duct branching and fibrotic microenvironment remodeling in liver cirrhosis. " This one in particular would require further validation with protein stainings and similar, directly in your model.

      d) Provide a clear reference for the used scRNA seq so it's clear that the data were re-analyzed.

      Example: "single-cell transcriptomic analysis revealed significant upregulation of bile duct-related genes in the CD34<sup>+</sup>Sca-1<sup>+</sup> endothelium of PLC in cirrhotic liver, with notably high expression of Lgals1 (Galectin-1) and HGF(Figure 5G) "

      When describing the transcriptional analysis of PLC endothelial cells, we explicitly cited the original scRNA-seq dataset (Su et al., 2021), clarifying that these data were reanalyzed rather than newly generated.

      e) Introducing references for claims that, in places, are crucial for further interpretation of experiments.

      Examples: "It not only guides bile duct branching during development but also"; the authors show no data from liver development.

      Thank you for pointing this out. We have revised the relevant statement to ensure that the claim is accurate and well-supported.

      f) Results sentence "Instead, bile duct epithelial cells at the terminal ducts extended partially along the canalicular network without directly participating in the formation of the bile duct lumen." Lacks a callout to the respective Figure.

      We would like to thank the reviewers for pointing out this issue. In the revised manuscript, the relevant image (Figure 3D) has been clearly annotated with white arrows to indicate the phenomenon of terminal cholangiocytes extending along the bile canaliculi network. Additionally, the schematic diagram on the right side clearly shows the bile canaliculi, cholangiocytes, and bile flow direction using arrows and color coding, thus intuitively corresponding to the textual description.

      (14) Formal text suggestions: The manuscript text contains a lot of missed or excessive spaces and several typos that ought to be fixed. A few examples follow:

      a) "densely n organized vascular network "

      b) "analysis, while offering high spatial "

      c) "specific differences, In the human liver, "

      d) Figure 4F has a typo in the description.

      e) "generation of high signal-to-noise ratio, multi-target " SNR abbreviation was introduced earlier.

      f) Canals of Hering, CoH abbreviation comes much later than the first mention of the Canals of Hering.

      We thank the reviewer for the helpful comment regarding textual consistency. We have carefully reviewed and revised the entire manuscript to improve the accuracy, clarity, and consistency of the text.

    1. eLife Assessment

      In this valuable study, the authors present traces of bone modification on ~1.8 million-year-old proboscidean remains from Tanzania, which they infer to be the earliest evidence for stone-tool-assisted megafaunal consumption by hominins. Challenging published claims, the authors argue that persistent megafaunal exploitation roughly coincided with the earliest Achulean tools. Notwithstanding the rich descriptive and spatial data, the behavioral inferences about hominin agency rely on traces (such as bone fracture patterns and spatial overlap) that are not unequivocal; the evidence presented to support the inferences thus remains incomplete. Given the implications of the timing and extent of hominin consumption of nutritious and energy-dense food resources, as well as of bone toolmaking, the findings of this study will be of interest to paleoanthropologists and other evolutionary biologists.

    2. Reviewer #1 (Public review):

      Domínguez-Rodrigo and colleagues make a largely convincing case for habitual elephant butchery by Early Pleistocene hominins at Olduvai Gorge (Tanzania), ca. 1.8-1.7 million years ago. They present this at a site scale (the EAK locality, which they excavated), as well as across the penecontemporaneous landscape, analyzing a series of findspots that contain stone tools and large-mammal bones. The latter are primarily elephants, but giraffids and bovids were also butchered in a few localities.

      The authors claim that this is the earliest well-documented evidence for elephant butchery; doing so requires debunking other purported cases of elephant butchery in the literature, or in one case, reinterpreting elephant bone manipulation as being nutritional (fracturing to obtain marrow) rather than technological (to make bone tools). The authors' critical discussion of these cases may not be consensual, but it surely advances the scientific discourse. The authors conclude by suggesting that an evolutionary threshold was achieved at ca. 1.8 ma, whereby regular elephant consumption rich in fats and perhaps food surplus, more advanced extractive technology (the Acheulian toolkit), and larger human group size had coincided.

      The fieldwork and spatial statistics methods are presented in detail and are solid and helpful, especially the excellent description (all too rare in zooarchaeology papers) of bone conservation and preservation procedures. The results are detailed and clearly presented.

      The authors achieved their aims, showcasing recurring elephant butchery in 1.8-1.7 million-year-old archaeological contexts. The authors cautiously emphasize the temporal and spatial correlation of 1) elephant butchery, 2) Acheulian toolkits, and 3) larger sites, and discuss how these elements may be causally related.

      Overall, this is an interesting manuscript of broad interest that presents original data and interpretations from the Early Pleistocene archaeology of Olduvai Gorge. These observations and the authors' critical review of previously published evidence are an important contribution that will form the basis for building models of Early Pleistocene hominin adaptation.

    3. Reviewer #2 (Public review):

      The manuscript makes a valuable contribution to the Olduvai Gorge record, offering a detailed description of the EAK faunal assemblage. In particular, the paper provides a high-resolution record of a juvenile Elephas recki carcass, associated lithic artifacts, and several green-broken bone specimens. These data are inherently valuable and will be of significant interest to researchers studying Early Pleistocene taphonomy. My concerns do not relate to the quality or importance of the data themselves, but rather to the interpretive inferences drawn from these data, particularly regarding the strength of the claim for unambiguous proboscidean butchery.

      This review follows the authors' response to an earlier round of reviewer feedback and addresses points raised in that exchange. In their rebuttal, the authors state that some of my initial concerns reflect misunderstandings of their analysis, but after carefully re-reading both the manuscript and their responses, I do not believe this is the case.

      In their response, the authors state that they do not treat the EAK evidence as decisive, yet the manuscript repeatedly characterizes the assemblage in very definitive terms. For example, EAK is described as "the oldest unambiguous proboscidean butchery site at Olduvai" and as "the oldest secure proboscidean butchery evidence." These phrases communicate a high level of confidence that does not align with the more qualified position articulated in the rebuttal and extends beyond what the documented evidence securely supports.

      I appreciate the authors' clarification regarding the fracture features, and I agree that these are well-established outcomes of dynamic hammerstone percussion. At the same time, several of these traits have been documented in non-anthropogenic contexts, including helicoidal spiral fractures resulting from trampling and carnivore activity (Haynes 1983), adjacent or flake-like scars created by carnivore gnawing (Villa and Bartram 1996), hackled break surfaces produced by heavy passive breakage such as trampling or sediment pressure (Haynes 1983), and impact-related bone flakes observed in carnivore-modified assemblages (Coil et al. 2020). One of the biggest issues is that there is no quantitative data or images of the bone fracture features that the authors refer to as the main diagnostic criteria at EAK. The only figures that show EAK specimens (S21, S22, S23) illustrate general green-bone fracture morphology but none of the specific traits listed in the text. In contrast, clear examples of similar features come from other Olduvai assemblages, which may be misleading to readers if they mistakenly interpret those as images from EAK. The manuscript also states that these traits "co-occur," but it is not defined whether this refers to multiple features on the same fragment or within the broader assemblage. Without images or counts that document these traits on EAK fossils, readers cannot evaluate the strength of the interpretation. Including that information would substantially strengthen the manuscript.

      Regarding the statement that "natural elephant long limb breaks have been documented only in pre or peri-mortem stages when an elephant breaks a leg, and only in femora (Haynes et al., 2021)," it is not entirely clear what this example is intended to illustrate in relation to the EAK assemblage. My understanding is that the authors are suggesting that naturally produced green bone fractures in elephants are very limited, perhaps occurring only in pre or peri-mortem broken leg cases, and that fractures on other elements should therefore be attributed to hominin activity. If that is not the intended argument, I would encourage clarifying this point. This appears to conflate pre-mortem injury with the broader issue of equifinality. My original comment was not referring to pre-mortem breaks but to the range of natural (i.e., non-hominin) and post-mortem processes that can generate spiral or green bone fractures similar to those described by the authors.

      I fully understand the spatial analyses, and I realize that the association between bones and lithics is statistically significant. My original concern was not about whether the correlation exists, but about how that correlation is interpreted. That point still stands. Statistical co-occurrence cannot distinguish among the multiple depositional and post-depositional processes that can generate similar spatial patterns. However, I agree that the spatial correlation is intriguing, particularly when viewed alongside the possible butchery evidence. The pattern is notable and worthy of publication, even if the behavioral interpretation requires caution.

      Finally, in considering the authors' response on the Nyayanga material, I still find the basis for their dismissal of that evidence difficult to follow and the contrasting treatment of the Nyayanga and EAK evidence raises concerns about interpretive consistency. Plummer et al. (2023) specify that bone surface modifications were examined using low-power magnification (10×-40×) and strong light sources to identify modifications and that they attributed agency (e.g., hominin, carnivore) to modifications only after excluding possible alternatives. The rebuttal does not engage with the procedures reported. The existence of newer analytical techniques does not diminish the validity of long-standing methods that have been applied across many studies. It is also unclear why abrasion is presented as a more likely explanation than stone tool cutmarks. The authors dismiss the Nyayanga images as "blurry," but this is irrelevant to the interpretation, since the analysis was based on the fossils, not the photographs. The Nyayanga dataset is dismissed without a thorough engagement, while the EAK material, despite similar uncertainties and potential for alternative explanations, is treated as definitive.

      These concerns do not diminish the significance of the EAK assemblage, and addressing them would allow the interpretations to more fully reflect the scope of the available data.

      Literature Cited:<br /> Coil, R., Yezzi-Woodley, K., & Tappen, M. (2020). Comparisons of impact flakes derived from hyena and hammerstone long bone breakage. Journal of Archaeological Science, 120, 105167.

      Haynes, G. (1983). A guide for differentiating mammalian carnivore taxa responsible for gnaw damage to herbivore limb bones. Paleobiology, 9(2), 164-172.<br /> Haynes, G., Krasinski, K., & Wojtal, P. (2021). A study of fractured proboscidean bones in recent and fossil assemblages. Journal of Archaeological Method and Theory, 28(3), 956-1025.

      Plummer, T. W., et al. (2023). Expanded geographic distribution and dietary strategies of the earliest Oldowan hominins and Paranthropus. Science, 379(6632), 561-566.<br /> Villa, P., & Bartram, L. (1996). Flaked bone from a hyena den. Paléo, Revue d'Archéologie Préhistorique, 8(1), 143-159.

    4. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Domínguez-Rodrigo and colleagues make a moderately convincing case for habitual elephant butchery by Early Pleistocene hominins at Olduvai Gorge (Tanzania), ca. 1.8-1.7 million years ago. They present this at the site scale (the EAK locality, which they excavated), as well as across the penecontemporaneous landscape, analyzing a series of findspots that contain stone tools and large-mammal bones. The latter are primarily elephants, but giraffids and bovids were also butchered in a few localities. The authors claim that this is the earliest well-documented evidence for elephant butchery; doing so requires debunking other purported cases of elephant butchery in the literature, or in one case, reinterpreting elephant bone manipulation as being nutritional (fracturing to obtain marrow) rather than technological (to make bone tools). The authors' critical discussion of these cases may not be consensual, but it surely advances the scientific discourse. The authors conclude by suggesting that an evolutionary threshold was achieved at ca. 1.8 ma, whereby regular elephant consumption rich in fats and perhaps food surplus, more advanced extractive technology (the Acheulian toolkit), and larger human group size had coincided.

      The fieldwork and spatial statistics methods are presented in detail and are solid and helpful, especially the excellent description (all too rare in zooarchaeology papers) of bone conservation and preservation procedures. However, the methods of the zooarchaeological and taphonomic analysis - the core of the study - are peculiarly missing. Some of these are explained along the manuscript, but not in a standard Methods paragraph with suitable references and an explicit account of how the authors recorded bone-surface modifications and the mode of bone fragmentation. This seems more of a technical omission that can be easily fixed than a true shortcoming of the study. The results are detailed and clearly presented.

      By and large, the authors achieved their aims, showcasing recurring elephant butchery in 1.8-1.7 million-year-old archaeological contexts. Nevertheless, some ambiguity surrounds the evolutionary significance part. The authors emphasize the temporal and spatial correlation of (1) elephant butchery, (2) Acheulian toolkits, and (3) larger sites, but do not actually discuss how these elements may be causally related. Is it not possible that larger group size or the adoption of Acheulian technology have nothing to do with megafaunal exploitation? Alternative hypotheses exist, and at least, the authors should try to defend the causation, not just put forward the correlation. The only exception is briefly mentioning food surplus as a "significant advantage", but how exactly, in the absence of food-preservation technologies? Moreover, in a landscape full of aggressive scavengers, such excess carcass parts may become a death trap for hominins, not an advantage. I do think that demonstrating habitual butchery bears very significant implications for human evolution, but more effort should be invested in explaining how this might have worked.

      Overall, this is an interesting manuscript of broad interest that presents original data and interpretations from the Early Pleistocene archaeology of Olduvai Gorge. These observations and the authors' critical review of previously published evidence are an important contribution that will form the basis for building models of Early Pleistocene hominin adaptation.

      This is a good example of the advantages of the eLife reviewing process. It has become much too common, among traditional peer-reviewing journals, to reject articles when there is no coincident agreement in the reviews, regardless of the heuristics (i.e., empirically-supported weight) of the arguments on both reviewers. Reviewers 1 and 2 provide contrasting evaluations, and the eLife dialogue between authors and reviewers enable us to address their comments differentially. Reviewer 1 (R1), whose evaluation is overall positive, remarks that the methods of the zooarchaeological and taphonomic analysis are missing. We have added them now in the revised version of our manuscript. R1 also remarks that our work highlights correlation of events, but not necessarily causation. We did not establish causation because such interpretations bear a considerable amount of speculation (and they might have fostered further criticism by R2); however, in the revised version, we expanded our discussion of these issues substantially. Establishing causation among the events described is impossible, but we certainly provide arguments to link them.

      Reviewer #2 (Public review):

      The authors argue that the Emiliano Aguirre Korongo (EAK) assemblage from the base of Bed II at Olduvai Gorge shows systematic exploitation of elephants by hominins about 1.78 million years ago. They describe it as the earliest clear case of proboscidean butchery at Olduvai and link it to a larger behavioral shift from the Oldowan to the Acheulean.

      The paper includes detailed faunal and spatial data. The excavation and mapping methods appear to be careful, and the figures and tables effectively document the assemblage. The data presentation is strong, but the behavioral interpretation is not supported by the evidence.

      The claim for butchery is based mainly on the presence of green-bone fractures and the proximity of bones and stone artifacts. These observations do not prove human activity. Fractures of this kind can form naturally when bones break while still fresh, and spatial overlap can result from post-depositional processes. The studies cited to support these points, including work by Haynes and colleagues, explain that such traces alone are not diagnostic of butchery, but this paper presents them as if they were.

      The spatial analyses are technically correct, but their interpretation extends beyond what they can demonstrate. Clustering indicates proximity, not behavior. The claim that statistical results demonstrate a functional link between bones and artifacts is not justified. Other studies that use these methods combine them with direct modification evidence, which is lacking in this case.

      The discussion treats different bodies of evidence unevenly. Well-documented cut-marked specimens from Nyayanga and other sites are described as uncertain, while less direct evidence at EAK is treated as decisive. This selective approach weakens the argument and creates inconsistency in how evidence is judged.

      The broader evolutionary conclusions are not supported by the data. The paper presents EAK as marking the start of systematic megafaunal exploitation, but the evidence does not show this. The assemblage is described well, but the behavioral and evolutionary interpretations extend far beyond what can be demonstrated.

      We disagree with the arguments provided by Reviewer 2 (R2). The arguments are based on two issues: bone breakage and spatial association. We will treat both separately here.

      Bone breakage

      R2 argues that:

      “The claim for butchery is based mainly on the presence of green-bone fractures and the proximity of bones and stone artifacts. These observations do not prove human activity. Fractures of this kind can form naturally when bones break while still fresh, and spatial overlap can result from post-depositional processes. The studies cited to support these points, including work by Haynes and colleagues, explain that such traces alone are not diagnostic of butchery, but this paper presents them as if they were.”

      In our manuscript, we argued that green-breakage provides an equally good (or even  better) taphonomic evidence of butchery if documented following clear taphonomic indicators. Not all green breaks are equal and not all “cut marks” are unambiguously identifiable as such. First, “natural” elephant long limb breaks have been documented only in pre/peri-mortem stages when an elephant breaks a leg. As a matter of fact, they have only been reported in publication on femora, the thinnest long bone (Haynes et al., 2021). Unfortunately, they have been studied many months after the death of the individuals, and the published diagnosis is made under the assumption that no other process intervened in the modification of those bones during this vast time span. Most of the breaks resulting from pre-mortem fractures produce long smooth, oblique/helical outlines. Occasionally, some flake scarring may occur on the cortical surface. This has been documented as uneven, small-sized, spaced, and we are not sure if it resulted from rubbing of broken fragments while the animal was alive and attempting to walk or some may have resulted from dessication of the bone after one year. When looking at them in detail, such breaks contain sometimes step-microfractures and angular (butterfly-like) outlines. Sometimes, they may be accompanied by pseudo-notches, which are distinct and not comparable to the deep notches that hammerstone breaking generates on the same types of bones. Commonly, the edges of the breaks show some polishing, probably from separate break planes rubbing against each other. It should be emphasized that the experimental work on hammerstone breaking documented by Haynes et al. (2021) is based on bone fracture properties of bones that are no longer completely green. The cracking documented in their hammerstone experimentation, with very irregular outlines differs from the cracking that we are documented in butchery of recently dead elephants.

      All this contrasts with the overlapping notches and flake scars (mostly occurring on the medullary side of the bone), both of them bigger in size, with clear smooth, spiral and longitudinal trajectories, with a more intensive modification on the medullary surface, and with sharp break edges resulting from hammerstone breaking of the green bone. No “natural” break has been documented replicating the same morphologies displayed in the Supplementary File to our paper. We display specimens with inflection points, hackle marks on the breaks, overlapping scarring on the medullary surface, with several specimens displaying percussion marks and pitting (also most likely percussion marks). Most importantly, we document this patterned modification on elements other than femora, for which no example has been documented of purported morphological equifinality caused by pre-mortem “natural” breaking. In contrast, such morphologies are documented in hammerstone-broken completely green bones (work in progress). We cited the works of Haynes to support this, because they do not show otherwise. As a matter of fact, Haynes himself had the courtesy of making a thorough reading of our manuscript and did not encounter any contradiction with his work. 

      Spatial association

      R2 argues in this regard:

      “The spatial analyses are technically correct, but their interpretation extends beyond what they can demonstrate. Clustering indicates proximity, not behavior. The claim that statistical results demonstrate a functional link between bones and artifacts is not justified. Other studies that use these methods combine them with direct modification evidence, which is lacking in this case.”

      We should emphasize that there is some confusion in the use and interpretation of clustering by R2 when applied to EAK. R2 appears to interpret clustering as the typical naked-eye perception of the spatial association of different items. In contrast, we rely on the statistical concept of clustering, more specifically on spatial interdependence or covariance, which is different. Items may appear visually clustered but still be statistically independent. This could, for example, result from two independent depositional episodes that happen to overlap spatially. In such cases, the item-to-item relationship does not necessarily show any spatial interdependence between classes other than simple clustering (i.e., spatial coincidence in intensity).

      Spatial statistical interdependence, on the other hand, reflects a spatial relationship or co-dependence between different items. This goes beyond the mere fact that classes appear clustered: items between classes may show specific spatial relationships — they may avoid each other or occupy distinct positions in space (regular co-dependence), or they may interact within the same spatial area (clustering co-dependence). Our tests indicate the latter for EAK.

      Such patterns are difficult to explain when depositional events are unrelated, since the probability that two independent events would generate identical spatial patterns in the same loci is very low. They are also difficult to reconcile when post-depositional processes intervene and resediment part of the assemblage (Domínguez-Rodrigo et al. 2018).

      Finally, R2 concludes:

      “The discussion treats different bodies of evidence unevenly. Well-documented cut-marked specimens from Nyayanga and other sites are described as uncertain, while less direct evidence at EAK is treated as decisive. This selective approach weakens the argument and creates inconsistency in how evidence is judged.”

      The Nyayanga hippo remains bearing modifications have not been well-documented cut marks. Neither R2 nor we can differentiate those marks from those inflicted by natural abrasive processes in coarse-grained sedimentary contexts, where the carcasses are found. The fact that the observable microscopic features (through low-quality photographs as appear in the original publication) differ between the cut marks documented on smaller animals and those inferred for the hippo remains makes them even more ambiguous. Nowhere in our manuscript do we treat the EAK evidence (or any other evidence) as decisive, but as the most likely given the methods used and the results reported.

      References

      Haynes G, Krasinski K, Wojtal P. 2021. A Study of Fractured Proboscidean Bones in Recent and Fossil Assemblages. Journal of Archaeological Method and Theory 28:956–1025.

      Domínguez-Rodrigo, M., Cobo-Sánchez, L., yravedra, J., Uribelarrea, D., Arriaza, C., Organista, E., Baquedano, E. 2018. Fluvial spatial taphonomy: a new method for the study of post-depositional processes. Archaeological and Anthropological Sciences 10: 1769-1789.

      Recommendations for authors:

      Reviewer #1 (Recommendations for the authors):

      I have several recommendations that, in my opinion, could enhance the communication of this study to the readers. The first point is the only crucial one.

      (1) A detailed zooarchaeological methods section must be added, with explanations (or references to them) of precisely how the authors defined and recorded bone-surface modifications and mode of bone fragmentation.

      This appears in the revised version of the manuscript in the form of a new sub-section within the Methods section.

      (2) The title could be improved to better represent the contents of the paper. It contains two parts: the earliest evidence for elephant butchery (that's ok), and revealing the evolutionary impact of megafaunal exploitation. The latter point is not actually revealed in the manuscript, just alluded to here and there (see also below).

      We have elaborated on this in the revised version, linking megafaunal exploitation and anatomical changes (which appear discussed in much more detail in the references indicated).

      (3) The abstract does not make it clear whether the authors think that the megafaunal adaptation strongly correlates with the Acheulian technocomplex. It seems that they do, so please make this point apparent in the abstract.

      From a functional point of view, we document the correlation, but do not believe in the causation, since most butchering tools around these megafaunal carcasses are typologically non Acheulian. We have indicated so in the abstract.

      (4) Please define what you mean by "megafauna". How large should an animal be to be considered as megafauna in this particular context?

      We have added this definition: we identify as “megafauna” those animals heavier than 800 kg.

      (5) In the literature survey, consider also this Middle Pleistocene case-study of elephant butchery, including a probable bone tool: Rabinovich, R., Ackermann, O., Aladjem, E., Barkai, R., Biton, R., Milevski, I., Solodenko, N., and Marder, O., 2012. Elephants at the middle Pleistocene Acheulian open-air site of Revadim Quarry, Israel. Quaternary International, 276, pp.183-197.

      Added to the revised version

      (6) The paragraph in lines 123-160 is unclear. Do the authors argue that the lack of evidence for processing elephant carcasses for marrow and grease is universal? They bring forth a single example of a much later (MIS 5) site in Germany. Then, the authors state the huge importance of fats for foragers (when? Where? Surely not in all latitudes and ecosystems). This left me confused - what exactly are you trying to claim here?

      We have explained this a little more in the revised text. What we pointed out was that most prehistoric (and modern) elephant butchery sites leave grease-containing long bones intact. Evidence of anthropogenic breakage of these elements is rather limited. The most probably reason is the overabundance of meat and fat from the rest of the carcass and the time-consuming effort needed to access the medullary cavity of elephant long bones.

      (7) The paragraph in lines 174-187 disrupts the flow of the text, contains previously mentioned information, ends with an unclear sentence, and could be cut.

      (8) Results: please provide the MNI for the EAK site (presumably 1, but this is never mentioned).

      Done in the revised version.

      (9) Lines 292 - 295: The authors found no traces of carnivoran activity (carnivoran remains, coprolites, or gnawing marks on the elephant bones), yet they attribute the absence of some non-dense skeletal elements to carnivore ravaging. I cannot understand this rationale, given that other density-mediated processes could have deleted the missing bones and epiphysis.

      This interpretation stems from our observations of several elephant carcasses in the Okavango delta in Botswana. Those that were monitored showed deletion of remains (i.e., disappearance of certain bones, like feet) without necessarily imprinting damage on the rest of the carcass. Carnivore intervention in an elephant death site can result in deletion of a few remains without much damage (if any), or if hyena clans access the carcass, much more conspicuous damage can be documented. There is a whole range of carnivore signatures in between. We are currently working on our study of several elephant carcasses subjected to these highly variable degrees of carnivore impact.

      (10) Lines 412 - 422: "The clustering of the elephant (and hippopotamus) carcasses in the areas containing the highest densities of landscape surface artifacts is suggestive of a hominin agency in at least part of their consumption and modification." - how so? It could equally suggest that both hominins and elephants were drawn to the same lush environments.

      We agree. Both hominins and megafauna must have been drawn to the same ecological loci for interaction to emerge. However, the fact that the highest density clusters of artifacts coincide with the highest density of carcasses “showing evidence of having been broken”, is suggestive of hominin use and consumption.

      (11) Discussion: I suggest starting the Discussion with a concise appraisal of the lines of evidence detailed in the Results and their interpretation, and only then, the critical reassessment of other studies. Similarly, a new topic starts in line 508, but without any subheading or an introductory sentence that could assist the readers.

      We added the introductory lines of the former Conclusion section to the revised Discussion section, as suggested by R1.

      (12) Line 607: Neumark-Nord are Late Pleistocene sites (MIS 5), not Middle Pleistocene.

      Corrected.

      (13) Regarding the ambiguity in how megafaunal exploitation may be causally related to the other features of the early Acheulian, the authors can develop the discussion. Alternatively, they should explicitly state that correlation is not causation, and that the present study adds the megafaunal exploitation element to be considered in future discussion of the shifts in lifestyles 1.8 million years ago.

      We have done so.

      Reviewer #2 (Recommendations for the authors):

      The following detailed comments are provided to help clarify arguments, ensure accurate representation of cited literature, and strengthen the logical and methodological framing of the paper. Line numbers refer to the version provided for review.

      (1) Line 55: Such concurrency (sometimes in conjunction with other variables)

      The term "other variables" is very vague. I would suggest expanding on this or taking it out altogether.

      (2) Line 146: Megafaunal long bone green breakage (linked to continuous spiral fractures on thick cortical bone) is probably a less ambiguous trace of butchery than "cut marks", since many of the latter could be equifinal and harder to identify, especially in contexts of high abrasion and trampling (Haynes et al., 2021, 2020).

      This reasoning is not supported by the evidence or the cited sources. Green-bone spiral fractures only show that a bone broke while it was fresh and do not reveal who or what caused it. Carnivore feeding, trampling, and natural sediment pressure can all create the same patterns, so these fractures are not clearer evidence of butchery than cut marks. Cut marks, when they are preserved and morphologically clear, remain the most reliable indicator of human activity. The Haynes papers actually show the opposite of what is claimed here. They warn that spiral fractures and surface marks can form naturally and that fracture patterns alone cannot be used to infer butchery. This section should be revised to reflect what those studies actually demonstrate.

      The reasoning referred to in line 146 is further explained below in the original text as follows:

      “Despite the occurrence of green fractures on naturally-broken bones, such as those trampled by elephants (Haynes et al., 2020), those occurring through traumatic fracturing or gnawed by carnivores (Haynes and Hutson, 2020), these fail to reproduce the elongated, extensive, or helicoidal spiral fractures (uninterrupted by stepped sections), accompanied by the overlapping conchoidal scars (both cortical and medullary), the reflected scarring, the inflection points, or the impact hackled break surfaces and flakes typical of dynamic percussive breakage. Evidence of this type of green breakage had not been documented earlier for the Early Pleistocene proboscidean or hippopotamid carcasses, beyond the documentation of flaked bone with the purpose of elaboration of bone tools (Backwell and d’Errico, 2004; Pante et al., 2020; Sano et al., 2020).”

      The problem in the way that R2 uses Haynes et al.´s works is that R2 uses features separately. Natural breaks occurring while the bone is green can generate spiral smooth breaks, for example, but it is not the presence of a single feature that invalidates the diagnosis of agency or that is taphonomically relevant, but the concurrence of several of them. The best example of a naturally (pre-mortem) broken bone was published by Haynes et al.

      The natural break shows helical fractures, subjugated to linear (angular) fracture outlines. Notice how the crack displays a zig-zag. The break is smooth but most damage occurs on the cortical surface, with flaking adjacent to the break and step micro-fracturing on the edges. The cortical scarring is discontinuous (almost marginal) and very small, almost limited to the very edge of the break. No modification occurs on the medullary surface. No extensive conchoidal fractures are documented, and certainly none inside the medullary surface of the break.

      Compare with Figure S8, S10, S17 and S34 (all specimens are shown in their medullary surface):

      In these examples, we see clearly modified medullary surfaces with multiple green breaks and large-sized step fractures, accompanied in some examples by hackle marks. Some show large overlapping scars (of substantially bigger size than those documented in the natural break image). Not a single example of naturally-broken bones has been documented displaying these morphologies simultaneously. It is the comprehensive analysis of the co-occurrence of these features and not their marginal and isolated occurrence in naturally-broken bones that make a difference in the attribution of agency. Likewise, no example of naturally-broken bone has been published that could mimic any of the two green-broken bones documented at EAK. In contrast, we do have bones from our on-going experimentation with green elephant carcasses that jointly reproduce these features. See also Figure 6 of the article to find another example without any modern referent in the naturally-broken bones documented.

      We should emphasize that R2 is inaccurately portraying what Haynes et al.´s results really document. Contrary to R2´s assertion, trampling does not reproduce any of the examples shown above. Neither do carnivores. It should be stressed that Haynes & Harrod only document similar overlapping scarring on the medullary surface of bones, when using much smaller animals. In all the carnivore damage repertoire that they document for elephants, durophagous spotted hyenas can only inflict furrowing on the ends of the biggest long bones, especially if they are adults. Long bone midshafts remain inaccessible to them. The mid-shaft portions of bones that we document in our Supplementary File and at EAK cannot be the result of hyena (or carnivore damage) for this reason, and also because their intense gnawing on elephant bones leaves tooth marking on most of the elements that they modify, being absent in our sample.

      (3) Line 176: other than hominins accessed them in different taphonomically-defined stages- stages - the "Stages" is repeated twice

      Defined in the revised version

      (4) Line 174: Regardless of the type of butchery evidence - and with the taphonomic caveat that no unambiguous evidence exists to confirm that megafaunal carcasses were hunted or scavenged other than hominins accessed them in different taphonomically-defined stages- stages - the principal reasons for exploring megafaunal consumption in early human evolution is its origin, its episodic or temporally-patterned occurrence, its impact on hominin adaptation to certain landscapes, and its reflection on hominin group size and site functionality.

      This sentence is confusing and needs to be rewritten for clarity. It tries to combine too many ideas at once, and the phrasing makes it hard to tell what the main point is. The taphonomic caveat in the middle interrupts the sentence and obscures the argument. It should be broken into separate, clearer statements that distinguish what evidence exists, what remains uncertain, and what the broader goals of the discussion are.

      We believe the ideas are displayed clearly

      (5) Line 179: landscapes, and its reflection on hominin group size and site functionality. If hominins actively sought the exploitation of megafauna, especially if targeting early stages of carcass consumption, the recovery of an apparent surplus of resources reflects a substantially different behavior from the small-group/small-site pattern documented at several earlier Oldowan anthropogenic sites (Domínguez-Rodrigo et al., 2019) -or some modern foragers, like the Hadza, who only exploit megafaunal carcasses very sporadically, mostly upon opportunistic encounters (Marlowe, 2010; O'Connell et al., 1992; Wood, 2010; Wood and Marlowe, 2013).

      This sentence makes a reasonable point, but is written in a confusing way. The idea that early, deliberate access to megafauna would represent a different behavioral pattern from smaller Oldowan or modern foraging contexts is valid, but the sentence is awkward and hard to follow. It should be rephrased to make the logic clearer and more direct.

      We believe the ideas are displayed clearly

      (6) Line 186: When the process started of becoming megafaunal commensal started has major implications for human evolution.

      This sentence is awkward and needs to be rewritten for clarity. The phrasing "when the process started of becoming megafaunal commensal started" is confusing and grammatically incorrect. It could be revised to something like "Determining when hominins first began to interact regularly with megafauna has major implications for human evolution," or another version that clearly identifies the process being discussed.

      Modified in the revised version

      (7) Line189: The multiple taphonomic biases intervening in the palimpsestic nature of most of these butchery sites often prevent the detection of the causal traces linking megafaunal carcasses and hominins. Functional links have commonly been assumed through the spatial concurrence of tools and carcass remains; however, this perception may be utterly unjustified as we argued above. Functional association of both archaeological elements can more securely be detected through objective spatial statistical methods. This has been argued to be foundational for heuristic interpretations of proboscidean butchery sites (Giusti, 2021). Such an approach removes ambiguity and solidifies spatial functional association, as demonstrated at sites like Marathousa 1 (Konidaris et al., 2018) or TK Sivatherium (Panera et al., 2019). This method will play a major role in the present study.

      This section overstates what spatial analysis can demonstrate and misrepresents the cited studies. The works by Giusti (2021), Konidaris et al. (2018), and Panera et al. (2019) do use spatial statistics to examine relationships between artifacts and faunal remains, but they explicitly caution that spatial overlap alone does not prove functional or behavioral association. These studies argue that clustering can support such interpretations only when combined with detailed taphonomic and stratigraphic evidence. None of them claims that spatial analysis "removes ambiguity" or "solidifies" functional links. The text should be revised to reflect the more qualified conclusions of those papers and to avoid implying that spatial statistics can establish behavioral causation on their own.

      We disagree. Both works (Giusti and Panera) use spatial statistical tools to create an inferential basis reinforcing a functional association of lithics and bones. In both cases, the anthropogenic agency inferred is based on that. We should stress that this only provides a basis for argumentation, not a definitive causation. Again, those analyses show much more than just apparent visual clustering.

      (8) Line 200: Here, we present the discovery of a new elephant butchery site (Emiliano Aguirre Korongo, EAK), dated to 1.78 Ma, from the base of Bed II at Olduvai Gorge. It is the oldest unambiguous proboscidean butchery site at Olduvai.

      It is fine to state the main finding in the introduction, but the phrasing here is too strong. Calling EAK "the oldest unambiguous proboscidean butchery site" asserts certainty before the evidence is presented. The claim should be stated more cautiously, for example, "a new site that provides early evidence for proboscidean butchery," so that the language reflects the strength of the data rather than pre-judging it.

      We understand the caution by R2, but in this case, EAK is the oldest taphonomically-supported evidence of elephant butchery at Olduvai (see discussion about FLK North in the text). Whether this is declared at the beginning or the end of the text is irrelevant.

      (9) Line 224: The drying that characterizes Bed II had not yet taken place during this moment.

      This sentence reads like a literal translation. It should be rewritten for clarity.

      Modified in the revised version

      (10) Line 233: During the recent Holocene, the EAK site was affected by a small landslide which displaced the...

      This section contains far more geological detail than is needed for the argument. The reader only needs to know that the site block was displaced by a small Holocene landslide but retains its stratigraphic integrity. The extended discussion of regional faults, seismicity, and slope processes goes well beyond what is necessary for context and distracts from the main focus of the paper.

      We disagree. The geological information is what is most commonly missing from most archaeological reports. Here, it is relevant because of the atypical process and because it has been documented only twice with elephant butchery sites. Explaining the dynamic geological process that shaped the site helps to understand its spatial properties.

      (11) Line 264: In June 2022, a partial elephant carcass was found at EAK on a fragmented stratigraphic block...

      This section reads like field notes rather than a formal site description. Most of the details about the discovery sequence, trench setup, and excavation process are unnecessary for the main text. Only the basic contextual information about the find location, stratigraphic position, and anatomical composition is needed. The rest could be condensed or moved to the methods or supplementary material.

      We disagree. See reply above.

      (12) Line 291: hominins or other carnivores. Ongoing restoration work will provide an accurate estimate of well-preserved and modified fractions of the assemblage.

      This sentence is unclear and needs to specify what kind of restoration work is being done and what is meant by well-preserved and modified fractions. It is not clear whether modified refers to surface marks, diagenetic alteration, or something else. If the bones are still being cleaned or prepared, the analysis is incomplete, and the counts cannot be considered final. If restoration only means conservation or stabilization, that should be stated clearly so the reader understands that it does not affect the results. As written, it is not clear whether the data presented here are preliminary or complete.

      We added: For this reason, until restoration is concluded, we cannot produce any asssertion about the presence or absence of bone surface modifications.

      (13) Line 294: The tibiae were well preserved, but the epiphyseal portions of the femora were missing, probably removed by carnivores, which would also explain why a large portion of the rib cage and almost all vertebrae are missing.

      This explanation is not well supported. The missing elements could be the result of other forms of density-mediated destruction, such as sediment compaction or post-depositional fragmentation, especially since no tooth marks were found. Given the low density of ribs, vertebrae, and femoral epiphyses, these processes are more likely explanations than carnivore removal. The text should acknowledge these alternatives rather than attributing the pattern to carnivore activity without direct evidence.

      Sediment compaction and post-depositional can break bones but cannot make them disappear. Our excavation process was careful enough to detect bone if present. Their absence indicates two possibilities: erosion through the years at the front of the excavation or carnivore intervention. Carnivores can take elephant bones without impacting the remaining assemblage (see our reply above to a similar comment).

      (14) Line 304: The fact that the carcass was moved while encased in its sedimentary context, along with the close association of stone tools with the elephant bones, is in agreement with the inference that the animal was butchered by hominins. A more objective way to assess this association is through spatial statistical analysis.

      The authors state that "the carcass was moved while encased in its sedimentary context, along with the close association of stone tools with the elephant bones, is in agreement with the inference that the animal was butchered by hominins." This does not logically follow. Movement of the block explains why the bones and tools remain together, not how that association was created. The preserved association alone does not demonstrate butchery, especially in the absence of cut marks or other direct evidence of hominin activity.

      Again, we are sorry that R2 is completely overlooking the strong signal detected by the spatial statistical analysis. The way that the block moved, it preserved the original association of bones and tools. This statement is meant to clarify that despite the allochthonous nature of the block, the original autochthonous depositional process of both types of archaeological materials has been preserved. The spatial association, as statistically demonstrated, indicates that the functional link is more likely than any other alternative process. The additional fact that nowhere else in that portion of the outcrop do we identify scatters of tools (all appear clustered at a landscape scale with the elephant) adds more support to this interpretation. This would have been further supported by the presence of cut marks, no doubt, but their absence does not indicate lack of functional association, since as Haynes´ works have clearly shown, most bulk defleshing of modern elephant leaves no traces on most bones.

      (15) Line 370: This also shows that the functional connection between the elephant bones and the tools has been maintained despite the block post-sedimentary movement.

      The spatial analyses appear to have been carried out appropriately, and the interpretations of clustering and segregation are consistent with the reported results. However, the conclusion that the "functional connection" between bones and tools has been maintained goes beyond what spatial correlation alone can demonstrate. These analyses show spatial proximity and scale-dependent clustering but cannot, by themselves, confirm a behavioral or functional link.

      R2 is making this comment repeatedly and we have addressed it more than once above. We disagree and we refer to our replies above to sustain it.

      (16) Line 412: The clustering of the elephant (and hippopotamus) carcasses in the areas containing the highest densities of landscape surface artifacts is suggestive of a hominin agency in at least part of their consumption and modification. The presence of green broken elephant long bone elements in the area surveyed is only documented within such clusters, both for lower and upper Bed II. This constitutes inverse negative evidence for natural breaks occurring on those carcasses through natural (i.e., non-hominin) pre- and peri-mortem limb breaking (Haynes et al., 2021, 2020; Haynes and Hutson, 2020). In this latter case, it would be expected for green-broken bones to show a more random landscape distribution, and occur in similar frequencies in areas with intense hominin landscape use (as documented in high density artifact deposition) and those with marginal or non-hominin intervention (mostly devoid of anthropogenic lithic remains).

      The clustering of green-bone fractures with stone tools is intriguing but should be interpreted cautiously. The Haynes references are misrepresented here. Those studies address both cut marks and green-bone (spiral) fractures, emphasizing that each can arise through non-hominin processes such as trampling, carcass collapse, and sediment loading. They do not treat green fractures as clearer evidence of butchery; in fact, they caution that such breakage patterns can occur naturally and even form clustered distributions in areas of repeated animal activity. The claim that these studies support spiral fractures as unambiguous indicators of hominin activity, or that natural breaks would be randomly distributed, is not accurate.

      We would like to emphasize again that the Haynes´references are not misrepresented here. See our extensive reply above. If R2 can provide evidence of natural breakage patterns resulting from pre-mortem limb breaking or post-mortem trampling resulting in all limb bones being affected by these processes and resulting in smooth spiral breaks, accompanied with extensive and overlapping scarring on the medullary surface, in conjunction with the other features described in our replies above, then we would be willing to reconsider. With the evidence reported until now, that does not occur simultaneously on specimens resulting from studies on modern elephant bones.

      R2 seems to contradict him(her)self here by saying that Haynes studies show that cut marks are not reliable because they can also be reproduced via trampling. Until this point, R2 had been saying that only cut marks could demonstrate a functional link and support butchery. Haynes´ studies do not deal experimentally with sediment loading.

      (17) Line 424: This indicates that from lower Bed II (1.78 Ma) onwards, there is ample documented evidence of anthropogenic agency in the modification of proboscidean bones across the Olduvai paleolandscapes. The discovery of EAK constitutes, in this respect, the oldest evidence thereof at the gorge. The taphonomic evidence of dynamic proboscidean bone breaking across time and space supports, therefore, the inferences made by the spatial statistical analyses of bones and lithics at the site.

      This conclusion is overstated. The claim of "ample documented evidence of anthropogenic agency" is too strong, given that the main support comes from indirect indicators like green-bone fractures and spatial clustering rather than clear butchery marks. It would be more accurate to say that the evidence suggests or is consistent with possible hominin involvement. The final sentence also conflates association with causation; spatial and taphonomic data can indicate a relationship, but do not confirm that the carcasses were butchered by hominins.

      The evidence is based on spatially clustering (at a landscape scale) of tools and elephant (and other megafaunal taxa) bones, in conjunction with a large amount of green-broken elements. This interpretation, if we compare it against modern referents is supported even stronger. In the past few years, we have been conducting work on modern naturally dead elephant carcasses in Botswana and Zambia, and of the several carcasses that we have seen, we have not identified a single case of long bone shaft breaks like those described by Haynes as natural or like those we describe here as anthropogenic. This probably means that they are highly unlikely or marginal occurrences at a landscape scale. This seems to be supported by Haynes´ work too. Out of the hundreds of elephant carcasses that he has monitored and studied over the years for different works, we have managed to identify only two instances where he described natural pre-mortem breaks. This certainly qualifies as extremely marginal. 

      Most of the Results section is clearly descriptive, but beginning with "The clustering of the elephant (and hippopotamus) carcasses..." the text shifts from reporting observations to drawing behavioral conclusions. From this point on, it interprets the data as evidence of hominin activity rather than simply describing the patterns. This part would be more appropriate for the Discussion, or should be rewritten in a neutral, descriptive way if it is meant to stay in the Results.

      This appears extensively discussed in the Discussion section, but the data presented in the results is also interpreted in that section, following a clear argumental chain.

      (18) Line 433: A recent discovery of a couple of hippopotamus partial carcasses at the 3.0-2.6 Ma site of Nyayanga (Kenya), spatially concurrent with stone artifacts, has been argued to be causally linked by the presence of cut marks on some bones (Plummer et al., 2023). The only evidence published thereof is a series of bone surface modifications on a hippo rib and a tibial crest, which we suggest may be the result of byproduct of abiotic abrasive processes; the marks contrast noticeably with the well-defined cut marks found on smaller mammal bones (Plummer et al. ́s 2023: Figure 3C, D) associated with the hippo remains (Plummer et al., 2023).

      The authors suggest that the Nyayanga marks could result from abiotic abrasion, but this claim does not engage with the detailed evidence presented by Plummer et al. (2023). Plummer and colleagues documented well-defined, morphologically consistent cut marks and considered the sedimentary context in their interpretation. Raising abrasion as a general possibility without addressing that analysis gives the impression of selective skepticism rather than an evaluation grounded in the published data.

      We disagree again on this matter. R2 does not clarify what he/she means by well-defined or morphologically consistent. We provide an alternative interpretation of those marks that fit their morphology and features and that Plummer at al did not successfully exclude. We also emphasize that the interpretation of the Nyayanga marks was made descriptively, without any analytical approach and with a high degree of subjectivity by the researcher. All of this disqualifies the approach as well defined and keeps casting an old look at modern taphonomy. Descriptive taphonomy is a thing of the 1980´s. Today there are a plethora of analytical methods, from multivariate statistics, to geometric morphometrics to AI computer vision (so far the most reliable) which represent how taphonomy (and more specifically, analysis of bone surface modifications) should be conducted in the XXI century. This approaches would reinforce interpretations as preliminarily published by Plummer et al, provided they reject alternative explanations like those that we have provided.

      (19) Line 459: It would have been essential to document that the FLK N6 tools associated with the elephant were either on the same depositional surface as the elephant bones and/or on the same vertical position. The ambiguity about the FLK N6 elephant renders EAK the oldest secure proboscidean butchery evidence at Olduvai, and also probably one of the oldest in the early Pleistocene elsewhere in Africa.

      The concern about vertical mixing is fair, but the tone makes it sound like the association is definitely not real. It would be more accurate to say that the evidence is ambiguous, not that it should be dismissed altogether.

      We have precisely done so. We do not dismiss it, but we cannot take it for anything solid since we excavated the site and show how easily one could make functional associations if forgetting about the third dimension. It is not a secure butchery site. This is what we said and we stick to this statement.

      (20) Line 479: In all cases, these wet environments must have been preferred places for water-dependent megafauna, like elephants and hippos, and their overlapping ecological niches are reflected in the spatial co-occurrence of their carcasses. Both types of megafauna show traces of hominin use through either cutmarked or percussed bones, green-broken bones, or both (Supplementary Information).

      The environmental part is good, but the behavioral interpretation is too strong. Saying elephants and hippos "must have been" drawn to these areas is too certain, and claiming that both "show traces of hominin use" makes it sound like every carcass was modified. It should be clearer that only some have possible evidence of this.

      The sentence only refers to both types of fauna taxonomically. No inference can be drawn therefor that all carcasses are modified.

      (21) Line 496: In most green-broken limb bones, we document the presence of a medullary cavity, despite the continuous presence of trabecular bone tissue on its walls.

      This sentence is confusing and doesn't seem to add anything meaningful. All limb bones naturally have a medullary cavity lined with trabecular bone, so it's unclear why this is noted as significant. The authors should clarify what they mean here or remove it if it's simply describing normal bone structure.

      No. Modern elephant long bones do not have a hollow medullary cavity. All the medullary volume is composed of trabecular tissue. Some elephants in the past had hollow medullary cavities, which probably contained larger amounts of marrow and fat. 

      (22) Line 518: We are not confident that the artefacts reported by de la Torre et al are indeed tools.

      While I generally agree with this statement, the paragraph reads as defensive rather than comparative. It would help if they briefly summarized what de la Torre et al. actually argued before explaining why they disagree.

      We devote two full pages of the Discussion section to do so precisely.

      (23) Lines 518-574: They are similar to the green-broken specimens that we have reported here...

      This part is very detailed but inconsistent. They argue that the T69 marks could come from natural processes, but they use similar evidence (green fractures, overlapping scars) to argue for human activity at EAK. If equifinality applies to one, it applies to both.

      We are confused by this misinterpretation. Features like green fractures and overlapping scars (among others) can be used to detect anthropogenic agency in elephant bone breaking; that is, any given specimen can be determined to have been an “artifact” (in the sense of human-created item), but going from there to interpreting an artifact as a tool, there is a large distance. Whereas an artifact (something made by a human) can be created indirectly through several processes (for example, demarrowing a bone resulting in long bone fragments), a tool suggest either intentional manufacture and use or both. That is the difference between de la Torre et al.´s interpretation and ours. We believe that they are showing anthropogenically-made items, but they have provided no proof that they were tools.

      (24) Line 576: A final argument used by the authors to justify the intentional artifactual nature of their bone implements is that the bone tools were found in situ within a single stratigraphic horizon securely dated to 1.5 million years ago, indicating systematic production rather than episodic use. This is taphonomically unjustified.

      The reasoning here feels uneven in how clustering evidence is used. At EAK, clustering of bones and artifacts is taken as meaningful evidence of hominin activity, but here the same pattern at T69 is treated as a natural by-product of butchery or carnivore activity. If clustering alone cannot distinguish between intentional and incidental association, the authors should clarify why it is interpreted as diagnostic in one case but not in the other.

      Again, we are confused by this misinterpretation. It applies to two different scenarios/questions:

      a) is there a functional link between tools and bones at EAK and T69? We have statistically demonstrated that at EAK and we think de la Torre et al. is trying to do the same for T69, although using a different method. 

      b) Are the purported tools at T69 tools? Are those that we report here tools? In this regard there is no evidence for either case and given that several bones from T69 come from animals smaller than elephants, we do not discard that carnivores might have been responsible for those, whereas hominin butchery might have been responsible for the intense long limb breaking at that site. It remains to be seen how many (if any) of those specimens were tools.

      (25) Line 600: If such a bone implement was a tool, it would be the oldest bone tool documented to date (>1.7 Ma).

      The comparison to prior studies is useful, and the point about missing use-wear traces is well taken. However, the last lines feel speculative. If no clear use evidence has been found, it's premature to suggest that one specimen "would be the oldest bone tool." That claim should be either removed or clearly stated as hypothetical.

      It clearly reads as hypothetical.

      (26) Line 606: Evidence documents that the oldest systematic anthropogenic exploitation of proboscidean carcasses are documented (at several paleolandscape scales) in the Middle Pleistocene sites of Neumark-Nord (Germany)(Gaudzinski-Windheuser et al., 2023a, 2023b).

      This is the first and only mention of Neumark-Nord in the paper, and it appears without any prior discussion or connection to the rest of the study. If this site is being used for comparison or as part of a broader temporal framework, it needs to be introduced and contextualized earlier. As written, it feels out of place and disconnected from the rest of the argument.

      This is a Late Pleistocene site and we do not see the need to present it earlier, given that the scope of this work is Early Pleistocene.

      (27) Line 608: Evidence of at least episodic access to proboscidean remains goes back in time (see review in Agam and Barkai, 2018; Ben-Dor et al., 2011; Haynes, 2022).

      The distinction between "systematic" and "episodic" exploitation is useful, but the authors should clarify what criteria define each. The phrase "episodic access...goes back in time" is vague and could be replaced with a clearer statement summarizing the nature of the earlier evidence.

      It is self-explanatory

      (28) Line 610: Redundant megafaunal exploitation is well documented at some early Pleistocene sites from Olduvai Gorge (Domínguez-Rodrigo et al., 2014a, 2014b; Organista et al., 2019, 2017, 2016).

      The phrase "redundant megafaunal exploitation" needs clarification. "Redundant" is not standard terminology in this context. Does this mean repeated, consistent, or overlapping behaviors? Also, while these same Olduvai sites are mentioned earlier, this phrasing also introduces new interpretive language not used before and implies a broader behavioral generalization than what the data actually show.

      Webster: Redundant means repetitive, occurring multiple times.

      (29) Line 612: At the very same sites, the stone artifactual assemblages, as well as the site dimensions, are substantially larger than those documented in the Bed I Oldowan sites (Diez-Martín et al., 2024, 2017, 2014, 2009).

      The placement and logic of this comparison are unclear. The discussion moves from Middle Pleistocene Neumark-Nord to early Pleistocene Olduvai sites, then to Bed I Oldowan contexts without clearly signaling the temporal or geographic transitions. If the intent is to contrast Acheulean vs. Oldowan site scale or organization, that connection needs to be made explicit. As written, it reads as a disjointed shift rather than a continuation of the argument.

      We disagree. Here, we finalize by bringing in some more recent assemblages where hominin agency is not in question.

      (30) Line 616: Here, we have reported a significant change in hominin foraging behaviors during Bed I and Bed II times, roughly coinciding with the replacement of Oldowan industries by Acheulian tool kits -although during Bed II, both industries co-existed for a substantial amount of time (Domínguez-Rodrigo et al., 2023; Uribelarrea et al., 2019, 2017).

      This section should be restructured for flow. The reference to behavioral change during Bed I-II and the overlap of Oldowan and Acheulean industries is important, but feels buried after a long detour. Consider moving this earlier or rephrasing so the main conclusion (behavioral change across Beds I-II) is clearly stated first, followed by supporting examples.

      It is not within the scope of this work and is properly described in the references mentioned.

      (31) Line 620: The evidence presented here, together with that documented by de la Torre et al. (2025), represents the most geographically extensive documentation of repeated access to proboscidean and other megafaunal remains at a single fossil locality.

      The phrase "most geographically extensive documentation of repeated access" overstates what has been demonstrated. The evidence presented is site-specific and does not justify such a broad superlative. This should be toned down or supported with comparative quantitative data.

      We disagree. There is no other example where such an abundant record of green-broken elements from megafauna is documented. Neumark-Nord is more similar because it shows extensive evidence of butchery, but not so much about degreasing.

      (32) Line 623: The transition from Oldowan sites, where lithic and archaeofaunal assemblages are typically concentrated within 30-40 m2 clusters, to Acheulean sites that span hundreds or even over 1000 m2 (as in BK), with distinct internal spatial organization and redundancy in space use across multiple archaeological layers spanning meters of stratigraphic sequence (Domínguez-Rodrigo et al., 2014a, 2009b; Organista et al., 2017), reflects significant behavioral and technological shifts.

      This sentence about site size and spatial organization repeats earlier claims without adding new insight. If it's meant as a synthesis, it should explicitly say how the spatial expansion relates to changes in behavior or mobility, not just describe the difference.

      In the Conclusion section these correlations have been explained in more detail to add some causation.

      (33) Line 628: This pattern likely signifies critical innovations in human evolution, coinciding with major anatomical and physiological transformations in early hominins (Dembitzer et al., 2022; Domínguez-Rodrigo et al., 2021, 2012).

      The conclusion that this "signifies critical innovations in human evolution" is too sweeping, given the data presented. It introduces physiological and anatomical transformation without connecting it to any evidence in this paper. Either cite the relevant findings or limit the claim to behavioral implications.

      The references cited elaboration in extension this. The revised version of the Conclusion section also elaborates on this.

      Overall, the conclusions section reads as a loosely connected set of assertions rather than a focused synthesis. It introduces new interpretations and terminology not supported or developed earlier in the paper, and the argument jumps across temporal and geographic scales without clear transitions. The discussion should be restructured to summarize key results, clarify the scope of interpretation, and avoid speculative or overstated claims about evolutionary significance.

      We have done so, supported by the references used in addition to extending some of the arguments

      (34) Line 639: The systematic excavation of the stratigraphic layers involved a small crew.

      This sentence is not necessary.

      No comment

      (35) Line 643: The orientation and inclination of the artifacts were recorded using a compass and an inclinometer, respectively.

      What were these measurements used for (e.g., post-depositional movement analysis, spatial patterning)? A short note on the purpose would make this more meaningful.

      Fabric analysis has been added to the revised version.

      (36) Line 659: Restoration of the EAK elephant bones

      This section could be streamlined and clarified. It includes procedural detail that doesn't contribute to scientific replicability (e.g., the texture of gauze, number of consolidant applications), while omitting some key information (such as how restoration may have affected analytical results). It also contains interpretive comments ("most of the assemblage has been successfully studied") that don't belong in Methods.

      No comment

      (37) Line 689: In the field laboratory, cleaning of the bone remains was carried out, along with adhesion of fragments and their consolidation when necessary.

      Clarify whether cleaning or adhesion treatments might obscure or alter bone surface modifications, as this has analytical implications.

      These protocols do not impact bone like that anymore.

      (38) Line 711: (b) Percussion Tools - Includes hammerstones or cobbles exhibiting diagnostic battering, pitting, and/or impact scars consistent with percussive activities.

      Define how diagnostic features (battering, pitting) were identified - visual inspection, magnification, or quantitative criteria?

      Both macro and microscopically

      (39) Line 734: We conducted the analysis in three different ways after selecting the spatial window, i.e., the analysed excavated area (52.56 m2).

      Clarify why the 52.56 m<sup>2</sup> spatial window was chosen. Was this the total excavated area or a selected portion?

      It was what was left of the elephant accumulation after erosion.

      (40) Line 728: The spatial statistical analyses of EAK.

      Adding one or two sentences at the start explaining the analytical objective, such as testing spatial association between faunal and lithic materials, would help readers understand how each analysis relates to the broader research questions.

      This is well explained in the main text

      (41) Line 782: An intensive survey seeking stratigraphically-associated megafaunal bones was carried out in the months of June 2023 and 2024.

      It would help to specify whether the same areas were resurveyed in both field seasons or if different zones were covered each year. This information is important for understanding sampling consistency and potential spatial bias.

      Both areas were surveyed in both field seasons. We were very consistent.

      (42) Line 787: We focused on proboscidean bones and used hippopotamus bones, some of the most abundant in the megafaunal fossils, as a spatial control.

      Clarify how the hippopotamus remains functional as a "spatial control." Are they used as a proxy for water-associated taxa to test habitat patterning, or as a baseline for comparing carcass distribution? The meaning of "control" in this context is ambiguous.

      As a proxy for megafaunal distribution given their greater abundance over any other megafaunal taxa.

      (43) Line 789: Stratigraphic association was carried out by direct observation of the geological context and with the presence of a Quaternary geologist during the whole survey.

      This is good methodological practice, but it would be helpful to describe how stratigraphic boundaries were identified in the field (for example, by reference to tuffs or marker beds). That information would make the geological framework more replicable.

      This is basic geological work. Of course, both tuffs and marker beds were followed.

      (44) Line 791: When fossils found were ambiguously associated with specific strata, these were excluded from the present analysis.

      You might specify what proportion of the total finds were excluded due to uncertain stratigraphic association. Reporting this would indicate the strength of the stratigraphic control.

      This was not quantified but it was a very small amount compared to those whose stratigraphic provenience was certain.

      (45) Line 799: The goals of this survey were: a) collect a spatial sample of proboscidean and megafaunal bones enabling us to understand if carcasses on the Olduvai paleolandscapes were randomly deposited or associated to specific habitats.

      You might clarify how randomness or habitat association was tested.

      Randomness was tested spatially and comparing density according to ecotone. Same for habitat association.

      (46) The Methods section provides detailed information about excavation, restoration, and spatial analyses but omits critical details about the zooarchaeological and taphonomic procedures. There is no explanation of how faunal remains were analyzed once recovered, including how cut marks, percussion marks, or green bone fractures were identified or what magnification or diagnostic criteria were used. The authors also do not specify the analytical unit used for faunal quantification (e.g., NISP, MNI, MNE, or other), making it unclear how specimen counts were generated for spatial or taphonomic analyses. Even if these details are provided in the Supplementary Information, the main text should include at least a concise summary describing the analytical framework, the criteria for identifying surface modifications and fracture morphology, and the quantification system employed. This information is essential for transparency, replicability, and proper evaluation of the behavioral interpretations.

      See reply above. There is a new subsection on taphonomic methods now.

      Supplementary information:

      (47) The Supplementary Information includes a large number of green-broken proboscidean specimens from other Olduvai localities (BK, LAS, SC, FLK West), but it is never explained why these are shown or how they relate to the EAK study. The main analysis focuses entirely on the EAK elephant, including so much unrelated material without any stated purpose, which makes the supplement confusing. If these examples are meant only to illustrate the appearance of green fractures, that should be stated. Otherwise, the extensive inclusion of non-EAK material gives the impression that they were part of the analyzed assemblage when they were not.

      This is stated in the opening paragraph to the section.

      (48) Line 96: A small collection of green-broken elephant bones was retrieved from the lower and upper Bed II units.

      It would help to clarify whether these specimens are part of the EAK assemblage or derive from other Bed II localities. As written, it is not clear whether this description refers to material analyzed in the main text or to comparative examples shown only in the Supplementary Information.

      No, EAK only occupies the lower Bed II section. They belong in the Bed II paleolandscape units.

      (49) Line 97: One of them, a proximal femoral shaft found within the LAS unit, has all the traces of having been used as a tool (Figure 6).

      This says the bone tool in Figure 6 is from LAS, but the main text caption identifies it as from EAK. If I am not mistaken, EAK is a site at the base of Bed II, and LAS is a separate stratigraphic unit higher in the sequence, so the authors should clarify which is correct.

      Our mistake. It provenience is from LAS in the vicinity of EAK.

      (50) Line 186: Figure S20. Example of other megafaunal long bone shafts showing green breaks.

      Not cited in text or SI narrative. No indication where these bones come from or why they are relevant.

      It appears justified in the revised version.

      (51) Line 474: Figure S28-S30. Hyena-ravaged giraffe bones from Chobe (Botswana).

      These figures are not discussed in the text or SI, and their relevance to the study is unclear. The authors should explain why these modern comparative examples were included and how they inform interpretations of the Olduvai assemblages.

      It appears justified in the revised version.

      (52) Line 498: Figure S31. Bos/Bison bone from Bois Roche (France).

      This figure is not mentioned in the text or Supplementary Information. The authors should specify why this specimen is shown and how it contributes to the study's taphonomic or behavioral comparisons.

      It appears justified in the revised version.

      (53) Line 504: Figure S32. Miocene Gomphotherium femur from Spain.

      This figure is never referenced in the paper. The authors should clarify the purpose of including a Miocene specimen from outside Africa and explain what it adds to the interpretation of Bed II material.

      It appears justified in the revised version.

      (54) Line 508: Figure S33. Elephant femoral shaft from BK (Olduvai).

      This figure appears to show comparative material but is not cited or discussed in the text. The authors should explain why the BK material is presented here and how it relates to EAK or the broader analysis.

      There are two figures labeled S33.

      It appears justified in the revised version.

      (55) Line 515: Figure S33. Tibia fragment from a large medium-sized bovid displaying multiple overlapping scars on both breakage planes inflicted by carnivore damage.

      Because this figure repeats the S33 label and is not cited or explained in the text, it is unclear why this specimen is included or how it contributes to the study. The authors should correct the duplicate numbering and clarify the purpose of this figure.

      It appears justified in the revised version.

      (56) Line 522: Same specimen as shown in Figure S30, viewed on its medial side.

      This is not the same bone as S30. This figure is not discussed in the text or Supplementary Information. The authors should clarify why it is included and how it relates to the rest of the analysis.

      It appears justified in the revised version.

    1. eLife Assessment

      This manuscript presents a fundamental advance in our understanding of nuclear receptor pharmacology by expanding on previous work demonstrating dual ligand occupancy in the peroxisome proliferator-activated receptor-gamma (PPARγ). Using a compelling combination of biophysical, biochemical, and cellular approaches, the authors show that covalent inverse agonists with enhanced efficacy shift the receptor conformation toward a transcriptionally repressive state that limits orthosteric ligand co-binding more effectively. This revised manuscript further strengthens support for a proximal, bidirectional allosteric model of dual ligand occupancy by sharpening the distinction between prior and new findings, adding clear conceptual figures, and strengthening statistical rigor.

    2. Reviewer #1 (Public review):

      Summary:

      This paper focuses on understanding how covalent inhibitors of peroxisome proliferator-activated receptor-gamma (PPARg) show improved inverse agonist activities. This work is important because PPARg plays essential roles in metabolic regulation, insulin sensitization, and adipogenesis. Like other nuclear receptors, PPARg, is a ligand-responsive transcriptional regulator. Its important role, coupled with its ligand-sensitive transcriptional activities, makes it an attractive therapeutic target for diabetes, inflammation, fibrosis, and cancer. Traditional non-covalent ligands like thiazolininediones (TZDs) show clinical benefit in metabolic diseases, but utility is limited by off-target effects and transient receptor engagement. In previous studies, the authors characterized and developed covalent PPARg inhibitors with improved inverse agonist activities. They also showed that these molecules engage unique PPARg ligand binding domain (LBD) conformations whereby the c-terminal helix 12 penetrates into the orthosteric binding pocket to stabilize a repressive state. In the nuclear receptor superclass of proteins, helix 12 is an allosteric switch that governs pharmacologic responses, and this new conformation was highly novel. In this study, the authors did a more thorough analysis of how two covalent inhibitors, SR33065 and SR36708 influence the structural dynamics of PPARg LBD.

      Strengths:

      (1) The authors employed a compelling integrated biochemical and biophysical approach.

      (2) The cobinding studies are unique for the field of nuclear receptor structural biology, and I'm not aware of any similar structural mechanism described for this class of proteins.

      (3) Overall, the results support their conclusions.

      (4) The results open up exciting possibilities for the development of new ligands that exploit the potential bidirectional relationship between the covalent versus non-covalent ligands studied here.

      Weaknesses:

      All weaknesses were addressed by the Authors in revision.

    3. Reviewer #2 (Public review):

      Summary:

      The authors use ligands (inverse agonists, partial agonists) for PPAR, and coactivators and corepressors, to investigate how ligands and cofactors interact in a complex manner to achieve functional outcomes (repressive vs. activating).

      Strengths:

      The data (mostly biophysical data) are compelling from well-designed experiments. Figures are clearly illustrated. The conclusions are supported by these compelling data. These results contribute to our fundamental understanding of the complex ligand-cofactor-receptor interactions.

      Weaknesses:

      Breaking down a complex system into a simpler model system, when possible, provides a unique lens with which to probe systems with mechanistic insight. While it works well in this particular paper, in general, caution should be taken when using simplified models to study a complex system.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public review): 

      Summary: 

      This paper focuses on understanding how covalent inhibitors of peroxisome proliferator-activated receptor-gamma (PPARg) show improved inverse agonist activities. This work is important because PPARg plays essential roles in metabolic regulation, insulin sensitization, and adipogenesis. Like other nuclear receptors, PPARg, is a ligand-responsive transcriptional regulator. Its important role, coupled with its ligand-sensitive transcriptional activities, makes it an attractive therapeutic target for diabetes, inflammation, fibrosis, and cancer. Traditional non-covalent ligands like thiazolininediones (TZDs) show clinical benefit in metabolic diseases, but utility is limited by off-target effects and transient receptor engagement. In previous studies, the authors characterized and developed covalent PPARg inhibitors with improved inverse agonist activities. They also showed that these molecules engage unique PPARg ligand binding domain (LBD) conformations whereby the c-terminal helix 12 penetrates into the orthosteric binding pocket to stabilize a repressive state. In the nuclear receptor superclass of proteins, helix 12 is an allosteric switch that governs pharmacologic responses, and this new conformation was highly novel. In this study, the authors did a more thorough analysis of how two covalent inhibitors, SR33065 and SR36708 influence the structural dynamics of PPARg LBD. 

      Strengths: 

      (1) The authors employed a compelling integrated biochemical and biophysical approach.  

      (2) The cobinding studies are unique for the field of nuclear receptor structural biology, and I'm not aware of any similar structural mechanism described for this class of proteins.  

      (3) Overall, the results support their conclusions.  

      (4) The results open up exciting possibilities for the development of new ligands that exploit the potential bidirectional relationship between the covalent versus non-covalent ligands studied here. 

      Weaknesses: 

      (1) The major weakness in this work is that it is hard to appreciate what these shifting allosteric ensembles actually look like on the protein structure. Additional graphical representations would really help convey the exciting results of this study. 

      We thank the review for the comments. In response to the specific recommendations below, we added two new figures—Figure 1 and Figure 8 in this resubmission—that hopefully address the weakness identified by the reviewer.

      Reviewer #2 (Public review): 

      Summary: 

      The authors use ligands (inverse agonists, partial agonists) for PPAR, and coactivators and corepressors, to investigate how ligands and cofactors interact in a complex manner to achieve functional outcomes (repressive vs. activating). 

      Strengths: 

      The data (mostly biophysical data) are compelling from well-designed experiments. Figures are clearly illustrated. The conclusions are supported by these compelling data. These results contribute to our fundamental understanding of the complex ligand-cofactor-receptor interactions. 

      Weaknesses: 

      This is not the weakness of this particular paper, but the general limitation in using simplified models to study a complex system. 

      We appreciate the reviewer’s comments. Breaking down a complex system into a simpler model system, when possible, provides a unique lens with which to probe systems with mechanistic insight. While simplified models may not always explain the complexity of systems in cells, for example, our recently published work showed that a simplified model system — biochemical assays using reconstituted PPARγ ligand-binding domain (LBD) protein and peptides derived from coregulator proteins (similar to the assays in this current work) and protein NMR structural biology studies using PPARγ LBD — can explain the activity of ligand-induced PPARγ activation and repression to a high degree (pearson/spearman correlation coefficients ~0.7-0.9):

      MacTavish BS, Zhu D, Shang J, Shao Q, He Y, Yang ZJ, Kamenecka TM, Kojetin DJ. Ligand efficacy shifts a nuclear receptor conformational ensemble between transcriptionally active and repressive states. Nat Commun. 2025 Feb 28;16(1):2065. doi: 10.1038/s41467-025-57325-4. PMID: 40021712; PMCID: PMC11871303.

      Recommendations for the authors

      Reviewer #1 (Recommendations for the authors): 

      (1) More set-up is needed in the results section. The first paragraph is unclear on what is new to this study versus what was done previously. Likewise, a brief description of the assays used and the meaning behind differences in signals would help the general reader along. 

      We modified the last paragraph of the introduction and first results section to hopefully better set the stage for what was done previously vs. what is new/recollected in this study. In our results section, we also include more description about what the assays measure.

      (2) Since this paper is building on previous work, additional figures are needed in the introduction and discussion. Graphical depictions of what was found in the first study on how these ligands uniquely influence PPARg LBD conformation. A new model/depiction in the discussion for what was learned and its context with the rest of the field. 

      Our revised manuscript includes a new Figure 1 describing the possible allosteric mechanism by which a covalent ligand inhibits binding of other non-covalent ligands that was inferred from our previous study; and a new Figure 8 with a model for what has been learned.

      (3) It is stated that the results shown are representative data for at least two biological replicates. However, I do not see the other replicates shown in the supplementary information. 

      We appreciate the Reviewer’s emphasis on data reproducibility and rigor. We confirm that the biochemical and cellular assay data presented are indeed representative of consistent findings observed across two or more biological replicates—and we show representative data in our figures but not the extensive replicate data in supplementary information consistent with standard practices.

      (4) Figure 1a could benefit from labels of antagonists, inverse agonist, etc., next to each chemical structure. Likewise, if any co-crystal or other models are available it would be helpful to include those for comparison. 

      We added the pharmacological labels to Figure 2a (old Figure 1a).

      (5) The figure legends don't seem to match up completely with the figures. For example, Figure 2b states that fitted Ki values +/- standard deviation. are stated in the legend, but it's shown as the log Ki. 

      We revised the figure legends to ensure they display the appropriate errors as reported from the data fitting.

      (6) EC50, IC50, Ki, and Kd values alongside reported errors and R2 values for the fits should be reported in a table. 

      Our revised manuscript now includes a Source Data file (Figure 5—source data 1.xlsx) of the data (n=2) plotted in Figure 5 (old Figure 4) so that readers can regenerate the plots and calculate the errors and R2 values if desired. Otherwise, fitted values and errors are reported in figures when fitting in Prism permitted and reported errors; when Prism was unable to fit data or fit the error, n.d. (not determined) is specified.

      (7) Statistical analysis is missing in some places, for example, Figure 1b. 

      We revised Figure 2b (old Figure 1b) to include statistical testing.

      Reviewer #2 (Recommendations for the authors): 

      I suggest that the authors discuss the following points to broaden the significance of the results: 

      (1) The two partial agonists MRL24 and nTZDpa) are "partial" in the coactivator and corepressor recruitment assays, but are "complete" in the TR-FRET ligand displacement assay (Figure 2). Please explain that a partial agonist is defined based on the functional outcome (cofactor recruitment in this study) but not binding affinity/efficacy. 

      We added the following sentence to describe the partial agonist activity of these compounds: “These high affinity ligands are partial agonists as defined on their functional outcome in coregulator recruitment and cellular transcription; i.e., they are less efficacious than full agonists at recruiting peptides derived from coactivator proteins in biochemical assays (Chrisman et al., 2018; Shang et al., 2019; Shang and Kojetin, 2024) and increasing PPARγ-mediated transcription (Acton et al., 2005; Berger et al., 2003).“

      (2) Will the discovery reported here be broadly applicable? 

      (a) Applicable if other partial agonists and inhibitors are used? 

      (b) Applicable if different coactivators/corepressors, or different segments of the same cofactor, are used?

      (c) Applicable to other NRs (their AF-2 are similar but with sequence variation)?

      (d) The term "allosteric" might mean different things to different people - many readers might think that it means a "distal and unrelated" binding pocket. It might be helpful to point out that in this study, the allosteric site is actually "proximal and related". 

      We expanded our introduction and/or discussion sections to expand upon these concepts; specific answers as follows:

      (a) Orthosteric partial agonists?—yes, because helix 12 would clash with an orthosteiric ligand; other covalent inhibitors?—it depends on whether the covalent inhibitor stabilizes helix 12 in the orthosteric pocket.

      (b) yes with some nuanced exceptions where certain segments of the same coregulator protein bind with high affinity and others apparently do not bind or bind with low affinity

      (c) it is not clear yet if other NRs share a similar ligand-induced conformational ensemble to PPARγ

      (d) we addressed this point in the 4th paragraph of the introduction “...the non-covalent ligand binding event we previously described at the alternate/allosteric site, which is proximal to the orthosteric ligand-binding pocket, …”

    1. eLife Assessment

      This study addresses an important problem in gene regulation, namely, which features of chromatin regulate potential RNA Polymerase 2 activity at a locus. The authors provided evidence that specific post-translational modifications of histones within the gene body are correlated with Pol II transcription, that these modifications are dynamic, and that they can be regulated by Pol II activity. The manuscript contributes to the concept of "fragile nucleosomes" as a unifying framework for key epigenetic drivers of transcription; however, the quality of the evidence provided is inadequate in support of the claims made, and further evidence teasing out the mechanistic aspects of the work would strengthen its impact. This work will be of interest to the fields of transcriptional regulation, chromatin structure, and epigenetics.

    2. Reviewer #1 (Public review):

      Summary:

      This study aims to explore how different forms of "fragile nucleosomes" facilitate RNA Polymerase II (Pol II) transcription along gene bodies in human cells. The authors propose that pan-acetylated, pan-phosphorylated, tailless, and combined acetylated/phosphorylated nucleosomes represent distinct fragile states that enable efficient transcription elongation. Using CUT&Tag-seq, RNA-seq, and DRB inhibition assays in HEK293T cells, they report a genome-wide correlation between histone pan-acetylation/phosphorylation and active Pol II occupancy, concluding that these modifications are essential for Pol II elongation.

      Strengths:

      (1) The manuscript tackles an important and long-standing question about how Pol II overcomes nucleosomal barriers during transcription.

      (2) The use of genome-wide CUT&Tag-seq for multiple histone marks (H3K9ac, H4K12ac, H3S10ph, H4S1ph) alongside active Pol II mapping provides a valuable dataset for the community.

      (3) The integration of inhibition (DRB) and recovery experiments offers insight into the coupling between Pol II activity and chromatin modifications.

      (4) The concept of "fragile nucleosomes" as a unifying framework is potentially appealing and could stimulate further mechanistic studies.

      Weaknesses:

      (1) Misrepresentation of prior literature

      The introduction incorrectly describes findings from Bintu et al., 2012. The cited work demonstrated that pan-acetylated or tailless nucleosomes reduce the nucleosomal barrier for Pol II passage, rather than showing no improvement. This misstatement undermines the rationale for the current study and should be corrected to accurately reflect prior evidence.

      (2) Incorrect statement regarding hexasome fragility

      The authors claim that hexasome nucleosomes "are not fragile," citing older in vitro work. However, recent studies clearly showed that hexasomes exist in cells (e.g., PMID 35597239) and that they markedly reduce the barrier to Pol II (e.g., PMID 40412388). These studies need to be acknowledged and discussed.

      (3) Inaccurate mechanistic interpretation of DRB

      The Results section states that DRB causes a "complete shutdown of transcription initiation (Ser5-CTD phosphorylation)." DRB is primarily a CDK9 inhibitor that blocks Pol II release from promoter-proximal pausing. While recent work (PMID: 40315851) suggests that CDK9 can contribute to CTD Ser5/Ser2 di-phosphorylation, the manuscript's claim of initiation shutdown by DRB should be revised to better align with the literature. The data in Figure 4A indicate that 1 µM DRB fully inhibits Pol II activity, yet much higher concentrations (10-100×) are needed to alter H3K9ac and H4K12ac levels. The authors should address this discrepancy by discussing the differential sensitivities of CTD phosphorylation versus histone modification turnover.

      (4) Insufficient resolution of genome-wide correlations

      Figure 1 presents only low-resolution maps, which are insufficient to determine whether pan-acetylation and pan-phosphorylation correlate with Pol II at promoters or gene bodies. The authors should provide normalized metagene plots (from TSS to TTS) across different subgroups to visualize modification patterns at higher resolution. In addition, the genome-wide distribution of another histone PTM with a different localization pattern should be included as a negative control.

      (5) Conceptual framing

      The manuscript frequently extrapolates correlative genome-wide data to mechanistic conclusions (e.g., that pan-acetylation/phosphorylation "generate" fragile nucleosomes). Without direct biochemical or structural evidence. Such causality statements should be toned down.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors use various genomics approaches to examine nucleosome acetylation, phosphorylation, and PolII-CTD phosphorylation marks. The results are synthesized into a hypothesis that 'fragile' nucleosomes are associated with active regions of PolII transcription.

      Strengths:

      The manuscript contains a lot of genome-wide analyses of histone acetylation, histone phosphorylation, and PolII-CTD phosphorylation.

      Weaknesses:

      This reviewer's main research expertise is in the in vitro study of transcription and its regulation in purified, reconstituted systems. I am not an expert at the genomics approaches and their interpretation, and overall, I had a very hard time understanding and interpreting the data that are presented in this manuscript. I believe this is due to a problem with the manuscript, in that the presentation of the data is not explained in a way that's understandable and interpretable to a non-expert. For example:

      (1) Figure 1 shows genome-wide distributions of H3K9ac, H4K12ac, Ser2ph-PolII, mRNA, H3S10ph, and H4S1ph, but does not demonstrate correlations/coupling - it is not clear from these data that pan-acetylation and pan-phosphorylation are coupled with Pol II transcription.

      (2) Figure 2 - It's not clear to me what Figure 2 is supposed to be showing.

      (A) Needs better explanation - what is the meaning of the labels at the top of the gel lanes?

      (B) This reviewer is not familiar with this technique, its visualization, or its interpretation - more explanation is needed. What is the meaning of the quantitation graphs shown at the top? How were these calculated (what is on the y-axis)?

      (3) To my knowledge, the initial observation of DRB effects on RNA synthesis also concluded that DRB inhibited initiation of RNA chains (pmid:982026) - this needs to be acknowledged.

      (4) Again, Figures 4B, 4C, 5, and 6 are very difficult to understand - what is shown in these heat maps, and what is shown in the quantitation graphs on top?

    4. Reviewer #3 (Public review):

      Summary:

      Li et al. investigated the prevalence of acetylated and phosphorylated histones (using H3K9ac, H4K12ac, H3S10ph & H4S1ph as representative examples) across the gene body of human HEK293T cells, as well as mapping elongating Pol II and mRNA. They found that histone acetylation and phosphorylation were dominant in gene bodies of actively transcribing genes. Genes with acetylation/phosphorylation restricted to the promoter region were also observed. Furthermore, they investigated and reported a correlation between histone modifications and Pol II activity, finding that inhibition of Pol II activity reduced acetylation/phosphorylation levels, while resuming Pol II activity restored them. The authors then proposed a model in which pan-acetylation or pan-phosphorylation of histones generates fragile nucleosomes; the first round of transcription is accompanied by pan-acetylation, while subsequent rounds are accompanied by pan-phosphorylation.

      Strengths:

      This study addresses a highly significant problem in gene regulation. The author provided riveting evidence that certain histone acetylation and/or phosphorylation within the gene body is correlated with Pol II transcription. The author furthermore made a compelling case that such transcriptionally correlated histone modification is dynamic and can be regulated by Pol II activity. This work has provided a clearer view of the connection between epigenetics and Pol II transcription.

      Weaknesses:

      The title of the manuscript, "Fragile nucleosomes are essential for RNA Polymerase II to transcribe in eukaryotes", suggests that fragile nucleosomes lead to transcription. While this study shows a correlation between histone modifications in gene bodies and transcription elongation, a causal relationship between the two has not been demonstrated.

    5. Author response:

      Public Reviews: 

      Reviewer #1 (Public review): 

      Summary: 

      This study aims to explore how different forms of "fragile nucleosomes" facilitate RNA Polymerase II (Pol II) transcription along gene bodies in human cells. The authors propose that pan-acetylated, pan-phosphorylated, tailless, and combined acetylated/phosphorylated nucleosomes represent distinct fragile states that enable eFicient transcription elongation. Using CUT&Tagseq, RNA-seq, and DRB inhibition assays in HEK293T cells, they report a genome-wide correlation between histone pan-acetylation/phosphorylation and active Pol II occupancy, concluding that these modifications are essential for Pol II elongation. 

      Strengths: 

      (1) The manuscript tackles an important and long-standing question about how Pol II overcomes nucleosomal barriers during transcription. 

      (2) The use of genome-wide CUT&Tag-seq for multiple histone marks (H3K9ac, H4K12ac, H3S10ph, H4S1ph) alongside active Pol II mapping provides a valuable dataset for the community. 

      (3) The integration of inhibition (DRB) and recovery experiments oFers insight into the coupling between Pol II activity and chromatin modifications. 

      (4) The concept of "fragile nucleosomes" as a unifying framework is potentially appealing and could stimulate further mechanistic studies. 

      Really appreciate the positive or affirmative comments from the reviewer.

      Weaknesses: 

      (1)  Misrepresentation of prior literature 

      The introduction incorrectly describes findings from Bintu et al., 2012. The cited work demonstrated that pan-acetylated or tailless nucleosomes reduce the nucleosomal barrier for Pol II passage, rather than showing no improvement. This misstatement undermines the rationale for the current study and should be corrected to accurately reflect prior evidence. 

      What we said is according to the original report in the publication (Bintu et al., Cell, 2012). Here is the citation from the report:

      Page 739,(Bintu, L. et al., Cell, 2012)(PMID: 23141536)

      “Overall transcription through tailless and acetylated nucleosomes is slightly faster than through unmodified nucleosomes (Figure 1C), with crossing times that are generally under 1 min (39.5 ± 5.7 and 45.3 ± 7.6 s, respectively). Both the removal and acetylation of the tails increase eFiciency of NPS passage:71% for tailless nucleosomes and 63% for acetylated nucleosomes (Figures 1C and S1), in agreement with results obtained using bulk assays of transcription (Ujva´ ri et al., 2008).”

      We will cite this original sentence in our revision.

      (2) Incorrect statement regarding hexasome fragility

      The authors claim that hexasome nucleosomes "are not fragile," citing older in vitro work. However, recent studies clearly showed that hexasomes exist in cells (e.g., PMID 35597239) and that they markedly reduce the barrier to Pol II (e.g., PMID 40412388). These studies need to be acknowledged and discussed. 

      “hexasome” was introduced in the transcription field four decades ago. Later, several groups claimed that “hexasome” is fragile and could be generated in transcription elongation of Pol II. However, their original definition was based on the detection of ~100 bps DNA fragments (MNase resistant) in vivo by Micrococcal nuclease sequencing (MNase-seq), which is the right length to wrap up one hexasome histone subunit (two H3/4 and one H2A/2B) to form the sub-nucleosome of a hexasome. As we should all agree that acetylation or phosphorylation of the tails of histone nucleosomes will lead to the compromised interaction between DNA and histone subunits, which could lead to the intact naïve nucleosome being fragile and easy to disassemble, and easy to access by MNase. Fragile nucleosomes lead to better accessibility of MNase to DNA that wraps around the histone octamer, producing shorter DNA fragments (~100 bps instead of ~140 bps). In this regard, we believe that these ~100 bps fragments are the products of fragile nucleosomes (fragile nucleosome --> hexasome), instead of the other way around (hexasome --> fragile). 

      Actually, two early reports from Dr. David J.  Clark’s group from NIH raised questions about the existence of hexasomes in vivo (PMID: 28157509) (PMID: 25348398).

      From the report of PMID:35597239, depletion of INO80 leads to the reduction of “hexasome” for a group of genes, and the distribution of both “nucleosomes” and “hexasomes” with the gene bodies gets fuzzier (less signal to noise). In a recent theoretical model (PMID: 41425263), the corresponding PI found that chromatin remodelers could act as drivers of histone modification complexes to carry out different modifications along gene bodies. The PI found that INO80 could drive NuA3 (a H3 acetyltransferase) to carry out pan-acetylation of H3 and possibly H2B as well in the later runs of transcription of Pol II for a group of genes (SAGA-dependent). It suggests that the depletion of INO80 will affect (reduce) the pan-acetylation of nucleosomes, which leads to the drop of pan-acetylated fragile nucleosomes, subsequently the drop of “hexasomes”. This explains why depletion of INO80 leads to the fuzzier results of either nucleosomes or “hexasomes” in PMID: 35597239. The result of PMID: 35597239 could be a strong piece of evidence to support the model proposed by the corresponding PI (PMID: 41425263).

      From a recent report: PMID:40412388, the authors claimed that FACT could bind to nucleosomes to generate “hexasomes”, which are fragile for Pol II to overcome the resistance of nucleosomes. It was well established that FACT enhances the processivity of Pol II in vivo via its chaperonin property. However, the exact working mechanism of FACT still remains ambiguous. A report from Dr. Cramer’s group showed that FACT enhances the elongation of regular genes but works just opposite for pausing-regulated genes (PMID: 38810649). An excellent review by Drs. Tim Formosa and Fred Winston showed that FACT is not required for the survival of a group of differentiated cells (PMID: 33104782), suggesting that FACT is not always required for transcription. It is quite tricky to generate naïve hexasomes in vitro according to early reports from the late Dr. Widom’s group. Most importantly, the new data (the speed of Pol II, the best one on bare DNA is ~27 bps/s) from the report of PMID: 40412388, which is much slower than the speed of Pol II in vivo: ~2.5 kbs/min or ~40 bps/s. From our recovering experiments (Fig. 4C, as mentioned by reviewer #3), in 20 minutes (the period between 10 minutes and 30 minutes, due to the property of CUT-&TAG-seq, of which Pol II still active after cells are collected, there is a big delay of complete stop of Pol II during the procedure of CUT&TAG experiments, so the first period of time does not actually reflect the speed of Pol II, which is ~5 kb/min), all Pol IIs move at a uniform speed of ~2.5 kbs/min in vivo. Interestingly, a recent report from Dr. Shixin Liu’s group (PMID: 41310264) showed that adding SPT4/5 to the transcription system with bare DNA (in vitro), the speed of Pol II reaches ~2.5kbs/min, exactly the same one as we derived in vivo. Similar to the original report (PMID: 23141536), the current report of PMID:40412388 does not mimic the conditions in vivo exactly.

      There is an urgent need for a revisit of the current definition of “hexasome”, which is claimed to be fragile and could be generated during the elongation of Pol II in vivo. MNase is an enzyme that only works when the substrate is accessible. In inactive regions of the genome, due to the tight packing of chromatin, MNase is not accessible to individual nucleosomes within the bodies of a gene or upstream of promoters, which is why we only see phased/spacing or clear distribution of nucleosomes at the transcription start sites, but it becomes fuzzy downstream or upstream of promoters. On the other hand, for fragile nucleosomes, the accessibility to MNase should increase dramatically, which leads to the ~100 bps fragments. Based on the uniform rate (2.5 kbs/min) of Pol II for all genes derived from human 293T cells and the similar rate (2.5 kbs/min) of Pol II on bare DNA in vitro, it is unlikely for Pol II to pause in the middle of nucleosomes to generate “hexasomes” to continue during elongation along gene bodies. Similar to RNAPs in bacterial (no nucleosomes) and Archaea (tailless nucleosomes), there should be no resistance when Pol IIs transcribe along all fragile nucleosomes within gene bodies in all eukaryotes, as we characterized in this manuscript. 

      (3)  Inaccurate mechanistic interpretation of DRB 

      The Results section states that DRB causes a "complete shutdown of transcription initiation (Ser5-CTD phosphorylation)." DRB is primarily a CDK9 inhibitor that blocks Pol II release from promoter-proximal pausing. While recent work (PMID: 40315851) suggests that CDK9 can contribute to CTD Ser5/Ser2 di-phosphorylation, the manuscript's claim of initiation shutdown by DRB should be revised to better align with the literature. The data in Figure 4A indicate that 1 M DRB fully inhibits Pol II activity, yet much higher concentrations (10-100 ) are needed to alter H3K9ac and H4K12ac levels. The authors should address this discrepancy by discussing the differential sensitivities of CTD phosphorylation versus histone modification turnover. 

      Yes, it was reported that DRB is also an inhibitor of CDK9. However, if the reviewer agrees with us and the current view in the field, the phosphorylation of Ser5-CTD of Pol II is the initiation of transcription for all Pol II-regulated genes in eukaryotes. CDK9 is only required to work on the already phosphorylated Ser5-CTD of Pol II to release the paused Pol II, which only happens in metazoans. From a series of works by us and others: CDK9 is unique in metazoans, required only for the pausing-regulated genes but not for regular genes. We found that CDK9 works on initiated Pol II (Ser5-CTD phosphorylated Pol II) and generates a unique phosphorylation pattern on CTD of Pol II (Ser2ph-Ser2ph-Ser5ph-CTD of Pol II), which is required to recruit JMJD5 (via CID domain) to generate a tailless nucleosome at +1 from TSS to release paused Pol II (PMID: 32747552). Interestingly, the report from Dr. Jesper Svejstrup’s group (PMID: 40315851) showed that CDK9 could generate a unique phosphorylation pattern (Ser2ph-Ser5ph-CTD of Pol II), which is not responsive to the popular 3E10 antibody that recognizes the single Ser2phCTD of Pol II.  This interesting result is consistent with our early report showing the unique phosphorylation pattern (Ser2ph-Ser2ph-Ser5ph-CTD of Pol II) is specifically generated by CDK9 in animals, which is not recognized by 3E10 either (PMID: 32747552). Actually, an early report from Dr. Dick Eick’s group (PMID: 26799765) showed the difference in the phosphorylation pattern of the CTD of Pol II between animal cells and yeast cells.  We have characterized how CDK9 is released from 7SK snRNP and recruited onto paused Pol II via the coupling of JMJD6 and BRD4 (PMID: 32048991), which was published on eLIFE. It is well established that CDK9 works after CDK7 or CDK8. From our PRO-seq data (Fig. 3) and CUT&TAG-seq data of active Pol II (Fig. 4), adding DRB completely shuts down all genes via inhibiting the initiation of Pol II (generation of Ser5ph-CTD of Pol II). Due to the uniqueness of CDK9 only in metazoans, it is not required for the activation of CDK12 or CDK13 (they are orthologs of CTK1 in yeast), as we demonstrated recently (PMID: 41377501). Instead, we found that CDK11/10 acts as the ortholog of Bur1 kinase from yeast, is essential for the phosphorylation of Spt5, the link of CTD of Pol II, and CDK12 (PMID: 41377501). 

      (4) Insufficient resolution of genome-wide correlations 

      Figure 1 presents only low-resolution maps, which are Insufficient o determine whether pan-acetylation and pan-phosphorylation correlate with Pol II at promoters or gene bodies. The authors should provide normalized metagene plots (from TSS to TTS) across different subgroups to visualize modification patterns at higher resolution. In addition, the genome-wide distribution of another histone PTM with a diFerent localization pattern should be included as a negative control. 

      A popular view in the field is that the majority of genomes are inactive since they do not contain coding RNAs, which are responsible for ~20,000 protein candidates characterized in animals. However, our genomewide characterization using the four histone modification marks, active Pol II, and RNA-seq, shows a different story. Figure 1 shows that most of the human genome of HEK293T is active in producing not only protein-coding RNAs but also non-coding RNAs (the majority of them). We believe that Figure 1 could change our current view of the activity of the entire genome, and should be of great interest to general readers as well as researchers on genomics. Furthermore, it is a basis for Figure 2, which is a zoom-in of Figure 1.  

      (5) Conceptual framing 

      The manuscript frequently extrapolates correlative genome-wide data to mechanistic conclusions (e.g., that pan-acetylation/phosphorylation "generate" fragile nucleosomes). Without direct biochemical or structural evidence. Such causality statements should be toned down.  

      The reviewer is right, we should tone down the strong sentences. However, we believe that our data is strong enough to derive the general conclusion. The reviewer may agree with us that the entire field of transcription and epigenetics has been stagnant in recent decades, but there is an urgent need for fresh ideas to change the current situation. Our novel discoveries, for sure, additional supporting data are needed, should open up a brand new avenue for people to explore. We believe that a new era of transcription will emerge based on our novel discoveries. We hope that this manuscript will attract more people to these topics. As Reviewer #3 pointed out, this story establishes the connection between transcription and epigenetics in the field. 

      Reviewer #2 (Public review): 

      Summary: 

      In this manuscript, the authors use various genomics approaches to examine nucleosome acetylation, phosphorylation, and PolII-CTD phosphorylation marks. The results are synthesized into a hypothesis that 'fragile' nucleosomes are associated with active regions of PolII transcription. 

      Strengths: 

      The manuscript contains a lot of genome-wide analyses of histone acetylation, histone phosphorylation, and PolII-CTD phosphorylation. 

      Weaknesses: 

      This reviewer's main research expertise is in the in vitro study of transcription and its regulation in purified, reconstituted systems. 

      Actually, the pioneering work of the establishment of in vitro transcription assays at Dr. Robert Roeder’s group led to numerous groundbreaking discoveries in the transcription field. The contributions of in vitro work in the transcription field are the key for us to explore the complexity of transcription in eukaryotes in the early times and remain important currently.

      I am not an expert at the genomics approaches and their interpretation, and overall, I had a very hard time understanding and interpreting the data that are presented in this manuscript.  I believe this is due to a problem with the manuscript, in that the presentation of the data is not explained in a way that's understandable and interpretable to a non-expert.

      Thanks for your suggestions. You are right, we have problems expressing our ideas clearly in this manuscript, which could confuse. We will make modifications accordingly per your suggestions.

      For example: 

      (1) Figure 1 shows genome-wide distributions of H3K9ac, H4K12ac, Ser2phPolII, mRNA, H3S10ph, and H4S1ph, but does not demonstrate correlations/coupling - it is not clear from these data that pan-acetylation and pan-phosphorylation are coupled with Pol II transcription. 

      Figure 1 shows the overall distribution of the four major histone modifications, active Pol II, and mRNA genome-wide in human HEK293T cells. It tells general readers that the entire genome is quite active and far more than people predicted that most of the genome is inactive, since just a small portion of the genome expresses coding RNAs (~20,000 in animals). Figure 1 shows that the majority of the genome is active and expresses not only coded mRNA but also non-coding RNAs. After all, it is the basis of Figure 2, which is a zoom-in of Figure 1. However, it is beyond the scope of this manuscript to discuss the non-coding RNAs. 

      (2) Figure 2 - It's not clear to me what Figure 2 is supposed to be showing. 

      (A) Needs better explanation - what is the meaning of the labels at the top of the gel lanes? 

      Figure 2 is a zoom-in for the individual gene, which shows how histone modifications are coupled with Pol II activity on the individual gene. We will give a more detailed explanation of the figure per the reviewer’s suggestions.

      (B) This reviewer is not familiar with this technique, its visualization, or its interpretation - more explanation is needed. What is the meaning of the quantitation graphs shown at the top? How were these calculated (what is on the y-axis)? 

      Good suggestions, we will do some modifications.

      (3) To my knowledge, the initial observation of DRB eFects on RNA synthesis also concluded that DRB inhibited initiation of RNA chains (pmid:982026) - this needs to be acknowledged. 

      Thanks for the reference, which is the first report to show the DRB inhibits initiation of Pol II in vivo. We will cite it in the revision.  

      (4) Again, Figures 4B, 4C, 5, and 6 are very difficult to understand - what is shown in these heat maps, and what is shown in the quantitation graphs on top? 

      Thanks for the suggestions, we will give a more detailed description of the Figures.  

      Reviewer #3 (Public review): 

      Summary: 

      Li et al. investigated the prevalence of acetylated and phosphorylated histones (using H3K9ac, H4K12ac, H3S10ph & H4S1ph as representative examples) across the gene body of human HEK293T cells, as well as mapping elongating Pol II and mRNA. They found that histone acetylation and phosphorylation were dominant in gene bodies of actively transcribing genes. Genes with acetylation/phosphorylation restricted to the promoter region were also observed. Furthermore, they investigated and reported a correlation between histone modifications and Pol II activity, finding that inhibition of Pol II activity reduced acetylation/phosphorylation levels, while resuming Pol II activity restored them. The authors then proposed a model in which panacetylation or pan-phosphorylation of histones generates fragile nucleosomes; the first round of transcription is accompanied by panacetylation, while subsequent rounds are accompanied by panphosphorylation. 

      Strengths: 

      This study addresses a highly significant problem in gene regulation. The author provided riveting evidence that certain histone acetylation and/or phosphorylation within the gene body is correlated with Pol II transcription. The author furthermore made a compelling case that such transcriptionally correlated histone modification is dynamic and can be regulated by Pol II activity. This work has provided a clearer view of the connection between epigenetics and Pol II transcription. 

      Thanks for the insightful comments, which are exactly what we want to present in this manuscript. 

      Weaknesses: 

      The title of the manuscript, "Fragile nucleosomes are essential for RNA Polymerase II to transcribe in eukaryotes", suggests that fragile nucleosomes lead to transcription. While this study shows a correlation between histone modifications in gene bodies and transcription elongation, a causal relationship between the two has not been demonstrated. 

      Thanks for the suggestions. What we want to express is that the generation of fragile nucleosomes precedes transcription, or, more specifically, transcription elongation. The corresponding PI wrote a hypothetical model on how pan-acetylation is generated by the coupling of chromatin remodelers and acetyltransferase complexes along gene bodies, in which chromatin remodelers act as drivers to carry acetyltransferases along gene bodies to generate pan-acetylation of nucleosomes (PMID: 41425263). We have a series of work to show how “tailless nucleosomes” at +1 from transcription start sites are generated to release paused Pol II in metazoans (PMID: 28847961) (PMID: 29459673) (PMID: 32747552) (PMID: 32048991).   We still do not know how pan-phosphorylation along gene bodies is generated. It should be one of the focuses of our future research.

    1. eLife Assessment

      This is an important study on the sensory roles of Cerebrospinal fluid-contacting neurons (CBF-cn) in mammals. The authors identify PKD2L1 as the predominant pH-sensing channel CBF-cn and show how the apical extension is used as an amplifier of chemical changes in the content of the Cerebrospinal fluid. The evidence is solid in experimental design but limited in mechanistic interpretation, as the electrophysiological analyses require re-evaluation.

    2. Reviewer #1 (Public review):

      This study by Vitar et al. probes the molecular identity and functional specialization of pH-sensing channels in cerebrospinal fluid-contacting neurons (CSFcNs). Combining patch-clamp electrophysiology, laser-based local acidification, immunohistochemistry, and confocal imaging, the authors propose that PKD2L1 channels localized to the apical protrusion (ApPr) function as the predominant dual-mode pH sensor in these cells.

      The work establishes a compelling spatial-physiological link between channel localization and chemosensory behavior. The integration of optical and electrical approaches is technically strong, and the separation of phasic and sustained response modes offers a useful conceptual advance for understanding how CSF composition is monitored.

      Several aspects of data interpretation, however, require clarification or reanalysis-most notably the single-channel analyses (event counts, Po metrics, and mixed parameters), the statistical treatment, and the interpretation of purported "OFF currents." Additional issues include PKD2L1-TRPP3 nomenclature consistency, kinetic comparison with ASICs, and the physiological relevance of the extreme acidification paradigm. Addressing these points will substantially improve reproducibility and mechanistic depth.

      Overall, this is a scientifically important and technically sophisticated study that advances our understanding of CSF sensing, provided that the analytical and interpretative weaknesses are satisfactorily corrected.

      (1) The authors should re-analyze electrophysiological data, focusing on macroscopic currents rather than statistically unreliable Po calculations. Remove or revise the Po analysis, which currently conflates current amplitude and open probability.

      (2) PKD2L1-TRPP3 nomenclature should be clarified and all figure labels, legends, and text should use consistent terminology throughout.

      (3) The authors should reinterpret the so-called OFF currents as pH-dependent recovery or relaxation phenomena, not as distinct current species. Remove the term "OFF response" from the manuscript.

      (4) Evidence for physiological relevance should be provided, including data from milder acidification (pH 6.5-6.8) and, where appropriate, comparisons with ASIC-mediated currents to place PKD2L1 activity in context.

      (5) Terminology and data presentation should be unified, adopting consistent use of "predominant" (instead of "exclusive") and "sustained" (instead of "tonic"), and all statistical formats and units should be standardized.

      (6) The Discussion should be expanded to address potential Ca²⁺-dependent signaling mechanisms downstream of PKD2L1 activation and their possible roles in CSF flow regulation and central chemoreception.

    3. Reviewer #2 (Public review):

      Summary:

      Cerebrospinal fluid contacting neurons (CSF-cNs) are GABAergic cells surrounding the spinal cord central canal (CC). In mammals, their soma lies sub-ependymally, with a dendritic-like apical extension (AP) terminating as a bulb inside the CC.

      How this anatomy-soma and AP in distinct extracellular environments relate to their multimodal CSF-sensing function remains unclear.

      The authors confirm that in GATA3:GFP mice, where these cells are labeled, that CSFcNs exhibit prominent spontaneous electrical activity mediated by PKD2L1 (TRPP2) channels, non-selective cation channels with ~200 pS conductance modulated by protons and mechanical forces.

      They investigated PKD2L1 pH sensitivity and its effects on CSFcN excitability. They uncovered that PKD2L1 generates both phasic and tonic currents, bidirectionally modulated by pH with high sensitivity near physiological values.

      Combining electrophysiology (intact and isolated AP recordings) with elegant laser-photolysis, they show that functional PKD2L1 channels localize specifically to the apical extension (AP).

      This spatial segregation, coupled with PKD2L1's biophysical properties (high conductance, pH sensitivity) and the AP's unique features (very high input resistance), renders CSFcN excitability highly sensitive to PKD2L1 modulation. Their findings reveal how the AP's properties are optimised for its sensory role.

      Strengths:

      This is a very convincing demonstration using elegant and challenging approaches (uncaging, outside out patch of the AP) together to form a complete understanding of how these sensory cells can detect the changes of pH in the CSF so finely.

      Weaknesses:

      The following do not constitute weaknesses; rather, they are minor requests that this reviewer considers would complete this beautiful study.

      (1) It would be nice to quantify further the relation in spontaneous as well as in acidic or basic pH between the effects observed on channel opening and holding current: do they always vary together and in a linear way?

      (2) Since CSF-cNs also respond to changes in osmolarity (Orts Dell Immagine 2013) & mechanosensory stimulations in a PKD2L1 dependent manner (Sternberg NC 2018), it would be nice to test the same results whether the same results hold true on the role of PKD2L1 in AP for pressure application of changes in osmolarity.

      In mice, like in fish (Sternberg et al, NC 2018), we can observe throughout the figures that a large fraction of the channel activity occurs with partial and very fast openings of the PKD2L1 channel. I recommend the authors analyse the points below:<br /> a) To what extent do these partial openings of the channel contribute to the changes in holding current and resting potential?<br /> b) In the trace from the outside out AP, it looks like the partial transient openings are gone. Can the authors verify whether these partial openings are only present in somatic recordings?

      (3) Previous studies have observed expression of metabotropic Glutamate receptors in CSF-cNs (transcriptome from Prendergast et al CB 2023). The authors only used blockers for ionotropic glutamate receptors in their recordings: could it be that these metabotropic receptors influence the response to uncaging of MNI-Glu when glutamate is co-released with a proton?

      (4) In the outside out patch of the AP, PKD2L1 unitary currents appear rare. Could it be that the disruption in the cilium or underlying actin/myosin cytoskeleton drastically alter the open probability of the channel?

      (5) Could the authors use drugs against ASIC to specify which ASIC channels contribute to the pH response in the soma?

      (6) This is out of the scope of this study, but we did observe in fish a very rarely-opening channel in the PKD2L1KO mutant. I wonder if the authors have similar observations in the conditions where PKD2L1 is mainly in the closed state.

    4. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      This study by Vitar et al. probes the molecular identity and functional specialization of pH-sensing channels in cerebrospinal fluid-contacting neurons (CSFcNs). Combining patch-clamp electrophysiology, laser-based local acidification, immunohistochemistry, and confocal imaging, the authors propose that PKD2L1 channels localized to the apical protrusion (ApPr) function as the predominant dual-mode pH sensor in these cells.

      The work establishes a compelling spatial-physiological link between channel localization and chemosensory behavior. The integration of optical and electrical approaches is technically strong, and the separation of phasic and sustained response modes offers a useful conceptual advance for understanding how CSF composition is monitored.

      Several aspects of data interpretation, however, require clarification or reanalysis-most notably the single-channel analyses (event counts, Po metrics, and mixed parameters), the statistical treatment, and the interpretation of purported "OFF currents." Additional issues include PKD2L1-TRPP3 nomenclature consistency, kinetic comparison with ASICs, and the physiological relevance of the extreme acidification paradigm. Addressing these points will substantially improve reproducibility and mechanistic depth.

      Overall, this is a scientifically important and technically sophisticated study that advances our understanding of CSF sensing, provided that the analytical and interpretative weaknesses are satisfactorily corrected.

      (1) The authors should re-analyze electrophysiological data, focusing on macroscopic currents rather than statistically unreliable Po calculations. Remove or revise the Po analysis, which currently conflates current amplitude and open probability.

      We agree with the reviewer that the Po analysis has strong limitations, particularly in experiments where the recording times are short, such as when extracellular pH is changed via photolysis (Figure 4D) or puff application (Figure 3Aa). To circumvent this problem and not rely solely on Po estimations, we used alternative methods, including an analysis of the total membrane charge (extensively used throughout the manuscript, as in Figures 3A and 4D) and an analysis of event latencies (Figure 4G). Nevertheless, single channel recordings contain information that is not included in the macroscopic current analysis. In the revised version, we intend to stress that the elementary current amplitude is conserved during manipulations such as pH changes, leaving the total number of channels (N) and the channel open probability (Po) as possible culprits for the current changes. Since these changes are rapid and reversible, it is likely that N remains constant while Po changes. To address the reviewer’s concern, we propose the following changes/reanalysis: (i) report in each condition the minimum N (based on the maximum number of simultaneously open channels; for example, in Figure 3Aa, the minimum N goes from 4-5 in control conditions to 1 during the puff of the pH 6.4 solution). Although imperfect, this method provides a tentative estimate of Po; (ii) report the fraction of time that the channels remain open; (iii) revise the text and figures to use the expression “apparent Po” instead of “Po”, acknowledging the limitations of the measurement in short recordings. We also acknowledge that some traces (Figure 3Aa, top) may appear confusing, as they seem to show macroscopic currents. We will modify these figures by including the amplitude histograms (as in Figure 1Bb) to clearly demonstrate that recordings from CSFcNs primarily reflect single-channel activity when challenged with pH changes.

      (2) PKD2L1-TRPP3 nomenclature should be clarified and all figure labels, legends, and text should use consistent terminology throughout.

      We agree with the reviewer that the nomenclature for the polycystin protein family is confusing. In this manuscript, we have followed the nomenclature  proposed in a recent comprehensive review on polycystin channels by Palomero, Larmore and DeCaen (Palomero et al. 2023), which refer to the channels by their gene names. As indicated in that review, the PKD2L1 channel corresponds to TRPP2 (previously known as TRPP3, see their Table 1). However, in another recent review on TRP channels,  the PKD2L1 channel is referred to as TRPP3 (Zhang et al. 2023). To prevent any ambiguity, we will remove references to the TRPP nomenclature from the text and exclusively use the PKD2L1 acronym.

      (3) The authors should reinterpret the so-called OFF currents as pH-dependent recovery or relaxation phenomena, not as distinct current species. Remove the term "OFF response" from the manuscript.

      Although largely used in the literature, we concur with the reviewer that the term “OFF response” is not very helpful from a biophysical perspective as it may imply the existence of a distinct current. Consequently, we will remove the terms “OFF response” and “OFF current” from the revised manuscript and replace them with the term “photolysis-evoked PKD2L1 current”. Furthermore, to improve the logical flow, we will condense the two sections (“The proton-induced current is an off-current” and “The off-current is mediated by the activation of PKD2L1 channels”) into a single, new section titled “The photolysis-induced current is mediated by PKD2L1 channels”. This consolidation will prevent the artificial separation of the description of this current. Finally, we will revise the discussion to better characterize this photolysis-evoked phenomenon as a recovery current.

      (4) Evidence for physiological relevance should be provided, including data from milder acidification (pH 6.5-6.8) and, where appropriate, comparisons with ASIC-mediated currents to place PKD2L1 activity in context.

      This point is partly addressed in Figure 3. The data indicate that  PKD2L1 channels are highly sensitive to pH variations within the physiological range. To strengthen this conclusion, we will add the EC50 values derived from the curve fittings to the figure. Regarding ASIC-mediated currents, one of our main conclusions is that ASICs are not present in the apical process (ApPr), as the effects of proton photolysis in the ApPr are not blocked by ASIC antagonists. Our results suggest that PKD2L1 channels are the exclusive pH sensitive channels in the ApPr. ASIC channels likely mediate acid sensitivity in the soma, although we have not investigated the latter in detail. We intend to modify the Discussion in order to provide a physiological framework linking channel activity with physiological and pathophysiological pH changes. 

      (5) Terminology and data presentation should be unified, adopting consistent use of "predominant" (instead of "exclusive") and "sustained" (instead of "tonic"), and all statistical formats and units should be standardized.

      Folllowing the reviewer’s suggestions, an exhaustive rephrasing will be performed to unify terminology, data presentation and correct the text.

      (6) The Discussion should be expanded to address potential Ca²⁺-dependent signaling mechanisms downstream of PKD2L1 activation and their possible roles in CSF flow regulation and central chemoreception.

      This is indeed a very interesting and currently unresolved point in the physiology of CSFcNs. Published data indicate that calcium influx through PKD2L1 channels is a key regulator of apical process (ApPr) physiology. These channels are calcium permeable yet are also inhibited by intracellular calcium (DeCaen et al. 2016). Additionally, ultrastructural data show that the ApPr is rich in mitochondria and tubulo-vesicular structures resembling the Golgi apparatus (Bruni et Reddy 1987; Bjugn et al. 1988; Nakamura et al. 2023), intracellular organelles critical for calcium homeostasis. Altogether, this evidence suggests that intra-ApPr calcium concentration must be finely regulated, both in space and time, for the ApPr to fulfill its physiological roles. Based on the existing literature, we can speculate that these calcium signals are decoded by several systems: (i) calcium may act as a second messenger, linking the activation of the multimodal PKD2L1 channels to changes in CSFcN excitability, which in turn regulates spinal neuronal networks controlling locomotor activity; (ii) calcium could initiate the neurosecretion of various molecules from the ApPr into the central canal, as proposed by the Wyart group in the zebrafish in the context of bacterial infections (Prendergast et al. 2023); (iii) calcium could activate the Hedgehog signaling pathway (as has been shown by Delling et al. 2013); iv) calcium could modulate CSF flow by modulating ependymal cells ciliary activity. Resolving these downstream pathways is essential to fully define the role of CSFcNs as integrators of cerebrospinal fluid homeostasis. We will expand on this topic in the Discussion section of the revised ms.

      Reviewer #2 (Public review):

      Summary:

      Cerebrospinal fluid contacting neurons (CSF-cNs) are GABAergic cells surrounding the spinal cord central canal (CC). In mammals, their soma lies sub-ependymally, with a dendritic-like apical extension (AP) terminating as a bulb inside the CC.

      How this anatomy-soma and AP in distinct extracellular environments relate to their multimodal CSF-sensing function remains unclear.

      The authors confirm that in GATA3:GFP mice, where these cells are labeled, that CSFcNs exhibit prominent spontaneous electrical activity mediated by PKD2L1 (TRPP2) channels, non-selective cation channels with ~200 pS conductance modulated by protons and mechanical forces.

      They investigated PKD2L1 pH sensitivity and its effects on CSFcN excitability. They uncovered that PKD2L1 generates both phasic and tonic currents, bidirectionally modulated by pH with high sensitivity near physiological values.

      Combining electrophysiology (intact and isolated AP recordings) with elegant laser-photolysis, they show that functional PKD2L1 channels localize specifically to the apical extension (AP).

      This spatial segregation, coupled with PKD2L1's biophysical properties (high conductance, pH sensitivity) and the AP's unique features (very high input resistance), renders CSFcN excitability highly sensitive to PKD2L1 modulation. Their findings reveal how the AP's properties are optimised for its sensory role.

      Strengths:

      This is a very convincing demonstration using elegant and challenging approaches (uncaging, outside out patch of the AP) together to form a complete understanding of how these sensory cells can detect the changes of pH in the CSF so finely.

      Weaknesses:

      The following do not constitute weaknesses; rather, they are minor requests that this reviewer considers would complete this beautiful study.

      (1) It would be nice to quantify further the relation in spontaneous as well as in acidic or basic pH between the effects observed on channel opening and holding current: do they always vary together and in a linear way?

      Following the reviewer’s suggestion, we performed a Spearman’s rank correlation test. The analysis revealed a significant correlation between the changes in the apparent open probability and the holding current in paired experiments (control vs pH 6.4 pressure applications; p < 0.05, Spearman r = 0.72 and critical value = 0.67). The Pearson correlation coefficient calculated on the same data set was r = 0.63 (critical value = 0.632), indicating that the correlation is not linear. We thank the reviewer for raising this point and will add this analysis to the manuscript.

      (2) Since CSF-cNs also respond to changes in osmolarity (Orts Dell Immagine 2013) & mechanosensory stimulations in a PKD2L1 dependent manner (Sternberg NC 2018), it would be nice to test the same results whether the same results hold true on the role of PKD2L1 in AP for pressure application of changes in osmolarity.

      This is a very important point. As the reviewer notes, previous experimental evidence indicates that CSFcNs are also sensitive to osmolarity changes and mechanical stimulation in a PKD2L1-dependent manner. It is therefore reasonable to assume that, similar to pH sensitivity, osmotic and mechanical sensitivity depend on channels localized to the apical process (ApPr). Regarding mechanosensitivity, this spatial segregation could be tested by mechanically stimulating either the ApPr or the soma with a piezo-controlled blunt pipette (see, for example, Hao et al. 2013). Assessing sensitivity to osmotic changes, however, is more challenging, as pressure application lacks the spatial resolution to discriminate between compartments in such a compact cell. In theory, a highly localized osmotic jump could be achieved via photolysis, provided a caged compound that releases many osmotic particles simultaneously is used. In typical photolysis experiments, a localized osmotic change is produced, but its amplitude is very low (on the order of 1 to 2 mOsm).

      In mice, like in fish (Sternberg et al, NC 2018), we can observe throughout the figures that a large fraction of the channel activity occurs with partial and very fast openings of the PKD2L1 channel. I recommend the authors analyse the points below:

      (a) To what extent do these partial openings of the channel contribute to the changes in holding current and resting potential?

      As the reviewer indicates, these partial and rapid openings are characteristic of PKD2L1 single-channel activity and appear to be conserved across species. However, estimating their precise contribution to the sustained current would require a detailed channel model, which is currently lacking. Indeed, the exact mechanism underlying this prominent sustained current in CSFcNs remains unknown and should definitely be addressed in future work.

      (b) In the trace from the outside out AP, it looks like the partial transient openings are gone. Can the authors verify whether these partial openings are only present in somatic recordings?

      The outside-out recordings from the apical process also show some partial openings (see the upper trace in Figure 4Db). We will specifically mention this important point in the revised version of the ms. 

      (3) Previous studies have observed expression of metabotropic Glutamate receptors in CSF-cNs (transcriptome from Prendergast et al CB 2023). The authors only used blockers for ionotropic glutamate receptors in their recordings: could it be that these metabotropic receptors influence the response to uncaging of MNI-Glu when glutamate is co-released with a proton?

      We thank the reviewer for pointing out the presence of metabotropic glutamate receptors in CSFcNs. However, our evidence indicates that metabotropic receptors do not contribute to the response when uncaging MNI-glutamate. This conclusion is supported by two observations: (i) the response obtained when uncaging MNI-γLGG, which does not release glutamate (Figure 5Ab), and (ii) the response obtained when uncaging protons from DPNI-GABA (data not shown) (DPNI-GABA is a GABA cage with photochemistry similar to MNI cages that also releases a proton upon photolysis; Trigo et al. 2009), are the same. In both experiments (uncaging MNI-γLGG or DPNI-GABA) a clear photolysis-evoked PKD2L1 current is observed.

      (4) In the outside out patch of the AP, PKD2L1 unitary currents appear rare. Could it be that the disruption in the cilium or underlying actin/myosin cytoskeleton drastically alter the open probability of the channel?

      The reviewer is correct in noting that the opening frequency of PKD2L1 channels appears lower in outside-out patches than in whole-ApPr recordings, although we have not quantified this. We interpreted this difference as reflecting a lower channel number. However, as the reviewer suggests, a plausible alternative explanation is that the channel's biophysical properties are altered when removed from its native ionic environment or when it loses interactions with regulatory proteins. We will address this point in the Discussion.

      (5) Could the authors use drugs against ASIC to specify which ASIC channels contribute to the pH response in the soma?

      As described in the manuscript, we performed experiments with ASIC antagonists, although we did not attempt to characterize the specific ASIC subtype mediating the somatic response. Based on the published literature, we used both psalmotoxin-1, which blocks ASIC1 channels, and APETx2, which blocks ASIC3 channels. The presence of ASIC1 in mouse CSFcNs has been demonstrated previously (Orts-Del’immagine et al. 2012; Orts-Del’Immagine et al. 2016), while ASIC3 has been identified in lamprey CSFcNs (Jalalvand et al. 2016). When applying an acidic solution to the soma, we recorded an inward current that was substantially blocked by psalmotoxin-1, although a small residual component persisted, consistent with the earlier findings of Orts-Del’Immagine et al. We did not attempt to block this remaining Psalmotoxin1‑insensitive component.

      (6) This is out of the scope of this study, but we did observe in fish a very rarely-opening channel in the PKD2L1KO mutant. I wonder if the authors have similar observations in the conditions where PKD2L1 is mainly in the closed state.

      We have never seen such kind of openings in our recordings (when the channel is closed or in the presence of dibucaine).

      References

      Bjugn, R, H K Haugland, et P R Flood. 1988. “Ultrastructure of the mouse spinal cord ependyma”. Journal of Anatomy 160 (octobre): 117‑25.

      Bruni, J. E., et K. Reddy. 1987. “Ependyma of the Central Canal of the Rat Spinal Cord: A Light and Transmission Electron Microscopic Study”. Journal of Anatomy 152 (juin): 55‑70.

      Delling, Markus, Paul G. DeCaen, Julia F. Doerner, Sebastien Febvay, et David E. Clapham. 2013. ”Primary cilia are specialized calcium signaling organelles”. Nature 504 (7479): 311‑14 https://doi.org/10.1038/nature12833.

      Hao, Jizhe, Jérôme Ruel, Bertrand Coste, Yann Roudaut, Marcel Crest, et Patrick Delmas. 2013. “Piezo-Electrically Driven Mechanical Stimulation of Sensory Neurons”. In Ion Channels, édité par Nikita Gamper, vol. 998. Methods in Molecular Biology. Humana Press. https://doi.org/10.1007/978-1-62703-351-0_12.

      Jalalvand, Elham, Brita Robertson, Hervé Tostivint, Peter Wallén, et Sten Grillner. 2016. “The Spinal Cord Has an Intrinsic System for the Control of pH”. Current Biology: CB 26 (10): 1346‑51. https://doi.org/10.1016/j.cub.2016.03.048.

      Nakamura, Yuka, Miyuki Kurabe, Mami Matsumoto, et al. 2023. “Cerebrospinal Fluid-Contacting Neuron Tracing Reveals Structural and Functional Connectivity for Locomotion in the Mouse Spinal Cord”. eLife 12 (février): e83108. https://doi.org/10.7554/eLife.83108.

      Orts-Del’Immagine, Adeline, Riad Seddik, Fabien Tell, et al. 2016. “A Single Polycystic Kidney Disease 2-like 1 Channel Opening Acts as a Spike Generator in Cerebrospinal Fluid-Contacting Neurons of Adult Mouse Brainstem”. Neuropharmacology 101 (février): 549‑65. https://doi.org/10.1016/j.neuropharm.2015.07.030.

      Orts-Del’immagine, Adeline, Nicolas Wanaverbecq, Catherine Tardivel, Vanessa Tillement, Michel Dallaporta, et Jérôme Trouslard. 2012. “Properties of Subependymal Cerebrospinal Fluid Contacting Neurones in the Dorsal Vagal Complex of the Mouse Brainstem”. The Journal of Physiology 590 (16): 3719‑41. https://doi.org/10.1113/jphysiol.2012.227959.

      Prendergast, Andrew E., Kin Ki Jim, Hugo Marnas, et al. 2023. “CSF-Contacting Neurons Respond to Streptococcus Pneumoniae and Promote Host Survival during Central Nervous System Infection”. Current Biology 33 (5): 940-956.e10. https://doi.org/10.1016/j.cub.2023.01.039.

      Trigo, Federico F., George Papageorgiou, John E. T. Corrie, et David Ogden. 2009. “Laser photolysis of DPNI-GABA, a tool for investigating the properties and distribution of GABA receptors and for silencing neurons in situ”. Journal of Neuroscience Methods 181 (2): 159‑69. https://doi.org/10.1016/j.jneumeth.2009.04.022.

    1. eLife Assessment

      This study presents important findings on how cardiac regenerative capacity diverges across species by examining heart repair in two species of livebearers, platyfish and swordtails. In contrast to zebrafish, the livebearer species show persistent scarring after cryo-injury, and the work highlights how lineage-specific anatomical and immunological traits may constrain regenerative competence. The study is compelling, the data are convincing, and the results contribute to our understanding of the mechanisms underlying heart regeneration across vertebrates.